8 Contingency Tables
8.1 ⭐️Overview
In Epi III (and epidemiology in general) we use a lot of contingency tables - especially 2x2 contingency tables. In this note, we play around with several different ways of creating, and working with, contingency tables. This can include converting them to a data frame format for some analyses. We explore some of the pros and cons of each of each method.
Need to add
I think ftable()
could be usefule. I need to work it in somehow. https://stat.ethz.ch/R-manual/R-devel/library/stats/html/ftable.html
8.4 Terminology
In the text below, we use the following terminology to distinguish three different structural representations of our data:
Data frame of observations
: A data frame where each row represents one observation (typically an individual person).
# Basic example
df <- tibble(
medication = c(rep("No", 10), rep("Yes", 10)),
fall = c(rep("No", 8), rep("Yes", 2), rep("No", 6), rep("Yes", 4))
) %>%
print()
## # A tibble: 20 × 2
## medication fall
## <chr> <chr>
## 1 No No
## 2 No No
## 3 No No
## 4 No No
## 5 No No
## 6 No No
## 7 No No
## 8 No No
## 9 No Yes
## 10 No Yes
## 11 Yes No
## 12 Yes No
## 13 Yes No
## 14 Yes No
## 15 Yes No
## 16 Yes No
## 17 Yes Yes
## 18 Yes Yes
## 19 Yes Yes
## 20 Yes Yes
Contingency table
: A grid-like table with the categories of one variable (typically the exposure of interest) making up the rows and the categories of a second variable (typically the outcome of interest) making up the columns of the table. The table contains the number of observations with the particular combination of row and column values that intersect at each cell.
## fall
## medication No Yes
## No 8 2
## Yes 6 4
Frequency table
: A data frame of counts (and optionally, other relevant statistics), where each row represents a particular combination of values from two or more categorical variables.
## # A tibble: 4 × 3
## medication fall n
## <chr> <chr> <int>
## 1 No No 8
## 2 No Yes 2
## 3 Yes No 6
## 4 Yes Yes 4
As the examples above illustrate, it’s pretty easy and straightforward to go from a data frame of observations to either a contingency table or a frequency table. What is currently less straightforward (or at least less well documented) is to:
Manually create a contingency table.
Convert a contingency table into a frequency table.
Convert a contingency table into a data frame of observations.
Convert a frequency table into a contingency table.
Convert a frequency table into a data frame of observations.
We go through each of these operations below.
8.5 Scenario
This scenario is borrowed from the Boston University website.
Data Summary
Consider the following example regarding the management of Hodgkin lymphoma, a cancer of the lymphatic system.
Years ago when a patient was diagnosed with Hodgkin Disease, they would frequently undergo a surgical procedure called a “staging laparotomy.” The purpose of the staging laparotomy was to determine the extent to which the cancer had spread, because this was important information for determining the patient’s prognosis and optimizing treatment. At times, the surgeons performing this procedure would also remove the patient’s appendix, not because it was inflamed; it was done “incidentally” in order to ensure that the patient never had to worry about getting appendicitis. However, performing an appendectomy requires transecting it, and this has the potential to contaminate the abdomen and the wound edges with bacteria normally contained inside the appendix. Some surgeons felt that doing this “incidental appendectomy” did the patient a favor by ensuring that they would never get appendicitis, but others felt that it meant unnecessarily increasing the patient’s risk of getting a post-operative wound infection by spreading around the bacteria that was once inside the appendix.
To address this, the surgeons at a large hospital performed a retrospective cohort study. They began by going through the hospital’s medical records to identify all subjects who had had a “staging laparotomy performed for Hodgkin.” They then reviewed the medical record and looked at the operative report to determine whether the patient had an incidental appendectomy or not. They then reviewed the progress notes, the laboratory reports, the nurses notes, and the discharge summary to determine whether the patient had developed a wound infection during the week after surgery.
The investigators reviewed the records of 210 patients who had undergone the staging procedure and found that 131 had also had an incidental appendectomy, while the other 79 had not. The data from that study are summarized in the table below. The numbers in the second and third columns indicate the number of subjects who did or did not develop a post-operative wound infection among those who had the incidental appendectomy (in the “Yes” row) and those who did not have the incidental appendectomy (in the “No” row). For example, the upper left cell indicates that seven of the subjects who had an incidental appendectomy (the exposure of interest) subsequently developed a wound infection. The upper right cell indicates that the other 124 subjects who had an incidental appendectomy did NOT develop a wound infection.
Had incidental appendectomy? | Wound infection | No wound infection | Total |
Yes | 7 | 124 | 131 |
No | 1 | 78 | 79 |
Total | 8 | 202 | 210 |
8.6 Manually creating each data structure
In this section, we manually create the incidental appendectomy data in all three structural representations (i.e., Data frame of observations
, Contingency table
, Frequency table
).
8.6.1 Data frame of observations
First, we can manually create tibble with one row for each person represented in the data above. Ordinarily, this is how the data would come to us. Then, we can use various different techniques – some of which are demonstrated below – to summarize the data as a 2x2 contingency table. In this case, we are working backwards from the data summary to the raw data just to show one way that it can be done. This isn’t necessarily a good way to do it, however. Later, we will demonstrate more efficient and less error-prone ways to create raw data from summary tables.
df <- tibble(
appendectomy = factor(c(rep("Yes", 7), rep("Yes", 124), "No", rep("No", 78))),
infection = factor(c(rep("Yes", 7), rep("No", 124), "Yes", rep("No", 78)))
) %>%
print()
## # A tibble: 210 × 2
## appendectomy infection
## <fct> <fct>
## 1 Yes Yes
## 2 Yes Yes
## 3 Yes Yes
## 4 Yes Yes
## 5 Yes Yes
## 6 Yes Yes
## 7 Yes Yes
## 8 Yes No
## 9 Yes No
## 10 Yes No
## # ℹ 200 more rows
8.6.2 Frequency table
Next, we will manually create a frequency table. In our experience, it isn’t common for analysts or investigators to manually create a frequency table representation of data. However, creating frequency tables this way is pretty straightforward and easy to do.
We can do using the tibble()
(or data.frame()
) function.
freq_tbl <- tibble(
appendectomy = c("Yes", "Yes", "No", "No"),
infection = c("Yes", "No", "Yes", "No"),
count = c(7, 124, 1, 78)
) %>%
print()
## # A tibble: 4 × 3
## appendectomy infection count
## <chr> <chr> <dbl>
## 1 Yes Yes 7
## 2 Yes No 124
## 3 No Yes 1
## 4 No No 78
Or by using the tribble()
function.
freq_tbl <- tribble(
~appendectomy, ~infection, ~count,
"Yes", "Yes", 7,
"Yes", "No", 124,
"No", "Yes", 1,
"No", "No", 78
) %>%
print()
## # A tibble: 4 × 3
## appendectomy infection count
## <chr> <chr> <dbl>
## 1 Yes Yes 7
## 2 Yes No 124
## 3 No Yes 1
## 4 No No 78
Either way, the end result is exactly the same. Which method of data entry you use is largely a matter of preference.
As you can see below, we can also add marginal totals to our frequency table, which can be useful for other analyses we may want to do.
# Add margins
freq_tbl %>%
group_by(appendectomy) %>%
mutate(appendectomy_totals = sum(count)) %>%
group_by(infection) %>%
mutate(infection_totals = sum(count)) %>%
ungroup() %>%
mutate(margin_total = sum(count))
## # A tibble: 4 × 6
## appendectomy infection count appendectomy_totals infection_totals margin_total
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Yes Yes 7 131 8 210
## 2 Yes No 124 131 202 210
## 3 No Yes 1 79 8 210
## 4 No No 78 79 202 210
8.6.3 Contingency tables
The third type of data structure we will manually create is the contingency table – the focus of this chapter. There are several ways to manually create contingency tables in R. We will demonstrate many of them below.
8.6.3.1 Matrix object
First, we can manually create a contingency table as a matrix object.
## [,1] [,2]
## [1,] 7 124
## [2,] 1 78
Then, we can add row and column names to make the matrix more readable.
# Add names to make the matrix more readable
rownames(matrix_ct) <- c("Appendectomy", "No Appendectomy")
colnames(matrix_ct) <- c("Infection", "No Infection")
matrix_ct
## Infection No Infection
## Appendectomy 7 124
## No Appendectomy 1 78
Or alternatively like this.
dimnames(matrix_ct) <- list(
c("Appendectomy", "No Appendectomy"),
c("Infection", "No Infection")
)
matrix_ct
## Infection No Infection
## Appendectomy 7 124
## No Appendectomy 1 78
Or alternatively like this.
## Infection
## Appendectomy Yes No
## Yes 7 124
## No 1 78
Notice that the third method of adding row and column names produces slightly different results.
And we can add marginal totals to the matrix.
## Infection
## Appendectomy Yes No Sum
## Yes 7 124 131
## No 1 78 79
## Sum 8 202 210
So, putting it all together, there are at least two processes to create a matrix with marginal totals.
# Method 1. Calculate the marginal totals before using dimnames()
matrix_ct <- matrix(
c(a = 7, b = 124, c = 1, d = 78),
ncol = 2,
byrow = TRUE
)
matrix_ct_margins <- addmargins(matrix_ct)
dimnames(matrix_ct_margins) <- list(
Appendectomy = c("Yes", "No", "colsum"),
Infection = c("Yes", "No", "rowsum")
)
matrix_ct_margins
## Infection
## Appendectomy Yes No rowsum
## Yes 7 124 131
## No 1 78 79
## colsum 8 202 210
# Method 2. Use rownames() and colnames() instead of dimnames()
matrix_ct <- matrix(
c(a = 7, b = 124, c = 1, d = 78),
ncol = 2,
byrow = TRUE
)
dimnames(matrix_ct) <- list(
Appendectomy = c("Yes", "No"),
Infection = c("Yes", "No")
)
matrix_ct_margins <- addmargins(matrix_ct)
matrix_ct_margins
## Infection
## Appendectomy Yes No Sum
## Yes 7 124 131
## No 1 78 79
## Sum 8 202 210
At this point, I think I prefer method 2. Only because it leaves matrix_ct
more readable.
8.6.3.2 Base R data frame with rownames
Another option for manually creating a contingency table is to start with a base R data frame with row names (tibbles drop row names by default, which is usually a good thing).
df_ct <- data.frame(
Infection = c(7, 1),
`No Infection` = c(124, 78)
)
# Add row names
rownames(df_ct) <- c("Appendectomy", "No Appendectomy")
df_ct
## Infection No.Infection
## Appendectomy 7 124
## No Appendectomy 1 78
And we can add marginal totals to the data frame. However, we cannot do so with the addmargins()
function.
## Error in FUN(newX[, i], ...): invalid 'type' (list) of argument
Therefore, it takes a little bit more code to add margins a contingency table created as a base R data frame.
df_ct_margins <- df_ct
df_ct_margins <- cbind(df_ct_margins, rowsum = rowSums(df_ct_margins))
df_ct_margins <- rbind(df_ct_margins, colsum = colSums(df_ct_margins))
df_ct_margins
## Infection No.Infection rowsum
## Appendectomy 7 124 131
## No Appendectomy 1 78 79
## colsum 8 202 210
At this point, I prefer manually creating a contingency table by starting with a matrix than by starting with a base R data frame. This is primarily due to the fact that it’s easier to add margins to the matrix than it is to the data frame.
8.7 Table objects
The table()
function “uses the cross-classifying factors to build a contingency table of the counts at each combination of factor levels.” (https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/table), and this is really R’s built-in method for working with contingency tables. However, I haven’t found a way to create a table object from scratch (i.e., without making a matrix or data frame first).
In this section, we briefly demonstrate how to create a table object from a matrix and from a data frame.
8.7.1 Table object from matrix
First, we will create a table from a matrix object. The code used here to first create the matrix is identical to the code above.
matrix_ct <- matrix(
c(a = 7, b = 124, c = 1, d = 78),
ncol = 2,
byrow = TRUE
)
dimnames(matrix_ct) <- list(
Appendectomy = c("Yes", "No"),
Infection = c("Yes", "No")
)
matrix_ct
## Infection
## Appendectomy Yes No
## Yes 7 124
## No 1 78
## Infection
## Appendectomy Yes No
## Yes 7 124
## No 1 78
The results look the same, but they are different under the hood.
## $matrix_ct
## [1] "matrix" "array"
##
## $table_from_matrix
## [1] "table"
And again, we can add margins to the table object with the addmargins()
function.
## Infection
## Appendectomy Yes No Sum
## Yes 7 124 131
## No 1 78 79
## Sum 8 202 210
Tables in R – A quick practical overview shows an alternative way to create a table object using a matrix as an intermediate step – the rbind()
function.
table_from_matrix <- as.table(
rbind(
c(7, 124),
c(1, 78)
)
)
dimnames(table_from_matrix) <- list(
Appendectomy = c("Yes", "No"),
Infection = c("Yes", "No")
)
table_from_matrix
## Infection
## Appendectomy Yes No
## Yes 7 124
## No 1 78
And again, we can add margins to the table object with the addmargins()
function.
## Infection
## Appendectomy Yes No Sum
## Yes 7 124 131
## No 1 78 79
## Sum 8 202 210
8.7.2 Table object from df
Next, we will create a table from a data frame of observations. This is where the table()
function comes in really handy!
## infection
## appendectomy No Yes
## No 78 1
## Yes 124 7
⚠️Warning: Notice that the “No” category comes before the “Yes” category by default when passing a data frame of observations to the table()
function. This typically not the order we would put them in for analysis. To prevent this from happening, change the ordering of the factor levels in the data frame of observations (demonstrated below).
# Make "Yes" the first factor level. Then, create the table.
df_yn <- df %>%
mutate(
appendectomy = factor(appendectomy, levels = c("Yes", "No")),
infection = factor(infection, levels = c("Yes", "No"))
)
## infection
## appendectomy Yes No
## Yes 7 124
## No 1 78
And again, we can add margins to the table object with the addmargins()
function.
## infection
## appendectomy Yes No Sum
## Yes 7 124 131
## No 1 78 79
## Sum 8 202 210
8.7.3 Why use table objects?
I used to think the table class allowed us to more easily manipulate and perform calculations on contingency tables than the matrix class does, but now I’m not sure it does. For example, addmargins()
and prop.table()
both work on the matrix object. Calculations (like incidence proportions below) work on both too.
## Infection
## Appendectomy Yes No
## Yes 0.033333333 0.5904762
## No 0.004761905 0.3714286
## Infection
## Appendectomy Yes No Sum
## Yes 7 124 131
## No 1 78 79
## Sum 8 202 210
# Add incidence proportion to a matrix contingency table
ip <- matrix_ct_margins[, "Yes"] / matrix_ct_margins[, "Sum"]
matrix_ct_margins_ip <- cbind(matrix_ct_margins, ip)
matrix_ct_margins_ip
## Yes No Sum ip
## Yes 7 124 131 0.05343511
## No 1 78 79 0.01265823
## Sum 8 202 210 0.03809524
However, using the table()
function is still the easiest way to convert a data frame of observations to a contingency table.
Further, when we convert a contingency table into a data frame of observations below, converting to a table object is a necessary intermediate step.
8.8 Convert a contingency table into a frequency table
8.8.1 Matrix object
When starting with a matrix contingency table, the easiest way to convert it to a frequency table is to first convert it to a [table object][Table-objects] using as.table()
. Then, we pass that result to as.data.frame()
to create a frequency table. This solution comes from Stack Overflow.
## Appendectomy Infection Freq
## 1 Yes Yes 7
## 2 No Yes 1
## 3 Yes No 124
## 4 No No 78
8.8.2 Table object from df
When starting with a [table object made from a df][Table-object-from-df], we only need to pass it to the as.data.frame()
function. Notice that the result is slightly different than above – the order of
## appendectomy infection Freq
## 1 Yes Yes 7
## 2 No Yes 1
## 3 Yes No 124
## 4 No No 78
8.8.3 Data frame contingency table
When starting from a [base R data frame contingency table][base-r-data-frame-with-rownames] (again, we don’t recommend doing this), there are a couple of options. These two come from Stack Overflow.
The first method is a bit convoluted, but uses only base R functions.
## Var1 Var2 Freq
## 1 Appendectomy Infection 7
## 2 No Appendectomy Infection 1
## 3 Appendectomy No.Infection 124
## 4 No Appendectomy No.Infection 78
The second method is a Tidyverse
solution.
## # A tibble: 4 × 3
## rowname name value
## <chr> <chr> <dbl>
## 1 Appendectomy Infection 7
## 2 Appendectomy No.Infection 124
## 3 No Appendectomy Infection 1
## 4 No Appendectomy No.Infection 78
8.9 Convert a contingency table into a data frame of observations
This is a really common conversion to want to make. We often come across data that is already in a 2x2 table and decide that we want to convert it into a data frame of observations to experiment with it or for some statistical procedures (e.g., regression).
All of the methods I’ve found so far require creating a frequency table as an intermediate step.
8.9.1 Matrix object
# Convert from data frame of counts to data frame of cases.
# `countcol` is the name of the column containing the counts
# From: https://cran.r-project.org/web/packages/DescTools/vignettes/TablesInR.pdf
countsToCases <- function(x, countcol = "Freq") {
# Get the row indices to pull from x
idx <- rep.int(seq_len(nrow(x)), x[[countcol]])
# Drop count column
x[[countcol]] <- NULL
# Get the rows from x
x[idx, ]
}
## # A tibble: 210 × 2
## Appendectomy Infection
## <fct> <fct>
## 1 Yes Yes
## 2 Yes Yes
## 3 Yes Yes
## 4 Yes Yes
## 5 Yes Yes
## 6 Yes Yes
## 7 Yes Yes
## 8 No Yes
## 9 Yes No
## 10 Yes No
## # ℹ 200 more rows
(Delete) Works, but I don’t like it. I’m trying to figure out why. Maybe because it uses a table object and a frequency table object as an intermediate step? Why do I care about that, though? Also, perhaps I don’t like the final result? Do I want Yes/No values or variable name values in the final result (compare this result to the result below for “Convert a frequency table into a data frame of observations”).
Let’s pull this function apart and really understand how it works.
## Appendectomy Infection Freq
## 1 Yes Yes 7
## 2 No Yes 1
## 3 Yes No 124
## 4 No No 78
At this point, we have a frequency table of class data frame.
# Create a vector if the number of times each combination (each row) from the
# frequency table is to appear in the new data frame of observations
# Count the number of rows in the df
rows <- nrow(test_mat_ct_to_df)
# Create a sequence of integers from 1 to rows
one_to_n_rows <- seq_len(rows)
# Pull vector of frequencies
freqs <- test_mat_ct_to_df[["Freq"]]
# Rep.int is just a faster, simplified version of rep
combo_reps <- rep.int(one_to_n_rows, freqs)
combo_reps
## [1] 1 1 1 1 1 1 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [38] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [75] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [112] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [149] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [186] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
# Drop the count (Freq) column from the frequency table
test_mat_ct_to_df[["Freq"]] <- NULL
test_mat_ct_to_df
## Appendectomy Infection
## 1 Yes Yes
## 2 No Yes
## 3 Yes No
## 4 No No
# Repeat each row of test_mat_ct_to_df according to combo_reps
df_obs_from_matrix_ct <- test_mat_ct_to_df[combo_reps, ]
# Drop weird numbered rownames
rownames(df_obs_from_matrix_ct) <- NULL
df_obs_from_matrix_ct
## Appendectomy Infection
## 1 Yes Yes
## 2 Yes Yes
## 3 Yes Yes
## 4 Yes Yes
## 5 Yes Yes
## 6 Yes Yes
## 7 Yes Yes
## 8 No Yes
## 9 Yes No
## 10 Yes No
## 11 Yes No
## 12 Yes No
## 13 Yes No
## 14 Yes No
## 15 Yes No
## 16 Yes No
## 17 Yes No
## 18 Yes No
## 19 Yes No
## 20 Yes No
## 21 Yes No
## 22 Yes No
## 23 Yes No
## 24 Yes No
## 25 Yes No
## 26 Yes No
## 27 Yes No
## 28 Yes No
## 29 Yes No
## 30 Yes No
## 31 Yes No
## 32 Yes No
## 33 Yes No
## 34 Yes No
## 35 Yes No
## 36 Yes No
## 37 Yes No
## 38 Yes No
## 39 Yes No
## 40 Yes No
## 41 Yes No
## 42 Yes No
## 43 Yes No
## 44 Yes No
## 45 Yes No
## 46 Yes No
## 47 Yes No
## 48 Yes No
## 49 Yes No
## 50 Yes No
## 51 Yes No
## 52 Yes No
## 53 Yes No
## 54 Yes No
## 55 Yes No
## 56 Yes No
## 57 Yes No
## 58 Yes No
## 59 Yes No
## 60 Yes No
## 61 Yes No
## 62 Yes No
## 63 Yes No
## 64 Yes No
## 65 Yes No
## 66 Yes No
## 67 Yes No
## 68 Yes No
## 69 Yes No
## 70 Yes No
## 71 Yes No
## 72 Yes No
## 73 Yes No
## 74 Yes No
## 75 Yes No
## 76 Yes No
## 77 Yes No
## 78 Yes No
## 79 Yes No
## 80 Yes No
## 81 Yes No
## 82 Yes No
## 83 Yes No
## 84 Yes No
## 85 Yes No
## 86 Yes No
## 87 Yes No
## 88 Yes No
## 89 Yes No
## 90 Yes No
## 91 Yes No
## 92 Yes No
## 93 Yes No
## 94 Yes No
## 95 Yes No
## 96 Yes No
## 97 Yes No
## 98 Yes No
## 99 Yes No
## 100 Yes No
## 101 Yes No
## 102 Yes No
## 103 Yes No
## 104 Yes No
## 105 Yes No
## 106 Yes No
## 107 Yes No
## 108 Yes No
## 109 Yes No
## 110 Yes No
## 111 Yes No
## 112 Yes No
## 113 Yes No
## 114 Yes No
## 115 Yes No
## 116 Yes No
## 117 Yes No
## 118 Yes No
## 119 Yes No
## 120 Yes No
## 121 Yes No
## 122 Yes No
## 123 Yes No
## 124 Yes No
## 125 Yes No
## 126 Yes No
## 127 Yes No
## 128 Yes No
## 129 Yes No
## 130 Yes No
## 131 Yes No
## 132 Yes No
## 133 No No
## 134 No No
## 135 No No
## 136 No No
## 137 No No
## 138 No No
## 139 No No
## 140 No No
## 141 No No
## 142 No No
## 143 No No
## 144 No No
## 145 No No
## 146 No No
## 147 No No
## 148 No No
## 149 No No
## 150 No No
## 151 No No
## 152 No No
## 153 No No
## 154 No No
## 155 No No
## 156 No No
## 157 No No
## 158 No No
## 159 No No
## 160 No No
## 161 No No
## 162 No No
## 163 No No
## 164 No No
## 165 No No
## 166 No No
## 167 No No
## 168 No No
## 169 No No
## 170 No No
## 171 No No
## 172 No No
## 173 No No
## 174 No No
## 175 No No
## 176 No No
## 177 No No
## 178 No No
## 179 No No
## 180 No No
## 181 No No
## 182 No No
## 183 No No
## 184 No No
## 185 No No
## 186 No No
## 187 No No
## 188 No No
## 189 No No
## 190 No No
## 191 No No
## 192 No No
## 193 No No
## 194 No No
## 195 No No
## 196 No No
## 197 No No
## 198 No No
## 199 No No
## 200 No No
## 201 No No
## 202 No No
## 203 No No
## 204 No No
## 205 No No
## 206 No No
## 207 No No
## 208 No No
## 209 No No
## 210 No No
8.9.2 Table object
Same as the matrix example above, there is just one less step because you’ve already converted the matrix to a table object.
## # A tibble: 210 × 2
## Appendectomy Infection
## <fct> <fct>
## 1 Yes Yes
## 2 Yes Yes
## 3 Yes Yes
## 4 Yes Yes
## 5 Yes Yes
## 6 Yes Yes
## 7 Yes Yes
## 8 No Yes
## 9 Yes No
## 10 Yes No
## # ℹ 200 more rows
And for completeness, here is the table object we previously created from a data frame of observations.
## # A tibble: 210 × 2
## appendectomy infection
## <fct> <fct>
## 1 Yes Yes
## 2 Yes Yes
## 3 Yes Yes
## 4 Yes Yes
## 5 Yes Yes
## 6 Yes Yes
## 7 Yes Yes
## 8 No Yes
## 9 Yes No
## 10 Yes No
## # ℹ 200 more rows
8.10 Convert a frequency table into a contingency table.
The Cookbook for R also shows how you can convert a frequency table into a contingency table using the base R xtabs()
function.
As a reminder, freq_tbl
was created above in the section on manually creating a frequency table.
## infection
## appendectomy No Yes
## No 78 1
## Yes 124 7
⚠️Warning: Notice that the “No” category comes before the “Yes” category by default when passing a frequency table to the xtabs()
function. This typically not the order we would put them in for analysis. To prevent this from happening, change the ordering of the factor levels in the frequency table (demonstrated below).
freq_tbl_yn <- freq_tbl %>%
mutate(
appendectomy = factor(appendectomy, levels = c("Yes", "No")),
infection = factor(infection, levels = c("Yes", "No"))
)
## infection
## appendectomy Yes No
## Yes 7 124
## No 1 78
8.11 Convert a frequency table into a data frame of observations.
We already saw how to do this above as an intermediate step between a contingency table and a data frame of observations.
## # A tibble: 210 × 2
## appendectomy infection
## <chr> <chr>
## 1 Yes Yes
## 2 Yes Yes
## 3 Yes Yes
## 4 Yes Yes
## 5 Yes Yes
## 6 Yes Yes
## 7 Yes Yes
## 8 Yes No
## 9 Yes No
## 10 Yes No
## # ℹ 200 more rows
8.12 Bottom line
This section briefly distill down everything from above into the most common scenario we are actually working on – Manually create a contingency table and then convert it to a data frame of observations.
When we open a open a book or journal article and see a 2x2 table, or when we just decide to create a 2x2 table from scratch to do some experimenting, it looks like it’s best to start with a matrix and convert it to a table object.
matrix_ct <- matrix(
c(a = 7, b = 124, c = 1, d = 78),
ncol = 2,
byrow = TRUE
)
dimnames(matrix_ct) <- list(
Appendectomy = c("Yes", "No"),
Infection = c("Yes", "No")
)
matrix_ct_margins <- addmargins(matrix_ct)
matrix_ct_margins
## Infection
## Appendectomy Yes No Sum
## Yes 7 124 131
## No 1 78 79
## Sum 8 202 210
You can go ahead and convert to a table at this point, but you don’t have to. Look back at the section on why to consider using a table object, but for now, I can’t come up with a good reason to do this off the top of my head.
To then convert that contingency table to a data frame of observations, we use can simply use the following code.
## # A tibble: 210 × 2
## Appendectomy Infection
## <fct> <fct>
## 1 Yes Yes
## 2 Yes Yes
## 3 Yes Yes
## 4 Yes Yes
## 5 Yes Yes
## 6 Yes Yes
## 7 Yes Yes
## 8 No Yes
## 9 Yes No
## 10 Yes No
## # ℹ 200 more rows
8.13 Helper functions
In some of the cases above, the solutions may be slightly unsatisfying, but they seem to work reasonably well. In this section, we create some helper functions to make working with contingency tables even easier – especially in the context of working with the freqtables
package. At some point, these functions may work their way into the freqtables
package.
8.13.1 Create a contingency table
After playing around with this a little bit, I couldn’t come up with a function that was really any better (i.e., less lines of code or easier to understand) than just creating the matrix using the standard base R functions. This is especially true when contingency tables However, I’m leaving the code here just in case.
# Code for manually creating a matrix contingency table.
matrix_ct <- matrix(
c(a = 7, b = 124, c = 1, d = 78),
ncol = 2,
byrow = TRUE
)
dimnames(matrix_ct) <- list(
Appendectomy = c("Yes", "No"),
Infection = c("Yes", "No")
)
matrix_ct_margins <- addmargins(matrix_ct)
matrix_ct_margins
## Infection
## Appendectomy Yes No Sum
## Yes 7 124 131
## No 1 78 79
## Sum 8 202 210
Here is the function code
contingency_table <- function(.values, ncol, dim_names, margins = FALSE) {
# Create the matrix
matrix_ct <- matrix(.values, ncol = ncol, byrow = TRUE)
# Optionally add dimnames
if (!missing(dim_names)) {
dimnames(matrix_ct) <- dim_names
}
# Optionally add margins
if (margins) {
matrix_ct <- addmargins(matrix_ct)
}
# Return contingency table
matrix_ct
}
Testing with a 2x2 table
contingency_table(
.values = c(
7, 224,
1, 78
),
ncol = 2,
dim_names = list(
Appendectomy = c("Yes", "No"),
Infection = c("Yes", "No")
),
margins = TRUE
)
## Infection
## Appendectomy Yes No Sum
## Yes 7 224 231
## No 1 78 79
## Sum 8 302 310
What if we need to make a contingency table for variables with more than two levels? For example:
matrix_3_ct <- matrix(
c(
1, 2, 1,
0, 1, 1,
0, 1, 2
),
ncol = 3,
byrow = TRUE
)
dimnames(matrix_3_ct) <- list(
age = c("Young", "Middle", "Old"),
severity = c("Mild", "Moderate", "Severe")
)
addmargins(matrix_3_ct)
## severity
## age Mild Moderate Severe Sum
## Young 1 2 1 4
## Middle 0 1 1 2
## Old 0 1 2 3
## Sum 1 4 4 9
Testing with a 3x3 table
contingency_table(
.values = c(
1, 2, 1,
0, 1, 1,
0, 1, 2
),
ncol = 3,
dim_names = list(
age = c("Young", "Middle", "Old"),
severity = c("Mild", "Moderate", "Severe")
),
margins = TRUE
)
## severity
## age Mild Moderate Severe Sum
## Young 1 2 1 4
## Middle 0 1 1 2
## Old 0 1 2 3
## Sum 1 4 4 9
This works, but I’m not sure what advantage it has over the manual code. It’s roughly the same number of lines and it’s necessarily any easier to read or understand.
What would the ideal solution even look like? Maybe something like this?
# Doesn't run
contingency_table(
| Wound infection |
|-------------------------|-----------------|-----|
| Incidental appendectomy | Yes | No |
| Yes | 7 | 124 |
| No | 1 | 78 |
)
Or perhaps a Shiny app that would allow to enter data into a spreadsheet-like interface?
8.13.3 Counts to cases
Rewrite this function
# Convert from data frame of counts to data frame of cases.
# `countcol` is the name of the column containing the counts
# From: https://cran.r-project.org/web/packages/DescTools/vignettes/TablesInR.pdf
countsToCases <- function(x, countcol = "Freq") {
# Get the row indices to pull from x
idx <- rep.int(seq_len(nrow(x)), x[[countcol]])
# Drop count column
x[[countcol]] <- NULL
# Get the rows from x
x[idx, ]
}
[Insert function code here]