8 Contingency Tables

8.1 ⭐️Overview

In Epi III (and epidemiology in general) we use a lot of contingency tables - especially 2x2 contingency tables. In this note, we play around with several different ways of creating, and working with, contingency tables. This can include converting them to a data frame format for some analyses. We explore some of the pros and cons of each of each method.

Need to add I think ftable() could be usefule. I need to work it in somehow. https://stat.ethz.ch/R-manual/R-devel/library/stats/html/ftable.html

8.2 🌎Useful websites

8.3 📦Load packages

library(dplyr, warn.conflicts = FALSE)

8.4 Terminology

In the text below, we use the following terminology to distinguish three different structural representations of our data:

Data frame of observations: A data frame where each row represents one observation (typically an individual person).

# Basic example
df <- tibble(
  medication = c(rep("No", 10), rep("Yes", 10)),
  fall = c(rep("No", 8), rep("Yes", 2), rep("No", 6), rep("Yes", 4))
) %>% 
print()

## # A tibble: 20 × 2
##    medication fall 
##    <chr>      <chr>
##  1 No         No   
##  2 No         No   
##  3 No         No   
##  4 No         No   
##  5 No         No   
##  6 No         No   
##  7 No         No   
##  8 No         No   
##  9 No         Yes  
## 10 No         Yes  
## 11 Yes        No   
## 12 Yes        No   
## 13 Yes        No   
## 14 Yes        No   
## 15 Yes        No   
## 16 Yes        No   
## 17 Yes        Yes  
## 18 Yes        Yes  
## 19 Yes        Yes  
## 20 Yes        Yes

Contingency table: A grid-like table with the categories of one variable (typically the exposure of interest) making up the rows and the categories of a second variable (typically the outcome of interest) making up the columns of the table. The table contains the number of observations with the particular combination of row and column values that intersect at each cell.

# Basic example
table(df)

##           fall
## medication No Yes
##        No   8   2
##        Yes  6   4

Frequency table: A data frame of counts (and optionally, other relevant statistics), where each row represents a particular combination of values from two or more categorical variables.

# Basic example
df %>% 
  count(medication, fall)

## # A tibble: 4 × 3
##   medication fall      n
##   <chr>      <chr> <int>
## 1 No         No        8
## 2 No         Yes       2
## 3 Yes        No        6
## 4 Yes        Yes       4

As the examples above illustrate, it’s pretty easy and straightforward to go from a data frame of observations to either a contingency table or a frequency table. What is currently less straightforward (or at least less well documented) is to:

Manually create a contingency table.
Convert a contingency table into a frequency table.
Convert a contingency table into a data frame of observations.
Convert a frequency table into a contingency table.
Convert a frequency table into a data frame of observations.

We go through each of these operations below.

8.5 Scenario

This scenario is borrowed from the Boston University website.

Data Summary

Consider the following example regarding the management of Hodgkin lymphoma, a cancer of the lymphatic system.

Years ago when a patient was diagnosed with Hodgkin Disease, they would frequently undergo a surgical procedure called a “staging laparotomy.” The purpose of the staging laparotomy was to determine the extent to which the cancer had spread, because this was important information for determining the patient’s prognosis and optimizing treatment. At times, the surgeons performing this procedure would also remove the patient’s appendix, not because it was inflamed; it was done “incidentally” in order to ensure that the patient never had to worry about getting appendicitis. However, performing an appendectomy requires transecting it, and this has the potential to contaminate the abdomen and the wound edges with bacteria normally contained inside the appendix. Some surgeons felt that doing this “incidental appendectomy” did the patient a favor by ensuring that they would never get appendicitis, but others felt that it meant unnecessarily increasing the patient’s risk of getting a post-operative wound infection by spreading around the bacteria that was once inside the appendix.

To address this, the surgeons at a large hospital performed a retrospective cohort study. They began by going through the hospital’s medical records to identify all subjects who had had a “staging laparotomy performed for Hodgkin.” They then reviewed the medical record and looked at the operative report to determine whether the patient had an incidental appendectomy or not. They then reviewed the progress notes, the laboratory reports, the nurses notes, and the discharge summary to determine whether the patient had developed a wound infection during the week after surgery.

The investigators reviewed the records of 210 patients who had undergone the staging procedure and found that 131 had also had an incidental appendectomy, while the other 79 had not. The data from that study are summarized in the table below. The numbers in the second and third columns indicate the number of subjects who did or did not develop a post-operative wound infection among those who had the incidental appendectomy (in the “Yes” row) and those who did not have the incidental appendectomy (in the “No” row). For example, the upper left cell indicates that seven of the subjects who had an incidental appendectomy (the exposure of interest) subsequently developed a wound infection. The upper right cell indicates that the other 124 subjects who had an incidental appendectomy did NOT develop a wound infection.

Had incidental appendectomy?	Wound infection	No wound infection	Total
Yes	7	124	131
No	1	78	79
Total	8	202	210

8.6 Manually creating each data structure

In this section, we manually create the incidental appendectomy data in all three structural representations (i.e., Data frame of observations, Contingency table, Frequency table).

8.6.1 Data frame of observations

First, we can manually create tibble with one row for each person represented in the data above. Ordinarily, this is how the data would come to us. Then, we can use various different techniques – some of which are demonstrated below – to summarize the data as a 2x2 contingency table. In this case, we are working backwards from the data summary to the raw data just to show one way that it can be done. This isn’t necessarily a good way to do it, however. Later, we will demonstrate more efficient and less error-prone ways to create raw data from summary tables.

df <- tibble(
  appendectomy = factor(c(rep("Yes", 7), rep("Yes", 124), "No", rep("No", 78))),
  infection    = factor(c(rep("Yes", 7), rep("No", 124), "Yes", rep("No", 78)))
) %>% 
  print()

## # A tibble: 210 × 2
##    appendectomy infection
##    <fct>        <fct>    
##  1 Yes          Yes      
##  2 Yes          Yes      
##  3 Yes          Yes      
##  4 Yes          Yes      
##  5 Yes          Yes      
##  6 Yes          Yes      
##  7 Yes          Yes      
##  8 Yes          No       
##  9 Yes          No       
## 10 Yes          No       
## # ℹ 200 more rows

8.6.2 Frequency table

Next, we will manually create a frequency table. In our experience, it isn’t common for analysts or investigators to manually create a frequency table representation of data. However, creating frequency tables this way is pretty straightforward and easy to do.

We can do using the tibble() (or data.frame()) function.

freq_tbl <- tibble(
  appendectomy = c("Yes", "Yes", "No", "No"),
  infection    = c("Yes", "No", "Yes", "No"),
  count        = c(7, 124, 1, 78)
) %>% 
  print()

## # A tibble: 4 × 3
##   appendectomy infection count
##   <chr>        <chr>     <dbl>
## 1 Yes          Yes           7
## 2 Yes          No          124
## 3 No           Yes           1
## 4 No           No           78

Or by using the tribble() function.

freq_tbl <- tribble(
  ~appendectomy, ~infection, ~count,
  "Yes", "Yes", 7,
  "Yes", "No",  124,
  "No",  "Yes", 1,
  "No",  "No",  78
) %>% 
  print()

## # A tibble: 4 × 3
##   appendectomy infection count
##   <chr>        <chr>     <dbl>
## 1 Yes          Yes           7
## 2 Yes          No          124
## 3 No           Yes           1
## 4 No           No           78

Either way, the end result is exactly the same. Which method of data entry you use is largely a matter of preference.

As you can see below, we can also add marginal totals to our frequency table, which can be useful for other analyses we may want to do.

# Add margins
freq_tbl %>% 
  group_by(appendectomy) %>% 
  mutate(appendectomy_totals = sum(count)) %>% 
  group_by(infection) %>% 
  mutate(infection_totals = sum(count)) %>% 
  ungroup() %>% 
  mutate(margin_total = sum(count))

## # A tibble: 4 × 6
##   appendectomy infection count appendectomy_totals infection_totals margin_total
##   <chr>        <chr>     <dbl>               <dbl>            <dbl>        <dbl>
## 1 Yes          Yes           7                 131                8          210
## 2 Yes          No          124                 131              202          210
## 3 No           Yes           1                  79                8          210
## 4 No           No           78                  79              202          210

8.6.3 Contingency tables

The third type of data structure we will manually create is the contingency table – the focus of this chapter. There are several ways to manually create contingency tables in R. We will demonstrate many of them below.

8.6.3.1 Matrix object

First, we can manually create a contingency table as a matrix object.

matrix_ct <- matrix(
  c(a = 7, b = 124, c = 1, d = 78),
  ncol = 2,
  byrow = TRUE
) %>% 
  print()

##      [,1] [,2]
## [1,]    7  124
## [2,]    1   78

Then, we can add row and column names to make the matrix more readable.

# Add names to make the matrix more readable
rownames(matrix_ct) <- c("Appendectomy", "No Appendectomy")
colnames(matrix_ct) <- c("Infection", "No Infection")

matrix_ct

##                 Infection No Infection
## Appendectomy            7          124
## No Appendectomy         1           78

Or alternatively like this.

dimnames(matrix_ct) <- list(
  c("Appendectomy", "No Appendectomy"),
  c("Infection", "No Infection")
)

matrix_ct

##                 Infection No Infection
## Appendectomy            7          124
## No Appendectomy         1           78

Or alternatively like this.

dimnames(matrix_ct) <- list(
  Appendectomy = c("Yes", "No"),
  Infection = c("Yes", "No")
)

matrix_ct

##             Infection
## Appendectomy Yes  No
##          Yes   7 124
##          No    1  78

Notice that the third method of adding row and column names produces slightly different results.

And we can add marginal totals to the matrix.

addmargins(matrix_ct)

##             Infection
## Appendectomy Yes  No Sum
##          Yes   7 124 131
##          No    1  78  79
##          Sum   8 202 210

So, putting it all together, there are at least two processes to create a matrix with marginal totals.

# Method 1. Calculate the marginal totals before using dimnames()
matrix_ct <- matrix(
  c(a = 7, b = 124, c = 1, d = 78),
  ncol = 2,
  byrow = TRUE
)

matrix_ct_margins <- addmargins(matrix_ct)

dimnames(matrix_ct_margins) <- list(
  Appendectomy = c("Yes", "No", "colsum"),
  Infection = c("Yes", "No", "rowsum")
)

matrix_ct_margins

##             Infection
## Appendectomy Yes  No rowsum
##       Yes      7 124    131
##       No       1  78     79
##       colsum   8 202    210

# Method 2. Use rownames() and colnames() instead of dimnames()
matrix_ct <- matrix(
  c(a = 7, b = 124, c = 1, d = 78),
  ncol = 2,
  byrow = TRUE
)

dimnames(matrix_ct) <- list(
  Appendectomy = c("Yes", "No"),
  Infection = c("Yes", "No")
)

matrix_ct_margins <- addmargins(matrix_ct)
matrix_ct_margins

##             Infection
## Appendectomy Yes  No Sum
##          Yes   7 124 131
##          No    1  78  79
##          Sum   8 202 210

At this point, I think I prefer method 2. Only because it leaves matrix_ct more readable.

8.6.3.2 Base R data frame with rownames

Another option for manually creating a contingency table is to start with a base R data frame with row names (tibbles drop row names by default, which is usually a good thing).

df_ct <- data.frame(
  Infection = c(7, 1),
  `No Infection` = c(124, 78)
)

# Add row names
rownames(df_ct) <- c("Appendectomy", "No Appendectomy")

df_ct

##                 Infection No.Infection
## Appendectomy            7          124
## No Appendectomy         1           78

And we can add marginal totals to the data frame. However, we cannot do so with the addmargins() function.

addmargins(df_ct)

## Error in FUN(newX[, i], ...): invalid 'type' (list) of argument

Therefore, it takes a little bit more code to add margins a contingency table created as a base R data frame.

df_ct_margins <- df_ct
df_ct_margins <- cbind(df_ct_margins, rowsum = rowSums(df_ct_margins))
df_ct_margins <- rbind(df_ct_margins, colsum = colSums(df_ct_margins))
df_ct_margins

##                 Infection No.Infection rowsum
## Appendectomy            7          124    131
## No Appendectomy         1           78     79
## colsum                  8          202    210

At this point, I prefer manually creating a contingency table by starting with a matrix than by starting with a base R data frame. This is primarily due to the fact that it’s easier to add margins to the matrix than it is to the data frame.

8.7 Table objects

The table() function “uses the cross-classifying factors to build a contingency table of the counts at each combination of factor levels.” (https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/table), and this is really R’s built-in method for working with contingency tables. However, I haven’t found a way to create a table object from scratch (i.e., without making a matrix or data frame first).

In this section, we briefly demonstrate how to create a table object from a matrix and from a data frame.

8.7.1 Table object from matrix

First, we will create a table from a matrix object. The code used here to first create the matrix is identical to the code above.

matrix_ct <- matrix(
  c(a = 7, b = 124, c = 1, d = 78),
  ncol = 2,
  byrow = TRUE
)

dimnames(matrix_ct) <- list(
  Appendectomy = c("Yes", "No"),
  Infection = c("Yes", "No")
)

matrix_ct

##             Infection
## Appendectomy Yes  No
##          Yes   7 124
##          No    1  78

table_from_matrix <- as.table(matrix_ct)
table_from_matrix

##             Infection
## Appendectomy Yes  No
##          Yes   7 124
##          No    1  78

The results look the same, but they are different under the hood.

list(
  matrix_ct = class(matrix_ct),
  table_from_matrix = class(table_from_matrix)
)

## $matrix_ct
## [1] "matrix" "array" 
## 
## $table_from_matrix
## [1] "table"

And again, we can add margins to the table object with the addmargins() function.

addmargins(table_from_matrix)

##             Infection
## Appendectomy Yes  No Sum
##          Yes   7 124 131
##          No    1  78  79
##          Sum   8 202 210

Tables in R – A quick practical overview shows an alternative way to create a table object using a matrix as an intermediate step – the rbind() function.

table_from_matrix <- as.table(
  rbind(
    c(7, 124), 
    c(1, 78)
  )
)

dimnames(table_from_matrix) <- list(
  Appendectomy = c("Yes", "No"),
  Infection = c("Yes", "No")
)

table_from_matrix

##             Infection
## Appendectomy Yes  No
##          Yes   7 124
##          No    1  78

And again, we can add margins to the table object with the addmargins() function.

addmargins(table_from_matrix)

##             Infection
## Appendectomy Yes  No Sum
##          Yes   7 124 131
##          No    1  78  79
##          Sum   8 202 210

8.7.2 Table object from df

Next, we will create a table from a data frame of observations. This is where the table() function comes in really handy!

table_from_df <- table(df)
table_from_df

##             infection
## appendectomy  No Yes
##          No   78   1
##          Yes 124   7

⚠️Warning: Notice that the “No” category comes before the “Yes” category by default when passing a data frame of observations to the table() function. This typically not the order we would put them in for analysis. To prevent this from happening, change the ordering of the factor levels in the data frame of observations (demonstrated below).

# Make "Yes" the first factor level. Then, create the table.
df_yn <- df %>% 
  mutate(
    appendectomy = factor(appendectomy, levels = c("Yes", "No")),
    infection = factor(infection, levels = c("Yes", "No"))
  )

table_from_df <- table(df_yn)
table_from_df

##             infection
## appendectomy Yes  No
##          Yes   7 124
##          No    1  78

And again, we can add margins to the table object with the addmargins() function.

# Add margins
table_from_df_margins <- addmargins(table_from_df)
table_from_df_margins

##             infection
## appendectomy Yes  No Sum
##          Yes   7 124 131
##          No    1  78  79
##          Sum   8 202 210

8.7.3 Why use table objects?

I used to think the table class allowed us to more easily manipulate and perform calculations on contingency tables than the matrix class does, but now I’m not sure it does. For example, addmargins() and prop.table() both work on the matrix object. Calculations (like incidence proportions below) work on both too.

prop.table(matrix_ct)

##             Infection
## Appendectomy         Yes        No
##          Yes 0.033333333 0.5904762
##          No  0.004761905 0.3714286

addmargins(matrix_ct)

##             Infection
## Appendectomy Yes  No Sum
##          Yes   7 124 131
##          No    1  78  79
##          Sum   8 202 210

# Add incidence proportion to a matrix contingency table
ip <- matrix_ct_margins[, "Yes"] / matrix_ct_margins[, "Sum"]
matrix_ct_margins_ip <- cbind(matrix_ct_margins, ip)
matrix_ct_margins_ip

##     Yes  No Sum         ip
## Yes   7 124 131 0.05343511
## No    1  78  79 0.01265823
## Sum   8 202 210 0.03809524

However, using the table() function is still the easiest way to convert a data frame of observations to a contingency table.

Further, when we convert a contingency table into a data frame of observations below, converting to a table object is a necessary intermediate step.

8.8 Convert a contingency table into a frequency table

8.8.1 Matrix object

When starting with a matrix contingency table, the easiest way to convert it to a frequency table is to first convert it to a [table object][Table-objects] using as.table(). Then, we pass that result to as.data.frame() to create a frequency table. This solution comes from Stack Overflow.

matrix_ct %>% 
  as.table() %>% 
  as.data.frame()

##   Appendectomy Infection Freq
## 1          Yes       Yes    7
## 2           No       Yes    1
## 3          Yes        No  124
## 4           No        No   78

8.8.2 Table object from df

When starting with a [table object made from a df][Table-object-from-df], we only need to pass it to the as.data.frame() function. Notice that the result is slightly different than above – the order of

df_yn %>% 
  table() %>% 
  as.data.frame()

##   appendectomy infection Freq
## 1          Yes       Yes    7
## 2           No       Yes    1
## 3          Yes        No  124
## 4           No        No   78

8.8.3 Data frame contingency table

When starting from a [base R data frame contingency table][base-r-data-frame-with-rownames] (again, we don’t recommend doing this), there are a couple of options. These two come from Stack Overflow.

The first method is a bit convoluted, but uses only base R functions.

df_ct %>% 
  as.matrix() %>% 
  as.table() %>% 
  as.data.frame()

##              Var1         Var2 Freq
## 1    Appendectomy    Infection    7
## 2 No Appendectomy    Infection    1
## 3    Appendectomy No.Infection  124
## 4 No Appendectomy No.Infection   78

The second method is a Tidyverse solution.

df_ct %>% 
  tibble::rownames_to_column() %>% 
  tidyr::pivot_longer(c(Infection, No.Infection))

## # A tibble: 4 × 3
##   rowname         name         value
##   <chr>           <chr>        <dbl>
## 1 Appendectomy    Infection        7
## 2 Appendectomy    No.Infection   124
## 3 No Appendectomy Infection        1
## 4 No Appendectomy No.Infection    78

8.9 Convert a contingency table into a data frame of observations

This is a really common conversion to want to make. We often come across data that is already in a 2x2 table and decide that we want to convert it into a data frame of observations to experiment with it or for some statistical procedures (e.g., regression).

All of the methods I’ve found so far require creating a frequency table as an intermediate step.

8.9.1 Matrix object

# Convert from data frame of counts to data frame of cases.
# `countcol` is the name of the column containing the counts
# From: https://cran.r-project.org/web/packages/DescTools/vignettes/TablesInR.pdf
countsToCases <- function(x, countcol = "Freq") {
    # Get the row indices to pull from x
    idx <- rep.int(seq_len(nrow(x)), x[[countcol]])

    # Drop count column
    x[[countcol]] <- NULL

    # Get the rows from x
    x[idx, ]
}

matrix_ct %>% 
  as.table() %>% 
  as.data.frame() %>% 
  countsToCases() %>% 
  tibble()

## # A tibble: 210 × 2
##    Appendectomy Infection
##    <fct>        <fct>    
##  1 Yes          Yes      
##  2 Yes          Yes      
##  3 Yes          Yes      
##  4 Yes          Yes      
##  5 Yes          Yes      
##  6 Yes          Yes      
##  7 Yes          Yes      
##  8 No           Yes      
##  9 Yes          No       
## 10 Yes          No       
## # ℹ 200 more rows

(Delete) Works, but I don’t like it. I’m trying to figure out why. Maybe because it uses a table object and a frequency table object as an intermediate step? Why do I care about that, though? Also, perhaps I don’t like the final result? Do I want Yes/No values or variable name values in the final result (compare this result to the result below for “Convert a frequency table into a data frame of observations”).

Let’s pull this function apart and really understand how it works.

test_mat_ct_to_df <- matrix_ct %>% 
  as.table() %>% 
  as.data.frame()

test_mat_ct_to_df

##   Appendectomy Infection Freq
## 1          Yes       Yes    7
## 2           No       Yes    1
## 3          Yes        No  124
## 4           No        No   78

At this point, we have a frequency table of class data frame.

# Create a vector if the number of times each combination (each row) from the 
# frequency table is to appear in the new data frame of observations

# Count the number of rows in the df
rows <- nrow(test_mat_ct_to_df)
# Create a sequence of integers from 1 to rows
one_to_n_rows <- seq_len(rows)
# Pull vector of frequencies
freqs <- test_mat_ct_to_df[["Freq"]]
# Rep.int is just a faster, simplified version of rep
combo_reps <- rep.int(one_to_n_rows, freqs)
combo_reps

##   [1] 1 1 1 1 1 1 1 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
##  [38] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
##  [75] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [112] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [149] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [186] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

# Drop the count (Freq) column from the frequency table
test_mat_ct_to_df[["Freq"]] <- NULL
test_mat_ct_to_df

##   Appendectomy Infection
## 1          Yes       Yes
## 2           No       Yes
## 3          Yes        No
## 4           No        No

# Repeat each row of test_mat_ct_to_df according to combo_reps
df_obs_from_matrix_ct <- test_mat_ct_to_df[combo_reps, ]

# Drop weird numbered rownames
rownames(df_obs_from_matrix_ct) <- NULL

df_obs_from_matrix_ct

##     Appendectomy Infection
## 1            Yes       Yes
## 2            Yes       Yes
## 3            Yes       Yes
## 4            Yes       Yes
## 5            Yes       Yes
## 6            Yes       Yes
## 7            Yes       Yes
## 8             No       Yes
## 9            Yes        No
## 10           Yes        No
## 11           Yes        No
## 12           Yes        No
## 13           Yes        No
## 14           Yes        No
## 15           Yes        No
## 16           Yes        No
## 17           Yes        No
## 18           Yes        No
## 19           Yes        No
## 20           Yes        No
## 21           Yes        No
## 22           Yes        No
## 23           Yes        No
## 24           Yes        No
## 25           Yes        No
## 26           Yes        No
## 27           Yes        No
## 28           Yes        No
## 29           Yes        No
## 30           Yes        No
## 31           Yes        No
## 32           Yes        No
## 33           Yes        No
## 34           Yes        No
## 35           Yes        No
## 36           Yes        No
## 37           Yes        No
## 38           Yes        No
## 39           Yes        No
## 40           Yes        No
## 41           Yes        No
## 42           Yes        No
## 43           Yes        No
## 44           Yes        No
## 45           Yes        No
## 46           Yes        No
## 47           Yes        No
## 48           Yes        No
## 49           Yes        No
## 50           Yes        No
## 51           Yes        No
## 52           Yes        No
## 53           Yes        No
## 54           Yes        No
## 55           Yes        No
## 56           Yes        No
## 57           Yes        No
## 58           Yes        No
## 59           Yes        No
## 60           Yes        No
## 61           Yes        No
## 62           Yes        No
## 63           Yes        No
## 64           Yes        No
## 65           Yes        No
## 66           Yes        No
## 67           Yes        No
## 68           Yes        No
## 69           Yes        No
## 70           Yes        No
## 71           Yes        No
## 72           Yes        No
## 73           Yes        No
## 74           Yes        No
## 75           Yes        No
## 76           Yes        No
## 77           Yes        No
## 78           Yes        No
## 79           Yes        No
## 80           Yes        No
## 81           Yes        No
## 82           Yes        No
## 83           Yes        No
## 84           Yes        No
## 85           Yes        No
## 86           Yes        No
## 87           Yes        No
## 88           Yes        No
## 89           Yes        No
## 90           Yes        No
## 91           Yes        No
## 92           Yes        No
## 93           Yes        No
## 94           Yes        No
## 95           Yes        No
## 96           Yes        No
## 97           Yes        No
## 98           Yes        No
## 99           Yes        No
## 100          Yes        No
## 101          Yes        No
## 102          Yes        No
## 103          Yes        No
## 104          Yes        No
## 105          Yes        No
## 106          Yes        No
## 107          Yes        No
## 108          Yes        No
## 109          Yes        No
## 110          Yes        No
## 111          Yes        No
## 112          Yes        No
## 113          Yes        No
## 114          Yes        No
## 115          Yes        No
## 116          Yes        No
## 117          Yes        No
## 118          Yes        No
## 119          Yes        No
## 120          Yes        No
## 121          Yes        No
## 122          Yes        No
## 123          Yes        No
## 124          Yes        No
## 125          Yes        No
## 126          Yes        No
## 127          Yes        No
## 128          Yes        No
## 129          Yes        No
## 130          Yes        No
## 131          Yes        No
## 132          Yes        No
## 133           No        No
## 134           No        No
## 135           No        No
## 136           No        No
## 137           No        No
## 138           No        No
## 139           No        No
## 140           No        No
## 141           No        No
## 142           No        No
## 143           No        No
## 144           No        No
## 145           No        No
## 146           No        No
## 147           No        No
## 148           No        No
## 149           No        No
## 150           No        No
## 151           No        No
## 152           No        No
## 153           No        No
## 154           No        No
## 155           No        No
## 156           No        No
## 157           No        No
## 158           No        No
## 159           No        No
## 160           No        No
## 161           No        No
## 162           No        No
## 163           No        No
## 164           No        No
## 165           No        No
## 166           No        No
## 167           No        No
## 168           No        No
## 169           No        No
## 170           No        No
## 171           No        No
## 172           No        No
## 173           No        No
## 174           No        No
## 175           No        No
## 176           No        No
## 177           No        No
## 178           No        No
## 179           No        No
## 180           No        No
## 181           No        No
## 182           No        No
## 183           No        No
## 184           No        No
## 185           No        No
## 186           No        No
## 187           No        No
## 188           No        No
## 189           No        No
## 190           No        No
## 191           No        No
## 192           No        No
## 193           No        No
## 194           No        No
## 195           No        No
## 196           No        No
## 197           No        No
## 198           No        No
## 199           No        No
## 200           No        No
## 201           No        No
## 202           No        No
## 203           No        No
## 204           No        No
## 205           No        No
## 206           No        No
## 207           No        No
## 208           No        No
## 209           No        No
## 210           No        No

8.9.2 Table object

Same as the matrix example above, there is just one less step because you’ve already converted the matrix to a table object.

table_from_matrix %>% 
  as.data.frame() %>% 
  countsToCases() %>% 
  tibble()

## # A tibble: 210 × 2
##    Appendectomy Infection
##    <fct>        <fct>    
##  1 Yes          Yes      
##  2 Yes          Yes      
##  3 Yes          Yes      
##  4 Yes          Yes      
##  5 Yes          Yes      
##  6 Yes          Yes      
##  7 Yes          Yes      
##  8 No           Yes      
##  9 Yes          No       
## 10 Yes          No       
## # ℹ 200 more rows

And for completeness, here is the table object we previously created from a data frame of observations.

table_from_df %>% 
  as.data.frame() %>% 
  countsToCases() %>% 
  tibble()

## # A tibble: 210 × 2
##    appendectomy infection
##    <fct>        <fct>    
##  1 Yes          Yes      
##  2 Yes          Yes      
##  3 Yes          Yes      
##  4 Yes          Yes      
##  5 Yes          Yes      
##  6 Yes          Yes      
##  7 Yes          Yes      
##  8 No           Yes      
##  9 Yes          No       
## 10 Yes          No       
## # ℹ 200 more rows

8.10 Convert a frequency table into a contingency table.

The Cookbook for R also shows how you can convert a frequency table into a contingency table using the base R xtabs() function.

As a reminder, freq_tbl was created above in the section on manually creating a frequency table.

xtabs(count ~ appendectomy + infection, data = freq_tbl)

##             infection
## appendectomy  No Yes
##          No   78   1
##          Yes 124   7

⚠️Warning: Notice that the “No” category comes before the “Yes” category by default when passing a frequency table to the xtabs() function. This typically not the order we would put them in for analysis. To prevent this from happening, change the ordering of the factor levels in the frequency table (demonstrated below).

freq_tbl_yn <- freq_tbl %>% 
  mutate(
    appendectomy = factor(appendectomy, levels = c("Yes", "No")),
    infection = factor(infection, levels = c("Yes", "No"))
  )

xtabs(count ~ appendectomy + infection, data = freq_tbl_yn)

##             infection
## appendectomy Yes  No
##          Yes   7 124
##          No    1  78

8.11 Convert a frequency table into a data frame of observations.

We already saw how to do this above as an intermediate step between a contingency table and a data frame of observations.

freq_tbl %>% 
   countsToCases("count")

## # A tibble: 210 × 2
##    appendectomy infection
##    <chr>        <chr>    
##  1 Yes          Yes      
##  2 Yes          Yes      
##  3 Yes          Yes      
##  4 Yes          Yes      
##  5 Yes          Yes      
##  6 Yes          Yes      
##  7 Yes          Yes      
##  8 Yes          No       
##  9 Yes          No       
## 10 Yes          No       
## # ℹ 200 more rows

8.12 Bottom line

This section briefly distill down everything from above into the most common scenario we are actually working on – Manually create a contingency table and then convert it to a data frame of observations.

When we open a open a book or journal article and see a 2x2 table, or when we just decide to create a 2x2 table from scratch to do some experimenting, it looks like it’s best to start with a matrix and convert it to a table object.

matrix_ct <- matrix(
  c(a = 7, b = 124, c = 1, d = 78),
  ncol = 2,
  byrow = TRUE
)

dimnames(matrix_ct) <- list(
  Appendectomy = c("Yes", "No"),
  Infection = c("Yes", "No")
)

matrix_ct_margins <- addmargins(matrix_ct)
matrix_ct_margins

##             Infection
## Appendectomy Yes  No Sum
##          Yes   7 124 131
##          No    1  78  79
##          Sum   8 202 210

You can go ahead and convert to a table at this point, but you don’t have to. Look back at the section on why to consider using a table object, but for now, I can’t come up with a good reason to do this off the top of my head.

To then convert that contingency table to a data frame of observations, we use can simply use the following code.

matrix_ct %>% 
  as.table() %>% 
  as.data.frame() %>% 
  countsToCases() %>% 
  tibble()

## # A tibble: 210 × 2
##    Appendectomy Infection
##    <fct>        <fct>    
##  1 Yes          Yes      
##  2 Yes          Yes      
##  3 Yes          Yes      
##  4 Yes          Yes      
##  5 Yes          Yes      
##  6 Yes          Yes      
##  7 Yes          Yes      
##  8 No           Yes      
##  9 Yes          No       
## 10 Yes          No       
## # ℹ 200 more rows

8.13 Helper functions

In some of the cases above, the solutions may be slightly unsatisfying, but they seem to work reasonably well. In this section, we create some helper functions to make working with contingency tables even easier – especially in the context of working with the freqtables package. At some point, these functions may work their way into the freqtables package.

8.13.1 Create a contingency table

After playing around with this a little bit, I couldn’t come up with a function that was really any better (i.e., less lines of code or easier to understand) than just creating the matrix using the standard base R functions. This is especially true when contingency tables However, I’m leaving the code here just in case.

# Code for manually creating a matrix contingency table.
matrix_ct <- matrix(
  c(a = 7, b = 124, c = 1, d = 78),
  ncol = 2,
  byrow = TRUE
)

dimnames(matrix_ct) <- list(
  Appendectomy = c("Yes", "No"),
  Infection = c("Yes", "No")
)

matrix_ct_margins <- addmargins(matrix_ct)
matrix_ct_margins

##             Infection
## Appendectomy Yes  No Sum
##          Yes   7 124 131
##          No    1  78  79
##          Sum   8 202 210

Here is the function code

contingency_table <- function(.values, ncol, dim_names, margins = FALSE) {
  # Create the matrix
  matrix_ct <- matrix(.values, ncol = ncol, byrow = TRUE)
  
  # Optionally add dimnames
  if (!missing(dim_names)) {
    dimnames(matrix_ct) <- dim_names
  }
  
  # Optionally add margins
  if (margins) {
    matrix_ct <- addmargins(matrix_ct)
  }
  
  # Return contingency table
  matrix_ct
}

Testing with a 2x2 table

contingency_table(
  .values = c(
    7, 224,
    1, 78
  ),
  ncol = 2,
  dim_names = list(
    Appendectomy = c("Yes", "No"),
    Infection    = c("Yes", "No")
  ),
  margins = TRUE
)

##             Infection
## Appendectomy Yes  No Sum
##          Yes   7 224 231
##          No    1  78  79
##          Sum   8 302 310

What if we need to make a contingency table for variables with more than two levels? For example:

matrix_3_ct <- matrix(
  c(
    1, 2, 1,
    0, 1, 1,
    0, 1, 2
  ),
  ncol = 3,
  byrow = TRUE
)

dimnames(matrix_3_ct) <- list(
  age      = c("Young", "Middle", "Old"),
  severity = c("Mild", "Moderate", "Severe")
)

addmargins(matrix_3_ct)

##         severity
## age      Mild Moderate Severe Sum
##   Young     1        2      1   4
##   Middle    0        1      1   2
##   Old       0        1      2   3
##   Sum       1        4      4   9

Testing with a 3x3 table

contingency_table(
  .values = c(
    1, 2, 1,
    0, 1, 1,
    0, 1, 2
  ),
  ncol = 3,
  dim_names = list(
    age      = c("Young", "Middle", "Old"),
    severity = c("Mild", "Moderate", "Severe")
  ),
  margins = TRUE
)

##         severity
## age      Mild Moderate Severe Sum
##   Young     1        2      1   4
##   Middle    0        1      1   2
##   Old       0        1      2   3
##   Sum       1        4      4   9

This works, but I’m not sure what advantage it has over the manual code. It’s roughly the same number of lines and it’s necessarily any easier to read or understand.

What would the ideal solution even look like? Maybe something like this?

# Doesn't run
contingency_table(
                          | Wound infection |
|-------------------------|-----------------|-----|
| Incidental appendectomy | Yes             | No  | 
| Yes                     | 7               | 124 |
| No                      | 1               | 78  |
)

Or perhaps a Shiny app that would allow to enter data into a spreadsheet-like interface?

8.13.3 Counts to cases

Rewrite this function

# Convert from data frame of counts to data frame of cases.
# `countcol` is the name of the column containing the counts
# From: https://cran.r-project.org/web/packages/DescTools/vignettes/TablesInR.pdf
countsToCases <- function(x, countcol = "Freq") {
    # Get the row indices to pull from x
    idx <- rep.int(seq_len(nrow(x)), x[[countcol]])

    # Drop count column
    x[[countcol]] <- NULL

    # Get the rows from x
    x[idx, ]
}

[Insert function code here]

R Notes

8 Contingency Tables

8.1 ⭐️Overview

8.2 🌎Useful websites

8.3 📦Load packages

8.4 Terminology

8.5 Scenario

8.6 Manually creating each data structure

8.6.1 Data frame of observations

8.6.2 Frequency table

8.6.3 Contingency tables

8.6.3.1 Matrix object

8.6.3.2 Base R data frame with rownames

8.7 Table objects

8.7.1 Table object from matrix

8.7.2 Table object from df

8.7.3 Why use table objects?

8.8 Convert a contingency table into a frequency table

8.8.1 Matrix object

8.8.2 Table object from df

8.8.3 Data frame contingency table

8.9 Convert a contingency table into a data frame of observations

8.9.1 Matrix object

8.9.2 Table object

8.10 Convert a frequency table into a contingency table.

8.11 Convert a frequency table into a data frame of observations.

8.12 Bottom line

8.13 Helper functions

8.13.1 Create a contingency table

8.13.2 🔴Here down needs refinement

8.13.3 Counts to cases

8.13.4 Freqtable to contingency table

8.13.5 Contingency table to freqtable

8.13.6 Freqtable to data frame of observations