5 Confidence intervals
5.1 ⭐️Overview
This R Notebook contains a demonstration that is intended to help you develop an intuition for frequentist interpretations of 95% confidence intervals.
5.4 🔢Simulate data
Generating 10,000 random participants. About half male and half female.
## # A tibble: 2 × 9
## var cat n n_total percent se t_crit lcl ucl
## <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 sex Female 5017 10000 50.2 0.500 1.96 49.2 51.1
## 2 sex Male 4983 10000 49.8 0.500 1.96 48.9 50.8
Assigning heights to the generated participants at random (within certain parameters).
set.seed(123)
random_data <- random_data %>%
mutate(
height = if_else(
sex == "Female",
rnorm(n, 71.00, 4.32),
rnorm(n, 64.30, 2.11)
)
)
## # A tibble: 2 × 5
## group_cat mean sd lcl ucl
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Female 71.0 4.33 70.9 71.1
## 2 Male 64.3 2.1 64.2 64.4
5.5 Draw samples from population
Randomly select samples of 100 participants
## # A tibble: 10 × 2
## sex height
## <chr> <dbl>
## 1 Male 65.8
## 2 Male 65.1
## 3 Male 68.4
## 4 Male 61.9
## 5 Male 64.0
## 6 Male 64.3
## 7 Male 64.3
## 8 Male 62.7
## 9 Male 61.2
## 10 Male 63.1
## [[1]]
## # A tibble: 10 × 2
## sex height
## <chr> <dbl>
## 1 Male 65.8
## 2 Male 65.1
## 3 Male 68.4
## 4 Male 61.9
## 5 Male 64.0
## 6 Male 64.3
## 7 Male 64.3
## 8 Male 62.7
## 9 Male 61.2
## 10 Male 63.1
##
## [[2]]
## # A tibble: 10 × 2
## sex height
## <chr> <dbl>
## 1 Female 66.9
## 2 Male 68.9
## 3 Female 74.1
## 4 Male 66.8
## 5 Female 71.2
## 6 Male 71.2
## 7 Male 64.8
## 8 Male 61.1
## 9 Male 62.8
## 10 Male 64.7
set.seed(137)
n_samples <- 100
random_samples <- tibble(
sample = 1:n_samples,
data = map(1:n_samples, ~ sample_n(random_data, 10))
)
## # A tibble: 10 × 2
## sex height
## <chr> <dbl>
## 1 Male 61.5
## 2 Male 64.2
## 3 Female 74.3
## 4 Female 63.5
## 5 Male 60.0
## 6 Male 69.6
## 7 Male 63.1
## 8 Female 69.2
## 9 Female 62.6
## 10 Female 76.1
# Demonstrate passing an extracted df to the mean_table() function
random_samples$data[[1]] %>%
mean_table(height) %>%
select(mean, lcl, ucl)
## # A tibble: 1 × 3
## mean lcl ucl
## <dbl> <dbl> <dbl>
## 1 66.4 62.4 70.4
Calculate the stats of interest for each of the samples
## # A tibble: 1 × 9
## response_var n mean sd sem lcl ucl min max
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 height 10 66.4 5.55 1.75 62.4 70.4 60.0 76.1
Unnest the results
## # A tibble: 100 × 11
## sample data response_var n mean sd sem lcl ucl min max
## <int> <list> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 <tibble> height 10 66.4 5.55 1.75 62.4 70.4 60.0 76.1
## 2 2 <tibble> height 10 67.2 4.79 1.52 63.8 70.7 60.9 73.9
## 3 3 <tibble> height 10 69.0 5.18 1.64 65.3 72.8 59.6 77.0
## 4 4 <tibble> height 10 70.6 5.46 1.73 66.7 74.5 62.4 77.9
## 5 5 <tibble> height 10 65.3 3.14 0.992 63.1 67.6 61.6 73.0
## 6 6 <tibble> height 10 66.2 5.91 1.87 61.9 70.4 60.1 76.7
## 7 7 <tibble> height 10 67 4.6 1.46 63.7 70.3 59.4 73.6
## 8 8 <tibble> height 10 68.5 4.87 1.54 65.0 72.0 60.1 76.8
## 9 9 <tibble> height 10 69.1 5.03 1.59 65.5 72.7 62.0 79.0
## 10 10 <tibble> height 10 68.2 5.69 1.80 64.2 72.3 59.5 76.5
## # ℹ 90 more rows
Count the number of CI’s that do not include the true population mean
## # A tibble: 1 × 9
## response_var n mean sd sem lcl ucl min max
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 height 10000 67.7 4.78 0.0478 67.6 67.8 56.0 87.6
## # A tibble: 100 × 5
## sample mean lcl ucl contain_pop_mean
## <int> <dbl> <dbl> <dbl> <lgl>
## 1 1 66.4 62.4 70.4 TRUE
## 2 2 67.2 63.8 70.7 TRUE
## 3 3 69.0 65.3 72.8 TRUE
## 4 4 70.6 66.7 74.5 TRUE
## 5 5 65.3 63.1 67.6 FALSE
## 6 6 66.2 61.9 70.4 TRUE
## 7 7 67 63.7 70.3 TRUE
## 8 8 68.5 65.0 72.0 TRUE
## 9 9 69.1 65.5 72.7 TRUE
## 10 10 68.2 64.2 72.3 TRUE
## # ℹ 90 more rows
## # A tibble: 2 × 9
## var cat n n_total percent se t_crit lcl ucl
## <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 contain_pop_mean FALSE 5 100 5 2.19 1.98 2.06 11.6
## 2 contain_pop_mean TRUE 95 100 95 2.19 1.98 88.4 97.9