Based on Chapter 8 of ModernDive. Code for Quiz 12.
Load the R package we will use.
What is the average age of members that have served in congress?
congress_age
and assign it to congress_age_100
set.seed(123)
congress_age_100 <- congress_age %>%
rep_sample_n(size=100)
congress_age
is the population and congress_age_100
is the sample
18,635 is the number of observations in the population, and 100 is the number of observations in your sample
Construct the confidence interval
1. Use specify
to indicate the variable from congress_age_100
that you are interested in
Response: age (numeric)
# A tibble: 100 × 1
age
<dbl>
1 53.1
2 54.9
3 65.3
4 60.1
5 43.8
6 57.9
7 55.3
8 46
9 42.1
10 37
# … with 90 more rows
2. Generate 1000 replicates of your sample of 100
Response: age (numeric)
# A tibble: 100,000 × 2
# Groups: replicate [1,000]
replicate age
<int> <dbl>
1 1 42.1
2 1 71.2
3 1 45.6
4 1 39.6
5 1 56.8
6 1 71.6
7 1 60.5
8 1 56.4
9 1 43.3
10 1 53.1
# … with 99,990 more rows
The output has 99,990 more rows
3. Calculate the mean for each replicate
bootstrap_distribution_mean_age
bootstrap_distribution_mean_age
bootstrap_distribution_mean_age
has 990 means4. Visualize
the bootstrap distribution
visualize(bootstrap_distribution_mean_age)
Calculate the 95% confidence interval using the percentile method
congress_ci_percentile
congress_ci_percentile
congress_ci_percentile <- bootstrap_distribution_mean_age %>%
get_confidence_interval(type = "percentile", level = 0.95)
congress_ci_percentile
# A tibble: 1 × 2
lower_ci upper_ci
<dbl> <dbl>
1 51.5 55.2
Calculate the observed point estimate of the mean and assign it to obs_mean_age
obs_mean_age
,obs_mean_age <- congress_age_100 %>%
specify(response = age) %>%
calculate(stat = "mean") %>%
pull()
obs_mean_age
[1] 53.36
obs_mean_age
, to your visualization and color it “hotpink”visualize(bootstrap_distribution_mean_age) +
shade_confidence_interval(endpoints = congress_ci_percentile) +
geom_vline(xintercept = obs_mean_age, color = "hotpink", size = 1 )
pop_mean_age
pop_mean_age
[1] 53.31373
visualize(bootstrap_distribution_mean_age) +
shade_confidence_interval(endpoints = congress_ci_percentile) +
geom_vline(xintercept = obs_mean_age, color = "hotpink", size = 1) +
geom_vline(xintercept = pop_mean_age, color = "purple", size = 3)
Is population mean the 95% confidence interval constructed using the bootstrap distribution? Yes.
Change set.seed(123) to set.seed(4346). Rerun all the code.