Project Part 1

Dataset for the age of schooling

I downloaded ‘School Life Expectancy’ from Our World in Data. I chose this data because I thought it was interesting to see the age variations between each country.
This is the link to the data.
The following code chunk loads the package I will use to read in and prepare the data for analysis.

library(tidyverse)

Read the data in

school_years <- read_csv(here::here("_posts/2022-05-08-project-part-1/expected-years-of-schooling.csv"))

Use glimpse to see the names and types of the columns

glimpse(school_years)

Rows: 5,142
Columns: 4
$ Entity                                <chr> "Afghanistan", "Afghan…
$ Code                                  <chr> "AFG", "AFG", "AFG", "…
$ Year                                  <dbl> 1990, 1991, 1992, 1993…
$ `Expected Years of Schooling (years)` <dbl> 2.6, 2.9, 3.2, 3.6, 3.…

# View(school_years)

Use output from glimpse (and View) to prepare the data for analysis

Create the object countries that is a list of countries that I want to extract from the dataset
Change the name of 1st column to Country and the 4th column to Age
Use filter to extract the rows that I want to keep: Year >= 2000 and Country in countries
Select the columns to keep: Country, Year, Age
No need to mutate the data (all in years)
Assign the output to countries_age
Display the first 10 rows of countries_age

countries  <- c("China",
               "United States",
               "South Korea",
               "Phillippines",
               "India", 
               "Ghana",
               "Ethiopia")

countries_age <- school_years  %>% 
  rename(Country = 1, Age = 4)  %>% 
  filter(Year >= 2000, Country %in%  countries)  %>% 
  select(Country, Year, Age)  %>% 
  mutate(Age = Age)

countries_age

# A tibble: 108 × 3
   Country  Year   Age
   <chr>   <dbl> <dbl>
 1 China    2000   9.6
 2 China    2001   9.7
 3 China    2002   9.9
 4 China    2003  10.2
 5 China    2004  10.6
 6 China    2005  11  
 7 China    2006  11.5
 8 China    2007  12  
 9 China    2008  12.3
10 China    2009  12.6
# … with 98 more rows

Check that the maximum years in school for 2000 equals the maximum in the graph

countries_age %>% filter(Year == 2000)  %>% 
  summarise(total_age = max(Age))

# A tibble: 1 × 1
  total_age
      <dbl>
1      15.6

Add a picture

Write the data to file in the project directory

write_csv(countries_age, file='countries_age.csv')

Project Part 1

Authors

Affiliations

Published

DOI

Footnotes