Skip to contents

This is the Project website and repository for Hannum, Kim, and Wang (2024) (HKW).

On this project repository, we provide (1) the HKW children, teachers, and schools database, (2) programs and scripts used to process and analyze data, and (3) scripts used to generate graphical and tabular outputs presented in the paper.

Raw data cleaning and aggregation

First, data-raw contains raw input data gathered from international and national statistical agencies. Raw global data and data for each country is stored in separate subfolders. Please see the online Appendix in the paper for additional information on sources for the country-specific administrative data.

Second, stata-script/01_Data_cleaning.do cleans and aggregates the raw input files and generates data-raw/ppts_easia_weuro_world_raw.csv.

Third, data-raw/ppts_easia_weuro_world.R generates a data skeleton with all potential country-year pairings and merges in the aggregated raw input files to produce the finalized raw aggregated file ppts_easia_weuro_world.rda, which is also stored as ppts_easia_weuro_world.csv. The data set is documented here: reference/ppts_easia_weuro_world.html.

Interpolation and computing statistics of changes

First, we develop functions for data interpolation: R/ffp_ppts_interp.R. When there are gaps in population, students, teacher, or school data measurements, we use the rate of change between the closest dates in which there is data to linearly interpolate the missing values.

Second, we implement the interpolation routine, along with also a close-boundary extrapolation routine, in the vignette ffv_gen_percent_changes with our global dataset on child population, students, teachers, and schools. This generates the output file ppts_easia_weuro_world_pchg.rda.

Third, the output file ppts_easia_weuro_world_elas_interp1.rda includes percentage changes and elasticities over different spans of year bins, and is generated by the vignette articles/ffv_gen_elasticities.html.

We also have three functions that compute levels, ratios, changes, and elasticities sequentially. They are tested in this vignette: articles/ffv_dev_child_teacher_ratio.html. The three functions are:

  1. Function to generate ratios reference/ff_ppts_lrce_flr.html
  2. Function to compute percentages changes across time: reference/ff_ppts_lrce_fpc.html
  3. Function to generate elasticities: reference/ff_ppts_lrce_fel.html

Data analysis and visualization

The following scripts generate visualizations for the paper:

  1. stata-script/02_Global_youth_population_trends.do
  2. stata-script/03_Global_education_responses.do
  3. stata-script/04_East_Asia_Western_Europe_education_responses.do
  4. stata-script/05_Korea_education_responses.do

The resulting figures are stored in the res-fig folder.

Tabulations

The following vignettes generate the tables contained in the appendix of the paper:

  1. Global population and students:
  2. Global population and teachers:
  3. Western Europe East Asia youth, students, teachers, and schools:
  4. Korean youth, students, teachers, and schools:

Due to limited space in the paper document, we also present on this website additional statistics in tables not included in the paper PDF:

  1. Table: Global students
  2. Table: Global students and teachers
  3. Table: Compare Raw and Interpolated Global Panel: In this table, we show for each country, the share of data points that is interpolated or extrapolated based on administrative data.