1 Nested within Group Stats

Go to the RMD, R, PDF, or HTML version of this file. Go back to fan’s REconTools Package, R Code Examples Repository (bookdown site), or Intro Stats with R Repository (bookdown site).

By Multiple within Individual Groups Variables, Averages for All Numeric Variables within All Groups of All Group Variables (Long to very Wide). Suppose you have an individual level final outcome. The individual is observed for N periods, where each period the inputs differ. What inputs impacted the final outcome?

Suppose we can divide N periods in which the individual is in the data into a number of years, a number of semi-years, a number of quarters, or uneven-staggered lengths. We might want to generate averages across individuals and within each of these different possible groups averages of inputs.

Then we want to version of the data where each row is an individual, one of the variables is the final outcome, and the other variables are these different averages: averages for the 1st, 2nd, 3rd year in which indivdiual is in data, averages for 1st, …, final quarter in which indivdiual is in data.

1.1 Build Function

This function takes as inputs:

  1. vars.not.groups2avg: a list of variables that are not the within-indivdiual or across-individual grouping variables, but the variables we want to average over. Withnin indivdiual grouping averages will be calculated for these variables using the not-listed variables as within indivdiual groups (excluding vars.indi.grp groups).
  2. vars.indi.grp: a list or individual variables, and also perhaps villages, province, etc id variables that are higher than individual ID. Note the groups are are ACROSS individual higher level group variables.
  3. the remaining variables are all within individual grouping variables.

the function output is a dataframe:

  1. each row is an individual
  2. initial variables individual ID and across individual groups from vars.indi.grp.
  3. other variables are all averages for the variables in vars.not.groups2avg
    • if there are 2 within individual group variables, and the first has 3 groups (years), the second has 6 groups (semi-years), then there would be 9 average variables.
    • each average variables has the original variable name from vars.not.groups2avg plus the name of the within individual grouping variable, and at the end ‘c_x’, where x is a integer representing the category within the group (if 3 years, x=1, 2, 3)
# Data Function
# https://fanwangecon.github.io/R4Econ/summarize/summ/ByGroupsSummWide.html
f.by.groups.summ.wide <- function(df.groups.to.average,
                                  vars.not.groups2avg,
                                  vars.indi.grp = c('S.country','ID'),
                                  display=TRUE) {

# 1. generate categoricals for full year (m.12), half year (m.6), quarter year (m.4)
# 2. generate categoricals also for uneven years (m12t14) using 
#  stagger (+2 rather than -1)
# 3. reshape wide to long, so that all categorical date groups appear in var=value,
    # and categories in var=variable
# 4. calculate mean for all numeric variables for all date groups
# 5. combine date categorical variable and value, single var:
    # m.12.c1= first year average from m.12 averaging

#### #### #### #### #######
# Step 1
#### #### #### #### #######
# 1. generate categoricals for full year (m.12), half year (m.6), quarter year (m.4)
# 2. generate categoricals also for uneven years (m12t14) using stagger 
#  (+2 rather than -1)

#### #### #### #### #######
# S2: reshape wide to long, so that all categorical date groups appear in var=value,
# and categories in var=variable; calculate mean for all 
# numeric variables for all date groups
#### #### #### #### #######
df.avg.long <- df.groups.to.average %>%
       gather(variable, value, -one_of(c(vars.indi.grp,
                                         vars.not.groups2avg))) %>%
       group_by(!!!syms(vars.indi.grp), variable, value) %>%
       summarise_if(is.numeric, funs(mean(., na.rm = TRUE)))

if (display){
  dim(df.avg.long)
  options(repr.matrix.max.rows=10, repr.matrix.max.cols=20)
  print(df.avg.long)
}

#### #### #### #### #######
# S3 combine date categorical variable and value, single var:
# m.12.c1= first year average from m.12 averaging; to do this make 
# data even longer first
#### #### #### #### #######

# We already have the averages, but we want them to show up as variables,
    # mean for each group of each variable.
df.avg.allvars.wide <- df.avg.long %>%
   ungroup() %>%
   mutate(all_m_cate = paste0(variable, '_c', value)) %>%
   select(all_m_cate, everything(), -variable, -value) %>%
   gather(variable, value, -one_of(vars.indi.grp), -all_m_cate) %>%
   unite('var_mcate', variable, all_m_cate) %>%
   spread(var_mcate, value)

if (display){
  dim(df.avg.allvars.wide)
  options(repr.matrix.max.rows=10, repr.matrix.max.cols=10)
  print(df.avg.allvars.wide)
}

return(df.avg.allvars.wide)
}

1.2 Test Program

In our sample dataset, the number of nutrition/height/income etc information observed within each country and month of age group are different. We have a panel dataset for children observed over different months of age.

We have two key grouping variables: 1. country: data are observed for guatemala and cebu 2. month-age (survey month round=svymthRound): different months of age at which each individual child is observed

A child could be observed for many months, or just a few months. A child’s height information could be observed for more months-of-age than nutritional intake information. We eventually want to run regressions where the outcome is height/weight and the input is nutrition. The regressions will be at the month-of-age level. We need to know how many times different variables are observed at the month-of-age level.

# Library
library(tidyverse)

# Load Sample Data
setwd('C:/Users/fan/R4Econ/_data/')
df <- read_csv('height_weight.csv')

1.2.1 Generate Within Individual Groups

In the data, children are observed for different number of months since birth. We want to calculate quarterly, semi-year, annual, etc average nutritional intakes. First generate these within-individual grouping variables. We can also generate uneven-staggered calendar groups as shown below.

mth.var <- 'svymthRound'
df.groups.to.average<- df %>%
        filter(!!sym(mth.var) >= 0 & !!sym(mth.var) <= 24)  %>%
        mutate(m12t24=(floor((!!sym(mth.var) - 12) %/% 14) + 1),
               m8t24=(floor((!!sym(mth.var) - 8) %/% 18) + 1),
               m12 = pmax((floor((!!sym(mth.var)-1) %/% 12) + 1), 1),
               m6 = pmax((floor((!!sym(mth.var)-1) %/% 6) + 1), 1),
               m3 = pmax((floor((!!sym(mth.var)-1) %/% 3) + 1), 1))
# Show Results
options(repr.matrix.max.rows=30, repr.matrix.max.cols=20)
vars.arrange <- c('S.country','indi.id','svymthRound')
vars.groups.within.indi <- c('m12t24', 'm8t24', 'm12', 'm6', 'm3')
as.tibble(df.groups.to.average %>%
          group_by(!!!syms(vars.arrange)) %>%
          arrange(!!!syms(vars.arrange)) %>%
          select(!!!syms(vars.arrange), !!!syms(vars.groups.within.indi)))

1.2.2 Within Group Averages

With the within-group averages created, we can generate averages for all variables within these groups.

vars.not.groups2avg <- c('prot', 'cal')
vars.indi.grp <- c('S.country', 'indi.id')
vars.groups.within.indi <- c('m12t24', 'm8t24', 'm12', 'm6', 'm3')

df.groups.to.average.select <- df.groups.to.average %>%
                        select(one_of(c(vars.indi.grp,
                                        vars.not.groups2avg,
                                        vars.groups.within.indi)))
df.avg.allvars.wide <- f.by.groups.summ.wide(df.groups.to.average.select,
                                             vars.not.groups2avg,
                                             vars.indi.grp, display=FALSE)

This is the tabular version of results

dim(df.avg.allvars.wide)
## [1] 2023   38
names(df.avg.allvars.wide)
##  [1] "S.country"      "indi.id"        "cal_m12_c1"     "cal_m12_c2"     "cal_m12t24_c0"  "cal_m12t24_c1"  "cal_m3_c1"      "cal_m3_c2"     
##  [9] "cal_m3_c3"      "cal_m3_c4"      "cal_m3_c5"      "cal_m3_c6"      "cal_m3_c7"      "cal_m3_c8"      "cal_m6_c1"      "cal_m6_c2"     
## [17] "cal_m6_c3"      "cal_m6_c4"      "cal_m8t24_c0"   "cal_m8t24_c1"   "prot_m12_c1"    "prot_m12_c2"    "prot_m12t24_c0" "prot_m12t24_c1"
## [25] "prot_m3_c1"     "prot_m3_c2"     "prot_m3_c3"     "prot_m3_c4"     "prot_m3_c5"     "prot_m3_c6"     "prot_m3_c7"     "prot_m3_c8"    
## [33] "prot_m6_c1"     "prot_m6_c2"     "prot_m6_c3"     "prot_m6_c4"     "prot_m8t24_c0"  "prot_m8t24_c1"
df.avg.allvars.wide[1:20,] %>% kable() %>% kable_styling_fc_wide()
S.country indi.id cal_m12_c1 cal_m12_c2 cal_m12t24_c0 cal_m12t24_c1 cal_m3_c1 cal_m3_c2 cal_m3_c3 cal_m3_c4 cal_m3_c5 cal_m3_c6 cal_m3_c7 cal_m3_c8 cal_m6_c1 cal_m6_c2 cal_m6_c3 cal_m6_c4 cal_m8t24_c0 cal_m8t24_c1 prot_m12_c1 prot_m12_c2 prot_m12t24_c0 prot_m12t24_c1 prot_m3_c1 prot_m3_c2 prot_m3_c3 prot_m3_c4 prot_m3_c5 prot_m3_c6 prot_m3_c7 prot_m3_c8 prot_m6_c1 prot_m6_c2 prot_m6_c3 prot_m6_c4 prot_m8t24_c0 prot_m8t24_c1
Cebu 1 132.15714 NaN 97.08333 342.6000 9.10 95.50 85.3 315.30 NaN NaN NaN NaN 52.300 238.63333 NaN NaN 52.300 238.6333 5.3571429 NaN 4.3666667 11.300000 0.65 3.65 2.6 13.15 NaN NaN NaN NaN 2.150 9.6333333 NaN NaN 2.150 9.633333
Cebu 2 90.72857 255.6500 81.46667 240.0286 83.35 12.30 155.1 144.35 228.0 152.85 305.0 347.60 47.825 147.93333 177.9000 333.4000 47.825 219.7444 3.1857143 8.550000 2.7333333 8.171429 3.20 1.25 5.2 4.10 5.4 5.15 7.7 13.95 2.225 4.4666667 5.233333 11.866667 2.225 7.188889
Cebu 3 96.80000 658.8167 31.56667 634.4429 0.50 28.85 57.0 280.95 459.3 549.95 612.0 890.85 14.675 206.30000 519.7333 797.9000 14.675 507.9778 4.5000000 21.116667 1.6833333 21.157143 1.05 2.15 2.3 11.40 18.5 18.05 18.0 27.05 1.600 8.3666667 18.200000 24.033333 1.600 16.866667
Cebu 4 27.45714 371.7000 24.55000 325.0143 4.50 25.95 39.4 45.95 221.2 271.00 581.3 442.85 15.225 43.76667 254.4000 489.0000 15.225 262.3889 0.8714286 6.850000 0.9000000 5.971429 0.75 1.10 1.2 0.60 1.8 4.85 10.1 9.75 0.925 0.8000000 3.833333 9.866667 0.925 4.833333
Cebu 5 101.34286 1080.8500 79.15000 959.9429 14.10 143.80 71.3 161.15 452.6 1345.20 1178.1 1082.00 78.950 131.20000 1047.6667 1114.0333 78.950 764.3000 2.4000000 19.483333 2.3166667 17.114286 1.35 3.00 3.4 2.35 7.1 23.15 24.5 19.50 2.175 2.7000000 17.800000 21.166667 2.175 13.888889
Cebu 6 185.35714 521.5333 162.23333 493.3286 23.85 184.70 169.1 355.65 653.4 506.50 416.8 523.00 104.275 293.46667 555.4667 487.6000 104.275 445.5111 8.4000000 15.116667 7.3833333 15.028571 0.85 7.40 9.8 16.25 26.8 14.10 11.4 12.15 4.125 14.1000000 18.333333 11.900000 4.125 14.777778
Cebu 7 157.25714 570.9800 145.50000 513.7833 8.30 137.80 407.8 200.40 390.6 637.10 688.1 569.55 73.050 269.53333 513.8500 609.0667 73.050 457.9375 3.3000000 20.440000 2.7833333 18.100000 0.95 1.70 8.6 4.60 16.4 23.00 21.5 20.65 1.325 5.9333333 19.700000 20.933333 1.325 15.000000
Cebu 8 471.92857 844.8333 379.20000 871.0429 158.95 423.00 417.5 861.05 691.3 897.95 637.1 972.35 290.975 713.20000 829.0667 860.6000 290.975 800.9556 13.6857143 32.716667 11.0166667 32.285714 3.90 11.35 10.8 27.25 42.7 26.45 25.8 37.45 7.625 21.7666667 31.866667 33.566667 7.625 29.066667
Cebu 9 32.27143 415.2167 16.58333 373.9571 5.05 10.40 15.1 89.95 142.4 203.60 753.2 594.25 7.725 65.00000 183.2000 647.2333 7.725 298.4778 0.9571429 18.283333 0.9166667 15.842857 0.50 0.50 0.5 2.10 4.2 10.85 39.5 22.15 0.500 1.5666667 8.633333 27.933333 0.500 12.711111
Cebu 10 67.18571 395.2500 68.58333 347.1857 9.55 26.40 164.6 116.90 296.6 303.00 385.1 541.90 17.975 132.80000 300.8667 489.6333 17.975 307.7667 2.0428571 8.466667 1.9333333 7.642857 0.85 0.50 4.9 3.35 7.5 6.05 9.2 11.00 0.675 3.8666667 6.533333 10.400000 0.675 6.933333
Cebu 11 14.90000 245.3833 11.80000 215.1143 0.50 5.20 30.0 31.45 126.2 223.05 239.6 330.20 2.850 30.96667 190.7667 300.0000 2.850 173.9111 1.0285714 6.833333 1.1166667 5.928571 0.80 1.70 1.2 0.50 3.6 6.35 7.3 8.70 1.250 0.7333333 5.433333 8.233333 1.250 4.800000
Cebu 12 453.61429 745.6833 419.51667 733.1857 325.60 483.65 463.0 546.90 766.8 676.85 998.5 677.55 404.625 518.93333 706.8333 784.5333 404.625 670.1000 14.7714286 22.133333 14.1666667 21.600000 7.40 13.60 26.6 17.40 18.6 20.35 25.6 23.95 10.500 20.4666667 19.766667 24.500000 10.500 21.577778
Cebu 13 47.51429 210.2500 36.81667 196.1714 17.45 40.00 28.5 94.60 216.9 195.15 281.3 186.50 28.725 72.56667 202.4000 218.1000 28.725 164.3556 1.9571429 6.800000 1.6666667 6.357143 0.70 1.50 2.1 3.60 8.3 4.75 6.2 8.40 1.100 3.1000000 5.933333 7.666667 1.100 5.566667
Cebu 14 608.85714 924.5833 527.30000 949.3857 259.05 554.30 688.1 973.60 525.5 1039.60 800.0 1071.40 406.675 878.43333 868.2333 980.9333 406.675 909.2000 24.6714286 28.050000 23.0833333 28.928571 15.05 28.85 32.6 26.15 12.7 29.75 29.5 33.30 21.950 28.3000000 24.066667 32.033333 21.950 28.133333
Cebu 15 74.67143 440.0833 64.21667 396.8429 62.40 39.60 80.1 119.30 292.5 237.45 607.8 632.65 51.000 106.23333 255.8000 624.3667 51.000 328.8000 2.2571429 10.633333 1.6333333 9.971429 1.65 0.90 1.5 4.60 9.4 5.80 13.6 14.60 1.275 3.5666667 7.000000 14.266667 1.275 8.277778
Cebu 16 128.45714 519.9333 90.50000 496.5429 4.80 11.00 205.3 331.15 290.8 354.50 563.3 778.25 7.900 289.20000 333.2667 706.6000 7.900 443.0222 4.6857143 18.083333 4.1000000 16.671429 0.65 2.70 8.7 8.70 7.8 12.95 25.9 24.45 1.675 8.7000000 11.233333 24.933333 1.675 14.955556
Cebu 17 130.78571 718.8667 97.48333 663.4000 5.50 7.80 249.9 319.50 774.6 892.90 551.9 600.45 6.650 296.30000 853.4667 584.2667 6.650 578.0111 2.8142857 19.716667 2.2166667 17.814286 0.60 0.50 6.1 5.70 20.1 21.55 24.9 15.10 0.550 5.8333333 21.066667 18.366667 0.550 15.088889
Cebu 18 172.64000 497.9500 172.64000 497.9500 29.60 234.40 NaN 335.20 NaN NaN NaN 497.95 132.000 335.20000 NaN 497.9500 132.000 443.7000 11.7000000 19.100000 11.7000000 19.100000 2.95 17.90 NaN 16.80 NaN NaN NaN 19.10 10.425 16.8000000 NaN 19.100000 10.425 18.333333
Cebu 19 74.45714 314.7333 80.10000 275.5714 3.65 95.70 171.3 75.60 131.3 350.50 304.6 375.75 49.675 107.50000 277.4333 352.0333 49.675 245.6556 2.5428571 10.466667 2.8833333 9.042857 0.50 2.95 6.7 2.10 3.4 11.40 12.3 12.15 1.725 3.6333333 8.733333 12.200000 1.725 8.188889
Cebu 20 110.90000 583.2000 80.51667 541.7714 7.30 120.65 77.8 221.30 391.2 582.10 466.1 738.85 63.975 173.46667 518.4667 647.9333 63.975 446.6222 3.2000000 16.966667 2.1833333 15.871429 0.50 3.85 2.7 5.50 7.9 20.15 11.8 20.90 2.175 4.5666667 16.066667 17.866667 2.175 12.833333