Go to the RMD, R, PDF, or HTML version of this file. Go back to fan’s REconTools Package, R Code Examples Repository (bookdown site), or Intro Stats with R Repository (bookdown site).
By Multiple within Individual Groups Variables, Averages for All Numeric Variables within All Groups of All Group Variables (Long to very Wide). Suppose you have an individual level final outcome. The individual is observed for N periods, where each period the inputs differ. What inputs impacted the final outcome?
Suppose we can divide N periods in which the individual is in the data into a number of years, a number of semi-years, a number of quarters, or uneven-staggered lengths. We might want to generate averages across individuals and within each of these different possible groups averages of inputs.
Then we want to version of the data where each row is an individual, one of the variables is the final outcome, and the other variables are these different averages: averages for the 1st, 2nd, 3rd year in which indivdiual is in data, averages for 1st, …, final quarter in which indivdiual is in data.
This function takes as inputs:
the function output is a dataframe:
# Data Function
# https://fanwangecon.github.io/R4Econ/summarize/summ/ByGroupsSummWide.html
f.by.groups.summ.wide <- function(df.groups.to.average,
vars.not.groups2avg,
vars.indi.grp = c('S.country','ID'),
display=TRUE) {
# 1. generate categoricals for full year (m.12), half year (m.6), quarter year (m.4)
# 2. generate categoricals also for uneven years (m12t14) using
# stagger (+2 rather than -1)
# 3. reshape wide to long, so that all categorical date groups appear in var=value,
# and categories in var=variable
# 4. calculate mean for all numeric variables for all date groups
# 5. combine date categorical variable and value, single var:
# m.12.c1= first year average from m.12 averaging
#### #### #### #### #######
# Step 1
#### #### #### #### #######
# 1. generate categoricals for full year (m.12), half year (m.6), quarter year (m.4)
# 2. generate categoricals also for uneven years (m12t14) using stagger
# (+2 rather than -1)
#### #### #### #### #######
# S2: reshape wide to long, so that all categorical date groups appear in var=value,
# and categories in var=variable; calculate mean for all
# numeric variables for all date groups
#### #### #### #### #######
df.avg.long <- df.groups.to.average %>%
gather(variable, value, -one_of(c(vars.indi.grp,
vars.not.groups2avg))) %>%
group_by(!!!syms(vars.indi.grp), variable, value) %>%
summarise_if(is.numeric, funs(mean(., na.rm = TRUE)))
if (display){
dim(df.avg.long)
options(repr.matrix.max.rows=10, repr.matrix.max.cols=20)
print(df.avg.long)
}
#### #### #### #### #######
# S3 combine date categorical variable and value, single var:
# m.12.c1= first year average from m.12 averaging; to do this make
# data even longer first
#### #### #### #### #######
# We already have the averages, but we want them to show up as variables,
# mean for each group of each variable.
df.avg.allvars.wide <- df.avg.long %>%
ungroup() %>%
mutate(all_m_cate = paste0(variable, '_c', value)) %>%
select(all_m_cate, everything(), -variable, -value) %>%
gather(variable, value, -one_of(vars.indi.grp), -all_m_cate) %>%
unite('var_mcate', variable, all_m_cate) %>%
spread(var_mcate, value)
if (display){
dim(df.avg.allvars.wide)
options(repr.matrix.max.rows=10, repr.matrix.max.cols=10)
print(df.avg.allvars.wide)
}
return(df.avg.allvars.wide)
}
In our sample dataset, the number of nutrition/height/income etc information observed within each country and month of age group are different. We have a panel dataset for children observed over different months of age.
We have two key grouping variables: 1. country: data are observed for guatemala and cebu 2. month-age (survey month round=svymthRound): different months of age at which each individual child is observed
A child could be observed for many months, or just a few months. A child’s height information could be observed for more months-of-age than nutritional intake information. We eventually want to run regressions where the outcome is height/weight and the input is nutrition. The regressions will be at the month-of-age level. We need to know how many times different variables are observed at the month-of-age level.
# Library
library(tidyverse)
# Load Sample Data
setwd('C:/Users/fan/R4Econ/_data/')
df <- read_csv('height_weight.csv')
In the data, children are observed for different number of months since birth. We want to calculate quarterly, semi-year, annual, etc average nutritional intakes. First generate these within-individual grouping variables. We can also generate uneven-staggered calendar groups as shown below.
mth.var <- 'svymthRound'
df.groups.to.average<- df %>%
filter(!!sym(mth.var) >= 0 & !!sym(mth.var) <= 24) %>%
mutate(m12t24=(floor((!!sym(mth.var) - 12) %/% 14) + 1),
m8t24=(floor((!!sym(mth.var) - 8) %/% 18) + 1),
m12 = pmax((floor((!!sym(mth.var)-1) %/% 12) + 1), 1),
m6 = pmax((floor((!!sym(mth.var)-1) %/% 6) + 1), 1),
m3 = pmax((floor((!!sym(mth.var)-1) %/% 3) + 1), 1))
# Show Results
options(repr.matrix.max.rows=30, repr.matrix.max.cols=20)
vars.arrange <- c('S.country','indi.id','svymthRound')
vars.groups.within.indi <- c('m12t24', 'm8t24', 'm12', 'm6', 'm3')
as.tibble(df.groups.to.average %>%
group_by(!!!syms(vars.arrange)) %>%
arrange(!!!syms(vars.arrange)) %>%
select(!!!syms(vars.arrange), !!!syms(vars.groups.within.indi)))
With the within-group averages created, we can generate averages for all variables within these groups.
vars.not.groups2avg <- c('prot', 'cal')
vars.indi.grp <- c('S.country', 'indi.id')
vars.groups.within.indi <- c('m12t24', 'm8t24', 'm12', 'm6', 'm3')
df.groups.to.average.select <- df.groups.to.average %>%
select(one_of(c(vars.indi.grp,
vars.not.groups2avg,
vars.groups.within.indi)))
df.avg.allvars.wide <- f.by.groups.summ.wide(df.groups.to.average.select,
vars.not.groups2avg,
vars.indi.grp, display=FALSE)
This is the tabular version of results
dim(df.avg.allvars.wide)
## [1] 2023 38
names(df.avg.allvars.wide)
## [1] "S.country" "indi.id" "cal_m12_c1" "cal_m12_c2" "cal_m12t24_c0" "cal_m12t24_c1" "cal_m3_c1" "cal_m3_c2"
## [9] "cal_m3_c3" "cal_m3_c4" "cal_m3_c5" "cal_m3_c6" "cal_m3_c7" "cal_m3_c8" "cal_m6_c1" "cal_m6_c2"
## [17] "cal_m6_c3" "cal_m6_c4" "cal_m8t24_c0" "cal_m8t24_c1" "prot_m12_c1" "prot_m12_c2" "prot_m12t24_c0" "prot_m12t24_c1"
## [25] "prot_m3_c1" "prot_m3_c2" "prot_m3_c3" "prot_m3_c4" "prot_m3_c5" "prot_m3_c6" "prot_m3_c7" "prot_m3_c8"
## [33] "prot_m6_c1" "prot_m6_c2" "prot_m6_c3" "prot_m6_c4" "prot_m8t24_c0" "prot_m8t24_c1"
df.avg.allvars.wide[1:20,] %>% kable() %>% kable_styling_fc_wide()
S.country | indi.id | cal_m12_c1 | cal_m12_c2 | cal_m12t24_c0 | cal_m12t24_c1 | cal_m3_c1 | cal_m3_c2 | cal_m3_c3 | cal_m3_c4 | cal_m3_c5 | cal_m3_c6 | cal_m3_c7 | cal_m3_c8 | cal_m6_c1 | cal_m6_c2 | cal_m6_c3 | cal_m6_c4 | cal_m8t24_c0 | cal_m8t24_c1 | prot_m12_c1 | prot_m12_c2 | prot_m12t24_c0 | prot_m12t24_c1 | prot_m3_c1 | prot_m3_c2 | prot_m3_c3 | prot_m3_c4 | prot_m3_c5 | prot_m3_c6 | prot_m3_c7 | prot_m3_c8 | prot_m6_c1 | prot_m6_c2 | prot_m6_c3 | prot_m6_c4 | prot_m8t24_c0 | prot_m8t24_c1 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Cebu | 1 | 132.15714 | NaN | 97.08333 | 342.6000 | 9.10 | 95.50 | 85.3 | 315.30 | NaN | NaN | NaN | NaN | 52.300 | 238.63333 | NaN | NaN | 52.300 | 238.6333 | 5.3571429 | NaN | 4.3666667 | 11.300000 | 0.65 | 3.65 | 2.6 | 13.15 | NaN | NaN | NaN | NaN | 2.150 | 9.6333333 | NaN | NaN | 2.150 | 9.633333 |
Cebu | 2 | 90.72857 | 255.6500 | 81.46667 | 240.0286 | 83.35 | 12.30 | 155.1 | 144.35 | 228.0 | 152.85 | 305.0 | 347.60 | 47.825 | 147.93333 | 177.9000 | 333.4000 | 47.825 | 219.7444 | 3.1857143 | 8.550000 | 2.7333333 | 8.171429 | 3.20 | 1.25 | 5.2 | 4.10 | 5.4 | 5.15 | 7.7 | 13.95 | 2.225 | 4.4666667 | 5.233333 | 11.866667 | 2.225 | 7.188889 |
Cebu | 3 | 96.80000 | 658.8167 | 31.56667 | 634.4429 | 0.50 | 28.85 | 57.0 | 280.95 | 459.3 | 549.95 | 612.0 | 890.85 | 14.675 | 206.30000 | 519.7333 | 797.9000 | 14.675 | 507.9778 | 4.5000000 | 21.116667 | 1.6833333 | 21.157143 | 1.05 | 2.15 | 2.3 | 11.40 | 18.5 | 18.05 | 18.0 | 27.05 | 1.600 | 8.3666667 | 18.200000 | 24.033333 | 1.600 | 16.866667 |
Cebu | 4 | 27.45714 | 371.7000 | 24.55000 | 325.0143 | 4.50 | 25.95 | 39.4 | 45.95 | 221.2 | 271.00 | 581.3 | 442.85 | 15.225 | 43.76667 | 254.4000 | 489.0000 | 15.225 | 262.3889 | 0.8714286 | 6.850000 | 0.9000000 | 5.971429 | 0.75 | 1.10 | 1.2 | 0.60 | 1.8 | 4.85 | 10.1 | 9.75 | 0.925 | 0.8000000 | 3.833333 | 9.866667 | 0.925 | 4.833333 |
Cebu | 5 | 101.34286 | 1080.8500 | 79.15000 | 959.9429 | 14.10 | 143.80 | 71.3 | 161.15 | 452.6 | 1345.20 | 1178.1 | 1082.00 | 78.950 | 131.20000 | 1047.6667 | 1114.0333 | 78.950 | 764.3000 | 2.4000000 | 19.483333 | 2.3166667 | 17.114286 | 1.35 | 3.00 | 3.4 | 2.35 | 7.1 | 23.15 | 24.5 | 19.50 | 2.175 | 2.7000000 | 17.800000 | 21.166667 | 2.175 | 13.888889 |
Cebu | 6 | 185.35714 | 521.5333 | 162.23333 | 493.3286 | 23.85 | 184.70 | 169.1 | 355.65 | 653.4 | 506.50 | 416.8 | 523.00 | 104.275 | 293.46667 | 555.4667 | 487.6000 | 104.275 | 445.5111 | 8.4000000 | 15.116667 | 7.3833333 | 15.028571 | 0.85 | 7.40 | 9.8 | 16.25 | 26.8 | 14.10 | 11.4 | 12.15 | 4.125 | 14.1000000 | 18.333333 | 11.900000 | 4.125 | 14.777778 |
Cebu | 7 | 157.25714 | 570.9800 | 145.50000 | 513.7833 | 8.30 | 137.80 | 407.8 | 200.40 | 390.6 | 637.10 | 688.1 | 569.55 | 73.050 | 269.53333 | 513.8500 | 609.0667 | 73.050 | 457.9375 | 3.3000000 | 20.440000 | 2.7833333 | 18.100000 | 0.95 | 1.70 | 8.6 | 4.60 | 16.4 | 23.00 | 21.5 | 20.65 | 1.325 | 5.9333333 | 19.700000 | 20.933333 | 1.325 | 15.000000 |
Cebu | 8 | 471.92857 | 844.8333 | 379.20000 | 871.0429 | 158.95 | 423.00 | 417.5 | 861.05 | 691.3 | 897.95 | 637.1 | 972.35 | 290.975 | 713.20000 | 829.0667 | 860.6000 | 290.975 | 800.9556 | 13.6857143 | 32.716667 | 11.0166667 | 32.285714 | 3.90 | 11.35 | 10.8 | 27.25 | 42.7 | 26.45 | 25.8 | 37.45 | 7.625 | 21.7666667 | 31.866667 | 33.566667 | 7.625 | 29.066667 |
Cebu | 9 | 32.27143 | 415.2167 | 16.58333 | 373.9571 | 5.05 | 10.40 | 15.1 | 89.95 | 142.4 | 203.60 | 753.2 | 594.25 | 7.725 | 65.00000 | 183.2000 | 647.2333 | 7.725 | 298.4778 | 0.9571429 | 18.283333 | 0.9166667 | 15.842857 | 0.50 | 0.50 | 0.5 | 2.10 | 4.2 | 10.85 | 39.5 | 22.15 | 0.500 | 1.5666667 | 8.633333 | 27.933333 | 0.500 | 12.711111 |
Cebu | 10 | 67.18571 | 395.2500 | 68.58333 | 347.1857 | 9.55 | 26.40 | 164.6 | 116.90 | 296.6 | 303.00 | 385.1 | 541.90 | 17.975 | 132.80000 | 300.8667 | 489.6333 | 17.975 | 307.7667 | 2.0428571 | 8.466667 | 1.9333333 | 7.642857 | 0.85 | 0.50 | 4.9 | 3.35 | 7.5 | 6.05 | 9.2 | 11.00 | 0.675 | 3.8666667 | 6.533333 | 10.400000 | 0.675 | 6.933333 |
Cebu | 11 | 14.90000 | 245.3833 | 11.80000 | 215.1143 | 0.50 | 5.20 | 30.0 | 31.45 | 126.2 | 223.05 | 239.6 | 330.20 | 2.850 | 30.96667 | 190.7667 | 300.0000 | 2.850 | 173.9111 | 1.0285714 | 6.833333 | 1.1166667 | 5.928571 | 0.80 | 1.70 | 1.2 | 0.50 | 3.6 | 6.35 | 7.3 | 8.70 | 1.250 | 0.7333333 | 5.433333 | 8.233333 | 1.250 | 4.800000 |
Cebu | 12 | 453.61429 | 745.6833 | 419.51667 | 733.1857 | 325.60 | 483.65 | 463.0 | 546.90 | 766.8 | 676.85 | 998.5 | 677.55 | 404.625 | 518.93333 | 706.8333 | 784.5333 | 404.625 | 670.1000 | 14.7714286 | 22.133333 | 14.1666667 | 21.600000 | 7.40 | 13.60 | 26.6 | 17.40 | 18.6 | 20.35 | 25.6 | 23.95 | 10.500 | 20.4666667 | 19.766667 | 24.500000 | 10.500 | 21.577778 |
Cebu | 13 | 47.51429 | 210.2500 | 36.81667 | 196.1714 | 17.45 | 40.00 | 28.5 | 94.60 | 216.9 | 195.15 | 281.3 | 186.50 | 28.725 | 72.56667 | 202.4000 | 218.1000 | 28.725 | 164.3556 | 1.9571429 | 6.800000 | 1.6666667 | 6.357143 | 0.70 | 1.50 | 2.1 | 3.60 | 8.3 | 4.75 | 6.2 | 8.40 | 1.100 | 3.1000000 | 5.933333 | 7.666667 | 1.100 | 5.566667 |
Cebu | 14 | 608.85714 | 924.5833 | 527.30000 | 949.3857 | 259.05 | 554.30 | 688.1 | 973.60 | 525.5 | 1039.60 | 800.0 | 1071.40 | 406.675 | 878.43333 | 868.2333 | 980.9333 | 406.675 | 909.2000 | 24.6714286 | 28.050000 | 23.0833333 | 28.928571 | 15.05 | 28.85 | 32.6 | 26.15 | 12.7 | 29.75 | 29.5 | 33.30 | 21.950 | 28.3000000 | 24.066667 | 32.033333 | 21.950 | 28.133333 |
Cebu | 15 | 74.67143 | 440.0833 | 64.21667 | 396.8429 | 62.40 | 39.60 | 80.1 | 119.30 | 292.5 | 237.45 | 607.8 | 632.65 | 51.000 | 106.23333 | 255.8000 | 624.3667 | 51.000 | 328.8000 | 2.2571429 | 10.633333 | 1.6333333 | 9.971429 | 1.65 | 0.90 | 1.5 | 4.60 | 9.4 | 5.80 | 13.6 | 14.60 | 1.275 | 3.5666667 | 7.000000 | 14.266667 | 1.275 | 8.277778 |
Cebu | 16 | 128.45714 | 519.9333 | 90.50000 | 496.5429 | 4.80 | 11.00 | 205.3 | 331.15 | 290.8 | 354.50 | 563.3 | 778.25 | 7.900 | 289.20000 | 333.2667 | 706.6000 | 7.900 | 443.0222 | 4.6857143 | 18.083333 | 4.1000000 | 16.671429 | 0.65 | 2.70 | 8.7 | 8.70 | 7.8 | 12.95 | 25.9 | 24.45 | 1.675 | 8.7000000 | 11.233333 | 24.933333 | 1.675 | 14.955556 |
Cebu | 17 | 130.78571 | 718.8667 | 97.48333 | 663.4000 | 5.50 | 7.80 | 249.9 | 319.50 | 774.6 | 892.90 | 551.9 | 600.45 | 6.650 | 296.30000 | 853.4667 | 584.2667 | 6.650 | 578.0111 | 2.8142857 | 19.716667 | 2.2166667 | 17.814286 | 0.60 | 0.50 | 6.1 | 5.70 | 20.1 | 21.55 | 24.9 | 15.10 | 0.550 | 5.8333333 | 21.066667 | 18.366667 | 0.550 | 15.088889 |
Cebu | 18 | 172.64000 | 497.9500 | 172.64000 | 497.9500 | 29.60 | 234.40 | NaN | 335.20 | NaN | NaN | NaN | 497.95 | 132.000 | 335.20000 | NaN | 497.9500 | 132.000 | 443.7000 | 11.7000000 | 19.100000 | 11.7000000 | 19.100000 | 2.95 | 17.90 | NaN | 16.80 | NaN | NaN | NaN | 19.10 | 10.425 | 16.8000000 | NaN | 19.100000 | 10.425 | 18.333333 |
Cebu | 19 | 74.45714 | 314.7333 | 80.10000 | 275.5714 | 3.65 | 95.70 | 171.3 | 75.60 | 131.3 | 350.50 | 304.6 | 375.75 | 49.675 | 107.50000 | 277.4333 | 352.0333 | 49.675 | 245.6556 | 2.5428571 | 10.466667 | 2.8833333 | 9.042857 | 0.50 | 2.95 | 6.7 | 2.10 | 3.4 | 11.40 | 12.3 | 12.15 | 1.725 | 3.6333333 | 8.733333 | 12.200000 | 1.725 | 8.188889 |
Cebu | 20 | 110.90000 | 583.2000 | 80.51667 | 541.7714 | 7.30 | 120.65 | 77.8 | 221.30 | 391.2 | 582.10 | 466.1 | 738.85 | 63.975 | 173.46667 | 518.4667 | 647.9333 | 63.975 | 446.6222 | 3.2000000 | 16.966667 | 2.1833333 | 15.871429 | 0.50 | 3.85 | 2.7 | 5.50 | 7.9 | 20.15 | 11.8 | 20.90 | 2.175 | 4.5666667 | 16.066667 | 17.866667 | 2.175 | 12.833333 |