Go to the RMD, R, PDF, or HTML version of this file. Go back to fan’s REconTools Package, R Code Examples Repository (bookdown site), or Intro Stats with R Repository (bookdown site).
There is a dataframe composed of M mini-dataframes. Group by a variable that identifies each unique sub-dataframe, and use the sub-dataframes with P rows as inputs to a function.
The function outputs Q by Z rows and columns of results, stack the results. The output file has MxQ rows and the Z columns of additional results should be appended.
M Grouping characteristics, P rows for each group, and N Variables.
# Define
it_M <- 3
it_P <- 5
svr_m <- 'group_m'
svr_mp <- 'info_mp'
# dataframe
set.seed(123)
df_panel_skeleton <- as_tibble(matrix(it_P, nrow=it_M, ncol=1)) %>%
rowid_to_column(var = svr_m) %>%
uncount(V1) %>%
group_by(!!sym(svr_m)) %>% mutate(!!sym(svr_mp) := row_number()) %>%
ungroup() %>%
rowwise() %>% mutate(wage = rnorm(1, 100, 10),
savings = rnorm(1, 200, 30)) %>%
ungroup() %>%
rowid_to_column(var = "id_ji")
# Print
kable(df_panel_skeleton) %>% kable_styling_fc()
id_ji | group_m | info_mp | wage | savings |
---|---|---|---|---|
1 | 1 | 1 | 94.39524 | 253.6074 |
2 | 1 | 2 | 97.69823 | 214.9355 |
3 | 1 | 3 | 115.58708 | 141.0015 |
4 | 1 | 4 | 100.70508 | 221.0407 |
5 | 1 | 5 | 101.29288 | 185.8163 |
6 | 2 | 1 | 117.15065 | 167.9653 |
7 | 2 | 2 | 104.60916 | 193.4608 |
8 | 2 | 3 | 87.34939 | 169.2199 |
9 | 2 | 4 | 93.13147 | 178.1333 |
10 | 2 | 5 | 95.54338 | 181.2488 |
11 | 3 | 1 | 112.24082 | 149.3992 |
12 | 3 | 2 | 103.59814 | 225.1336 |
13 | 3 | 3 | 104.00771 | 204.6012 |
14 | 3 | 4 | 101.10683 | 165.8559 |
15 | 3 | 5 | 94.44159 | 237.6144 |
Use the M sub-dataframes, generate Q by Z result for each of the M groups. Stack all results together.
Base on all the wages for each individual, generate individual specific mean and standard deviations. Do this for three things, the wage variable, the savings variable, and the sum of wage and savings:
First, here is the processing function that takes the dataframe as input, with a parameter for rounding:
# define function
ffi_subset_mean_sd <- function(df_sub, it_round=1) {
#' A function that generates mean and sd for several variables
#'
#' @description
#' Assume there are two variables in df_sub wage and savings
#'
#' @param df_sub dataframe where each individual row is a different
#' data point, over which we compute mean and sd, Assum there are two
#' variables, savings and wage
#' @param it_round integer rounding for resulting dataframe
#' @return a dataframe where each row is aggregate for a different type
#' of variablea and each column is a different statistics
fl_wage_mn = mean(df_sub$wage)
fl_wage_sd = sd(df_sub$wage)
fl_save_mn = mean(df_sub$savings)
fl_save_sd = sd(df_sub$savings)
fl_wgsv_mn = mean(df_sub$wage + df_sub$savings)
fl_wgsv_sd = sd(df_sub$wage + df_sub$savings)
ar_mn <- c(fl_wage_mn, fl_save_mn, fl_wgsv_mn)
ar_sd <- c(fl_wage_sd, fl_save_sd, fl_wgsv_sd)
ar_st_row_lab <- c('wage', 'savings', 'wage_and_savings')
mt_stats <- cbind(ar_mn, ar_sd)
mt_stats <- round(mt_stats, it_round)
ar_st_varnames <- c('mean', 'sd', 'variables')
df_combine <- as_tibble(mt_stats) %>%
add_column(ar_st_row_lab) %>%
rename_all(~c(ar_st_varnames)) %>%
select(variables, 'mean', 'sd') %>%
rowid_to_column(var = "id_q")
return(df_combine)
}
# testing function
ffi_subset_mean_sd(df_panel_skeleton %>% filter(!!sym(svr_m)==1))
Second, call ffi_subset_mean_sd function for each of the groups indexed by j and stack results together with j index:
# run group stats and stack dataframes
df_outputs <- df_panel_skeleton %>% group_by(!!sym(svr_m)) %>%
do(df_stats = ffi_subset_mean_sd(., it_round=2)) %>%
unnest() %>%
rowid_to_column(var = "id_mq")
# print
kable(df_outputs) %>% kable_styling_fc()
id_mq | group_m | id_q | variables | mean | sd |
---|---|---|---|---|---|
1 | 1 | 1 | wage | 101.94 | 8.11 |
2 | 1 | 2 | savings | 203.28 | 42.33 |
3 | 1 | 3 | wage_and_savings | 305.22 | 34.83 |
4 | 2 | 1 | wage | 99.56 | 11.63 |
5 | 2 | 2 | savings | 178.01 | 10.34 |
6 | 2 | 3 | wage_and_savings | 277.56 | 15.48 |
7 | 3 | 1 | wage | 103.08 | 6.39 |
8 | 3 | 2 | savings | 196.52 | 37.86 |
9 | 3 | 3 | wage_and_savings | 299.60 | 33.50 |
In the resulting file, we went from a matrix with MxP rows to a matrix with MxQ Rows.