INPUT MATRIX: There are $N$ students in class, but only a subset of them attend class each day. If student $id_i$ is in class on day $Q$, the teacher records on a sheet the date and the student ID. So if the student has been in class 10 times, the teacher has ten rows of recorded data for the student with two columns: column one is the student ID, and column two is the date on which the student was in class. Suppose there were 50 students, who on average attended exactly 10 classes each during the semester, this means we have 10⋅50 rows of data, with differing numbers of rows for each student. This is the input matrix for this function here.
Assume that each row of the input matrix has unique id/time combinations. Will only keep unique rows.
OUTPUT MATRIX: Now we want to generate a new dataframe, where each row is a date, and each column is a student. The values in the new dataframe shows, at the $Q^th$ day, how many classes student $i$ has attended so far. The following results is also in a REconTools Function. This is shown as df outputed by this function generated below.
This function is useful beyond the roster example. It is used in the optimal allocation problem as well. There are individual recipients of allocation, and each can receive some Q units of allocations. Given total resources available, what is the sequence in which allocation should be given to each. The input dataframe has two columns, the ID of each individual, and the queue rank for a particular allocation for this individual. Expanding to wide gives us a new df where each row is each additional unit of aggregate allocation available, and each column is a different individual. The values says at the current level of overall resources how many units of resources go to each of the individual.
ff_panel_expand_longrosterwide(df, svr_id_t, svr_id_i, st_idcol_prefix = "id_")
string time variable name
string individual ID name
string prefix for wide id
a list of two dataframes
df_roster_wide - a wide dataframe rows are unique dates, columns are individuals, cells are 1 if attended that day
df_roster_wide_cumu - a wide dataframe rows are unique dates, columns are individuals, cells show cumulative attendance
https://fanwangecon.github.io/REconTools/reference/ff_panel_expand_longrosterwide.html https://fanwangecon.github.io/R4Econ/panel/widelong/fs_pivotwider.html https://github.com/FanWangEcon/REconTools/blob/master/R/ff_panel_expand.R
# Generate Input Data Structure
# Define
it_N <- 3
it_M <- 5
svr_id <- 'student_id'
# from : support/rand/fs_rand_draws.Rmd
df_panel_attend_date <- as_tibble(matrix(it_M, nrow=it_N, ncol=1)) %>%
rowid_to_column(var = svr_id) %>%
uncount(V1) %>%
group_by(!!sym(svr_id)) %>% mutate(date = row_number()) %>%
ungroup() %>% mutate(in_class = case_when(rnorm(n(),mean=0,sd=1) < 0 ~ 1, TRUE ~ 0)) %>%
filter(in_class == 1) %>% select(!!sym(svr_id), date) %>%
rename(date_in_class = date)
#> Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if `.name_repair` is omitted as of tibble 2.0.0.
#> Using compatibility `.name_repair`.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
#> # A tibble: 8 x 2
#> student_id date_in_class
#> <int> <int>
#> 1 1 2
#> 2 1 4
#> 3 2 1
#> 4 2 2
#> 5 2 5
#> 6 3 2
#> 7 3 3
#> 8 3 5
# Parameters
df <- df_panel_attend_date
svr_id_i <- 'student_id'
svr_id_t <- 'date_in_class'
st_idcol_prefix <- 'sid_'
# Invoke Function
ls_df_rosterwide <- ff_panel_expand_longrosterwide(df, svr_id_t, svr_id_i, st_idcol_prefix)
df_roster_wide_func <- ls_df_rosterwide$df_roster_wide
df_roster_wide_cumu_func <- ls_df_rosterwide$df_roster_wide_cumu
# Print
#> # A tibble: 5 x 4
#> date_in_class sid_1 sid_2 sid_3
#> <int> <dbl> <dbl> <dbl>
#> 1 1 NA 1 NA
#> 2 2 1 1 1
#> 3 3 NA NA 1
#> 4 4 1 NA NA
#> 5 5 NA 1 1
#> # A tibble: 5 x 4
#> date_in_class sid_1 sid_2 sid_3
#> <int> <dbl> <dbl> <dbl>
#> 1 1 0 1 0
#> 2 2 1 2 1
#> 3 3 1 2 2
#> 4 4 2 2 2
#> 5 5 2 3 3