INPUT MATRIX: There are $N$ students in class, but only a subset of them attend class each day. If student $id_i$ is in class on day $Q$, the teacher records on a sheet the date and the student ID. So if the student has been in class 10 times, the teacher has ten rows of recorded data for the student with two columns: column one is the student ID, and column two is the date on which the student was in class. Suppose there were 50 students, who on average attended exactly 10 classes each during the semester, this means we have \(10 \cdot 50\) rows of data, with differing numbers of rows for each student. This is the input matrix for this function here.

Assume that each row of the input matrix has unique id/time combinations. Will only keep unique rows.

OUTPUT MATRIX: Now we want to generate a new dataframe, where each row is a date, and each column is a student. The values in the new dataframe shows, at the $Q^th$ day, how many classes student $i$ has attended so far. The following results is also in a REconTools Function. This is shown as df outputed by this function generated below.

This function is useful beyond the roster example. It is used in the optimal allocation problem as well. There are individual recipients of allocation, and each can receive some Q units of allocations. Given total resources available, what is the sequence in which allocation should be given to each. The input dataframe has two columns, the ID of each individual, and the queue rank for a particular allocation for this individual. Expanding to wide gives us a new df where each row is each additional unit of aggregate allocation available, and each column is a different individual. The values says at the current level of overall resources how many units of resources go to each of the individual.

ff_panel_expand_longrosterwide(df, svr_id_t, svr_id_i, st_idcol_prefix = "id_")

Arguments

svr_id_t

string time variable name

svr_id_i

string individual ID name

st_idcol_prefix

string prefix for wide id

Value

a list of two dataframes

  • df_roster_wide - a wide dataframe rows are unique dates, columns are individuals, cells are 1 if attended that day

  • df_roster_wide_cumu - a wide dataframe rows are unique dates, columns are individuals, cells show cumulative attendance

Author

Fan Wang, http://fanwangecon.github.io

Examples

library(dplyr)
library(tidyr)
library(tibble)
# Generate Input Data Structure
# Define
it_N <- 3
it_M <- 5
svr_id <- 'student_id'

# from : support/rand/fs_rand_draws.Rmd
set.seed(222)
df_panel_attend_date <- as_tibble(matrix(it_M, nrow=it_N, ncol=1)) %>%
  rowid_to_column(var = svr_id) %>%
  uncount(V1) %>%
  group_by(!!sym(svr_id)) %>% mutate(date = row_number()) %>%
  ungroup() %>% mutate(in_class = case_when(rnorm(n(),mean=0,sd=1) < 0 ~ 1, TRUE ~ 0)) %>%
  filter(in_class == 1) %>% select(!!sym(svr_id), date) %>%
  rename(date_in_class = date)
#> Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if `.name_repair` is omitted as of tibble 2.0.0.
#> Using compatibility `.name_repair`.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

print(df_panel_attend_date)
#> # A tibble: 8 x 2
#>   student_id date_in_class
#>        <int>         <int>
#> 1          1             2
#> 2          1             4
#> 3          2             1
#> 4          2             2
#> 5          2             5
#> 6          3             2
#> 7          3             3
#> 8          3             5

# Parameters
df <- df_panel_attend_date
svr_id_i <- 'student_id'
svr_id_t <- 'date_in_class'
st_idcol_prefix <- 'sid_'

# Invoke Function
ls_df_rosterwide <- ff_panel_expand_longrosterwide(df, svr_id_t, svr_id_i, st_idcol_prefix)
df_roster_wide_func <- ls_df_rosterwide$df_roster_wide
df_roster_wide_cumu_func <- ls_df_rosterwide$df_roster_wide_cumu

# Print
print(df_roster_wide_func)
#> # A tibble: 5 x 4
#>   date_in_class sid_1 sid_2 sid_3
#>           <int> <dbl> <dbl> <dbl>
#> 1             1    NA     1    NA
#> 2             2     1     1     1
#> 3             3    NA    NA     1
#> 4             4     1    NA    NA
#> 5             5    NA     1     1
print(df_roster_wide_cumu_func)
#> # A tibble: 5 x 4
#>   date_in_class sid_1 sid_2 sid_3
#>           <int> <dbl> <dbl> <dbl>
#> 1             1     0     1     0
#> 2             2     1     2     1
#> 3             3     1     2     2
#> 4             4     2     2     2
#> 5             5     2     3     3