1 Drawly Random Rows

Go to the RMD, R, PDF, or HTML version of this file. Go back to fan’s REconTools research support package, R4Econ examples page, PkgTestR packaging guide, or Stat4Econ course page.

1.1 Draw Random Subset of Sample

  • r random discrete

We have a sample of N individuals in some dataframe. Draw without replacement a subset \(M<N\) of rows.

# parameters, it_M < it_N
it_N <- 10
it_M <- 5

# Draw it_m from indexed list of it_N
set.seed(123)
ar_it_rand_idx <- sample(it_N, it_M, replace=FALSE)

# dataframe
df_full <- as_tibble(matrix(rnorm(4,mean=0,sd=1), nrow=it_N, ncol=4)) %>% rowid_to_column(var = "ID")

# random Subset
df_rand_sub_a <- df_full[ar_it_rand_idx,]

# Random subset also
df_rand_sub_b <- df_full[sample(dim(df_full)[1], it_M, replace=FALSE),]

# Print
# Display
kable(df_full) %>% kable_styling_fc()
ID V1 V2 V3 V4
1 0.1292877 0.4609162 0.1292877 0.4609162
2 1.7150650 -1.2650612 1.7150650 -1.2650612
3 0.4609162 0.1292877 0.4609162 0.1292877
4 -1.2650612 1.7150650 -1.2650612 1.7150650
5 0.1292877 0.4609162 0.1292877 0.4609162
6 1.7150650 -1.2650612 1.7150650 -1.2650612
7 0.4609162 0.1292877 0.4609162 0.1292877
8 -1.2650612 1.7150650 -1.2650612 1.7150650
9 0.1292877 0.4609162 0.1292877 0.4609162
10 1.7150650 -1.2650612 1.7150650 -1.2650612
kable(df_rand_sub_a) %>% kable_styling_fc()
ID V1 V2 V3 V4
3 0.4609162 0.1292877 0.4609162 0.1292877
10 1.7150650 -1.2650612 1.7150650 -1.2650612
2 1.7150650 -1.2650612 1.7150650 -1.2650612
8 -1.2650612 1.7150650 -1.2650612 1.7150650
6 1.7150650 -1.2650612 1.7150650 -1.2650612
kable(df_rand_sub_b) %>% kable_styling_fc()
ID V1 V2 V3 V4
5 0.1292877 0.4609162 0.1292877 0.4609162
3 0.4609162 0.1292877 0.4609162 0.1292877
9 0.1292877 0.4609162 0.1292877 0.4609162
1 0.1292877 0.4609162 0.1292877 0.4609162
4 -1.2650612 1.7150650 -1.2650612 1.7150650

1.2 Random Subset of Panel

There are \(N\) individuals, each could be observed \(M\) times, but then select a subset of rows only, so each person is randomly observed only a subset of times. Specifically, there there are 3 unique students with student ids, and the second variable shows the random dates in which the student showed up in class, out of the 10 classes available.

# Define
it_N <- 3
it_M <- 10
svr_id <- 'student_id'

# dataframe
set.seed(123)
df_panel_rand <- as_tibble(matrix(it_M, nrow=it_N, ncol=1)) %>%
  rowid_to_column(var = svr_id) %>%
  uncount(V1) %>%
  group_by(!!sym(svr_id)) %>% mutate(date = row_number()) %>%
  ungroup() %>% mutate(in_class = case_when(rnorm(n(),mean=0,sd=1) < 0 ~ 1, TRUE ~ 0)) %>%
  dplyr::filter(in_class == 1) %>% select(!!sym(svr_id), date) %>%
  rename(date_in_class = date)

# Print
kable(df_panel_rand) %>% kable_styling_fc()
student_id date_in_class
1 1
1 2
1 8
1 9
1 10
2 5
2 8
2 10
3 1
3 2
3 3
3 4
3 5
3 6
3 9