R/ff_summ_count.R
ff_summ_count_unique_by_groups.Rd
We have multiple groups (country, village), we want to know the number of unique observations within these groups. In addition, we also want to generate the total number of observations for each variable within these country/village groups, these total observations includes multiple values for each unique individual.
ff_summ_count_unique_by_groups(
df,
ar_svr_group = c("S.country", "vil.id"),
svr_unique_identifier = "indi.id"
)
dataframe input dataframe of interest
array string array of variables to group by
string variable that has the unique key of interest
a dataframe with stats outputs.
https://fanwangecon.github.io/REconTools/reference/ff_summ_count_unique_by_groups.html https://github.com/FanWangEcon/REconTools/blob/master/R/ff_summ_count.R
df_uniques_count_by_vil <- ff_summ_count_unique_by_groups(df_hgt_wgt,
ar_svr_group=c('S.country', 'vil.id'),
svr_unique_identifier = 'indi.id')
#> `mutate_if()` ignored the following grouping variables:
#> Columns `S.country`, `vil.id`
#> Adding missing grouping variables: `S.country`, `vil.id`
print(df_uniques_count_by_vil, n=50)
#> # A tibble: 37 x 15
#> # Groups: S.country, vil.id [37]
#> S.country vil.id unique_indi indi.id_n svymthRound_n momEdu_n wealthIdx_n
#> <chr> <dbl> <int> <int> <int> <int> <int>
#> 1 Cebu 1 66 1188 1188 1188 1188
#> 2 Cebu 2 34 612 612 612 612
#> 3 Cebu 3 90 1620 1620 1620 1620
#> 4 Cebu 4 44 792 792 792 792
#> 5 Cebu 5 4 72 72 72 72
#> 6 Cebu 6 54 972 972 972 972
#> 7 Cebu 7 73 1314 1314 1314 1314
#> 8 Cebu 8 42 756 756 756 756
#> 9 Cebu 9 75 1349 1349 1349 1349
#> 10 Cebu 10 70 1260 1260 1260 1260
#> 11 Cebu 11 65 1170 1170 1170 1170
#> 12 Cebu 12 44 792 792 792 792
#> 13 Cebu 13 138 2484 2484 2484 2484
#> 14 Cebu 14 35 630 630 630 630
#> 15 Cebu 15 39 702 702 702 702
#> 16 Cebu 16 39 702 702 702 702
#> 17 Cebu 17 114 2052 2052 2052 2052
#> 18 Cebu 18 14 252 252 252 252
#> 19 Cebu 19 13 234 234 234 234
#> 20 Cebu 20 38 684 684 684 684
#> 21 Cebu 21 8 144 144 144 144
#> 22 Cebu 22 8 144 144 144 144
#> 23 Cebu 23 12 216 216 216 216
#> 24 Cebu 24 7 126 126 126 126
#> 25 Cebu 25 14 252 252 252 252
#> 26 Cebu 26 29 522 522 522 522
#> 27 Cebu 27 30 540 540 540 540
#> 28 Cebu 28 66 1188 1188 1188 1188
#> 29 Cebu 29 16 288 288 288 288
#> 30 Cebu 30 17 306 306 306 306
#> 31 Cebu 31 25 450 450 450 450
#> 32 Cebu 32 19 342 342 342 342
#> 33 Cebu 33 7 126 126 126 126
#> 34 Guatemala 3 186 2976 2976 0 2976
#> 35 Guatemala 6 196 3136 3136 0 3136
#> 36 Guatemala 8 151 2416 2416 0 2416
#> 37 Guatemala 14 141 2256 2256 0 2256
#> # ... with 8 more variables: hgt_n <int>, wgt_n <int>, hgt0_n <int>,
#> # wgt0_n <int>, prot_n <int>, cal_n <int>, p.A.prot_n <int>,
#> # p.A.nProt_n <int>
df_uniques_count_by_mth <- ff_summ_count_unique_by_groups(df_hgt_wgt,
ar_svr_group=c('S.country', 'svymthRound'),
svr_unique_identifier = 'indi.id')
#> `mutate_if()` ignored the following grouping variables:
#> Columns `S.country`, `svymthRound`
#> Adding missing grouping variables: `S.country`, `svymthRound`
print(df_uniques_count_by_mth, n=50)
#> # A tibble: 34 x 15
#> # Groups: S.country, svymthRound [34]
#> S.country svymthRound unique_indi vil.id_n indi.id_n momEdu_n wealthIdx_n
#> <chr> <dbl> <int> <int> <int> <int> <int>
#> 1 Cebu 0 1349 1349 1349 1349 1349
#> 2 Cebu 2 1349 1349 1349 1349 1349
#> 3 Cebu 4 1349 1349 1349 1349 1349
#> 4 Cebu 6 1349 1349 1349 1349 1349
#> 5 Cebu 8 1349 1349 1349 1349 1349
#> 6 Cebu 10 1349 1349 1349 1349 1349
#> 7 Cebu 12 1349 1349 1349 1349 1349
#> 8 Cebu 14 1349 1349 1349 1349 1349
#> 9 Cebu 16 1349 1349 1349 1349 1349
#> 10 Cebu 18 1349 1349 1349 1349 1349
#> 11 Cebu 20 1349 1349 1349 1349 1349
#> 12 Cebu 22 1349 1349 1349 1349 1349
#> 13 Cebu 24 1349 1349 1349 1349 1349
#> 14 Cebu 102 1349 1349 1349 1349 1349
#> 15 Cebu 138 1349 1349 1349 1349 1349
#> 16 Cebu 187 1349 1349 1349 1349 1349
#> 17 Cebu 224 1348 1348 1348 1348 1348
#> 18 Cebu 258 1349 1349 1349 1349 1349
#> 19 Guatemala 0 674 674 674 0 674
#> 20 Guatemala 3 674 674 674 0 674
#> 21 Guatemala 6 674 674 674 0 674
#> 22 Guatemala 9 674 674 674 0 674
#> 23 Guatemala 12 674 674 674 0 674
#> 24 Guatemala 15 674 674 674 0 674
#> 25 Guatemala 18 674 674 674 0 674
#> 26 Guatemala 21 674 674 674 0 674
#> 27 Guatemala 24 674 674 674 0 674
#> 28 Guatemala 30 674 674 674 0 674
#> 29 Guatemala 36 674 674 674 0 674
#> 30 Guatemala 42 674 674 674 0 674
#> 31 Guatemala 48 674 674 674 0 674
#> 32 Guatemala 60 674 674 674 0 674
#> 33 Guatemala 72 674 674 674 0 674
#> 34 Guatemala 84 674 674 674 0 674
#> # ... with 8 more variables: hgt_n <int>, wgt_n <int>, hgt0_n <int>,
#> # wgt0_n <int>, prot_n <int>, cal_n <int>, p.A.prot_n <int>,
#> # p.A.nProt_n <int>
df_uniques_count_by_country <- ff_summ_count_unique_by_groups(df_hgt_wgt,
ar_svr_group=c('S.country'),
svr_unique_identifier = 'indi.id')
#> `mutate_if()` ignored the following grouping variables:
#> Column `S.country`
#> Adding missing grouping variables: `S.country`
print(df_uniques_count_by_country)
#> # A tibble: 2 x 15
#> # Groups: S.country [2]
#> S.country unique_indi vil.id_n indi.id_n svymthRound_n momEdu_n wealthIdx_n
#> <chr> <int> <int> <int> <int> <int> <int>
#> 1 Cebu 1349 24281 24281 24281 24281 24281
#> 2 Guatemala 674 10784 10784 10784 0 10784
#> # ... with 8 more variables: hgt_n <int>, wgt_n <int>, hgt0_n <int>,
#> # wgt0_n <int>, prot_n <int>, cal_n <int>, p.A.prot_n <int>,
#> # p.A.nProt_n <int>