We have multiple groups (country, village), we want to know the number of unique observations within these groups. In addition, we also want to generate the total number of observations for each variable within these country/village groups, these total observations includes multiple values for each unique individual.

ff_summ_count_unique_by_groups(
  df,
  ar_svr_group = c("S.country", "vil.id"),
  svr_unique_identifier = "indi.id"
)

Arguments

df

dataframe input dataframe of interest

ar_svr_group

array string array of variables to group by

svr_unique_identifier

string variable that has the unique key of interest

Value

a dataframe with stats outputs.

Author

Fan Wang, http://fanwangecon.github.io

Examples

df_uniques_count_by_vil <- ff_summ_count_unique_by_groups(df_hgt_wgt,
                         ar_svr_group=c('S.country', 'vil.id'),
                         svr_unique_identifier = 'indi.id')
#> `mutate_if()` ignored the following grouping variables:
#> Columns `S.country`, `vil.id`
#> Adding missing grouping variables: `S.country`, `vil.id`
print(df_uniques_count_by_vil, n=50)
#> # A tibble: 37 x 15
#> # Groups:   S.country, vil.id [37]
#>    S.country vil.id unique_indi indi.id_n svymthRound_n momEdu_n wealthIdx_n
#>    <chr>      <dbl>       <int>     <int>         <int>    <int>       <int>
#>  1 Cebu           1          66      1188          1188     1188        1188
#>  2 Cebu           2          34       612           612      612         612
#>  3 Cebu           3          90      1620          1620     1620        1620
#>  4 Cebu           4          44       792           792      792         792
#>  5 Cebu           5           4        72            72       72          72
#>  6 Cebu           6          54       972           972      972         972
#>  7 Cebu           7          73      1314          1314     1314        1314
#>  8 Cebu           8          42       756           756      756         756
#>  9 Cebu           9          75      1349          1349     1349        1349
#> 10 Cebu          10          70      1260          1260     1260        1260
#> 11 Cebu          11          65      1170          1170     1170        1170
#> 12 Cebu          12          44       792           792      792         792
#> 13 Cebu          13         138      2484          2484     2484        2484
#> 14 Cebu          14          35       630           630      630         630
#> 15 Cebu          15          39       702           702      702         702
#> 16 Cebu          16          39       702           702      702         702
#> 17 Cebu          17         114      2052          2052     2052        2052
#> 18 Cebu          18          14       252           252      252         252
#> 19 Cebu          19          13       234           234      234         234
#> 20 Cebu          20          38       684           684      684         684
#> 21 Cebu          21           8       144           144      144         144
#> 22 Cebu          22           8       144           144      144         144
#> 23 Cebu          23          12       216           216      216         216
#> 24 Cebu          24           7       126           126      126         126
#> 25 Cebu          25          14       252           252      252         252
#> 26 Cebu          26          29       522           522      522         522
#> 27 Cebu          27          30       540           540      540         540
#> 28 Cebu          28          66      1188          1188     1188        1188
#> 29 Cebu          29          16       288           288      288         288
#> 30 Cebu          30          17       306           306      306         306
#> 31 Cebu          31          25       450           450      450         450
#> 32 Cebu          32          19       342           342      342         342
#> 33 Cebu          33           7       126           126      126         126
#> 34 Guatemala      3         186      2976          2976        0        2976
#> 35 Guatemala      6         196      3136          3136        0        3136
#> 36 Guatemala      8         151      2416          2416        0        2416
#> 37 Guatemala     14         141      2256          2256        0        2256
#> # ... with 8 more variables: hgt_n <int>, wgt_n <int>, hgt0_n <int>,
#> #   wgt0_n <int>, prot_n <int>, cal_n <int>, p.A.prot_n <int>,
#> #   p.A.nProt_n <int>
df_uniques_count_by_mth <- ff_summ_count_unique_by_groups(df_hgt_wgt,
                         ar_svr_group=c('S.country', 'svymthRound'),
                         svr_unique_identifier = 'indi.id')
#> `mutate_if()` ignored the following grouping variables:
#> Columns `S.country`, `svymthRound`
#> Adding missing grouping variables: `S.country`, `svymthRound`
print(df_uniques_count_by_mth, n=50)
#> # A tibble: 34 x 15
#> # Groups:   S.country, svymthRound [34]
#>    S.country svymthRound unique_indi vil.id_n indi.id_n momEdu_n wealthIdx_n
#>    <chr>           <dbl>       <int>    <int>     <int>    <int>       <int>
#>  1 Cebu                0        1349     1349      1349     1349        1349
#>  2 Cebu                2        1349     1349      1349     1349        1349
#>  3 Cebu                4        1349     1349      1349     1349        1349
#>  4 Cebu                6        1349     1349      1349     1349        1349
#>  5 Cebu                8        1349     1349      1349     1349        1349
#>  6 Cebu               10        1349     1349      1349     1349        1349
#>  7 Cebu               12        1349     1349      1349     1349        1349
#>  8 Cebu               14        1349     1349      1349     1349        1349
#>  9 Cebu               16        1349     1349      1349     1349        1349
#> 10 Cebu               18        1349     1349      1349     1349        1349
#> 11 Cebu               20        1349     1349      1349     1349        1349
#> 12 Cebu               22        1349     1349      1349     1349        1349
#> 13 Cebu               24        1349     1349      1349     1349        1349
#> 14 Cebu              102        1349     1349      1349     1349        1349
#> 15 Cebu              138        1349     1349      1349     1349        1349
#> 16 Cebu              187        1349     1349      1349     1349        1349
#> 17 Cebu              224        1348     1348      1348     1348        1348
#> 18 Cebu              258        1349     1349      1349     1349        1349
#> 19 Guatemala           0         674      674       674        0         674
#> 20 Guatemala           3         674      674       674        0         674
#> 21 Guatemala           6         674      674       674        0         674
#> 22 Guatemala           9         674      674       674        0         674
#> 23 Guatemala          12         674      674       674        0         674
#> 24 Guatemala          15         674      674       674        0         674
#> 25 Guatemala          18         674      674       674        0         674
#> 26 Guatemala          21         674      674       674        0         674
#> 27 Guatemala          24         674      674       674        0         674
#> 28 Guatemala          30         674      674       674        0         674
#> 29 Guatemala          36         674      674       674        0         674
#> 30 Guatemala          42         674      674       674        0         674
#> 31 Guatemala          48         674      674       674        0         674
#> 32 Guatemala          60         674      674       674        0         674
#> 33 Guatemala          72         674      674       674        0         674
#> 34 Guatemala          84         674      674       674        0         674
#> # ... with 8 more variables: hgt_n <int>, wgt_n <int>, hgt0_n <int>,
#> #   wgt0_n <int>, prot_n <int>, cal_n <int>, p.A.prot_n <int>,
#> #   p.A.nProt_n <int>
df_uniques_count_by_country <- ff_summ_count_unique_by_groups(df_hgt_wgt,
                         ar_svr_group=c('S.country'),
                         svr_unique_identifier = 'indi.id')
#> `mutate_if()` ignored the following grouping variables:
#> Column `S.country`
#> Adding missing grouping variables: `S.country`
print(df_uniques_count_by_country)
#> # A tibble: 2 x 15
#> # Groups:   S.country [2]
#>   S.country unique_indi vil.id_n indi.id_n svymthRound_n momEdu_n wealthIdx_n
#>   <chr>           <int>    <int>     <int>         <int>    <int>       <int>
#> 1 Cebu             1349    24281     24281         24281    24281       24281
#> 2 Guatemala         674    10784     10784         10784        0       10784
#> # ... with 8 more variables: hgt_n <int>, wgt_n <int>, hgt0_n <int>,
#> #   wgt0_n <int>, prot_n <int>, cal_n <int>, p.A.prot_n <int>,
#> #   p.A.nProt_n <int>