flowchart TD
subgraph inputs [Inputs]
whitem[data-raw/whitem160aggregate_wthhhkey_loanamount.rda]
loans[data/tstm_loans.rda]
end
subgraph step0 [Section0_Loan_amounts]
loans --> loanAmt[tstm_loans_amount]
end
subgraph step1 [Section1_Load_and_select]
whitem --> sel[tstm_asset_sel]
end
subgraph step2 [Section2_BS_composites]
sel --> bsVars[agg_BS_101x_102x_201x_301x]
end
subgraph step3 [Section3_IS_composites]
bsVars --> isVars[agg_IS_101x_111x_201x_301x]
end
subgraph step4 [Section4_Loans_and_merge]
loanAmt --> loanSel[tstm_loan_sel]
isVars --> merge[left_join_and_loan_NA_to_zero]
loanSel --> merge
merge --> prc[tstm_asset_loan_prc]
end
subgraph step5 [Section5_QC_and_filter]
prc --> nacheck[agg_nonna_count]
prc --> zerocheck[agg_sum_is0]
nacheck --> filter[keep_full_info_rows]
zerocheck --> filter
filter --> assetLoan[tstm_asset_loan]
end
subgraph step6 [Section6_HH_span]
assetLoan --> hhSpan[tstm_hh_mthspan]
end
subgraph step7 [Section7_Save_validate_report]
loanAmt --> saveOut[ffv_save_gateway_tstm]
assetLoan --> saveOut
hhSpan --> saveOut
saveOut --> dataOut[data_or_data_temp]
saveOut --> extdata[inst/extdata_csv_dta]
dataOut --> compare[compare_vs_canonical_data_folder]
compare --> report[report_identical_or_not]
end
This vignette implements PrjThaiHFID-#5: merge the White balance-sheet panel with household-month loan flows, build composite agg_BS_* and agg_IS_* variables, filter to full-information rows, and derive household survey spans.
Production logic is ported from data-raw/tstm_loans_amount.R, data-raw/ffs_hfid_gen_a_data.R (Step A), and data-raw/ffs_hfid_gen_b_data.R (Step B) into a single in-memory pipeline (no data-temp/tstm_asset_loan_prc.csv or data-csv/tstm_loans_amount.csv bridge files).
Unit of observation (final): household × month (tstm_asset_loan, tstm_loans_amount); one row per household for span metadata (tstm_hh_mthspan).
Outputs at a glance
This vignette produces three datasets. Use the links below to jump to the rendered data documentation page:
| Object | Data page |
|---|---|
tstm_loans_amount |
reference |
tstm_asset_loan |
reference |
tstm_hh_mthspan |
reference |
The three outputs are documented in R/data-res.R; the upstream input tstm_loans is documented in R/data.R.
What ships vs. what is generated locally
- Shipped with the installed package: the packaged datasets in
data/(lazy-loaded, e.g.tstm_loans).- Generated locally when this vignette runs (neither pushed to GitHub nor shipped in the package): the
data/(ordata-temp/).rdaoutputs above; theirinst/extdata/*.csv/*.dtaexports; and any review tables written tores/res_gen_asset_loan/(.tex/.csv, only when the matchingls_save_rescontrol isTRUE). Theinst/extdata/andres/artifacts duplicate the vignette content and are git-ignored, saved only for local convenience.
Required inputs
| File | Object | Role |
|---|---|---|
data-raw/whitem160aggregate_wthhhkey_loanamount.rda |
whitem160aggregate_wthhhkey_loanamount |
White balance-sheet panel (household × month); anonymized id (tmid_hh); raw agg_BS_*, agg_IS_*, agg_CF_*
|
data/tstm_loans.rda |
tstm_loans |
Loan-level file with anonymized hhid_Num (tmid_hh); aggregated to household-month loan flows |
Both inputs already store anonymized household IDs (tmid_hh, assigned by the anonymization pipeline in data-raw/id_anonymize/). The whitem panel .rda is stripped of all true identifiers at the anonymization step (true province / village / household / hhid / hhid_month are dropped before it is saved), and the pipeline keys only on the anonymized IDs (id / hhid_Num) and month, so no true identifiers ever enter the pipeline.
Saved outputs
| Object | Unit |
.rda destination |
Also exported |
|---|---|---|---|
tstm_loans_amount |
household × month |
data/ or data-temp/
|
inst/extdata/tstm_loans_amount.{csv,dta} |
tstm_asset_loan |
household × month (full-info rows) |
data/ or data-temp/
|
inst/extdata/tstm_asset_loan.{csv,dta} |
tstm_hh_mthspan |
household |
data/ or data-temp/
|
inst/extdata/tstm_hh_mthspan.{csv,dta} |
Because both inputs use anonymized household IDs and true geography is not carried through, all three saved objects contain only anonymized tmid_hh identifiers (id in tstm_asset_loan / tstm_hh_mthspan, hhid_Num in tstm_loans_amount). No true household, province, or village codes are written to data/, data-temp/, or inst/extdata/.
Set bl_replace_data_output <- TRUE to overwrite canonical data/*.rda; FALSE (default) writes to gitignored data-temp/ for safe validation against data/.
Downstream use
tstm_loans_amount
Downstream: input to the asset–loan merge in this vignette; formerly loaded from data-csv/tstm_loans_amount.csv in data-raw/ffs_hfid_gen_a_data.R.
Package: documented in R/data-res.R; hhid_Num matches anonymized id in tstm_asset_loan and tstm_loans_panel.
tstm_asset_loan
Vignettes
| Vignette | Role |
|---|---|
ffv_invest_loan_bridge.qmd |
Gateway Group B input |
ffv_invest_return_bridge.qmd |
Gateway + ff_hfid_invest_winstats() balance windows |
ffv_invest_freq_sizes.qmd |
Investment frequency / asset baseline tables |
ffv_bridge_timing.qmd |
References data/tstm_asset_loan.rda
|
R-script
| Script | Role |
|---|---|
R-script/ffv_mbf_size/ffs_mbf_village_size_main.R |
Village asset/income baselines |
R-script/ffv_mbf_size/ffs_mbf_size_by_village_plan.R |
ID crosswalk + village aggregates |
R-script/ffs_misc/ffs_depreciation_rate/ |
Composite agg_BS_* definitions documented in ?tstm_asset_loan
|
Package R code: ffp_hfid_invest_loan_linked_abc_investloan_char_gateway() (Group B); ff_hfid_invest_winstats().
tstm_hh_mthspan
Vignettes: ffv_invest_freq_sizes.qmd — household survey span for investment-frequency analysis.
Package: documented in R/data-res.R alongside tstm_asset_loan.
Pipeline structure
Control parameters
library(PrjThaiHFID)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
library(readr)
library(glue)
library(kableExtra)
#>
#> Attaching package: 'kableExtra'
#> The following object is masked from 'package:dplyr':
#>
#> group_rows
bs_style <- c("striped", "hover", "condensed", "responsive")
options(kable_styling_bootstrap_options = bs_style)
verbose <- TRUE
verbose_detail <- FALSE
it_verbose_detail_nrow <- 100L
it_file_code <- 963158L
it_file_code_qc <- 54767L
st_kableformat <- "html"
# TRUE: overwrite canonical data/*.rda; FALSE: write to data-temp/ (gitignored)
bl_replace_data_output <- FALSE
# here::here() anchors on _quarto.yml under Quarto (-> vignettes/); find the
# package root explicitly so paths resolve under both Quarto and knitr.
spn_pkg_root <- rprojroot::find_root(rprojroot::has_file("DESCRIPTION"))
st_data_out <- file.path(
spn_pkg_root,
if (bl_replace_data_output) "data" else "data-temp",
fsep = .Platform$file.sep
)
st_data_canonical <- file.path(spn_pkg_root, "data")
st_extdata_dir <- file.path(spn_pkg_root, "inst", "extdata")
for (st_dir in c(st_data_out, st_extdata_dir)) {
if (!dir.exists(st_dir)) {
dir.create(st_dir, recursive = TRUE)
}
}
# res/ auto-save controls. spt_res is the per-vignette output folder; each entry
# of ls_save_res toggles whether that table is written (TRUE -> .tex + .csv via
# ffp_save_res_table()). Default all FALSE so a knit never writes to res/.
# Flip any subset, or `ls_save_res[] <- TRUE` to enable everything.
spt_res <- file.path(spn_pkg_root, "res", "res_gen_asset_loan")
ls_save_res <- list(
tb_lender_formal = FALSE,
tb_nonna = FALSE,
tb_zero = FALSE,
tb_nonna_zero = FALSE,
tb_hh_span = FALSE,
tb_funnel = FALSE,
tb_cols_loans_amount = FALSE,
tb_cols_asset_loan = FALSE,
tb_cols_hh_span = FALSE,
tb_validate = FALSE
)
ar_svr_capital_var <- c(
"agg_BS_08",
"agg_BS_09", "agg_BS_10", "agg_BS_11", "agg_BS_12",
"agg_BS_13"
)
ar_svr_stock_var <- c(
"agg_BS_01",
"agg_BS_03",
"agg_BS_17"
)
ar_svr_flow_var <- c(
"agg_IS_01", "agg_IS_02", "agg_IS_03", "agg_IS_04", "agg_IS_05", "agg_IS_06",
"agg_IS_07",
"agg_IS_08", "agg_IS_09", "agg_IS_12", "agg_IS_13", "agg_IS_14", "agg_IS_15",
"agg_IS_16",
"agg_IS_20",
"agg_IS_29",
"agg_CF_01"
)
ffv_save_gateway_tstm <- function(obj, name, out_dir, extdata_dir = st_extdata_dir) {
if (!dir.exists(out_dir)) {
dir.create(out_dir, recursive = TRUE)
}
if (!dir.exists(extdata_dir)) {
dir.create(extdata_dir, recursive = TRUE)
}
st_rda <- file.path(out_dir, paste0(name, ".rda"))
st_csv <- file.path(extdata_dir, paste0(name, ".csv"))
st_dta <- file.path(extdata_dir, paste0(name, ".dta"))
tmp_env <- new.env(parent = emptyenv())
tmp_env[[name]] <- obj
save(list = name, file = st_rda, envir = tmp_env)
readr::write_csv(obj, st_csv)
if (!requireNamespace("haven", quietly = TRUE)) {
stop(
"Package 'haven' is required to write .dta to inst/extdata. ",
"Install with: install.packages('haven')",
call. = FALSE
)
}
haven::write_dta(obj, st_dta)
invisible(list(rda = st_rda, csv = st_csv, dta = st_dta))
}
ffv_render_table <- function(df, caption, st_format = st_kableformat, name = NULL) {
# Optionally persist a LaTeX + CSV copy to res/ when this table's control is on.
if (!is.null(name) && isTRUE(ls_save_res[[name]])) {
kbl_latex <- kbl(
df,
format = "latex",
booktabs = TRUE,
caption = caption,
digits = 4
)
ffp_save_res_table(kbl_latex, name, spt_res, df = df, bl_save = TRUE)
}
kbl(
df,
format = st_format,
booktabs = TRUE,
caption = caption,
digits = 4
) %>%
kable_styling(
bootstrap_options = bs_style,
full_width = FALSE,
position = "left"
)
}
ffv_compare_tstm <- function(generated, canonical_path, name,
sort_keys = c("id", "month")) {
e_can <- new.env()
load(canonical_path, envir = e_can)
gen <- as.data.frame(generated) %>% arrange(across(all_of(sort_keys)))
can <- as.data.frame(get(name, envir = e_can)) %>% arrange(across(all_of(sort_keys)))
ae <- all.equal(gen, can)
msg <- if (isTRUE(ae)) {
"TRUE"
} else if (is.character(ae)) {
ae[[1]]
} else {
"FALSE (no detail)"
}
list(
object = name,
dims_match = identical(dim(gen), dim(can)),
cols_match = identical(sort(names(gen)), sort(names(can))),
identical = isTRUE(ae),
all_equal_msg = as.character(msg)[1]
)
}
ffv_load_tstm_rda <- function(path, name, sort_keys = c("id", "month")) {
e <- new.env()
load(path, envir = e)
get(name, envir = e) %>% arrange(across(all_of(sort_keys)))
}
if (verbose) {
print(glue::glue("f-{it_file_code}, controls"))
print(glue::glue(" bl_replace_data_output: {bl_replace_data_output} -> {st_data_out}/"))
}
#> f-963158, controls
#> bl_replace_data_output: FALSE -> /home/runner/work/PrjThaiHFID/PrjThaiHFID/data-temp/Section 0 — Aggregate tstm_loans to household-month loan flows
Port of data-raw/tstm_loans_amount.R: formal/informal classification, positive S_Init_Amount sums by household-month, wide format with amt_all_lenders, amt_formal, amt_informal.
data(tstm_loans)
tstm_loans_fi <- tstm_loans %>%
mutate(
formal = case_when(
G_LenderType == "Agri Coop" ~ FALSE,
G_LenderType == "PCG" ~ FALSE,
G_LenderType == "MoneyLender" ~ FALSE,
G_LenderType == "Neighbor" ~ FALSE,
G_LenderType == "Relatives" ~ FALSE,
G_LenderType == "Others" ~ FALSE,
G_LenderType == "Other Non-Indi Formal or Informal" & G_Location == "Village" ~ FALSE,
TRUE ~ TRUE
),
formal_st = if_else(formal, "amt_formal", "amt_informal")
)
tstm_loans_fi_amt <- tstm_loans_fi %>%
select(
contains("Num"), surveymonth,
formal_st, G_LenderType,
S_Init_Amount
) %>%
filter(S_Init_Amount > 0)
tstm_loans_fi_amt_jnt <- tstm_loans_fi_amt %>%
group_by(hhid_Num, surveymonth) %>%
summarize(new_loan_amt = sum(S_Init_Amount, na.rm = TRUE), .groups = "drop") %>%
filter(new_loan_amt > 0) %>%
mutate(lender_type_mix = "amt_all_lenders")
tstm_loans_fi_amt_fi <- tstm_loans_fi_amt %>%
group_by(hhid_Num, surveymonth, formal_st) %>%
summarize(new_loan_amt = sum(S_Init_Amount, na.rm = TRUE), .groups = "drop") %>%
filter(new_loan_amt > 0) %>%
rename(lender_type_mix = formal_st)
tstm_loas_amt_m <- bind_rows(
tstm_loans_fi_amt_jnt,
tstm_loans_fi_amt_fi
)
tstm_loans_amount <- tstm_loas_amt_m %>%
pivot_wider(
id_cols = c("hhid_Num", "surveymonth"),
names_from = lender_type_mix,
names_prefix = "",
values_from = new_loan_amt
)
if (verbose) {
print(glue::glue("f-{it_file_code}, loans-amount: {nrow(tstm_loans_amount)} household-months"))
}
#> f-963158, loans-amount: 15999 household-months| G_LenderType | formal | n |
|---|---|---|
| Agri Coop | FALSE | 577 |
| BAAC | TRUE | 3396 |
| Commercial Bank | TRUE | 64 |
| MoneyLender | FALSE | 625 |
| Neighbor | FALSE | 998 |
| Other Non-Indi Formal or Informal | FALSE | 2732 |
| Other Non-Indi Formal or Informal | TRUE | 1927 |
| Others | FALSE | 1349 |
| PCG | FALSE | 501 |
| Relatives | FALSE | 1363 |
| Village Fund | TRUE | 7408 |
Section 1 — Load White panel and select variables
load(file.path(spn_pkg_root, "data-raw", "whitem160aggregate_wthhhkey_loanamount.rda"))
tstm_asset <- whitem160aggregate_wthhhkey_loanamount
tstm_asset_sel <- tstm_asset %>%
select(
id, month,
dplyr::any_of(c(ar_svr_capital_var, ar_svr_stock_var, ar_svr_flow_var))
)
if (verbose) {
print(glue::glue("f-{it_file_code}, load: whitem {nrow(tstm_asset)} rows"))
print(glue::glue(" tstm_asset_sel: {nrow(tstm_asset_sel)} rows x {ncol(tstm_asset_sel)} cols"))
}
#> f-963158, load: whitem 127351 rows
#> tstm_asset_sel: 127351 rows x 28 colsSection 2 — BS composite variables
Joint, agriculture, and business revenue-asset aggregates (agg_BS_101*–agg_BS_302*).
tstm_asset_sel <- tstm_asset_sel %>%
rowwise() %>%
mutate(
# 1. Joint group
# All rev-assets: livestock + household + Agri + business + land
# problem: is household asset really revenue generating?
agg_BS_1011 = sum(agg_BS_08, agg_BS_10, agg_BS_11, agg_BS_12, agg_BS_13,
na.rm = TRUE
),
# Core 1 rev-assets: livestock + Agri + business + land
# all are potentially revenue generating
agg_BS_1012 = sum(agg_BS_08, agg_BS_11, agg_BS_12, agg_BS_13,
na.rm = TRUE
),
# Core 2 rev-assets: livestock + agri + business
# Livestock assets/revenue and land might be vastly different
agg_BS_1021 = sum(agg_BS_08, agg_BS_11, agg_BS_12, na.rm = TRUE),
# Core 3 rev-assets: agri + business
# Livestock assets/revenue and land might be vastly different
agg_BS_1022 = sum(agg_BS_11, agg_BS_12, na.rm = TRUE),
# 2025-05-15 05:56:00, Core 4 rev-assets: agri + business + land
# land might relate to cultivation revenue, livestock capital gains loss confusing
agg_BS_1023 = sum(agg_BS_11, agg_BS_12, agg_BS_13, na.rm = TRUE),
# land alone
agg_BS_1024 = sum(agg_BS_13, na.rm = TRUE),
# 2025-05-17 09:41:12, Livestock + agri assets combined, these are the combined agri assets
agg_BS_1025 = sum(agg_BS_08, agg_BS_11, na.rm = TRUE),
# 2. Agri group
# All agri-assets: livestock + agri + land
agg_BS_2011 = sum(agg_BS_08, agg_BS_11, agg_BS_13, na.rm = TRUE),
# core agri-assets: agri asset only
agg_BS_2021 = sum(agg_BS_11, na.rm = TRUE),
# 3. Business group
# All biz-assets: household + business
# household assets might faciliate business, a car for example.
agg_BS_3011 = sum(agg_BS_10, agg_BS_12, na.rm = TRUE),
# core biz-assets: business asset only
agg_BS_3021 = sum(agg_BS_12, na.rm = TRUE),
# core biz-assets: household asset alone
agg_BS_3022 = sum(agg_BS_10, na.rm = TRUE)
) %>%
ungroup()
if (verbose) {
print(glue::glue("f-{it_file_code}, bs-composites: {ncol(tstm_asset_sel)} cols"))
}
#> f-963158, bs-composites: 40 colsSection 3 — IS composite variables
Income and net-income aggregates (agg_IS_101*–agg_IS_302*); drop raw capital/flow inputs used only for construction.
tstm_asset_sel <- tstm_asset_sel %>%
rowwise() %>%
mutate(
# 1. All
# All income
agg_IS_1011 = sum(agg_IS_07, na.rm = TRUE),
# Income minus wage
agg_IS_1012 = sum(agg_IS_07, (-1) * agg_IS_05, na.rm = TRUE),
# income minus wage, minus other income, don't know what other income is
# to match up with BS_1021
agg_IS_1013 = sum(agg_IS_07, (-1) * sum(agg_IS_05, agg_IS_06,
na.rm = TRUE
), na.rm = TRUE),
# income minus wage, minus other income, fish and livestock income
agg_IS_1021 = sum(agg_IS_07, (-1) * sum(
agg_IS_02, agg_IS_03, agg_IS_05, agg_IS_06,
na.rm = TRUE
), na.rm = TRUE),
# total income minus total costs
agg_IS_1111 = sum(agg_IS_07, (-1) * agg_IS_16, na.rm = TRUE),
# 2025-05-15 05:47:08, total income minus total costs, excluding income and costs from labor
agg_IS_1112 = sum(
agg_IS_07, (-1) * (agg_IS_05),
(-1) * (agg_IS_16), (agg_IS_14),
na.rm = TRUE),
# 2025-05-15 05:47:15, total income minus total costs, excluding income and costs from labor and "others"
agg_IS_1113 = sum(
agg_IS_07, (-1) * (agg_IS_05), (-1) * (agg_IS_06),
(-1) * (agg_IS_16), (agg_IS_14), (agg_IS_15),
na.rm = TRUE),
# 2025-05-15 05:49:48, Fish/shrimp + biz + cultivation only, no livestock
agg_IS_1114 = sum(
agg_IS_07, (-1) * (agg_IS_05), (-1) * (agg_IS_06), (-1) * (agg_IS_02),
(-1) * (agg_IS_16), (agg_IS_14), (agg_IS_15), (agg_IS_09),
na.rm = TRUE),
# from 1111: subtract from gross profit also depreciation of fixed asset
agg_IS_1115 = sum(
agg_IS_07, (-1) * agg_IS_16, (-1) * agg_IS_20,
na.rm = TRUE),
# from 1113: subtract from gross profit also depreciation of fixed asset
agg_IS_1116 = sum(
agg_IS_07, (-1) * (agg_IS_05), (-1) * (agg_IS_06),
(-1) * (agg_IS_16), (agg_IS_14), (agg_IS_15),
(-1) * agg_IS_20,
na.rm = TRUE),
# 2. Agri group
# cultivation + livestock + fish
agg_IS_2011 = sum(agg_IS_01, agg_IS_02, agg_IS_03, na.rm = TRUE),
# net for (cultivation + livestock + fish)
agg_IS_2012 = sum(
agg_IS_01, agg_IS_02, agg_IS_03,
(-1) * (agg_IS_08), (-1) * (agg_IS_09), (-1) * (agg_IS_12),
na.rm = TRUE),
# 2025-05-15 05:59:13, cultivation + livestock
agg_IS_2013 = sum(agg_IS_01, agg_IS_02, na.rm = TRUE),
# net for cultivation + livestock
agg_IS_2014 = sum(
agg_IS_01, agg_IS_02,
(-1) * (agg_IS_08), (-1) * (agg_IS_09),
na.rm = TRUE),
# 2025-05-15 05:59:22, cultivation + shrimp
agg_IS_2015 = sum(agg_IS_01, agg_IS_03, na.rm = TRUE),
# net for cultivation + shrimp
agg_IS_2016 = sum(
agg_IS_01, agg_IS_03,
(-1) * (agg_IS_08), (-1) * (agg_IS_12),
na.rm = TRUE),
# cultivation only
agg_IS_2021 = sum(agg_IS_01, na.rm = TRUE),
# net for cultivation only
agg_IS_2022 = sum(
agg_IS_01,
(-1) * (agg_IS_08),
na.rm = TRUE),
# 3. Business group
# Business and labor income, labor if they pay themselves?
agg_IS_3011 = sum(agg_IS_04, agg_IS_05, na.rm = TRUE),
# Pure business
agg_IS_3021 = sum(agg_IS_04, na.rm = TRUE),
# Business revenue minus profits
agg_IS_3022 = sum(
agg_IS_04,
(-1) * agg_IS_13,
na.rm = TRUE)
) %>%
select(
id, month,
dplyr::any_of(ar_svr_stock_var),
matches("agg_BS_101|agg_BS_102"),
matches("agg_BS_201|agg_BS_202"),
matches("agg_BS_301|agg_BS_302"),
matches("agg_IS_101|agg_IS_102"),
matches("agg_IS_111"),
matches("agg_IS_201|agg_IS_202"),
matches("agg_IS_301|agg_IS_302")
) %>%
ungroup()
if (verbose) {
print(glue::glue("f-{it_file_code}, is-composites: {ncol(tstm_asset_sel)} cols"))
}
#> f-963158, is-composites: 38 colsSection 4 — Loan variables and merge
tstm_loan_sel <- tstm_loans_amount %>%
mutate(
id = as.integer(hhid_Num),
month = as.integer(surveymonth)
) %>%
rename(
agg_loanflow_all = amt_all_lenders,
agg_loanflow_inf = amt_informal,
agg_loanflow_for = amt_formal
) %>%
select(id, month, agg_loanflow_all, agg_loanflow_inf, agg_loanflow_for)
tstm_asset_loan_prc <- tstm_asset_sel %>%
left_join(tstm_loan_sel, by = c("id", "month"))
tstm_asset_loan_prc <- tstm_asset_loan_prc %>%
mutate(
agg_loanflow_all = ifelse(
is.na(agg_loanflow_all) & !is.na(agg_BS_01), 0, agg_loanflow_all
),
agg_loanflow_inf = ifelse(
is.na(agg_loanflow_inf) & !is.na(agg_BS_01), 0, agg_loanflow_inf
),
agg_loanflow_for = ifelse(
is.na(agg_loanflow_for) & !is.na(agg_BS_01), 0, agg_loanflow_for
)
)
if (verbose) {
print(glue::glue("f-{it_file_code}, merge: tstm_asset_loan_prc {nrow(tstm_asset_loan_prc)} rows"))
summary(tstm_asset_loan_prc)
str(tstm_asset_loan_prc)
}
#> f-963158, merge: tstm_asset_loan_prc 127351 rows
#> tibble [127,351 × 41] (S3: tbl_df/tbl/data.frame)
#> $ id : int [1:127351] 3078 3078 3078 3078 3078 3078 3078 3078 3078 3078 ...
#> $ month : num [1:127351] 0 1 2 3 4 5 6 7 8 9 ...
#> $ agg_BS_01 : num [1:127351] 283 1186 1953 31 0 ...
#> $ agg_BS_03 : num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#> $ agg_BS_17 : num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#> $ agg_BS_1011 : num [1:127351] 152226 152205 152183 152162 152141 ...
#> $ agg_BS_1012 : num [1:127351] 151000 151000 151000 151000 151000 151000 151000 151000 151000 151000 ...
#> $ agg_BS_1021 : num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#> $ agg_BS_1022 : num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#> $ agg_BS_1023 : num [1:127351] 151000 151000 151000 151000 151000 151000 151000 151000 151000 151000 ...
#> $ agg_BS_1024 : num [1:127351] 151000 151000 151000 151000 151000 151000 151000 151000 151000 151000 ...
#> $ agg_BS_1025 : num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#> $ agg_BS_2011 : num [1:127351] 151000 151000 151000 151000 151000 151000 151000 151000 151000 151000 ...
#> $ agg_BS_2021 : num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#> $ agg_BS_3011 : num [1:127351] 1226 1205 1183 1162 1141 ...
#> $ agg_BS_3021 : num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#> $ agg_BS_3022 : num [1:127351] 1226 1205 1183 1162 1141 ...
#> $ agg_IS_1011 : num [1:127351] 0 1600 1400 1530 1500 1500 1200 1200 1220 1200 ...
#> $ agg_IS_1012 : num [1:127351] 0 0 0 20 0 0 0 0 20 0 ...
#> $ agg_IS_1013 : num [1:127351] 0 0 0 20 0 0 0 0 0 0 ...
#> $ agg_IS_1021 : num [1:127351] 0 0 0 20 0 0 0 0 0 0 ...
#> $ agg_IS_1111 : num [1:127351] 0 1600 1400 1330 1500 1500 1200 1200 1220 1200 ...
#> $ agg_IS_1112 : num [1:127351] 0 0 0 -180 0 0 0 0 20 0 ...
#> $ agg_IS_1113 : num [1:127351] 0 0 0 -180 0 0 0 0 0 0 ...
#> $ agg_IS_1114 : num [1:127351] 0 0 0 -180 0 0 0 0 0 0 ...
#> $ agg_IS_1115 : num [1:127351] 0 1578 1378 1309 1479 ...
#> $ agg_IS_1116 : num [1:127351] 0 -21.9 -21.5 -201.1 -20.7 ...
#> $ agg_IS_2011 : num [1:127351] 0 0 0 20 0 0 0 0 0 0 ...
#> $ agg_IS_2012 : num [1:127351] 0 0 0 -180 0 0 0 0 0 0 ...
#> $ agg_IS_2013 : num [1:127351] 0 0 0 20 0 0 0 0 0 0 ...
#> $ agg_IS_2014 : num [1:127351] 0 0 0 -180 0 0 0 0 0 0 ...
#> $ agg_IS_2015 : num [1:127351] 0 0 0 20 0 0 0 0 0 0 ...
#> $ agg_IS_2016 : num [1:127351] 0 0 0 -180 0 0 0 0 0 0 ...
#> $ agg_IS_2021 : num [1:127351] 0 0 0 20 0 0 0 0 0 0 ...
#> $ agg_IS_2022 : num [1:127351] 0 0 0 -180 0 0 0 0 0 0 ...
#> $ agg_IS_3011 : num [1:127351] 0 1600 1400 1510 1500 1500 1200 1200 1200 1200 ...
#> $ agg_IS_3021 : num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#> $ agg_IS_3022 : num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#> $ agg_loanflow_all: num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#> $ agg_loanflow_inf: num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#> $ agg_loanflow_for: num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...Section 5 — QC diagnostics and full-info filter
tstm_asset_loan_nacheck <- tstm_asset_loan_prc %>%
mutate(across(
contains("agg"),
function(x) ifelse(is.na(x), 0, 1)
)) %>%
mutate(
agg_nonna_count = base::rowSums(
dplyr::pick(contains("agg")),
na.rm = TRUE
)
) %>%
select(id, month, agg_nonna_count)
tstm_asset_loan_sum <- tstm_asset_loan_prc %>%
mutate(
agg_sum = base::rowSums(
dplyr::pick(contains("agg")),
na.rm = TRUE
),
agg_sum_is0 = if_else(agg_sum == 0, 1, 0)
) %>%
select(id, month, agg_sum, agg_sum_is0)
tstm_asset_loan_nasumcheck <- tstm_asset_loan_nacheck %>%
left_join(tstm_asset_loan_sum, by = c("id", "month"))
it_agg_nonna_count_max <- max(tstm_asset_loan_nacheck$agg_nonna_count)
tstm_asset_loan_idmonth <- tstm_asset_loan_nacheck %>%
filter(agg_nonna_count == it_agg_nonna_count_max)
tstm_asset_loan <- tstm_asset_loan_prc %>%
left_join(tstm_asset_loan_idmonth, by = c("id", "month")) %>%
filter(agg_nonna_count == it_agg_nonna_count_max) %>%
select(-agg_nonna_count)
if (verbose) {
print(glue::glue(
"f-{it_file_code_qc}, filter: agg_nonna_count_max={it_agg_nonna_count_max}; ",
"tstm_asset_loan {nrow(tstm_asset_loan)} rows"
))
}
#> f-54767, filter: agg_nonna_count_max=39; tstm_asset_loan 112718 rows
if (verbose_detail) {
summary(tstm_asset_loan %>% filter(id == 70201, month == 23))
summary(tstm_asset_loan %>% filter(id == 271452, month == 13))
print(tstm_asset_loan %>% filter(id == 271452), n = 180)
}Section 6 — Household month span
tstm_hh_mthspan <- tstm_asset_loan %>%
group_by(id) %>%
arrange(month, .by_group = TRUE) %>%
mutate(
month_start = min(month),
month_end = max(month),
month_n = dplyr::n()
) %>%
select(id, month_start, month_end, month_n) %>%
slice_head(n = 1) %>%
ungroup()
if (verbose) {
print(glue::glue(
"f-{it_file_code_qc}, hh-span: {nrow(tstm_hh_mthspan)} households"
))
}
#> f-54767, hh-span: 791 householdsTables: QC diagnostics
tb_nonna <- tstm_asset_loan_nacheck %>%
group_by(agg_nonna_count) %>%
tally(name = "n") %>%
arrange(agg_nonna_count)
tb_zero <- tstm_asset_loan_sum %>%
group_by(agg_sum_is0) %>%
tally(name = "n") %>%
arrange(agg_sum_is0)
tb_nonna_zero <- tstm_asset_loan_nasumcheck %>%
group_by(agg_nonna_count, agg_sum_is0) %>%
tally(name = "n") %>%
arrange(agg_nonna_count, agg_sum_is0)
tb_hh_span <- tstm_hh_mthspan %>%
group_by(month_start, month_end, month_n) %>%
tally(name = "n_households") %>%
arrange(month_start, month_end, month_n)
tb_funnel <- data.frame(
stage = c(
"tstm_loans_amount",
"tstm_asset_loan_prc (merged)",
"tstm_asset_loan (full-info filter)",
"tstm_hh_mthspan (households)"
),
n_rows = c(
nrow(tstm_loans_amount),
nrow(tstm_asset_loan_prc),
nrow(tstm_asset_loan),
nrow(tstm_hh_mthspan)
)
)
ffv_render_table(tb_nonna, caption = "Distribution of agg_nonna_count (pre-filter)", name = "tb_nonna")| agg_nonna_count | n |
|---|---|
| 33 | 14554 |
| 35 | 46 |
| 36 | 33 |
| 39 | 112718 |
ffv_render_table(tb_zero, caption = "Distribution of agg_sum_is0 (all-zero rows)", name = "tb_zero")| agg_sum_is0 | n |
|---|---|
| 0 | 112797 |
| 1 | 14554 |
ffv_render_table(tb_nonna_zero, caption = "Cross-tab agg_nonna_count x agg_sum_is0", name = "tb_nonna_zero")| agg_nonna_count | agg_sum_is0 | n |
|---|---|---|
| 33 | 1 | 14554 |
| 35 | 0 | 46 |
| 36 | 0 | 33 |
| 39 | 0 | 112718 |
ffv_render_table(tb_hh_span, caption = "Household month-span combinations", name = "tb_hh_span")| month_start | month_end | month_n | n_households |
|---|---|---|---|
| 0 | 2 | 3 | 13 |
| 0 | 3 | 4 | 1 |
| 0 | 4 | 5 | 1 |
| 0 | 6 | 7 | 2 |
| 0 | 7 | 8 | 1 |
| 0 | 8 | 9 | 2 |
| 0 | 10 | 11 | 1 |
| 0 | 12 | 13 | 1 |
| 0 | 22 | 23 | 1 |
| 0 | 25 | 26 | 6 |
| 0 | 27 | 28 | 7 |
| 0 | 28 | 29 | 6 |
| 0 | 34 | 35 | 4 |
| 0 | 35 | 36 | 3 |
| 0 | 36 | 37 | 1 |
| 0 | 38 | 39 | 1 |
| 0 | 39 | 40 | 1 |
| 0 | 46 | 47 | 1 |
| 0 | 49 | 50 | 1 |
| 0 | 50 | 51 | 1 |
| 0 | 52 | 53 | 2 |
| 0 | 54 | 55 | 1 |
| 0 | 55 | 56 | 2 |
| 0 | 61 | 62 | 1 |
| 0 | 66 | 67 | 3 |
| 0 | 67 | 68 | 2 |
| 0 | 71 | 72 | 4 |
| 0 | 73 | 74 | 2 |
| 0 | 75 | 76 | 1 |
| 0 | 78 | 79 | 1 |
| 0 | 82 | 83 | 1 |
| 0 | 84 | 85 | 1 |
| 0 | 89 | 90 | 1 |
| 0 | 91 | 92 | 1 |
| 0 | 93 | 94 | 1 |
| 0 | 94 | 95 | 1 |
| 0 | 160 | 161 | 606 |
| 2 | 160 | 159 | 15 |
| 3 | 160 | 158 | 2 |
| 4 | 160 | 157 | 2 |
| 5 | 160 | 156 | 1 |
| 6 | 160 | 155 | 1 |
| 7 | 160 | 154 | 1 |
| 8 | 27 | 20 | 1 |
| 8 | 160 | 153 | 1 |
| 10 | 160 | 151 | 1 |
| 13 | 160 | 148 | 1 |
| 22 | 160 | 139 | 1 |
| 26 | 160 | 135 | 6 |
| 27 | 160 | 134 | 10 |
| 28 | 71 | 44 | 1 |
| 28 | 160 | 133 | 4 |
| 29 | 71 | 43 | 1 |
| 30 | 160 | 131 | 1 |
| 34 | 160 | 127 | 4 |
| 35 | 71 | 37 | 1 |
| 35 | 160 | 126 | 2 |
| 36 | 160 | 125 | 1 |
| 38 | 160 | 123 | 1 |
| 39 | 160 | 122 | 1 |
| 47 | 160 | 114 | 1 |
| 49 | 160 | 112 | 1 |
| 50 | 160 | 111 | 1 |
| 52 | 160 | 109 | 2 |
| 54 | 160 | 107 | 1 |
| 55 | 60 | 6 | 1 |
| 55 | 160 | 106 | 1 |
| 60 | 160 | 101 | 1 |
| 61 | 160 | 100 | 1 |
| 66 | 160 | 95 | 3 |
| 67 | 160 | 94 | 2 |
| 71 | 160 | 90 | 17 |
| 72 | 160 | 89 | 1 |
| 73 | 160 | 88 | 2 |
| 74 | 160 | 87 | 2 |
| 75 | 160 | 86 | 1 |
| 78 | 160 | 83 | 1 |
| 82 | 160 | 79 | 1 |
| 84 | 160 | 77 | 1 |
| 89 | 160 | 72 | 1 |
| 91 | 160 | 70 | 1 |
| 93 | 160 | 68 | 1 |
| 94 | 160 | 67 | 1 |
ffv_render_table(tb_funnel, caption = "Pipeline row counts", name = "tb_funnel")| stage | n_rows |
|---|---|
| tstm_loans_amount | 15999 |
| tstm_asset_loan_prc (merged) | 127351 |
| tstm_asset_loan (full-info filter) | 112718 |
| tstm_hh_mthspan (households) | 791 |
Tables: column inventory
tb_cols_asset_loan <- data.frame(
var = names(tstm_asset_loan),
stringsAsFactors = FALSE
)
tb_cols_hh_span <- data.frame(
var = names(tstm_hh_mthspan),
stringsAsFactors = FALSE
)
tb_cols_loans_amount <- data.frame(
var = names(tstm_loans_amount),
stringsAsFactors = FALSE
)
ffv_render_table(tb_cols_loans_amount, caption = "tstm_loans_amount column names", name = "tb_cols_loans_amount")| var |
|---|
| hhid_Num |
| surveymonth |
| amt_all_lenders |
| amt_informal |
| amt_formal |
ffv_render_table(tb_cols_asset_loan, caption = "tstm_asset_loan column names", name = "tb_cols_asset_loan")| var |
|---|
| id |
| month |
| agg_BS_01 |
| agg_BS_03 |
| agg_BS_17 |
| agg_BS_1011 |
| agg_BS_1012 |
| agg_BS_1021 |
| agg_BS_1022 |
| agg_BS_1023 |
| agg_BS_1024 |
| agg_BS_1025 |
| agg_BS_2011 |
| agg_BS_2021 |
| agg_BS_3011 |
| agg_BS_3021 |
| agg_BS_3022 |
| agg_IS_1011 |
| agg_IS_1012 |
| agg_IS_1013 |
| agg_IS_1021 |
| agg_IS_1111 |
| agg_IS_1112 |
| agg_IS_1113 |
| agg_IS_1114 |
| agg_IS_1115 |
| agg_IS_1116 |
| agg_IS_2011 |
| agg_IS_2012 |
| agg_IS_2013 |
| agg_IS_2014 |
| agg_IS_2015 |
| agg_IS_2016 |
| agg_IS_2021 |
| agg_IS_2022 |
| agg_IS_3011 |
| agg_IS_3021 |
| agg_IS_3022 |
| agg_loanflow_all |
| agg_loanflow_inf |
| agg_loanflow_for |
ffv_render_table(tb_cols_hh_span, caption = "tstm_hh_mthspan column names", name = "tb_cols_hh_span")| var |
|---|
| id |
| month_start |
| month_end |
| month_n |
Section 7 — Save and validate
When bl_replace_data_output is FALSE: save to data-temp/, then compare written .rda files to canonical data/.
When bl_replace_data_output is TRUE: validate in-memory objects against canonical data/ first, then save to data/. Validation never stops the knit; mismatches emit warning().
ar_outputs <- list(
list(
obj = tstm_loans_amount,
name = "tstm_loans_amount",
sort_keys = c("hhid_Num", "surveymonth")
),
list(obj = tstm_asset_loan, name = "tstm_asset_loan", sort_keys = c("id", "month")),
list(obj = tstm_hh_mthspan, name = "tstm_hh_mthspan", sort_keys = "id")
)
if (bl_replace_data_output) {
ls_validate <- lapply(ar_outputs, function(spec) {
ffv_compare_tstm(
generated = spec$obj,
canonical_path = file.path(st_data_canonical, paste0(spec$name, ".rda")),
name = spec$name,
sort_keys = spec$sort_keys
)
})
ls_saved <- lapply(ar_outputs, function(spec) {
ffv_save_gateway_tstm(spec$obj, spec$name, st_data_out)
})
} else {
ls_saved <- lapply(ar_outputs, function(spec) {
ffv_save_gateway_tstm(spec$obj, spec$name, st_data_out)
})
ls_validate <- lapply(ar_outputs, function(spec) {
ffv_compare_tstm(
generated = ffv_load_tstm_rda(
file.path(st_data_out, paste0(spec$name, ".rda")),
spec$name,
spec$sort_keys
),
canonical_path = file.path(st_data_canonical, paste0(spec$name, ".rda")),
name = spec$name,
sort_keys = spec$sort_keys
)
})
}
tb_validate <- dplyr::bind_rows(lapply(ls_validate, function(x) {
data.frame(
object = x$object,
dims_match = x$dims_match,
cols_match = x$cols_match,
identical = x$identical,
all_equal_msg = substr(x$all_equal_msg, 1, 80),
stringsAsFactors = FALSE
)
}))
ffv_render_table(
tb_validate,
caption = paste0(
"Validation vs canonical data/ (bl_replace_data_output = ",
bl_replace_data_output, ")"
),
name = "tb_validate"
)| object | dims_match | cols_match | identical | all_equal_msg |
|---|---|---|---|---|
| tstm_loans_amount | TRUE | TRUE | TRUE | TRUE |
| tstm_asset_loan | TRUE | TRUE | TRUE | TRUE |
| tstm_hh_mthspan | TRUE | TRUE | TRUE | TRUE |
for (res in ls_validate) {
print(glue::glue(
"f-{it_file_code}, validate: {res$object} identical to data/ canonical: {res$identical}"
))
if (!res$identical) {
warning(
res$object, " differs from canonical data/: ", res$all_equal_msg,
call. = FALSE
)
}
}
#> f-963158, validate: tstm_loans_amount identical to data/ canonical: TRUE
#> f-963158, validate: tstm_asset_loan identical to data/ canonical: TRUE
#> f-963158, validate: tstm_hh_mthspan identical to data/ canonical: TRUE
if (verbose) {
if (bl_replace_data_output) {
print(glue::glue("f-{it_file_code}, save: replaced canonical data/*.rda"))
} else {
print(glue::glue(
"f-{it_file_code}, save: wrote data-temp/*.rda (data/ unchanged)"
))
}
for (ls in ls_saved) {
print(glue::glue("f-{it_file_code}, save: {ls$rda}"))
print(glue::glue("f-{it_file_code}, save: {ls$csv}"))
print(glue::glue("f-{it_file_code}, save: {ls$dta}"))
}
}
#> f-963158, save: wrote data-temp/*.rda (data/ unchanged)
#> f-963158, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/data-temp/tstm_loans_amount.rda
#> f-963158, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/inst/extdata/tstm_loans_amount.csv
#> f-963158, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/inst/extdata/tstm_loans_amount.dta
#> f-963158, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/data-temp/tstm_asset_loan.rda
#> f-963158, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/inst/extdata/tstm_asset_loan.csv
#> f-963158, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/inst/extdata/tstm_asset_loan.dta
#> f-963158, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/data-temp/tstm_hh_mthspan.rda
#> f-963158, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/inst/extdata/tstm_hh_mthspan.csv
#> f-963158, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/inst/extdata/tstm_hh_mthspan.dtaWhen bl_replace_data_output is FALSE and validation reports identical = TRUE for all objects, set bl_replace_data_output <- TRUE in the setup chunk and re-knit to refresh canonical data/.
Note: tstm_loans_amount validation may report a mismatch if canonical data/tstm_loans_amount.rda was built before tstm_loans used anonymized hhid_Num; tstm_asset_loan and tstm_hh_mthspan are the primary regression/gateway checks.
Outputs produced by this run
The pipeline above wrote the following artifacts (the same links appear in Outputs at a glance at the top). Local CSV/Stata copies are also written to inst/extdata/ (git-ignored, not shipped):
tstm_loans_amounttstm_asset_loantstm_hh_mthspan- Upstream input:
tstm_loans
Session info
sessionInfo()
#> R version 4.6.1 (2026-06-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] kableExtra_1.4.0 glue_1.8.1 readr_2.2.0 tidyr_1.3.2
#> [5] dplyr_1.2.1 PrjThaiHFID_0.2.0
#>
#> loaded via a namespace (and not attached):
#> [1] bit_4.6.0 jsonlite_2.0.0 crayon_1.5.3 compiler_4.6.1
#> [5] tidyselect_1.2.1 xml2_1.6.0 stringr_1.6.0 parallel_4.6.1
#> [9] textshaping_1.0.5 systemfonts_1.3.2 scales_1.4.0 yaml_2.3.12
#> [13] fastmap_1.2.0 R6_2.6.1 generics_0.1.4 knitr_1.51
#> [17] forcats_1.0.1 tibble_3.3.1 rprojroot_2.1.1 svglite_2.2.2
#> [21] pillar_1.11.1 RColorBrewer_1.1-3 tzdb_0.5.0 rlang_1.2.0
#> [25] stringi_1.8.7 xfun_0.59 bit64_4.8.2 otel_0.2.0
#> [29] viridisLite_0.4.3 cli_3.6.6 withr_3.0.3 magrittr_2.0.5
#> [33] digest_0.6.39 vroom_1.7.1 rstudioapi_0.19.0 haven_2.5.5
#> [37] hms_1.1.4 lifecycle_1.0.5 vctrs_0.7.3 evaluate_1.0.5
#> [41] farver_2.1.2 rmarkdown_2.31 purrr_1.2.2 tools_4.6.1
#> [45] pkgconfig_2.0.3 htmltools_0.5.9