Skip to contents

This vignette implements PrjThaiHFID-#5: merge the White balance-sheet panel with household-month loan flows, build composite agg_BS_* and agg_IS_* variables, filter to full-information rows, and derive household survey spans.

Production logic is ported from data-raw/tstm_loans_amount.R, data-raw/ffs_hfid_gen_a_data.R (Step A), and data-raw/ffs_hfid_gen_b_data.R (Step B) into a single in-memory pipeline (no data-temp/tstm_asset_loan_prc.csv or data-csv/tstm_loans_amount.csv bridge files).

Unit of observation (final): household × month (tstm_asset_loan, tstm_loans_amount); one row per household for span metadata (tstm_hh_mthspan).

Outputs at a glance

This vignette produces three datasets. Use the links below to jump to the rendered data documentation page:

Object Data page
tstm_loans_amount reference
tstm_asset_loan reference
tstm_hh_mthspan reference

The three outputs are documented in R/data-res.R; the upstream input tstm_loans is documented in R/data.R.

What ships vs. what is generated locally

  • Shipped with the installed package: the packaged datasets in data/ (lazy-loaded, e.g. tstm_loans).
  • Generated locally when this vignette runs (neither pushed to GitHub nor shipped in the package): the data/ (or data-temp/) .rda outputs above; their inst/extdata/*.csv / *.dta exports; and any review tables written to res/res_gen_asset_loan/ (.tex/.csv, only when the matching ls_save_res control is TRUE). The inst/extdata/ and res/ artifacts duplicate the vignette content and are git-ignored, saved only for local convenience.

Required inputs

File Object Role
data-raw/whitem160aggregate_wthhhkey_loanamount.rda whitem160aggregate_wthhhkey_loanamount White balance-sheet panel (household × month); anonymized id (tmid_hh); raw agg_BS_*, agg_IS_*, agg_CF_*
data/tstm_loans.rda tstm_loans Loan-level file with anonymized hhid_Num (tmid_hh); aggregated to household-month loan flows

Both inputs already store anonymized household IDs (tmid_hh, assigned by the anonymization pipeline in data-raw/id_anonymize/). The whitem panel .rda is stripped of all true identifiers at the anonymization step (true province / village / household / hhid / hhid_month are dropped before it is saved), and the pipeline keys only on the anonymized IDs (id / hhid_Num) and month, so no true identifiers ever enter the pipeline.

Saved outputs

Object Unit .rda destination Also exported
tstm_loans_amount household × month data/ or data-temp/ inst/extdata/tstm_loans_amount.{csv,dta}
tstm_asset_loan household × month (full-info rows) data/ or data-temp/ inst/extdata/tstm_asset_loan.{csv,dta}
tstm_hh_mthspan household data/ or data-temp/ inst/extdata/tstm_hh_mthspan.{csv,dta}

Because both inputs use anonymized household IDs and true geography is not carried through, all three saved objects contain only anonymized tmid_hh identifiers (id in tstm_asset_loan / tstm_hh_mthspan, hhid_Num in tstm_loans_amount). No true household, province, or village codes are written to data/, data-temp/, or inst/extdata/.

Set bl_replace_data_output <- TRUE to overwrite canonical data/*.rda; FALSE (default) writes to gitignored data-temp/ for safe validation against data/.

Downstream use

tstm_loans_amount

Downstream: input to the asset–loan merge in this vignette; formerly loaded from data-csv/tstm_loans_amount.csv in data-raw/ffs_hfid_gen_a_data.R.

Package: documented in R/data-res.R; hhid_Num matches anonymized id in tstm_asset_loan and tstm_loans_panel.

tstm_asset_loan

Vignettes

Vignette Role
ffv_invest_loan_bridge.qmd Gateway Group B input
ffv_invest_return_bridge.qmd Gateway + ff_hfid_invest_winstats() balance windows
ffv_invest_freq_sizes.qmd Investment frequency / asset baseline tables
ffv_bridge_timing.qmd References data/tstm_asset_loan.rda

R-script

Script Role
R-script/ffv_mbf_size/ffs_mbf_village_size_main.R Village asset/income baselines
R-script/ffv_mbf_size/ffs_mbf_size_by_village_plan.R ID crosswalk + village aggregates
R-script/ffs_misc/ffs_depreciation_rate/ Composite agg_BS_* definitions documented in ?tstm_asset_loan

Package R code: ffp_hfid_invest_loan_linked_abc_investloan_char_gateway() (Group B); ff_hfid_invest_winstats().

tstm_hh_mthspan

Vignettes: ffv_invest_freq_sizes.qmd — household survey span for investment-frequency analysis.

Package: documented in R/data-res.R alongside tstm_asset_loan.

Pipeline structure

flowchart TD
  subgraph inputs [Inputs]
    whitem[data-raw/whitem160aggregate_wthhhkey_loanamount.rda]
    loans[data/tstm_loans.rda]
  end
  subgraph step0 [Section0_Loan_amounts]
    loans --> loanAmt[tstm_loans_amount]
  end
  subgraph step1 [Section1_Load_and_select]
    whitem --> sel[tstm_asset_sel]
  end
  subgraph step2 [Section2_BS_composites]
    sel --> bsVars[agg_BS_101x_102x_201x_301x]
  end
  subgraph step3 [Section3_IS_composites]
    bsVars --> isVars[agg_IS_101x_111x_201x_301x]
  end
  subgraph step4 [Section4_Loans_and_merge]
    loanAmt --> loanSel[tstm_loan_sel]
    isVars --> merge[left_join_and_loan_NA_to_zero]
    loanSel --> merge
    merge --> prc[tstm_asset_loan_prc]
  end
  subgraph step5 [Section5_QC_and_filter]
    prc --> nacheck[agg_nonna_count]
    prc --> zerocheck[agg_sum_is0]
    nacheck --> filter[keep_full_info_rows]
    zerocheck --> filter
    filter --> assetLoan[tstm_asset_loan]
  end
  subgraph step6 [Section6_HH_span]
    assetLoan --> hhSpan[tstm_hh_mthspan]
  end
  subgraph step7 [Section7_Save_validate_report]
    loanAmt --> saveOut[ffv_save_gateway_tstm]
    assetLoan --> saveOut
    hhSpan --> saveOut
    saveOut --> dataOut[data_or_data_temp]
    saveOut --> extdata[inst/extdata_csv_dta]
    dataOut --> compare[compare_vs_canonical_data_folder]
    compare --> report[report_identical_or_not]
  end

Control parameters

library(PrjThaiHFID)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(readr)
library(glue)
library(kableExtra)
#> 
#> Attaching package: 'kableExtra'
#> The following object is masked from 'package:dplyr':
#> 
#>     group_rows

bs_style <- c("striped", "hover", "condensed", "responsive")
options(kable_styling_bootstrap_options = bs_style)

verbose <- TRUE
verbose_detail <- FALSE
it_verbose_detail_nrow <- 100L
it_file_code <- 963158L
it_file_code_qc <- 54767L
st_kableformat <- "html"

# TRUE: overwrite canonical data/*.rda; FALSE: write to data-temp/ (gitignored)
bl_replace_data_output <- FALSE

# here::here() anchors on _quarto.yml under Quarto (-> vignettes/); find the
# package root explicitly so paths resolve under both Quarto and knitr.
spn_pkg_root <- rprojroot::find_root(rprojroot::has_file("DESCRIPTION"))
st_data_out <- file.path(
  spn_pkg_root,
  if (bl_replace_data_output) "data" else "data-temp",
  fsep = .Platform$file.sep
)
st_data_canonical <- file.path(spn_pkg_root, "data")
st_extdata_dir <- file.path(spn_pkg_root, "inst", "extdata")
for (st_dir in c(st_data_out, st_extdata_dir)) {
  if (!dir.exists(st_dir)) {
    dir.create(st_dir, recursive = TRUE)
  }
}

# res/ auto-save controls. spt_res is the per-vignette output folder; each entry
# of ls_save_res toggles whether that table is written (TRUE -> .tex + .csv via
# ffp_save_res_table()). Default all FALSE so a knit never writes to res/.
# Flip any subset, or `ls_save_res[] <- TRUE` to enable everything.
spt_res <- file.path(spn_pkg_root, "res", "res_gen_asset_loan")
ls_save_res <- list(
  tb_lender_formal = FALSE,
  tb_nonna = FALSE,
  tb_zero = FALSE,
  tb_nonna_zero = FALSE,
  tb_hh_span = FALSE,
  tb_funnel = FALSE,
  tb_cols_loans_amount = FALSE,
  tb_cols_asset_loan = FALSE,
  tb_cols_hh_span = FALSE,
  tb_validate = FALSE
)

ar_svr_capital_var <- c(
  "agg_BS_08",
  "agg_BS_09", "agg_BS_10", "agg_BS_11", "agg_BS_12",
  "agg_BS_13"
)
ar_svr_stock_var <- c(
  "agg_BS_01",
  "agg_BS_03",
  "agg_BS_17"
)
ar_svr_flow_var <- c(
  "agg_IS_01", "agg_IS_02", "agg_IS_03", "agg_IS_04", "agg_IS_05", "agg_IS_06",
  "agg_IS_07",
  "agg_IS_08", "agg_IS_09", "agg_IS_12", "agg_IS_13", "agg_IS_14", "agg_IS_15",
  "agg_IS_16",
  "agg_IS_20",
  "agg_IS_29",
  "agg_CF_01"
)

ffv_save_gateway_tstm <- function(obj, name, out_dir, extdata_dir = st_extdata_dir) {
  if (!dir.exists(out_dir)) {
    dir.create(out_dir, recursive = TRUE)
  }
  if (!dir.exists(extdata_dir)) {
    dir.create(extdata_dir, recursive = TRUE)
  }

  st_rda <- file.path(out_dir, paste0(name, ".rda"))
  st_csv <- file.path(extdata_dir, paste0(name, ".csv"))
  st_dta <- file.path(extdata_dir, paste0(name, ".dta"))

  tmp_env <- new.env(parent = emptyenv())
  tmp_env[[name]] <- obj
  save(list = name, file = st_rda, envir = tmp_env)

  readr::write_csv(obj, st_csv)

  if (!requireNamespace("haven", quietly = TRUE)) {
    stop(
      "Package 'haven' is required to write .dta to inst/extdata. ",
      "Install with: install.packages('haven')",
      call. = FALSE
    )
  }
  haven::write_dta(obj, st_dta)

  invisible(list(rda = st_rda, csv = st_csv, dta = st_dta))
}

ffv_render_table <- function(df, caption, st_format = st_kableformat, name = NULL) {
  # Optionally persist a LaTeX + CSV copy to res/ when this table's control is on.
  if (!is.null(name) && isTRUE(ls_save_res[[name]])) {
    kbl_latex <- kbl(
      df,
      format = "latex",
      booktabs = TRUE,
      caption = caption,
      digits = 4
    )
    ffp_save_res_table(kbl_latex, name, spt_res, df = df, bl_save = TRUE)
  }
  kbl(
    df,
    format = st_format,
    booktabs = TRUE,
    caption = caption,
    digits = 4
  ) %>%
    kable_styling(
      bootstrap_options = bs_style,
      full_width = FALSE,
      position = "left"
    )
}

ffv_compare_tstm <- function(generated, canonical_path, name,
                             sort_keys = c("id", "month")) {
  e_can <- new.env()
  load(canonical_path, envir = e_can)
  gen <- as.data.frame(generated) %>% arrange(across(all_of(sort_keys)))
  can <- as.data.frame(get(name, envir = e_can)) %>% arrange(across(all_of(sort_keys)))
  ae <- all.equal(gen, can)
  msg <- if (isTRUE(ae)) {
    "TRUE"
  } else if (is.character(ae)) {
    ae[[1]]
  } else {
    "FALSE (no detail)"
  }
  list(
    object = name,
    dims_match = identical(dim(gen), dim(can)),
    cols_match = identical(sort(names(gen)), sort(names(can))),
    identical = isTRUE(ae),
    all_equal_msg = as.character(msg)[1]
  )
}

ffv_load_tstm_rda <- function(path, name, sort_keys = c("id", "month")) {
  e <- new.env()
  load(path, envir = e)
  get(name, envir = e) %>% arrange(across(all_of(sort_keys)))
}

if (verbose) {
  print(glue::glue("f-{it_file_code}, controls"))
  print(glue::glue("  bl_replace_data_output: {bl_replace_data_output} -> {st_data_out}/"))
}
#> f-963158, controls
#>   bl_replace_data_output: FALSE -> /home/runner/work/PrjThaiHFID/PrjThaiHFID/data-temp/

Section 0 — Aggregate tstm_loans to household-month loan flows

Port of data-raw/tstm_loans_amount.R: formal/informal classification, positive S_Init_Amount sums by household-month, wide format with amt_all_lenders, amt_formal, amt_informal.

data(tstm_loans)

tstm_loans_fi <- tstm_loans %>%
  mutate(
    formal = case_when(
      G_LenderType == "Agri Coop" ~ FALSE,
      G_LenderType == "PCG" ~ FALSE,
      G_LenderType == "MoneyLender" ~ FALSE,
      G_LenderType == "Neighbor" ~ FALSE,
      G_LenderType == "Relatives" ~ FALSE,
      G_LenderType == "Others" ~ FALSE,
      G_LenderType == "Other Non-Indi Formal or Informal" & G_Location == "Village" ~ FALSE,
      TRUE ~ TRUE
    ),
    formal_st = if_else(formal, "amt_formal", "amt_informal")
  )

tstm_loans_fi_amt <- tstm_loans_fi %>%
  select(
    contains("Num"), surveymonth,
    formal_st, G_LenderType,
    S_Init_Amount
  ) %>%
  filter(S_Init_Amount > 0)

tstm_loans_fi_amt_jnt <- tstm_loans_fi_amt %>%
  group_by(hhid_Num, surveymonth) %>%
  summarize(new_loan_amt = sum(S_Init_Amount, na.rm = TRUE), .groups = "drop") %>%
  filter(new_loan_amt > 0) %>%
  mutate(lender_type_mix = "amt_all_lenders")

tstm_loans_fi_amt_fi <- tstm_loans_fi_amt %>%
  group_by(hhid_Num, surveymonth, formal_st) %>%
  summarize(new_loan_amt = sum(S_Init_Amount, na.rm = TRUE), .groups = "drop") %>%
  filter(new_loan_amt > 0) %>%
  rename(lender_type_mix = formal_st)

tstm_loas_amt_m <- bind_rows(
  tstm_loans_fi_amt_jnt,
  tstm_loans_fi_amt_fi
)

tstm_loans_amount <- tstm_loas_amt_m %>%
  pivot_wider(
    id_cols = c("hhid_Num", "surveymonth"),
    names_from = lender_type_mix,
    names_prefix = "",
    values_from = new_loan_amt
  )

if (verbose) {
  print(glue::glue("f-{it_file_code}, loans-amount: {nrow(tstm_loans_amount)} household-months"))
}
#> f-963158, loans-amount: 15999 household-months
Loan records by G_LenderType and formal/informal classification
G_LenderType formal n
Agri Coop FALSE 577
BAAC TRUE 3396
Commercial Bank TRUE 64
MoneyLender FALSE 625
Neighbor FALSE 998
Other Non-Indi Formal or Informal FALSE 2732
Other Non-Indi Formal or Informal TRUE 1927
Others FALSE 1349
PCG FALSE 501
Relatives FALSE 1363
Village Fund TRUE 7408

Section 1 — Load White panel and select variables

load(file.path(spn_pkg_root, "data-raw", "whitem160aggregate_wthhhkey_loanamount.rda"))
tstm_asset <- whitem160aggregate_wthhhkey_loanamount

tstm_asset_sel <- tstm_asset %>%
  select(
    id, month,
    dplyr::any_of(c(ar_svr_capital_var, ar_svr_stock_var, ar_svr_flow_var))
  )

if (verbose) {
  print(glue::glue("f-{it_file_code}, load: whitem {nrow(tstm_asset)} rows"))
  print(glue::glue("  tstm_asset_sel: {nrow(tstm_asset_sel)} rows x {ncol(tstm_asset_sel)} cols"))
}
#> f-963158, load: whitem 127351 rows
#>   tstm_asset_sel: 127351 rows x 28 cols

Section 2 — BS composite variables

Joint, agriculture, and business revenue-asset aggregates (agg_BS_101*agg_BS_302*).

tstm_asset_sel <- tstm_asset_sel %>%
  rowwise() %>%
  mutate(
    # 1. Joint group
    # All rev-assets: livestock + household + Agri + business + land
    # problem: is household asset really revenue generating?
    agg_BS_1011 = sum(agg_BS_08, agg_BS_10, agg_BS_11, agg_BS_12, agg_BS_13,
      na.rm = TRUE
    ),
    # Core 1 rev-assets: livestock + Agri + business + land
    # all are potentially revenue generating
    agg_BS_1012 = sum(agg_BS_08, agg_BS_11, agg_BS_12, agg_BS_13,
      na.rm = TRUE
    ),
    # Core 2 rev-assets: livestock + agri + business
    # Livestock assets/revenue and land might be vastly different
    agg_BS_1021 = sum(agg_BS_08, agg_BS_11, agg_BS_12, na.rm = TRUE),
    # Core 3 rev-assets: agri + business
    # Livestock assets/revenue and land might be vastly different
    agg_BS_1022 = sum(agg_BS_11, agg_BS_12, na.rm = TRUE),
    # 2025-05-15 05:56:00, Core 4 rev-assets: agri + business + land
    # land might relate to cultivation revenue, livestock capital gains loss confusing
    agg_BS_1023 = sum(agg_BS_11, agg_BS_12, agg_BS_13, na.rm = TRUE),
    # land alone
    agg_BS_1024 = sum(agg_BS_13, na.rm = TRUE),
    # 2025-05-17 09:41:12, Livestock + agri assets combined, these are the combined agri assets
    agg_BS_1025 = sum(agg_BS_08, agg_BS_11, na.rm = TRUE),
    # 2. Agri group
    # All agri-assets: livestock + agri + land
    agg_BS_2011 = sum(agg_BS_08, agg_BS_11, agg_BS_13, na.rm = TRUE),
    # core agri-assets: agri asset only
    agg_BS_2021 = sum(agg_BS_11, na.rm = TRUE),
    # 3. Business group
    # All biz-assets: household + business
    # household assets might faciliate business, a car for example.
    agg_BS_3011 = sum(agg_BS_10, agg_BS_12, na.rm = TRUE),
    # core biz-assets: business asset only
    agg_BS_3021 = sum(agg_BS_12, na.rm = TRUE),
    # core biz-assets: household asset alone
    agg_BS_3022 = sum(agg_BS_10, na.rm = TRUE)
  ) %>%
  ungroup()

if (verbose) {
  print(glue::glue("f-{it_file_code}, bs-composites: {ncol(tstm_asset_sel)} cols"))
}
#> f-963158, bs-composites: 40 cols

Section 3 — IS composite variables

Income and net-income aggregates (agg_IS_101*agg_IS_302*); drop raw capital/flow inputs used only for construction.

tstm_asset_sel <- tstm_asset_sel %>%
  rowwise() %>%
  mutate(
    # 1. All
    # All income
    agg_IS_1011 = sum(agg_IS_07, na.rm = TRUE),
    # Income minus wage
    agg_IS_1012 = sum(agg_IS_07, (-1) * agg_IS_05, na.rm = TRUE),
    # income minus wage, minus other income, don't know what other income is
    # to match up with BS_1021
    agg_IS_1013 = sum(agg_IS_07, (-1) * sum(agg_IS_05, agg_IS_06,
      na.rm = TRUE
    ), na.rm = TRUE),
    # income minus wage, minus other income, fish and livestock income
    agg_IS_1021 = sum(agg_IS_07, (-1) * sum(
      agg_IS_02, agg_IS_03, agg_IS_05, agg_IS_06,
      na.rm = TRUE
    ), na.rm = TRUE),
    # total income minus total costs
    agg_IS_1111 = sum(agg_IS_07, (-1) * agg_IS_16, na.rm = TRUE),
    # 2025-05-15 05:47:08, total income minus total costs, excluding income and costs from labor
    agg_IS_1112 = sum(
      agg_IS_07, (-1) * (agg_IS_05),
      (-1) * (agg_IS_16), (agg_IS_14),
      na.rm = TRUE),
    # 2025-05-15 05:47:15, total income minus total costs, excluding income and costs from labor and "others"
    agg_IS_1113 = sum(
      agg_IS_07, (-1) * (agg_IS_05), (-1) * (agg_IS_06),
      (-1) * (agg_IS_16), (agg_IS_14), (agg_IS_15),
      na.rm = TRUE),
    # 2025-05-15 05:49:48, Fish/shrimp + biz + cultivation only, no livestock
    agg_IS_1114 = sum(
      agg_IS_07, (-1) * (agg_IS_05), (-1) * (agg_IS_06), (-1) * (agg_IS_02),
      (-1) * (agg_IS_16), (agg_IS_14), (agg_IS_15), (agg_IS_09),
      na.rm = TRUE),
    # from 1111: subtract from gross profit also depreciation of fixed asset
    agg_IS_1115 = sum(
      agg_IS_07, (-1) * agg_IS_16, (-1) * agg_IS_20,
      na.rm = TRUE),
    # from 1113: subtract from gross profit also depreciation of fixed asset
    agg_IS_1116 = sum(
      agg_IS_07, (-1) * (agg_IS_05), (-1) * (agg_IS_06),
      (-1) * (agg_IS_16), (agg_IS_14), (agg_IS_15),
      (-1) * agg_IS_20,
      na.rm = TRUE),
    # 2. Agri group
    # cultivation + livestock + fish
    agg_IS_2011 = sum(agg_IS_01, agg_IS_02, agg_IS_03, na.rm = TRUE),
    # net for (cultivation + livestock + fish)
    agg_IS_2012 = sum(
      agg_IS_01, agg_IS_02, agg_IS_03,
      (-1) * (agg_IS_08), (-1) * (agg_IS_09), (-1) * (agg_IS_12),
      na.rm = TRUE),
    # 2025-05-15 05:59:13, cultivation + livestock
    agg_IS_2013 = sum(agg_IS_01, agg_IS_02, na.rm = TRUE),
    # net for cultivation + livestock
    agg_IS_2014 = sum(
      agg_IS_01, agg_IS_02,
      (-1) * (agg_IS_08), (-1) * (agg_IS_09),
      na.rm = TRUE),
    # 2025-05-15 05:59:22, cultivation + shrimp
    agg_IS_2015 = sum(agg_IS_01, agg_IS_03, na.rm = TRUE),
    # net for cultivation + shrimp
    agg_IS_2016 = sum(
      agg_IS_01, agg_IS_03,
      (-1) * (agg_IS_08), (-1) * (agg_IS_12),
      na.rm = TRUE),
    # cultivation only
    agg_IS_2021 = sum(agg_IS_01, na.rm = TRUE),
    # net for cultivation only
    agg_IS_2022 = sum(
      agg_IS_01,
      (-1) * (agg_IS_08),
      na.rm = TRUE),
    # 3. Business group
    # Business and labor income, labor if they pay themselves?
    agg_IS_3011 = sum(agg_IS_04, agg_IS_05, na.rm = TRUE),
    # Pure business
    agg_IS_3021 = sum(agg_IS_04, na.rm = TRUE),
    # Business revenue minus profits
    agg_IS_3022 = sum(
      agg_IS_04,
      (-1) * agg_IS_13,
      na.rm = TRUE)
  ) %>%
  select(
    id, month,
    dplyr::any_of(ar_svr_stock_var),
    matches("agg_BS_101|agg_BS_102"),
    matches("agg_BS_201|agg_BS_202"),
    matches("agg_BS_301|agg_BS_302"),
    matches("agg_IS_101|agg_IS_102"),
    matches("agg_IS_111"),
    matches("agg_IS_201|agg_IS_202"),
    matches("agg_IS_301|agg_IS_302")
  ) %>%
  ungroup()

if (verbose) {
  print(glue::glue("f-{it_file_code}, is-composites: {ncol(tstm_asset_sel)} cols"))
}
#> f-963158, is-composites: 38 cols

Section 4 — Loan variables and merge

tstm_loan_sel <- tstm_loans_amount %>%
  mutate(
    id = as.integer(hhid_Num),
    month = as.integer(surveymonth)
  ) %>%
  rename(
    agg_loanflow_all = amt_all_lenders,
    agg_loanflow_inf = amt_informal,
    agg_loanflow_for = amt_formal
  ) %>%
  select(id, month, agg_loanflow_all, agg_loanflow_inf, agg_loanflow_for)

tstm_asset_loan_prc <- tstm_asset_sel %>%
  left_join(tstm_loan_sel, by = c("id", "month"))

tstm_asset_loan_prc <- tstm_asset_loan_prc %>%
  mutate(
    agg_loanflow_all = ifelse(
      is.na(agg_loanflow_all) & !is.na(agg_BS_01), 0, agg_loanflow_all
    ),
    agg_loanflow_inf = ifelse(
      is.na(agg_loanflow_inf) & !is.na(agg_BS_01), 0, agg_loanflow_inf
    ),
    agg_loanflow_for = ifelse(
      is.na(agg_loanflow_for) & !is.na(agg_BS_01), 0, agg_loanflow_for
    )
  )

if (verbose) {
  print(glue::glue("f-{it_file_code}, merge: tstm_asset_loan_prc {nrow(tstm_asset_loan_prc)} rows"))
  summary(tstm_asset_loan_prc)
  str(tstm_asset_loan_prc)
}
#> f-963158, merge: tstm_asset_loan_prc 127351 rows
#> tibble [127,351 × 41] (S3: tbl_df/tbl/data.frame)
#>  $ id              : int [1:127351] 3078 3078 3078 3078 3078 3078 3078 3078 3078 3078 ...
#>  $ month           : num [1:127351] 0 1 2 3 4 5 6 7 8 9 ...
#>  $ agg_BS_01       : num [1:127351] 283 1186 1953 31 0 ...
#>  $ agg_BS_03       : num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ agg_BS_17       : num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ agg_BS_1011     : num [1:127351] 152226 152205 152183 152162 152141 ...
#>  $ agg_BS_1012     : num [1:127351] 151000 151000 151000 151000 151000 151000 151000 151000 151000 151000 ...
#>  $ agg_BS_1021     : num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ agg_BS_1022     : num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ agg_BS_1023     : num [1:127351] 151000 151000 151000 151000 151000 151000 151000 151000 151000 151000 ...
#>  $ agg_BS_1024     : num [1:127351] 151000 151000 151000 151000 151000 151000 151000 151000 151000 151000 ...
#>  $ agg_BS_1025     : num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ agg_BS_2011     : num [1:127351] 151000 151000 151000 151000 151000 151000 151000 151000 151000 151000 ...
#>  $ agg_BS_2021     : num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ agg_BS_3011     : num [1:127351] 1226 1205 1183 1162 1141 ...
#>  $ agg_BS_3021     : num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ agg_BS_3022     : num [1:127351] 1226 1205 1183 1162 1141 ...
#>  $ agg_IS_1011     : num [1:127351] 0 1600 1400 1530 1500 1500 1200 1200 1220 1200 ...
#>  $ agg_IS_1012     : num [1:127351] 0 0 0 20 0 0 0 0 20 0 ...
#>  $ agg_IS_1013     : num [1:127351] 0 0 0 20 0 0 0 0 0 0 ...
#>  $ agg_IS_1021     : num [1:127351] 0 0 0 20 0 0 0 0 0 0 ...
#>  $ agg_IS_1111     : num [1:127351] 0 1600 1400 1330 1500 1500 1200 1200 1220 1200 ...
#>  $ agg_IS_1112     : num [1:127351] 0 0 0 -180 0 0 0 0 20 0 ...
#>  $ agg_IS_1113     : num [1:127351] 0 0 0 -180 0 0 0 0 0 0 ...
#>  $ agg_IS_1114     : num [1:127351] 0 0 0 -180 0 0 0 0 0 0 ...
#>  $ agg_IS_1115     : num [1:127351] 0 1578 1378 1309 1479 ...
#>  $ agg_IS_1116     : num [1:127351] 0 -21.9 -21.5 -201.1 -20.7 ...
#>  $ agg_IS_2011     : num [1:127351] 0 0 0 20 0 0 0 0 0 0 ...
#>  $ agg_IS_2012     : num [1:127351] 0 0 0 -180 0 0 0 0 0 0 ...
#>  $ agg_IS_2013     : num [1:127351] 0 0 0 20 0 0 0 0 0 0 ...
#>  $ agg_IS_2014     : num [1:127351] 0 0 0 -180 0 0 0 0 0 0 ...
#>  $ agg_IS_2015     : num [1:127351] 0 0 0 20 0 0 0 0 0 0 ...
#>  $ agg_IS_2016     : num [1:127351] 0 0 0 -180 0 0 0 0 0 0 ...
#>  $ agg_IS_2021     : num [1:127351] 0 0 0 20 0 0 0 0 0 0 ...
#>  $ agg_IS_2022     : num [1:127351] 0 0 0 -180 0 0 0 0 0 0 ...
#>  $ agg_IS_3011     : num [1:127351] 0 1600 1400 1510 1500 1500 1200 1200 1200 1200 ...
#>  $ agg_IS_3021     : num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ agg_IS_3022     : num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ agg_loanflow_all: num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ agg_loanflow_inf: num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ agg_loanflow_for: num [1:127351] 0 0 0 0 0 0 0 0 0 0 ...

Section 5 — QC diagnostics and full-info filter

tstm_asset_loan_nacheck <- tstm_asset_loan_prc %>%
  mutate(across(
    contains("agg"),
    function(x) ifelse(is.na(x), 0, 1)
  )) %>%
  mutate(
    agg_nonna_count = base::rowSums(
      dplyr::pick(contains("agg")),
      na.rm = TRUE
    )
  ) %>%
  select(id, month, agg_nonna_count)

tstm_asset_loan_sum <- tstm_asset_loan_prc %>%
  mutate(
    agg_sum = base::rowSums(
      dplyr::pick(contains("agg")),
      na.rm = TRUE
    ),
    agg_sum_is0 = if_else(agg_sum == 0, 1, 0)
  ) %>%
  select(id, month, agg_sum, agg_sum_is0)

tstm_asset_loan_nasumcheck <- tstm_asset_loan_nacheck %>%
  left_join(tstm_asset_loan_sum, by = c("id", "month"))

it_agg_nonna_count_max <- max(tstm_asset_loan_nacheck$agg_nonna_count)

tstm_asset_loan_idmonth <- tstm_asset_loan_nacheck %>%
  filter(agg_nonna_count == it_agg_nonna_count_max)

tstm_asset_loan <- tstm_asset_loan_prc %>%
  left_join(tstm_asset_loan_idmonth, by = c("id", "month")) %>%
  filter(agg_nonna_count == it_agg_nonna_count_max) %>%
  select(-agg_nonna_count)

if (verbose) {
  print(glue::glue(
    "f-{it_file_code_qc}, filter: agg_nonna_count_max={it_agg_nonna_count_max}; ",
    "tstm_asset_loan {nrow(tstm_asset_loan)} rows"
  ))
}
#> f-54767, filter: agg_nonna_count_max=39; tstm_asset_loan 112718 rows

if (verbose_detail) {
  summary(tstm_asset_loan %>% filter(id == 70201, month == 23))
  summary(tstm_asset_loan %>% filter(id == 271452, month == 13))
  print(tstm_asset_loan %>% filter(id == 271452), n = 180)
}

Section 6 — Household month span

tstm_hh_mthspan <- tstm_asset_loan %>%
  group_by(id) %>%
  arrange(month, .by_group = TRUE) %>%
  mutate(
    month_start = min(month),
    month_end = max(month),
    month_n = dplyr::n()
  ) %>%
  select(id, month_start, month_end, month_n) %>%
  slice_head(n = 1) %>%
  ungroup()

if (verbose) {
  print(glue::glue(
    "f-{it_file_code_qc}, hh-span: {nrow(tstm_hh_mthspan)} households"
  ))
}
#> f-54767, hh-span: 791 households

Tables: QC diagnostics

tb_nonna <- tstm_asset_loan_nacheck %>%
  group_by(agg_nonna_count) %>%
  tally(name = "n") %>%
  arrange(agg_nonna_count)

tb_zero <- tstm_asset_loan_sum %>%
  group_by(agg_sum_is0) %>%
  tally(name = "n") %>%
  arrange(agg_sum_is0)

tb_nonna_zero <- tstm_asset_loan_nasumcheck %>%
  group_by(agg_nonna_count, agg_sum_is0) %>%
  tally(name = "n") %>%
  arrange(agg_nonna_count, agg_sum_is0)

tb_hh_span <- tstm_hh_mthspan %>%
  group_by(month_start, month_end, month_n) %>%
  tally(name = "n_households") %>%
  arrange(month_start, month_end, month_n)

tb_funnel <- data.frame(
  stage = c(
    "tstm_loans_amount",
    "tstm_asset_loan_prc (merged)",
    "tstm_asset_loan (full-info filter)",
    "tstm_hh_mthspan (households)"
  ),
  n_rows = c(
    nrow(tstm_loans_amount),
    nrow(tstm_asset_loan_prc),
    nrow(tstm_asset_loan),
    nrow(tstm_hh_mthspan)
  )
)

ffv_render_table(tb_nonna, caption = "Distribution of agg_nonna_count (pre-filter)", name = "tb_nonna")
Distribution of agg_nonna_count (pre-filter)
agg_nonna_count n
33 14554
35 46
36 33
39 112718
ffv_render_table(tb_zero, caption = "Distribution of agg_sum_is0 (all-zero rows)", name = "tb_zero")
Distribution of agg_sum_is0 (all-zero rows)
agg_sum_is0 n
0 112797
1 14554
ffv_render_table(tb_nonna_zero, caption = "Cross-tab agg_nonna_count x agg_sum_is0", name = "tb_nonna_zero")
Cross-tab agg_nonna_count x agg_sum_is0
agg_nonna_count agg_sum_is0 n
33 1 14554
35 0 46
36 0 33
39 0 112718
ffv_render_table(tb_hh_span, caption = "Household month-span combinations", name = "tb_hh_span")
Household month-span combinations
month_start month_end month_n n_households
0 2 3 13
0 3 4 1
0 4 5 1
0 6 7 2
0 7 8 1
0 8 9 2
0 10 11 1
0 12 13 1
0 22 23 1
0 25 26 6
0 27 28 7
0 28 29 6
0 34 35 4
0 35 36 3
0 36 37 1
0 38 39 1
0 39 40 1
0 46 47 1
0 49 50 1
0 50 51 1
0 52 53 2
0 54 55 1
0 55 56 2
0 61 62 1
0 66 67 3
0 67 68 2
0 71 72 4
0 73 74 2
0 75 76 1
0 78 79 1
0 82 83 1
0 84 85 1
0 89 90 1
0 91 92 1
0 93 94 1
0 94 95 1
0 160 161 606
2 160 159 15
3 160 158 2
4 160 157 2
5 160 156 1
6 160 155 1
7 160 154 1
8 27 20 1
8 160 153 1
10 160 151 1
13 160 148 1
22 160 139 1
26 160 135 6
27 160 134 10
28 71 44 1
28 160 133 4
29 71 43 1
30 160 131 1
34 160 127 4
35 71 37 1
35 160 126 2
36 160 125 1
38 160 123 1
39 160 122 1
47 160 114 1
49 160 112 1
50 160 111 1
52 160 109 2
54 160 107 1
55 60 6 1
55 160 106 1
60 160 101 1
61 160 100 1
66 160 95 3
67 160 94 2
71 160 90 17
72 160 89 1
73 160 88 2
74 160 87 2
75 160 86 1
78 160 83 1
82 160 79 1
84 160 77 1
89 160 72 1
91 160 70 1
93 160 68 1
94 160 67 1
ffv_render_table(tb_funnel, caption = "Pipeline row counts", name = "tb_funnel")
Pipeline row counts
stage n_rows
tstm_loans_amount 15999
tstm_asset_loan_prc (merged) 127351
tstm_asset_loan (full-info filter) 112718
tstm_hh_mthspan (households) 791

Tables: column inventory

tb_cols_asset_loan <- data.frame(
  var = names(tstm_asset_loan),
  stringsAsFactors = FALSE
)
tb_cols_hh_span <- data.frame(
  var = names(tstm_hh_mthspan),
  stringsAsFactors = FALSE
)
tb_cols_loans_amount <- data.frame(
  var = names(tstm_loans_amount),
  stringsAsFactors = FALSE
)

ffv_render_table(tb_cols_loans_amount, caption = "tstm_loans_amount column names", name = "tb_cols_loans_amount")
tstm_loans_amount column names
var
hhid_Num
surveymonth
amt_all_lenders
amt_informal
amt_formal
ffv_render_table(tb_cols_asset_loan, caption = "tstm_asset_loan column names", name = "tb_cols_asset_loan")
tstm_asset_loan column names
var
id
month
agg_BS_01
agg_BS_03
agg_BS_17
agg_BS_1011
agg_BS_1012
agg_BS_1021
agg_BS_1022
agg_BS_1023
agg_BS_1024
agg_BS_1025
agg_BS_2011
agg_BS_2021
agg_BS_3011
agg_BS_3021
agg_BS_3022
agg_IS_1011
agg_IS_1012
agg_IS_1013
agg_IS_1021
agg_IS_1111
agg_IS_1112
agg_IS_1113
agg_IS_1114
agg_IS_1115
agg_IS_1116
agg_IS_2011
agg_IS_2012
agg_IS_2013
agg_IS_2014
agg_IS_2015
agg_IS_2016
agg_IS_2021
agg_IS_2022
agg_IS_3011
agg_IS_3021
agg_IS_3022
agg_loanflow_all
agg_loanflow_inf
agg_loanflow_for
ffv_render_table(tb_cols_hh_span, caption = "tstm_hh_mthspan column names", name = "tb_cols_hh_span")
tstm_hh_mthspan column names
var
id
month_start
month_end
month_n

Section 7 — Save and validate

When bl_replace_data_output is FALSE: save to data-temp/, then compare written .rda files to canonical data/.

When bl_replace_data_output is TRUE: validate in-memory objects against canonical data/ first, then save to data/. Validation never stops the knit; mismatches emit warning().

ar_outputs <- list(
  list(
    obj = tstm_loans_amount,
    name = "tstm_loans_amount",
    sort_keys = c("hhid_Num", "surveymonth")
  ),
  list(obj = tstm_asset_loan, name = "tstm_asset_loan", sort_keys = c("id", "month")),
  list(obj = tstm_hh_mthspan, name = "tstm_hh_mthspan", sort_keys = "id")
)

if (bl_replace_data_output) {
  ls_validate <- lapply(ar_outputs, function(spec) {
    ffv_compare_tstm(
      generated = spec$obj,
      canonical_path = file.path(st_data_canonical, paste0(spec$name, ".rda")),
      name = spec$name,
      sort_keys = spec$sort_keys
    )
  })

  ls_saved <- lapply(ar_outputs, function(spec) {
    ffv_save_gateway_tstm(spec$obj, spec$name, st_data_out)
  })
} else {
  ls_saved <- lapply(ar_outputs, function(spec) {
    ffv_save_gateway_tstm(spec$obj, spec$name, st_data_out)
  })

  ls_validate <- lapply(ar_outputs, function(spec) {
    ffv_compare_tstm(
      generated = ffv_load_tstm_rda(
        file.path(st_data_out, paste0(spec$name, ".rda")),
        spec$name,
        spec$sort_keys
      ),
      canonical_path = file.path(st_data_canonical, paste0(spec$name, ".rda")),
      name = spec$name,
      sort_keys = spec$sort_keys
    )
  })
}

tb_validate <- dplyr::bind_rows(lapply(ls_validate, function(x) {
  data.frame(
    object = x$object,
    dims_match = x$dims_match,
    cols_match = x$cols_match,
    identical = x$identical,
    all_equal_msg = substr(x$all_equal_msg, 1, 80),
    stringsAsFactors = FALSE
  )
}))

ffv_render_table(
  tb_validate,
  caption = paste0(
    "Validation vs canonical data/ (bl_replace_data_output = ",
    bl_replace_data_output, ")"
  ),
  name = "tb_validate"
)
Validation vs canonical data/ (bl_replace_data_output = FALSE)
object dims_match cols_match identical all_equal_msg
tstm_loans_amount TRUE TRUE TRUE TRUE
tstm_asset_loan TRUE TRUE TRUE TRUE
tstm_hh_mthspan TRUE TRUE TRUE TRUE

for (res in ls_validate) {
  print(glue::glue(
    "f-{it_file_code}, validate: {res$object} identical to data/ canonical: {res$identical}"
  ))
  if (!res$identical) {
    warning(
      res$object, " differs from canonical data/: ", res$all_equal_msg,
      call. = FALSE
    )
  }
}
#> f-963158, validate: tstm_loans_amount identical to data/ canonical: TRUE
#> f-963158, validate: tstm_asset_loan identical to data/ canonical: TRUE
#> f-963158, validate: tstm_hh_mthspan identical to data/ canonical: TRUE

if (verbose) {
  if (bl_replace_data_output) {
    print(glue::glue("f-{it_file_code}, save: replaced canonical data/*.rda"))
  } else {
    print(glue::glue(
      "f-{it_file_code}, save: wrote data-temp/*.rda (data/ unchanged)"
    ))
  }
  for (ls in ls_saved) {
    print(glue::glue("f-{it_file_code}, save: {ls$rda}"))
    print(glue::glue("f-{it_file_code}, save: {ls$csv}"))
    print(glue::glue("f-{it_file_code}, save: {ls$dta}"))
  }
}
#> f-963158, save: wrote data-temp/*.rda (data/ unchanged)
#> f-963158, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/data-temp/tstm_loans_amount.rda
#> f-963158, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/inst/extdata/tstm_loans_amount.csv
#> f-963158, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/inst/extdata/tstm_loans_amount.dta
#> f-963158, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/data-temp/tstm_asset_loan.rda
#> f-963158, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/inst/extdata/tstm_asset_loan.csv
#> f-963158, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/inst/extdata/tstm_asset_loan.dta
#> f-963158, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/data-temp/tstm_hh_mthspan.rda
#> f-963158, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/inst/extdata/tstm_hh_mthspan.csv
#> f-963158, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/inst/extdata/tstm_hh_mthspan.dta

When bl_replace_data_output is FALSE and validation reports identical = TRUE for all objects, set bl_replace_data_output <- TRUE in the setup chunk and re-knit to refresh canonical data/.

Note: tstm_loans_amount validation may report a mismatch if canonical data/tstm_loans_amount.rda was built before tstm_loans used anonymized hhid_Num; tstm_asset_loan and tstm_hh_mthspan are the primary regression/gateway checks.

Outputs produced by this run

The pipeline above wrote the following artifacts (the same links appear in Outputs at a glance at the top). Local CSV/Stata copies are also written to inst/extdata/ (git-ignored, not shipped):

Session info

sessionInfo()
#> R version 4.6.1 (2026-06-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] kableExtra_1.4.0  glue_1.8.1        readr_2.2.0       tidyr_1.3.2      
#> [5] dplyr_1.2.1       PrjThaiHFID_0.2.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] bit_4.6.0          jsonlite_2.0.0     crayon_1.5.3       compiler_4.6.1    
#>  [5] tidyselect_1.2.1   xml2_1.6.0         stringr_1.6.0      parallel_4.6.1    
#>  [9] textshaping_1.0.5  systemfonts_1.3.2  scales_1.4.0       yaml_2.3.12       
#> [13] fastmap_1.2.0      R6_2.6.1           generics_0.1.4     knitr_1.51        
#> [17] forcats_1.0.1      tibble_3.3.1       rprojroot_2.1.1    svglite_2.2.2     
#> [21] pillar_1.11.1      RColorBrewer_1.1-3 tzdb_0.5.0         rlang_1.2.0       
#> [25] stringi_1.8.7      xfun_0.59          bit64_4.8.2        otel_0.2.0        
#> [29] viridisLite_0.4.3  cli_3.6.6          withr_3.0.3        magrittr_2.0.5    
#> [33] digest_0.6.39      vroom_1.7.1        rstudioapi_0.19.0  haven_2.5.5       
#> [37] hms_1.1.4          lifecycle_1.0.5    vctrs_0.7.3        evaluate_1.0.5    
#> [41] farver_2.1.2       rmarkdown_2.31     purrr_1.2.2        tools_4.6.1       
#> [45] pkgconfig_2.0.3    htmltools_0.5.9