Skip to contents

This vignette chains the investment–loan–bridge gateway (PrjThaiHFID-#32), forward/backward investment windows (#2), and window balance-sheet statistics (#3), then merges investloan_type_m8 labels and regression category dummies.

Unit of observation (final): household × investment spell × ivars × blnc_vars × blnc_agg_stats, with bridge linkage labels attached.

Downstream use: the saved object tstm_invest_stats_bridgechar supports regressions of income change (win_diff, forward minus backward window means) on investment size (capital_invest), optionally by loan-bridge type and survey-month subsamples. Specifications align with stata/invest_rev_diff_250524_v2.do (mean aggregation; revenue IS_1111 / profit IS_1011 loops).

Production logic is ported from R-dev/ffs_hfid_window_windiff.R.

Outputs at a glance

This vignette saves one dataset and renders a series of diagnostic and paper-aligned tables.

Object Data page Role
tstm_invest_stats_bridgechar reference Household x investment-spell x balance-outcome x ivars window changes with bridge labels

The output is documented in R/data-res.R. It is written to data/ when bl_replace_data_output <- TRUE, otherwise to gitignored data-temp/.

Tables produced (not saved as data): pipeline funnel; investloan_type_m8 by asset ivars; regression category dummies; balance dimensions grid; outcome summaries for regression; Stata-aligned preview slice; merge diagnostics; and mth_inv_start by bridge type. Each table is gated by its own ls_save_res[["<name>"]] switch (all FALSE by default); flip one on to write its LaTeX/CSV to res/res_invest_return_bridge/ via ffp_save_res_table().

What ships vs. what is generated locally

  • Shipped with the installed package: the packaged datasets in data/ (lazy-loaded inputs tstm_loans_panel, tstm_asset_loan).
  • Generated locally when this vignette runs (neither pushed to GitHub nor shipped in the package): the tstm_invest_stats_bridgechar .rda written to data/ (or data-temp/) and its inst/extdata/*.csv / *.dta export; and any tables written to res/res_invest_return_bridge/ (.tex/.csv, only when the matching ls_save_res control is TRUE). The inst/extdata/ and res/ artifacts are git-ignored, saved only for local convenience. The anonymized crosswalk data-raw/tm_key_id_hh_anony.rda and data-raw/census_vil_hh_structure_count.rda are local build inputs (not shipped).

Required packaged inputs

The gateway loads only two objects from the package data/ folder (must exist locally when knitting):

Object Role
tstm_loans_panel Monthly loan panel (Group A)
tstm_asset_loan Household-month asset and loan aggregates (Group B)
data-raw/tm_key_id_hh_anony.rda Anonymized household → province/village crosswalk (step 5)
data-raw/census_vil_hh_structure_count.rda Village N_households (step 6)

ff_hfid_invest_winstats() also reads tstm_asset_loan directly (Stata agg_* subset). All other pipeline objects are computed in this run.

Contrast with ffv_invest_loan_bridge.qmd

ffv_invest_loan_bridge.qmd This vignette
Saved objects (bl_replace_data_output <- TRUE) Eight gateway intermediates: tstm_loans_pn_nd, tstm_loans_hooks, tstm_loans_bridges_type, tstm_invdates_uniq, tstm_invest, tstm_roster_invest_loan_linked, tstm_roster_invest2loan2bridge, tstm_invest2loan2bridge_chars One object: tstm_invest_stats_bridgechar
Uses prior vignette .rda? N/A (produces gateway outputs) No — re-calls ffp_hfid_invest_loan_linked_abc_investloan_char_gateway() so every gateway parameter used to build tstm_invest_stats_bridgechar appears in this file
Primary outputs Bridge-type counts / paper Table-style tabulations Window balance changes + bridge labels for return regressions

Design intent: self-contained, parameter-transparent knit rather than load(tstm_invest2loan2bridge_chars.rda) from the loan-bridge vignette. By contrast, ffv_bridge_timing.qmd does consume saved gateway outputs from data/.

Pipeline structure

The seven-step chain from packaged inputs to the saved tstm_invest_stats_bridgechar; an ASCII version of the same flow is in R-dev/ffs_hfid_window_windiff.R.

flowchart TD
  subgraph inputs [Packaged data]
    loansPanel[tstm_loans_panel]
    assetLoan[tstm_asset_loan]
  end
  subgraph step1 [Section1_Gateway]
    gw[ffp_hfid_invest_loan_linked_abc_investloan_char_gateway]
    gw --> tstm_invest
    gw --> tstm_invdates_uniq
    gw --> tstm_invest2loan2bridge_chars
  end
  subgraph step2 [Section2_Window]
    win[ff_hfid_invest_window]
    tstm_invdates_uniq --> win
    win --> tstm_invdates_uniq_fwbw
  end
  subgraph step3 [Section3_Winstats]
    ws[ff_hfid_invest_winstats]
    assetLoan --> ws
    tstm_invest --> ws
    tstm_invdates_uniq_fwbw --> ws
    ws --> tstm_invest_stats
  end
  subgraph step4 [Section4_Merge_bridge]
    merge[left_join_bridge_chars_and_cat_dummies]
    tstm_invest_stats --> merge
    tstm_invest2loan2bridge_chars --> merge
    merge --> allInvest[tstm_invest_stats_bridgechar_all_invest]
  end
  subgraph step5 [Section5_Merge_geography]
    geo[merge_tm_key_id_hh_id_prov_village_bl_id_ne]
    allInvest --> geo
  end
  subgraph step6 [Section6_Merge_MBF_timing]
    mbf[merge_N_households_mth_gap_mbf]
    geo --> mbf
  end
  subgraph step7 [Section7_Filter_and_save]
    mbf --> filt[filter_m8_and_valid_windows]
    filt --> out[tstm_invest_stats_bridgechar]
    out --> saveRda[data_or_data_temp]
  end
  loansPanel --> gw
  assetLoan --> gw

Functions relied on

Step Function Source
1 ffp_hfid_invest_loan_linked_abc_investloan_char_gateway() R/ffp_investloan_type_wrapper.R
2 ff_hfid_invest_window() R/ffs_window.R
3 ff_hfid_invest_winstats() R/ffs_winstats.R
4–5 dplyr merge, filter, mutate this vignette

Control parameters

Gateway parameters match ffv_invest_loan_bridge.qmd. Window length 12 forward and 12 backward matches Stata fw12bw12 naming in invest_rev_diff_*.do.

library(PrjThaiHFID)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
library(glue)
library(kableExtra)
#> 
#> Attaching package: 'kableExtra'
#> The following object is masked from 'package:dplyr':
#> 
#>     group_rows

bs_style <- c("striped", "hover", "condensed", "responsive")
options(kable_styling_bootstrap_options = bs_style)

# Script control
verbose <- TRUE
verbose_detail <- FALSE
it_verbose_detail_nrow <- 100L
it_file_code <- 646572L

# TRUE: overwrite canonical data/*.rda; FALSE: write to data-temp/ (gitignored)
bl_replace_data_output <- FALSE
bl_write_latex <- FALSE
st_kableformat <- "html"

# Gateway — loan panel variable names (Group A)
svr_lender_var <- "forinfm4"
svr_principal <- "bf5klm_bm6h_joint"
svr_principal_last <- "bm6h"
svr_principal_interest_sum <- "bm6b"

# Gateway — bridge filters (Group A.3–A.4)
bl_filter_bridge_grvgr0 <- TRUE
it_ll_grv_min <- -1L
bl_filter_loan_duration_a <- FALSE
bl_filter_loan_duration_b <- FALSE
bl_filter_lender_type <- FALSE
bl_filter_bridge_informal <- FALSE
bl_filter_loan_size <- FALSE
bl_filter_loan_duration_more <- FALSE

# Gateway — investment typing and sample (Groups B–E)
fl_sd_ithres <- stats::qnorm(0.99)
it_thres_invest_mth_gap <- 2L
it_gap_LBL_IL_min <- -6L
# Investment ivars: paper set plus agg_BS_3022 (hh assets only) for stata/invest_rev_diff_250519.do tables 2 and 4
ar_st_vars_to_keep <- c(
  "agg_BS_1021", "agg_BS_1012", "agg_BS_1011", "agg_BS_3022"
)
ar_ivars_paper <- ar_st_vars_to_keep
fl_min_invest_size_cut <- 10000
it_mth_inv_start_min_cut <- 14L
it_mth_inv_start_max_cut <- 144L
bl_drop_afrombc <- TRUE
bl_drop_cfromb <- TRUE
bl_compare2baserda <- FALSE

# Window — forward/backward month spans (issue #2)
it_window_forward <- 12L
it_window_backward <- 12L
bl_forward_include_imth <- FALSE
bl_backward_include_imth <- FALSE

# Winstats — balance-sheet aggregation (issue #3)
bl_compute_sum <- TRUE
bl_compute_mean <- TRUE
ar_st_asset_loan_id_cols <- c("id", "month")
ar_st_blnc_vars_stata <- c(
  "IS_1111", "IS_1113", "IS_1115", "IS_1116",
  "IS_1011", "IS_1013"
)
ar_st_agg_cols_asset_loan <- paste0("agg_", ar_st_blnc_vars_stata)
ar_st_cols_asset_loan <- c(ar_st_asset_loan_id_cols, ar_st_agg_cols_asset_loan)

# Bridge merge — join keys and category dummies
ar_st_bridge_join_keys <- c("id", "ivars", "hh_inv_asset_ctr")
st_bridge_id_col <- "hhid_Num"
st_bridge_type_col <- "investloan_type_m8"
st_investloan_no_loan <- "1-investment-no-loan"
ar_st_investloan_loan_ab <- c("2-investment-loan-a", "3-investment-loan-hook")
ar_st_investloan_prefix_4cat <- "^(4-|5-|6-|8-)"
# here::here() anchors on the temporary _quarto.yml pkgdown writes under
# vignettes/ during a quarto render; find the package root explicitly so paths
# resolve under both Quarto and knitr.
spn_pkg_root <- rprojroot::find_root(rprojroot::has_file("DESCRIPTION"))
st_path_tm_key_hh <- file.path(spn_pkg_root, "data-raw", "tm_key_id_hh_anony.rda")
st_path_census_vil <- file.path(spn_pkg_root, "data-raw", "census_vil_hh_structure_count.rda")
# MBF funding start month (same as R-script/ffs_bridge_count/ffs_bridge_count_mbf.R)
it_mbf_start_month <- 37L

st_data_out <- file.path(
  spn_pkg_root,
  if (bl_replace_data_output) "data" else "data-temp",
  fsep = .Platform$file.sep
)
st_extdata_dir <- file.path(spn_pkg_root, "inst", "extdata")
for (st_dir in c(st_data_out, st_extdata_dir)) {
  if (!dir.exists(st_dir)) {
    dir.create(st_dir, recursive = TRUE)
  }
}

# Per-vignette results folder and per-table save switches (all FALSE by default).
spt_res <- file.path(spn_pkg_root, "res", "res_invest_return_bridge")
ls_save_res <- list(
  tb_win_bad_share = FALSE,
  df_fw_bad_mth = FALSE,
  df_bw_bad_mth = FALSE,
  tb_funnel = FALSE,
  tb_m8_counts = FALSE,
  tb_m8_shares = FALSE,
  tb_cat2 = FALSE,
  tb_cat3 = FALSE,
  tb_cat4 = FALSE,
  tb_blnc_grid = FALSE,
  tb_outcome_m8 = FALSE,
  tb_stata_m8 = FALSE,
  tb_stata_summary = FALSE,
  tb_m8_avail = FALSE,
  tb_meets_m8 = FALSE,
  tb_mth_m8 = FALSE
)

ffv_save_gateway_tstm <- function(obj, name, out_dir, extdata_dir = st_extdata_dir) {
  if (!dir.exists(out_dir)) {
    dir.create(out_dir, recursive = TRUE)
  }
  if (!dir.exists(extdata_dir)) {
    dir.create(extdata_dir, recursive = TRUE)
  }

  st_rda <- file.path(out_dir, paste0(name, ".rda"))
  st_csv <- file.path(extdata_dir, paste0(name, ".csv"))
  st_dta <- file.path(extdata_dir, paste0(name, ".dta"))

  tmp_env <- new.env(parent = emptyenv())
  tmp_env[[name]] <- obj
  save(list = name, file = st_rda, envir = tmp_env)

  readr::write_csv(obj, st_csv)

  if (!requireNamespace("haven", quietly = TRUE)) {
    stop(
      "Package 'haven' is required to write .dta to inst/extdata. ",
      "Install with: install.packages('haven')",
      call. = FALSE
    )
  }
  haven::write_dta(obj, st_dta)

  invisible(list(rda = st_rda, csv = st_csv, dta = st_dta))
}

ffv_inv_type_by_ivars <- function(df, type_var) {
  df %>%
    dplyr::group_by(ivars, .data[[type_var]]) %>%
    dplyr::tally(name = "n") %>%
    tidyr::pivot_wider(names_from = ivars, values_from = n, values_fill = 0)
}

ffv_inv_type_share_by_ivars <- function(df, type_var) {
  df %>%
    dplyr::group_by(ivars, .data[[type_var]]) %>%
    dplyr::tally(name = "n") %>%
    dplyr::group_by(ivars) %>%
    dplyr::mutate(share = n / sum(n)) %>%
    dplyr::select(-n) %>%
    tidyr::pivot_wider(
      id_cols = dplyr::all_of(type_var),
      names_from = ivars,
      values_from = share
    )
}

ffv_render_table <- function(
    df,
    caption,
    name = NULL,
    st_format = st_kableformat) {
  # Optional latex (+ csv) save, gated by the per-table ls_save_res switch.
  if (!is.null(name) && isTRUE(ls_save_res[[name]])) {
    kbl_latex <- kbl(
      df,
      format = "latex",
      booktabs = TRUE,
      caption = caption,
      digits = 4
    )
    ffp_save_res_table(kbl_latex, name, spt_res, df = df, bl_save = TRUE)
  }
  kbl(
    df,
    format = st_format,
    booktabs = TRUE,
    caption = caption,
    digits = 4
  ) %>%
    kable_styling(
      bootstrap_options = bs_style,
      full_width = FALSE,
      position = "left"
    )
}

bl_win_val_bad <- function(x) {
  is.na(x) | !is.finite(x)
}

if (verbose) {
  print(glue::glue("f-{it_file_code}, controls"))
  print(glue::glue("  bl_replace_data_output: {bl_replace_data_output} -> {st_data_out}/"))
  print(glue::glue("  windows: fw{it_window_forward} bw{it_window_backward}"))
  print(glue::glue("  fl_min_invest_size_cut: {fl_min_invest_size_cut}"))
  print(glue::glue("  mth_inv_start in [{it_mth_inv_start_min_cut}, {it_mth_inv_start_max_cut}]"))
  print(glue::glue("  ar_st_vars_to_keep: {paste(ar_st_vars_to_keep, collapse = ', ')}"))
  print(glue::glue("  ar_st_blnc_vars_stata: {paste(ar_st_blnc_vars_stata, collapse = ', ')}"))
}
#> f-646572, controls
#>   bl_replace_data_output: FALSE -> /home/runner/work/PrjThaiHFID/PrjThaiHFID/data-temp/
#>   windows: fw12 bw12
#>   fl_min_invest_size_cut: 10000
#>   mth_inv_start in [14, 144]
#>   ar_st_vars_to_keep: agg_BS_1021, agg_BS_1012, agg_BS_1011, agg_BS_3022
#>   ar_st_blnc_vars_stata: IS_1111, IS_1113, IS_1115, IS_1116, IS_1011, IS_1013

Run gateway (step 1)

Re-calls the gateway inline (same parameters as ffv_invest_loan_bridge.qmd). Does not load saved gateway .rda files from data/.

if (verbose) {
  print(glue::glue("f-{it_file_code}, gateway: calling ffp_hfid_invest_loan_linked_abc_investloan_char_gateway"))
}
#> f-646572, gateway: calling ffp_hfid_invest_loan_linked_abc_investloan_char_gateway

ls_gateway_result <- PrjThaiHFID::ffp_hfid_invest_loan_linked_abc_investloan_char_gateway(
  svr_lender_var = svr_lender_var,
  svr_principal = svr_principal,
  svr_principal_last = svr_principal_last,
  svr_principal_interest_sum = svr_principal_interest_sum,
  bl_filter_bridge_grvgr0 = bl_filter_bridge_grvgr0,
  it_ll_grv_min = it_ll_grv_min,
  bl_filter_loan_duration_a = bl_filter_loan_duration_a,
  bl_filter_loan_duration_b = bl_filter_loan_duration_b,
  bl_filter_lender_type = bl_filter_lender_type,
  bl_filter_bridge_informal = bl_filter_bridge_informal,
  bl_filter_loan_size = bl_filter_loan_size,
  bl_filter_loan_duration_more = bl_filter_loan_duration_more,
  fl_sd_ithres = fl_sd_ithres,
  it_thres_invest_mth_gap = it_thres_invest_mth_gap,
  it_gap_LBL_IL_min = it_gap_LBL_IL_min,
  ar_st_vars_to_keep = ar_st_vars_to_keep,
  fl_min_invest_size = fl_min_invest_size_cut,
  it_mth_inv_start_min = it_mth_inv_start_min_cut,
  it_mth_inv_start_max = it_mth_inv_start_max_cut,
  bl_drop_afrombc = bl_drop_afrombc,
  bl_drop_cfromb = bl_drop_cfromb,
  bl_compare2baserda = bl_compare2baserda,
  verbose = verbose,
  verbose_detail = verbose_detail,
  it_verbose_detail_nrow = it_verbose_detail_nrow
)
#> 
#> ── char_gateway ────────────────────────────────────────────────────────────────
#> ℹ A.1 Loan non-duplicate
#> ✔ A.1 Loan non-duplicate [17ms]
#> 
#> ✔ tstm_loans_pn_nd: 19587 rows
#> ℹ A.2 Hook pairs
#> ✔ A.2 Hook pairs [6ms]
#> 
#> ✔ tstm_loans_hooks: 47754 rows
#> ℹ A.3 Bridges from hooks
#> ✔ A.3 Bridges from hooks [6ms]
#> 
#> ✔ tstm_loans_bridges: 94907 rows
#> ℹ A.4 Bridge type
#> ✔ A.4 Bridge type [6ms]
#> 
#> ✔ tstm_loans_bridges_type: 94907 rows
#> ℹ B Investment gateway
#> ✔ B Investment gateway [6ms]
#> 
#> ℹ Group B: processing 4 asset ivar(s): agg_BS_1021, agg_BS_1012, agg_BS_1011, agg_BS_3022
#> ✔ tstm_invest: 6530 rows
#> ℹ C.1 Bridge roster
#> ✔ C.1 Bridge roster [6ms]
#> 
#> Adding missing grouping variables: `br_type`, `br_type_id`
#> ✔ tstm_roster_invest_loan_bridge: 83252 rows
#> ℹ C.2 Linker
#> ✔ C.2 Linker [6ms]
#> 
#> ✔ tstm_roster_invest_loan_linker: 4975 rows
#> ℹ C.3 Invest-loan linked
#> ✔ C.3 Invest-loan linked [6ms]
#> 
#> ✔ tstm_roster_invest_loan_linked: 166942 rows
#> ℹ D.1 abc_distinct filter
#> ✔ D.1 abc_distinct filter [6ms]
#> 
#> ✔ tstm_roster_invest2loan2bridge_clean: 13859 rows
#> ℹ D.1b NA synchronization
#> ✔ D.1b NA synchronization [6ms]
#> 
#> ℹ D.2 Bridge char
#> ✔ D.2 Bridge char [81ms]
#> 
#> ✔ tstm_roster_invest2loan2bridge: 13859 rows
#> ℹ D.3 Investment char
#> ✔ D.3 Investment char [6ms]
#> 
#> ✔ tstm_invest2loan2bridge_chars: 4232 rows
#> ℹ E Invest typing summary
#> ✔ E Invest typing summary [6ms]
#> 
#> ── Group E: invest typing summary
#> ──────────────────────────────────────────────
#> ℹ Thresholds: capital_invest >= 10000; mth_inv_start in [14, 144]
#> 
#> ── Invest rows not in bridge chars
#> Count not in chars: 2298
#> By exclusion reason (mutually exclusive):
#> # A tibble: 4 × 3
#>   st_unmatched_reason           n   share
#>   <chr>                     <int>   <dbl>
#> 1 fail_size_and_start         238 0.104  
#> 2 fail_size_only             1276 0.555  
#> 3 fail_start_only             781 0.340  
#> 4 pass_cuts_still_unmatched     3 0.00131
#> 2x2 cross-tab (bl_meets_size x bl_meets_start):
#> # A tibble: 4 × 3
#>   bl_meets_size bl_meets_start     n
#>   <lgl>         <lgl>          <int>
#> 1 FALSE         FALSE            238
#> 2 FALSE         TRUE            1276
#> 3 TRUE          FALSE            781
#> 4 TRUE          TRUE               3
#> ! 3 row(s) pass cuts on tstm_invest but are not in bridge chars.Inspect df_invest_pipeline_consistency in the return list.
#> 
#> ── Pipeline consistency (per ivars)
#> # A tibble: 4 × 11
#>   ivars   n_invest n_pass_typing n_in_chars n_not_in_linked n_pass_not_in_linked
#>   <chr>      <int>         <int>      <int>           <int>                <int>
#> 1 agg_BS…     1390           736        735               0                    0
#> 2 agg_BS…     1414           821        821               0                    0
#> 3 agg_BS…     1890          1433       1433               0                    0
#> 4 agg_BS…     1836          1245       1243               0                    0
#> # ℹ 5 more variables: n_pass_not_in_chars <int>,
#> #   n_capital_mismatch_invest_roster <int>,
#> #   n_start_mismatch_invest_roster <int>, n_ctr_mismatch_invest_roster <int>,
#> #   n_capital_mismatch_chars_roster <int>
#> ✖ GATEWAY INVEST PIPELINE CONSISTENCY: FAIL (see df_invest_pipeline_consistency)
#> Warning: Investment pipeline consistency check failed: some investments passing
#> typing thresholds are missing from tstm_invest2loan2bridge_chars, or
#> tstm_invest keys/fields disagree with the invest2loan roster in this run.
#> Inspect df_invest_pipeline_consistency in the return list. Do not mix return
#> objects with packaged data/ .rda files.

tstm_invest <- ls_gateway_result$tstm_invest
tstm_invdates_uniq <- ls_gateway_result$tstm_invdates_uniq
tstm_invest2loan2bridge_chars <- ls_gateway_result$tstm_invest2loan2bridge_chars

if (verbose) {
  print(glue::glue("f-{it_file_code}, gateway: tstm_invest {nrow(tstm_invest)} rows"))
  print(glue::glue("f-{it_file_code}, gateway: tstm_invdates_uniq {nrow(tstm_invdates_uniq)} rows"))
  print(glue::glue(
    "f-{it_file_code}, gateway: tstm_invest2loan2bridge_chars {nrow(tstm_invest2loan2bridge_chars)} rows"
  ))
  if (!isTRUE(ls_gateway_result$bl_invest_pipeline_consistency_ok)) {
    print(glue::glue("f-{it_file_code}, gateway: warning — invest pipeline consistency check failed"))
  }
}
#> f-646572, gateway: tstm_invest 6530 rows
#> f-646572, gateway: tstm_invdates_uniq 3725 rows
#> f-646572, gateway: tstm_invest2loan2bridge_chars 4232 rows
#> f-646572, gateway: warning — invest pipeline consistency check failed

Investment windows (step 2)

if (verbose) {
  print(glue::glue(
    "f-{it_file_code}, window: ff_hfid_invest_window fw{it_window_forward} bw{it_window_backward}"
  ))
}
#> f-646572, window: ff_hfid_invest_window fw12 bw12

tstm_invdates_uniq_fwbw <- PrjThaiHFID::ff_hfid_invest_window(
  df_invdates_uniq = tstm_invdates_uniq,
  it_window_forward = it_window_forward,
  it_window_backward = it_window_backward,
  bl_forward_include_imth = bl_forward_include_imth,
  bl_backward_include_imth = bl_backward_include_imth
)

if (verbose) {
  print(glue::glue("f-{it_file_code}, window: tstm_invdates_uniq_fwbw {nrow(tstm_invdates_uniq_fwbw)} rows"))
}
#> f-646572, window: tstm_invdates_uniq_fwbw 89400 rows

Window balance-sheet statistics (step 3)

Subset tstm_asset_loan to Stata-used agg_* columns only (not the full wide asset file).

ar_st_agg_missing <- setdiff(ar_st_agg_cols_asset_loan, names(PrjThaiHFID::tstm_asset_loan))
if (length(ar_st_agg_missing) > 0L) {
  stop(
    "tstm_asset_loan missing Stata agg columns: ",
    paste(ar_st_agg_missing, collapse = ", "),
    call. = FALSE
  )
}

df_asset_loan_stata <- PrjThaiHFID::tstm_asset_loan %>%
  select(all_of(ar_st_cols_asset_loan))

if (verbose) {
  print(glue::glue("f-{it_file_code}, winstats: ff_hfid_invest_winstats"))
  print(glue::glue(
    "f-{it_file_code}, winstats: df_asset_loan_stata {ncol(df_asset_loan_stata)} cols"
  ))
}
#> f-646572, winstats: ff_hfid_invest_winstats
#> f-646572, winstats: df_asset_loan_stata 8 cols

tstm_invest_stats <- PrjThaiHFID::ff_hfid_invest_winstats(
  df_asset_loan = df_asset_loan_stata,
  df_invest = tstm_invest,
  df_invwin_uniq = tstm_invdates_uniq_fwbw,
  bl_compute_sum = bl_compute_sum,
  bl_compute_mean = bl_compute_mean
)

if (verbose) {
  print(glue::glue("f-{it_file_code}, winstats: tstm_invest_stats {nrow(tstm_invest_stats)} rows"))
}
#> f-646572, winstats: tstm_invest_stats 78360 rows

Window validity diagnostics (step 3b)

win_forward / win_backward can be NA or non-finite when the window lacks enough observed balance-sheet months (calendar edge or household not in every survey month).

tb_win_bad_share <- tstm_invest_stats %>%
  dplyr::ungroup() %>%
  dplyr::summarize(
    n = dplyr::n(),
    n_fw_bad = sum(bl_win_val_bad(win_forward)),
    n_bw_bad = sum(bl_win_val_bad(win_backward)),
    share_fw_bad = round(n_fw_bad / n, 4),
    share_bw_bad = round(n_bw_bad / n, 4)
  )

df_fw_bad_mth <- tstm_invest_stats %>%
  dplyr::ungroup() %>%
  dplyr::filter(bl_win_val_bad(win_forward)) %>%
  dplyr::summarize(
    n = dplyr::n(),
    mth_inv_start_min = min(mth_inv_start, na.rm = TRUE),
    mth_inv_start_mean = mean(mth_inv_start, na.rm = TRUE),
    mth_inv_start_max = max(mth_inv_start, na.rm = TRUE)
  )

df_bw_bad_mth <- tstm_invest_stats %>%
  dplyr::ungroup() %>%
  dplyr::filter(bl_win_val_bad(win_backward)) %>%
  dplyr::summarize(
    n = dplyr::n(),
    mth_inv_start_min = min(mth_inv_start, na.rm = TRUE),
    mth_inv_start_mean = mean(mth_inv_start, na.rm = TRUE),
    mth_inv_start_max = max(mth_inv_start, na.rm = TRUE)
  )

ffv_render_table(
  tb_win_bad_share,
  caption = "Share of invest-stats rows with NA or non-finite window stats (before bridge merge)",
  name = "tb_win_bad_share"
)
Share of invest-stats rows with NA or non-finite window stats (before bridge merge)
n n_fw_bad n_bw_bad share_fw_bad share_bw_bad
78360 4944 6144 0.0631 0.0784
ffv_render_table(
  df_fw_bad_mth,
  caption = "mth_inv_start range when win_forward is NA or invalid",
  name = "df_fw_bad_mth"
)
mth_inv_start range when win_forward is NA or invalid
n mth_inv_start_min mth_inv_start_mean mth_inv_start_max
4944 1 146.551 160
ffv_render_table(
  df_bw_bad_mth,
  caption = "mth_inv_start range when win_backward is NA or invalid",
  name = "df_bw_bad_mth"
)
mth_inv_start range when win_backward is NA or invalid
n mth_inv_start_min mth_inv_start_mean mth_inv_start_max
6144 1 10.6934 104

Merge bridge characteristics (step 4)

tstm_invest2loan2bridge_chars_sel <- tstm_invest2loan2bridge_chars %>%
  dplyr::select(all_of(c(st_bridge_id_col, "ivars", "hh_inv_asset_ctr", st_bridge_type_col))) %>%
  dplyr::rename(id = dplyr::all_of(st_bridge_id_col))

tstm_invest_stats_bridgechar_all_invest <- tstm_invest_stats %>%
  dplyr::left_join(
    tstm_invest2loan2bridge_chars_sel,
    by = ar_st_bridge_join_keys
  ) %>%
  dplyr::mutate(
    bl_meets_size = capital_invest >= fl_min_invest_size_cut,
    bl_meets_start = mth_inv_start >= it_mth_inv_start_min_cut &
      mth_inv_start <= it_mth_inv_start_max_cut,
    bl_has_investloan_type_m8 = !is.na(.data[[st_bridge_type_col]]),
    investloan_type_m8_avail_2x2 = dplyr::case_when(
      bl_has_investloan_type_m8 ~ "has_investloan_type_m8",
      !bl_meets_size & !bl_meets_start ~ "absent_fail_size_and_start",
      !bl_meets_size & bl_meets_start ~ "absent_fail_size_only",
      bl_meets_size & !bl_meets_start ~ "absent_fail_start_only",
      bl_meets_size & bl_meets_start ~ "absent_pass_cuts_still_unmatched",
      TRUE ~ "absent_other"
    ),
    investloan_type_m8_cat2 = ifelse(.data[[st_bridge_type_col]] == st_investloan_no_loan, 0L, 1L),
    investloan_type_m8_cat2v2 = dplyr::case_when(
      .data[[st_bridge_type_col]] == st_investloan_no_loan ~ 0L,
      .data[[st_bridge_type_col]] %in% ar_st_investloan_loan_ab ~ 0L,
      TRUE ~ 1L
    ),
    investloan_type_m8_cat3 = dplyr::case_when(
      .data[[st_bridge_type_col]] == st_investloan_no_loan ~ 0L,
      .data[[st_bridge_type_col]] %in% ar_st_investloan_loan_ab ~ 1L,
      TRUE ~ 2L
    ),
    investloan_type_m8_cat4 = dplyr::case_when(
      .data[[st_bridge_type_col]] == st_investloan_no_loan ~ 0L,
      .data[[st_bridge_type_col]] %in% ar_st_investloan_loan_ab ~ 1L,
      grepl(ar_st_investloan_prefix_4cat, .data[[st_bridge_type_col]]) ~ 2L,
      TRUE ~ 3L
    )
  )

if (verbose) {
  print(glue::glue(
    "f-{it_file_code}, merge: tstm_invest_stats_bridgechar_all_invest ",
    "{nrow(tstm_invest_stats_bridgechar_all_invest)} rows x ",
    "{ncol(tstm_invest_stats_bridgechar_all_invest)} cols"
  ))
}
#> f-646572, merge: tstm_invest_stats_bridgechar_all_invest 78360 rows x 26 cols

Merge household geography (step 5)

Attach province, village, and northeast indicators from data-raw/tm_key_id_hh_anony.rda (anonymized-only crosswalk) via anonymized household ID (id = tmid_hh). id_prov (tmid_prov), id_village (tmid_vil), bl_id_ne (1 if region_name == "Northeast", else 0).

if (!file.exists(st_path_tm_key_hh)) {
  stop("Missing required input: ", st_path_tm_key_hh, call. = FALSE)
}
load(st_path_tm_key_hh)

df_hh_geo <- tm_key_id_hh_anony %>%
  dplyr::distinct(tmid_hh, tmid_prov, tmid_vil, region_name) %>%
  dplyr::mutate(
    id_prov = as.integer(tmid_prov),
    id_village = as.integer(tmid_vil),
    bl_id_ne = as.integer(region_name == "Northeast")
  ) %>%
  dplyr::select(tmid_hh, id_prov, id_village, bl_id_ne)

n_before_geo <- nrow(tstm_invest_stats_bridgechar_all_invest)

tstm_invest_stats_bridgechar_all_invest <- tstm_invest_stats_bridgechar_all_invest %>%
  dplyr::left_join(df_hh_geo, by = c("id" = "tmid_hh"))

n_miss_geo <- sum(
  is.na(tstm_invest_stats_bridgechar_all_invest$id_prov) |
    is.na(tstm_invest_stats_bridgechar_all_invest$id_village)
)

if (verbose) {
  print(glue::glue(
    "f-{it_file_code}, merge-hh-geo: {n_before_geo} rows; ",
    "NA id_prov/id_village={n_miss_geo}; ",
    "bl_id_ne share={round(100 * mean(tstm_invest_stats_bridgechar_all_invest$bl_id_ne, na.rm = TRUE), 2)}%"
  ))
}
#> f-646572, merge-hh-geo: 78360 rows; NA id_prov/id_village=0; bl_id_ne share=49.48%

if (n_miss_geo > 0L) {
  warning(
    "Some rows lack geography after tm_key_id_hh join (n = ", n_miss_geo, ")",
    call. = FALSE
  )
}

Merge MBF village counts and timing (step 6)

Follows R-script/ffs_bridge_count/ffs_bridge_count_mbf.R §6: village household counts from census_vil_hh_structure_count.rda, joined on id_village (tmid_vil), plus MBF-relative month bins from mth_inv_start.

if (!file.exists(st_path_census_vil)) {
  stop("Missing required input: ", st_path_census_vil, call. = FALSE)
}
load(st_path_census_vil)

df_vil_hh <- census_vil_hh_structure_count %>%
  dplyr::select(tmid_vil, N_households)

tstm_invest_stats_bridgechar_all_invest <- tstm_invest_stats_bridgechar_all_invest %>%
  dplyr::left_join(df_vil_hh, by = c("id_village" = "tmid_vil")) %>%
  dplyr::mutate(
    mth_gap_mbf = mth_inv_start - it_mbf_start_month,
    mth_gap_mbf_cat = dplyr::case_when(
      mth_gap_mbf >= -24 & mth_gap_mbf <= -13 ~ "-2yr",
      mth_gap_mbf >= -12 & mth_gap_mbf <=  -1 ~ "-1yr",
      mth_gap_mbf >=   0 & mth_gap_mbf <=  11 ~ "yr1",
      mth_gap_mbf >=  12 & mth_gap_mbf <=  23 ~ "yr2",
      mth_gap_mbf >=  24 & mth_gap_mbf <=  35 ~ "yr3",
      mth_gap_mbf >=  36 & mth_gap_mbf <=  47 ~ "yr4",
      mth_gap_mbf >=  48 & mth_gap_mbf <=  59 ~ "yr5",
      mth_gap_mbf >=  60 & mth_gap_mbf <=  71 ~ "yr6",
      mth_gap_mbf >=  72 & mth_gap_mbf <=  83 ~ "yr7",
      mth_gap_mbf >=  84 & mth_gap_mbf <=  95 ~ "yr8",
      mth_gap_mbf >=  96 & mth_gap_mbf <= 107 ~ "yr9",
      TRUE ~ NA_character_
    ),
    mth_gap_mbf_cat_int = dplyr::case_when(
      mth_gap_mbf_cat == "-2yr" ~ -2L,
      mth_gap_mbf_cat == "-1yr" ~ -1L,
      mth_gap_mbf_cat == "yr1" ~ 1L,
      mth_gap_mbf_cat == "yr2" ~ 2L,
      mth_gap_mbf_cat == "yr3" ~ 3L,
      mth_gap_mbf_cat == "yr4" ~ 4L,
      mth_gap_mbf_cat == "yr5" ~ 5L,
      mth_gap_mbf_cat == "yr6" ~ 6L,
      mth_gap_mbf_cat == "yr7" ~ 7L,
      mth_gap_mbf_cat == "yr8" ~ 8L,
      mth_gap_mbf_cat == "yr9" ~ 9L,
      TRUE ~ NA_integer_
    )
  )

n_miss_n_hh <- sum(is.na(tstm_invest_stats_bridgechar_all_invest$N_households))
n_miss_mbf_cat <- sum(is.na(tstm_invest_stats_bridgechar_all_invest$mth_gap_mbf_cat))

if (verbose) {
  print(glue::glue(
    "f-{it_file_code}, merge-mbf-vil: rows={nrow(tstm_invest_stats_bridgechar_all_invest)}; ",
    "NA N_households={n_miss_n_hh}; NA mth_gap_mbf_cat={n_miss_mbf_cat}"
  ))
  print(glue::glue(
    "  mth_gap_mbf range: [{min(tstm_invest_stats_bridgechar_all_invest$mth_gap_mbf, na.rm = TRUE)}, ",
    "{max(tstm_invest_stats_bridgechar_all_invest$mth_gap_mbf, na.rm = TRUE)}]"
  ))
  print(table(tstm_invest_stats_bridgechar_all_invest$mth_gap_mbf_cat_int, useNA = "ifany"))
}
#> f-646572, merge-mbf-vil: rows=78360; NA N_households=0; NA mth_gap_mbf_cat=11760
#>   mth_gap_mbf range: [-36, 123]
#> 
#>    -2    -1     1     2     3     4     5     6     7     8     9  <NA> 
#>  6144  6252  6732  5592  7476  6420  7296  5400  5724  4992  4572 11760

if (n_miss_n_hh > 0L) {
  warning(
    "Some rows lack N_households after census join (n = ", n_miss_n_hh, ")",
    call. = FALSE
  )
}

Analysis subset (step 7)

Primary return: tstm_invest_stats_bridgecharinvestloan_type_m8 present and finite win_forward / win_backward.

tstm_invest_stats_bridgechar <- tstm_invest_stats_bridgechar_all_invest %>%
  dplyr::ungroup() %>%
  dplyr::filter(
    !is.na(.data[[st_bridge_type_col]]),
    !bl_win_val_bad(win_forward),
    !bl_win_val_bad(win_backward)
  ) %>%
  dplyr::select(-investloan_type_m8_avail_2x2)

if (verbose) {
  print(glue::glue(
    "f-{it_file_code}, filter: tstm_invest_stats_bridgechar ",
    "{nrow(tstm_invest_stats_bridgechar)} rows ",
    "({round(100 * nrow(tstm_invest_stats_bridgechar) / nrow(tstm_invest_stats_bridgechar_all_invest), 1)}% of all_invest)"
  ))
}
#> f-646572, filter: tstm_invest_stats_bridgechar 50184 rows (64% of all_invest)

Save tstm_invest_stats_bridgechar

ls_saved <- ffv_save_gateway_tstm(
  obj = tstm_invest_stats_bridgechar,
  name = "tstm_invest_stats_bridgechar",
  out_dir = st_data_out
)

if (verbose) {
  if (bl_replace_data_output) {
    print(glue::glue("f-{it_file_code}, save: replaced canonical data/tstm_invest_stats_bridgechar.rda"))
  } else {
    print(glue::glue(
      "f-{it_file_code}, save: wrote data-temp/tstm_invest_stats_bridgechar.rda (data/ unchanged)"
    ))
  }
  print(glue::glue("f-{it_file_code}, save: {ls_saved$csv}"))
  print(glue::glue("f-{it_file_code}, save: {ls_saved$dta}"))
}
#> f-646572, save: wrote data-temp/tstm_invest_stats_bridgechar.rda (data/ unchanged)
#> f-646572, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/inst/extdata/tstm_invest_stats_bridgechar.csv
#> f-646572, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/inst/extdata/tstm_invest_stats_bridgechar.dta

Tables: pipeline funnel

tb_funnel <- data.frame(
  stage = c(
    "tstm_invest_stats",
    "tstm_invest_stats_bridgechar_all_invest",
    "tstm_invest_stats_bridgechar"
  ),
  n_rows = c(
    nrow(tstm_invest_stats),
    nrow(tstm_invest_stats_bridgechar_all_invest),
    nrow(tstm_invest_stats_bridgechar)
  ),
  n_cols = c(
    ncol(tstm_invest_stats),
    ncol(tstm_invest_stats_bridgechar_all_invest),
    ncol(tstm_invest_stats_bridgechar)
  )
) %>%
  dplyr::mutate(
    pct_of_stats = round(100 * n_rows / n_rows[stage == "tstm_invest_stats"], 1),
    pct_of_all_invest = round(
      100 * n_rows / n_rows[stage == "tstm_invest_stats_bridgechar_all_invest"],
      1
    )
  )

ffv_render_table(
  tb_funnel,
  caption = "Row counts through the window + bridge pipeline",
  name = "tb_funnel"
)
Row counts through the window + bridge pipeline
stage n_rows n_cols pct_of_stats pct_of_all_invest
tstm_invest_stats 78360 17 100 100
tstm_invest_stats_bridgechar_all_invest 78360 33 100 100
tstm_invest_stats_bridgechar 50184 32 64 64

Tables: investloan_type_m8 by asset (ivars)

tb_m8_counts <- ffv_inv_type_by_ivars(
  tstm_invest_stats_bridgechar, "investloan_type_m8"
)
tb_m8_shares <- ffv_inv_type_share_by_ivars(
  tstm_invest_stats_bridgechar, "investloan_type_m8"
)

# Publication-style builder: bridge type x asset-definition (ivars) counts.
f_tb_m8_counts <- function(st_format) {
  ar_st_ivars <- names(tb_m8_counts)[-1]
  bk <- kbl(
    tb_m8_counts,
    format = st_format,
    linesep = "",
    booktabs = TRUE,
    align = "c",
    caption = "Analysis-sample counts by investloan_type_m8 and ivars",
    col.names = c("Bridge type (investloan_type_m8)", ar_st_ivars)
  ) %>%
    kable_styling(
      bootstrap_options = c("striped", "hover", "condensed", "responsive"),
      full_width = FALSE, position = "left"
    ) %>%
    add_header_above(c(" " = 1, "Asset definition (ivars) — counts" = length(ar_st_ivars))) %>%
    column_spec(1, width = "16em") %>%
    column_spec(2:(length(ar_st_ivars) + 1), width = "6em")
  return(bk)
}

# Publication-style builder: bridge type x asset-definition (ivars) shares.
f_tb_m8_shares <- function(st_format) {
  ar_st_ivars <- names(tb_m8_shares)[-1]
  bk <- kbl(
    tb_m8_shares,
    format = st_format,
    linesep = "",
    booktabs = TRUE,
    align = "c",
    caption = "Analysis-sample shares by investloan_type_m8 and ivars",
    col.names = c("Bridge type (investloan_type_m8)", ar_st_ivars),
    digits = 4
  ) %>%
    kable_styling(
      bootstrap_options = c("striped", "hover", "condensed", "responsive"),
      full_width = FALSE, position = "left"
    ) %>%
    add_header_above(c(" " = 1, "Asset definition (ivars) — within-ivars shares" = length(ar_st_ivars))) %>%
    column_spec(1, width = "16em") %>%
    column_spec(2:(length(ar_st_ivars) + 1), width = "6em")
  return(bk)
}

f_tb_m8_counts(st_kableformat)
Analysis-sample counts by investloan_type_m8 and ivars
Asset definition (ivars) — counts
Bridge type (investloan_type_m8) agg_BS_1011 agg_BS_1012 agg_BS_1021 agg_BS_3022
1-investment-no-loan 5016 2880 2424 4572
2-investment-loan-a 3348 1680 1500 2916
3-investment-loan-hook 1356 696 600 1236
4-brg-single-lender 708 300 324 636
5-brg-baac-vilfund-combo 468 372 324 444
6-brg-informal-quasifor-combo 36 12 24 36
7-brg-has-informal-bridge 3360 2076 1968 2604
8-brg-has-informal-in-bridge 2700 1716 1548 2304
ffp_save_res_table(
  f_tb_m8_counts("latex"), "tb_m8_counts", spt_res,
  df = tb_m8_counts, bl_save = ls_save_res[["tb_m8_counts"]]
)
f_tb_m8_shares(st_kableformat)
Analysis-sample shares by investloan_type_m8 and ivars
Asset definition (ivars) — within-ivars shares
Bridge type (investloan_type_m8) agg_BS_1011 agg_BS_1012 agg_BS_1021 agg_BS_3022
1-investment-no-loan 0.2952 0.2959 0.2782 0.3100
2-investment-loan-a 0.1970 0.1726 0.1722 0.1977
3-investment-loan-hook 0.0798 0.0715 0.0689 0.0838
4-brg-single-lender 0.0417 0.0308 0.0372 0.0431
5-brg-baac-vilfund-combo 0.0275 0.0382 0.0372 0.0301
6-brg-informal-quasifor-combo 0.0021 0.0012 0.0028 0.0024
7-brg-has-informal-bridge 0.1977 0.2133 0.2259 0.1766
8-brg-has-informal-in-bridge 0.1589 0.1763 0.1777 0.1562
ffp_save_res_table(
  f_tb_m8_shares("latex"), "tb_m8_shares", spt_res,
  df = tb_m8_shares, bl_save = ls_save_res[["tb_m8_shares"]]
)

Tables: regression category dummies

tb_cat2 <- tstm_invest_stats_bridgechar %>%
  dplyr::group_by(investloan_type_m8_cat2, investloan_type_m8) %>%
  dplyr::tally(name = "n") %>%
  dplyr::arrange(investloan_type_m8_cat2, investloan_type_m8)

tb_cat3 <- tstm_invest_stats_bridgechar %>%
  dplyr::group_by(investloan_type_m8_cat3, investloan_type_m8) %>%
  dplyr::tally(name = "n") %>%
  dplyr::arrange(investloan_type_m8_cat3, investloan_type_m8)

tb_cat4 <- tstm_invest_stats_bridgechar %>%
  dplyr::group_by(investloan_type_m8_cat4, investloan_type_m8) %>%
  dplyr::tally(name = "n") %>%
  dplyr::arrange(investloan_type_m8_cat4, investloan_type_m8)

ffv_render_table(tb_cat2, caption = "Counts by investloan_type_m8_cat2 and investloan_type_m8", name = "tb_cat2")
Counts by investloan_type_m8_cat2 and investloan_type_m8
investloan_type_m8_cat2 investloan_type_m8 n
0 1-investment-no-loan 14892
1 2-investment-loan-a 9444
1 3-investment-loan-hook 3888
1 4-brg-single-lender 1968
1 5-brg-baac-vilfund-combo 1608
1 6-brg-informal-quasifor-combo 108
1 7-brg-has-informal-bridge 10008
1 8-brg-has-informal-in-bridge 8268
ffv_render_table(tb_cat3, caption = "Counts by investloan_type_m8_cat3 and investloan_type_m8", name = "tb_cat3")
Counts by investloan_type_m8_cat3 and investloan_type_m8
investloan_type_m8_cat3 investloan_type_m8 n
0 1-investment-no-loan 14892
1 2-investment-loan-a 9444
1 3-investment-loan-hook 3888
2 4-brg-single-lender 1968
2 5-brg-baac-vilfund-combo 1608
2 6-brg-informal-quasifor-combo 108
2 7-brg-has-informal-bridge 10008
2 8-brg-has-informal-in-bridge 8268
ffv_render_table(tb_cat4, caption = "Counts by investloan_type_m8_cat4 and investloan_type_m8", name = "tb_cat4")
Counts by investloan_type_m8_cat4 and investloan_type_m8
investloan_type_m8_cat4 investloan_type_m8 n
0 1-investment-no-loan 14892
1 2-investment-loan-a 9444
1 3-investment-loan-hook 3888
2 4-brg-single-lender 1968
2 5-brg-baac-vilfund-combo 1608
2 6-brg-informal-quasifor-combo 108
2 8-brg-has-informal-in-bridge 8268
3 7-brg-has-informal-bridge 10008

Tables: balance dimensions

tb_blnc_grid <- tstm_invest_stats_bridgechar %>%
  dplyr::count(blnc_vars, blnc_agg_stats, ivars, name = "n") %>%
  dplyr::arrange(blnc_vars, blnc_agg_stats, ivars)

ffv_render_table(
  tb_blnc_grid,
  caption = "Analysis-sample row counts by blnc_vars, blnc_agg_stats, and ivars",
  name = "tb_blnc_grid"
)
Analysis-sample row counts by blnc_vars, blnc_agg_stats, and ivars
blnc_vars blnc_agg_stats ivars n
IS_1011 mean agg_BS_1011 1416
IS_1011 mean agg_BS_1012 811
IS_1011 mean agg_BS_1021 726
IS_1011 mean agg_BS_3022 1229
IS_1011 sum agg_BS_1011 1416
IS_1011 sum agg_BS_1012 811
IS_1011 sum agg_BS_1021 726
IS_1011 sum agg_BS_3022 1229
IS_1013 mean agg_BS_1011 1416
IS_1013 mean agg_BS_1012 811
IS_1013 mean agg_BS_1021 726
IS_1013 mean agg_BS_3022 1229
IS_1013 sum agg_BS_1011 1416
IS_1013 sum agg_BS_1012 811
IS_1013 sum agg_BS_1021 726
IS_1013 sum agg_BS_3022 1229
IS_1111 mean agg_BS_1011 1416
IS_1111 mean agg_BS_1012 811
IS_1111 mean agg_BS_1021 726
IS_1111 mean agg_BS_3022 1229
IS_1111 sum agg_BS_1011 1416
IS_1111 sum agg_BS_1012 811
IS_1111 sum agg_BS_1021 726
IS_1111 sum agg_BS_3022 1229
IS_1113 mean agg_BS_1011 1416
IS_1113 mean agg_BS_1012 811
IS_1113 mean agg_BS_1021 726
IS_1113 mean agg_BS_3022 1229
IS_1113 sum agg_BS_1011 1416
IS_1113 sum agg_BS_1012 811
IS_1113 sum agg_BS_1021 726
IS_1113 sum agg_BS_3022 1229
IS_1115 mean agg_BS_1011 1416
IS_1115 mean agg_BS_1012 811
IS_1115 mean agg_BS_1021 726
IS_1115 mean agg_BS_3022 1229
IS_1115 sum agg_BS_1011 1416
IS_1115 sum agg_BS_1012 811
IS_1115 sum agg_BS_1021 726
IS_1115 sum agg_BS_3022 1229
IS_1116 mean agg_BS_1011 1416
IS_1116 mean agg_BS_1012 811
IS_1116 mean agg_BS_1021 726
IS_1116 mean agg_BS_3022 1229
IS_1116 sum agg_BS_1011 1416
IS_1116 sum agg_BS_1012 811
IS_1116 sum agg_BS_1021 726
IS_1116 sum agg_BS_3022 1229

Tables: outcome summaries for regression

tb_outcome_m8 <- tstm_invest_stats_bridgechar %>%
  dplyr::group_by(investloan_type_m8) %>%
  dplyr::summarize(
    n = dplyr::n(),
    win_diff_mean = mean(win_diff, na.rm = TRUE),
    win_diff_median = median(win_diff, na.rm = TRUE),
    win_forward_mean = mean(win_forward, na.rm = TRUE),
    win_backward_mean = mean(win_backward, na.rm = TRUE),
    capital_invest_mean = mean(capital_invest, na.rm = TRUE),
    capital_invest_median = median(capital_invest, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  dplyr::arrange(investloan_type_m8)

# Publication-style builder: outcome summaries by bridge type.
f_tb_outcome_m8 <- function(st_format) {
  bk <- kbl(
    tb_outcome_m8,
    format = st_format,
    linesep = "",
    booktabs = TRUE,
    align = "c",
    caption = "Mean/median win_diff and capital_invest by investloan_type_m8 (all blnc rows)",
    col.names = c(
      "Bridge type (investloan_type_m8)", "N",
      "Mean", "Median", "Fwd mean", "Bwd mean",
      "Mean", "Median"
    ),
    digits = 4
  ) %>%
    kable_styling(
      bootstrap_options = c("striped", "hover", "condensed", "responsive"),
      full_width = FALSE, position = "left"
    ) %>%
    add_header_above(c(
      " " = 2,
      "win_diff" = 2,
      "Window means" = 2,
      "capital_invest" = 2
    )) %>%
    column_spec(1, width = "16em")
  return(bk)
}

f_tb_outcome_m8(st_kableformat)
Mean/median win_diff and capital_invest by investloan_type_m8 (all blnc rows)
win_diff
Window means
capital_invest
Bridge type (investloan_type_m8) N Mean Median Fwd mean Bwd mean Mean Median
1-investment-no-loan 14892 5190.170 20.0000 149427.92 144237.75 150142.99 39245.50
2-investment-loan-a 9444 5579.087 0.0000 112231.25 106652.16 130044.54 45690.42
3-investment-loan-hook 3888 5690.960 160.0000 135154.93 129463.97 122155.56 48170.24
4-brg-single-lender 1968 7002.587 21.9853 110317.68 103315.10 90503.63 44903.07
5-brg-baac-vilfund-combo 1608 -9828.215 671.6534 100724.27 110552.48 91099.16 50075.16
6-brg-informal-quasifor-combo 108 -38276.970 -2690.1835 112656.14 150933.11 153511.51 49107.63
7-brg-has-informal-bridge 10008 6715.753 611.3628 82867.63 76151.88 100066.38 40082.09
8-brg-has-informal-in-bridge 8268 4002.023 157.0833 115319.89 111317.86 124212.79 45874.92
ffp_save_res_table(
  f_tb_outcome_m8("latex"), "tb_outcome_m8", spt_res,
  df = tb_outcome_m8, bl_save = ls_save_res[["tb_outcome_m8"]]
)

Tables: Stata-aligned preview slice

Main revenue specification in invest_rev_diff_250524_v2.do: ivars == "agg_BS_1021", blnc_vars == "IS_1111", blnc_agg_stats == "mean".

df_stata_slice <- tstm_invest_stats_bridgechar %>%
  dplyr::filter(
    ivars == "agg_BS_1021",
    blnc_vars == "IS_1111",
    blnc_agg_stats == "mean"
  )

tb_stata_m8 <- df_stata_slice %>%
  dplyr::group_by(investloan_type_m8) %>%
  dplyr::tally(name = "n") %>%
  dplyr::arrange(investloan_type_m8)

tb_stata_summary <- df_stata_slice %>%
  dplyr::summarize(
    n = dplyr::n(),
    win_diff_mean = mean(win_diff, na.rm = TRUE),
    win_diff_sd = sd(win_diff, na.rm = TRUE),
    capital_invest_mean = mean(capital_invest, na.rm = TRUE),
    capital_invest_sd = sd(capital_invest, na.rm = TRUE)
  )

ffv_render_table(
  tb_stata_m8,
  caption = "Stata-aligned slice: counts by investloan_type_m8 (agg_BS_1021, IS_1111, mean)",
  name = "tb_stata_m8"
)
Stata-aligned slice: counts by investloan_type_m8 (agg_BS_1021, IS_1111, mean)
investloan_type_m8 n
1-investment-no-loan 202
2-investment-loan-a 125
3-investment-loan-hook 50
4-brg-single-lender 27
5-brg-baac-vilfund-combo 27
6-brg-informal-quasifor-combo 2
7-brg-has-informal-bridge 164
8-brg-has-informal-in-bridge 129

# Publication-style builder: pooled Stata-slice summaries.
f_tb_stata_summary <- function(st_format) {
  bk <- kbl(
    tb_stata_summary,
    format = st_format,
    linesep = "",
    booktabs = TRUE,
    align = "c",
    caption = "Stata-aligned slice: pooled win_diff and capital_invest summaries",
    col.names = c("N", "Mean", "SD", "Mean", "SD"),
    digits = 4
  ) %>%
    kable_styling(
      bootstrap_options = c("striped", "hover", "condensed", "responsive"),
      full_width = FALSE, position = "left"
    ) %>%
    add_header_above(c(
      " " = 1,
      "win_diff" = 2,
      "capital_invest" = 2
    )) %>%
    column_spec(1, width = "6em")
  return(bk)
}

f_tb_stata_summary(st_kableformat)
Stata-aligned slice: pooled win_diff and capital_invest summaries
win_diff
capital_invest
N Mean SD Mean SD
726 614.4499 26325.72 82124.96 222630.2
ffp_save_res_table(
  f_tb_stata_summary("latex"), "tb_stata_summary", spt_res,
  df = tb_stata_summary, bl_save = ls_save_res[["tb_stata_summary"]]
)

Tables: merge diagnostics (all_invest)

tb_m8_avail <- tstm_invest_stats_bridgechar_all_invest %>%
  dplyr::ungroup() %>%
  dplyr::count(investloan_type_m8_avail_2x2, name = "n") %>%
  dplyr::mutate(share = round(n / sum(n), 4)) %>%
  dplyr::arrange(dplyr::desc(n))

tb_meets_m8 <- tstm_invest_stats_bridgechar_all_invest %>%
  dplyr::ungroup() %>%
  dplyr::count(bl_meets_size, bl_meets_start, bl_has_investloan_type_m8, name = "n") %>%
  dplyr::arrange(bl_meets_size, bl_meets_start, bl_has_investloan_type_m8)

ffv_render_table(
  tb_m8_avail,
  caption = "All-invest rows by investloan_type_m8_avail_2x2 (before analysis filter)",
  name = "tb_m8_avail"
)
All-invest rows by investloan_type_m8_avail_2x2 (before analysis filter)
investloan_type_m8_avail_2x2 n share
has_investloan_type_m8 50784 0.6481
absent_fail_size_only 15312 0.1954
absent_fail_start_only 9372 0.1196
absent_fail_size_and_start 2856 0.0364
absent_pass_cuts_still_unmatched 36 0.0005
ffv_render_table(
  tb_meets_m8,
  caption = "All-invest rows by size/start cuts and bridge-char match",
  name = "tb_meets_m8"
)
All-invest rows by size/start cuts and bridge-char match
bl_meets_size bl_meets_start bl_has_investloan_type_m8 n
FALSE FALSE FALSE 2856
FALSE TRUE FALSE 15312
TRUE FALSE FALSE 9372
TRUE TRUE FALSE 36
TRUE TRUE TRUE 50784

Tables: mth_inv_start by bridge type

Supports MBF before/after subsample regressions (R-script/ffv_invest_return_bridge/README.md).

tb_mth_m8 <- tstm_invest_stats_bridgechar %>%
  dplyr::filter(blnc_agg_stats == "mean", blnc_vars == "IS_1111", ivars == "agg_BS_1021") %>%
  dplyr::group_by(investloan_type_m8) %>%
  dplyr::summarize(
    n = dplyr::n(),
    mth_inv_start_min = min(mth_inv_start, na.rm = TRUE),
    mth_inv_start_mean = round(mean(mth_inv_start, na.rm = TRUE), 1),
    mth_inv_start_max = max(mth_inv_start, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  dplyr::arrange(investloan_type_m8)

# Publication-style builder: month-of-start summary by bridge type.
f_tb_mth_m8 <- function(st_format) {
  bk <- kbl(
    tb_mth_m8,
    format = st_format,
    linesep = "",
    booktabs = TRUE,
    align = "c",
    caption = "Survey month of investment start by investloan_type_m8 (Stata revenue slice)",
    col.names = c("Bridge type (investloan_type_m8)", "N", "Min", "Mean", "Max"),
    digits = 4
  ) %>%
    kable_styling(
      bootstrap_options = c("striped", "hover", "condensed", "responsive"),
      full_width = FALSE, position = "left"
    ) %>%
    add_header_above(c(
      " " = 2,
      "mth_inv_start (survey month)" = 3
    )) %>%
    column_spec(1, width = "16em")
  return(bk)
}

f_tb_mth_m8(st_kableformat)
Survey month of investment start by investloan_type_m8 (Stata revenue slice)
mth_inv_start (survey month)
Bridge type (investloan_type_m8) N Min Mean Max
1-investment-no-loan 202 14 68.9 142
2-investment-loan-a 125 14 71.3 140
3-investment-loan-hook 50 16 73.4 139
4-brg-single-lender 27 19 85.3 138
5-brg-baac-vilfund-combo 27 20 97.9 141
6-brg-informal-quasifor-combo 2 27 51.5 76
7-brg-has-informal-bridge 164 14 63.3 142
8-brg-has-informal-in-bridge 129 15 75.5 143
ffp_save_res_table(
  f_tb_mth_m8("latex"), "tb_mth_m8", spt_res,
  df = tb_mth_m8, bl_save = ls_save_res[["tb_mth_m8"]]
)

Downstream analysis

This vignette builds the analysis dataset only; it does not run regressions.

  • Dependent variable: win_diff from ff_hfid_invest_winstats() — difference between forward- and backward-window balance means (win_forward - win_backward) over it_window_forward / it_window_backward months (default 12 each).
  • Key regressor: capital_invest (investment size at the spell).
  • Subgroups: investloan_type_m8 or collapsed dummies investloan_type_m8_cat2cat4; loops over blnc_vars (revenue vs profit) and ivars (three paper asset definitions in ar_st_vars_to_keep).
  • Time heterogeneity: split samples by mth_gap_mbf, mth_gap_mbf_cat, or mth_gap_mbf_cat_int (MBF start month 37; see step 6).
  • Next step: replicate tables in stata/invest_rev_diff_250524_v2.do in R using tstm_invest_stats_bridgechar loaded from data/ or data-temp/ depending on bl_replace_data_output.

To refresh canonical data/tstm_invest_stats_bridgechar.rda, set bl_replace_data_output <- TRUE in the setup chunk and re-knit.

Session info

sessionInfo()
#> R version 4.6.1 (2026-06-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] kableExtra_1.4.0  glue_1.8.1        tidyr_1.3.2       dplyr_1.2.1      
#> [5] PrjThaiHFID_0.2.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] utf8_1.2.6         generics_0.1.4     xml2_1.6.0         stringi_1.8.7     
#>  [5] hms_1.1.4          digest_0.6.39      magrittr_2.0.5     evaluate_1.0.5    
#>  [9] RColorBrewer_1.1-3 fastmap_1.2.0      rprojroot_2.1.1    jsonlite_2.0.0    
#> [13] purrr_1.2.2        viridisLite_0.4.3  scales_1.4.0       textshaping_1.0.5 
#> [17] cli_3.6.6          rlang_1.2.0        crayon_1.5.3       bit64_4.8.2       
#> [21] withr_3.0.3        yaml_2.3.12        otel_0.2.0         tools_4.6.1       
#> [25] parallel_4.6.1     tzdb_0.5.0         forcats_1.0.1      vctrs_0.7.3       
#> [29] R6_2.6.1           lifecycle_1.0.5    stringr_1.6.0      bit_4.6.0         
#> [33] vroom_1.7.1        pkgconfig_2.0.3    pillar_1.11.1      systemfonts_1.3.2 
#> [37] haven_2.5.5        xfun_0.59          tibble_3.3.1       tidyselect_1.2.1  
#> [41] rstudioapi_0.19.0  knitr_1.51         farver_2.1.2       htmltools_0.5.9   
#> [45] rmarkdown_2.31     svglite_2.2.2      readr_2.2.0        compiler_4.6.1