flowchart TD
subgraph inputs [Packaged data]
loansPanel[tstm_loans_panel]
assetLoan[tstm_asset_loan]
end
subgraph step1 [Section1_Gateway]
gw[ffp_hfid_invest_loan_linked_abc_investloan_char_gateway]
gw --> tstm_invest
gw --> tstm_invdates_uniq
gw --> tstm_invest2loan2bridge_chars
end
subgraph step2 [Section2_Window]
win[ff_hfid_invest_window]
tstm_invdates_uniq --> win
win --> tstm_invdates_uniq_fwbw
end
subgraph step3 [Section3_Winstats]
ws[ff_hfid_invest_winstats]
assetLoan --> ws
tstm_invest --> ws
tstm_invdates_uniq_fwbw --> ws
ws --> tstm_invest_stats
end
subgraph step4 [Section4_Merge_bridge]
merge[left_join_bridge_chars_and_cat_dummies]
tstm_invest_stats --> merge
tstm_invest2loan2bridge_chars --> merge
merge --> allInvest[tstm_invest_stats_bridgechar_all_invest]
end
subgraph step5 [Section5_Merge_geography]
geo[merge_tm_key_id_hh_id_prov_village_bl_id_ne]
allInvest --> geo
end
subgraph step6 [Section6_Merge_MBF_timing]
mbf[merge_N_households_mth_gap_mbf]
geo --> mbf
end
subgraph step7 [Section7_Filter_and_save]
mbf --> filt[filter_m8_and_valid_windows]
filt --> out[tstm_invest_stats_bridgechar]
out --> saveRda[data_or_data_temp]
end
loansPanel --> gw
assetLoan --> gw
This vignette chains the investment–loan–bridge gateway (PrjThaiHFID-#32), forward/backward investment windows (#2), and window balance-sheet statistics (#3), then merges investloan_type_m8 labels and regression category dummies.
Unit of observation (final): household × investment spell × ivars × blnc_vars × blnc_agg_stats, with bridge linkage labels attached.
Downstream use: the saved object tstm_invest_stats_bridgechar supports regressions of income change (win_diff, forward minus backward window means) on investment size (capital_invest), optionally by loan-bridge type and survey-month subsamples. Specifications align with stata/invest_rev_diff_250524_v2.do (mean aggregation; revenue IS_1111 / profit IS_1011 loops).
Production logic is ported from R-dev/ffs_hfid_window_windiff.R.
Outputs at a glance
This vignette saves one dataset and renders a series of diagnostic and paper-aligned tables.
| Object | Data page | Role |
|---|---|---|
tstm_invest_stats_bridgechar |
reference | Household x investment-spell x balance-outcome x ivars window changes with bridge labels |
The output is documented in R/data-res.R. It is written to data/ when bl_replace_data_output <- TRUE, otherwise to gitignored data-temp/.
Tables produced (not saved as data): pipeline funnel; investloan_type_m8 by asset ivars; regression category dummies; balance dimensions grid; outcome summaries for regression; Stata-aligned preview slice; merge diagnostics; and mth_inv_start by bridge type. Each table is gated by its own ls_save_res[["<name>"]] switch (all FALSE by default); flip one on to write its LaTeX/CSV to res/res_invest_return_bridge/ via ffp_save_res_table().
What ships vs. what is generated locally
- Shipped with the installed package: the packaged datasets in
data/(lazy-loaded inputststm_loans_panel,tstm_asset_loan).- Generated locally when this vignette runs (neither pushed to GitHub nor shipped in the package): the
tstm_invest_stats_bridgechar.rdawritten todata/(ordata-temp/) and itsinst/extdata/*.csv/*.dtaexport; and any tables written tores/res_invest_return_bridge/(.tex/.csv, only when the matchingls_save_rescontrol isTRUE). Theinst/extdata/andres/artifacts are git-ignored, saved only for local convenience. The anonymized crosswalkdata-raw/tm_key_id_hh_anony.rdaanddata-raw/census_vil_hh_structure_count.rdaare local build inputs (not shipped).
Required packaged inputs
The gateway loads only two objects from the package data/ folder (must exist locally when knitting):
| Object | Role |
|---|---|
tstm_loans_panel |
Monthly loan panel (Group A) |
tstm_asset_loan |
Household-month asset and loan aggregates (Group B) |
data-raw/tm_key_id_hh_anony.rda |
Anonymized household → province/village crosswalk (step 5) |
data-raw/census_vil_hh_structure_count.rda |
Village N_households (step 6) |
ff_hfid_invest_winstats() also reads tstm_asset_loan directly (Stata agg_* subset). All other pipeline objects are computed in this run.
Contrast with ffv_invest_loan_bridge.qmd
ffv_invest_loan_bridge.qmd |
This vignette | |
|---|---|---|
Saved objects (bl_replace_data_output <- TRUE) |
Eight gateway intermediates: tstm_loans_pn_nd, tstm_loans_hooks, tstm_loans_bridges_type, tstm_invdates_uniq, tstm_invest, tstm_roster_invest_loan_linked, tstm_roster_invest2loan2bridge, tstm_invest2loan2bridge_chars
|
One object: tstm_invest_stats_bridgechar
|
Uses prior vignette .rda? |
N/A (produces gateway outputs) |
No — re-calls ffp_hfid_invest_loan_linked_abc_investloan_char_gateway() so every gateway parameter used to build tstm_invest_stats_bridgechar appears in this file |
| Primary outputs | Bridge-type counts / paper Table-style tabulations | Window balance changes + bridge labels for return regressions |
Design intent: self-contained, parameter-transparent knit rather than load(tstm_invest2loan2bridge_chars.rda) from the loan-bridge vignette. By contrast, ffv_bridge_timing.qmd does consume saved gateway outputs from data/.
Pipeline structure
The seven-step chain from packaged inputs to the saved tstm_invest_stats_bridgechar; an ASCII version of the same flow is in R-dev/ffs_hfid_window_windiff.R.
Functions relied on
| Step | Function | Source |
|---|---|---|
| 1 | ffp_hfid_invest_loan_linked_abc_investloan_char_gateway() |
R/ffp_investloan_type_wrapper.R |
| 2 | ff_hfid_invest_window() |
R/ffs_window.R |
| 3 | ff_hfid_invest_winstats() |
R/ffs_winstats.R |
| 4–5 | dplyr merge, filter, mutate | this vignette |
Control parameters
Gateway parameters match ffv_invest_loan_bridge.qmd. Window length 12 forward and 12 backward matches Stata fw12bw12 naming in invest_rev_diff_*.do.
library(PrjThaiHFID)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
library(glue)
library(kableExtra)
#>
#> Attaching package: 'kableExtra'
#> The following object is masked from 'package:dplyr':
#>
#> group_rows
bs_style <- c("striped", "hover", "condensed", "responsive")
options(kable_styling_bootstrap_options = bs_style)
# Script control
verbose <- TRUE
verbose_detail <- FALSE
it_verbose_detail_nrow <- 100L
it_file_code <- 646572L
# TRUE: overwrite canonical data/*.rda; FALSE: write to data-temp/ (gitignored)
bl_replace_data_output <- FALSE
bl_write_latex <- FALSE
st_kableformat <- "html"
# Gateway — loan panel variable names (Group A)
svr_lender_var <- "forinfm4"
svr_principal <- "bf5klm_bm6h_joint"
svr_principal_last <- "bm6h"
svr_principal_interest_sum <- "bm6b"
# Gateway — bridge filters (Group A.3–A.4)
bl_filter_bridge_grvgr0 <- TRUE
it_ll_grv_min <- -1L
bl_filter_loan_duration_a <- FALSE
bl_filter_loan_duration_b <- FALSE
bl_filter_lender_type <- FALSE
bl_filter_bridge_informal <- FALSE
bl_filter_loan_size <- FALSE
bl_filter_loan_duration_more <- FALSE
# Gateway — investment typing and sample (Groups B–E)
fl_sd_ithres <- stats::qnorm(0.99)
it_thres_invest_mth_gap <- 2L
it_gap_LBL_IL_min <- -6L
# Investment ivars: paper set plus agg_BS_3022 (hh assets only) for stata/invest_rev_diff_250519.do tables 2 and 4
ar_st_vars_to_keep <- c(
"agg_BS_1021", "agg_BS_1012", "agg_BS_1011", "agg_BS_3022"
)
ar_ivars_paper <- ar_st_vars_to_keep
fl_min_invest_size_cut <- 10000
it_mth_inv_start_min_cut <- 14L
it_mth_inv_start_max_cut <- 144L
bl_drop_afrombc <- TRUE
bl_drop_cfromb <- TRUE
bl_compare2baserda <- FALSE
# Window — forward/backward month spans (issue #2)
it_window_forward <- 12L
it_window_backward <- 12L
bl_forward_include_imth <- FALSE
bl_backward_include_imth <- FALSE
# Winstats — balance-sheet aggregation (issue #3)
bl_compute_sum <- TRUE
bl_compute_mean <- TRUE
ar_st_asset_loan_id_cols <- c("id", "month")
ar_st_blnc_vars_stata <- c(
"IS_1111", "IS_1113", "IS_1115", "IS_1116",
"IS_1011", "IS_1013"
)
ar_st_agg_cols_asset_loan <- paste0("agg_", ar_st_blnc_vars_stata)
ar_st_cols_asset_loan <- c(ar_st_asset_loan_id_cols, ar_st_agg_cols_asset_loan)
# Bridge merge — join keys and category dummies
ar_st_bridge_join_keys <- c("id", "ivars", "hh_inv_asset_ctr")
st_bridge_id_col <- "hhid_Num"
st_bridge_type_col <- "investloan_type_m8"
st_investloan_no_loan <- "1-investment-no-loan"
ar_st_investloan_loan_ab <- c("2-investment-loan-a", "3-investment-loan-hook")
ar_st_investloan_prefix_4cat <- "^(4-|5-|6-|8-)"
# here::here() anchors on the temporary _quarto.yml pkgdown writes under
# vignettes/ during a quarto render; find the package root explicitly so paths
# resolve under both Quarto and knitr.
spn_pkg_root <- rprojroot::find_root(rprojroot::has_file("DESCRIPTION"))
st_path_tm_key_hh <- file.path(spn_pkg_root, "data-raw", "tm_key_id_hh_anony.rda")
st_path_census_vil <- file.path(spn_pkg_root, "data-raw", "census_vil_hh_structure_count.rda")
# MBF funding start month (same as R-script/ffs_bridge_count/ffs_bridge_count_mbf.R)
it_mbf_start_month <- 37L
st_data_out <- file.path(
spn_pkg_root,
if (bl_replace_data_output) "data" else "data-temp",
fsep = .Platform$file.sep
)
st_extdata_dir <- file.path(spn_pkg_root, "inst", "extdata")
for (st_dir in c(st_data_out, st_extdata_dir)) {
if (!dir.exists(st_dir)) {
dir.create(st_dir, recursive = TRUE)
}
}
# Per-vignette results folder and per-table save switches (all FALSE by default).
spt_res <- file.path(spn_pkg_root, "res", "res_invest_return_bridge")
ls_save_res <- list(
tb_win_bad_share = FALSE,
df_fw_bad_mth = FALSE,
df_bw_bad_mth = FALSE,
tb_funnel = FALSE,
tb_m8_counts = FALSE,
tb_m8_shares = FALSE,
tb_cat2 = FALSE,
tb_cat3 = FALSE,
tb_cat4 = FALSE,
tb_blnc_grid = FALSE,
tb_outcome_m8 = FALSE,
tb_stata_m8 = FALSE,
tb_stata_summary = FALSE,
tb_m8_avail = FALSE,
tb_meets_m8 = FALSE,
tb_mth_m8 = FALSE
)
ffv_save_gateway_tstm <- function(obj, name, out_dir, extdata_dir = st_extdata_dir) {
if (!dir.exists(out_dir)) {
dir.create(out_dir, recursive = TRUE)
}
if (!dir.exists(extdata_dir)) {
dir.create(extdata_dir, recursive = TRUE)
}
st_rda <- file.path(out_dir, paste0(name, ".rda"))
st_csv <- file.path(extdata_dir, paste0(name, ".csv"))
st_dta <- file.path(extdata_dir, paste0(name, ".dta"))
tmp_env <- new.env(parent = emptyenv())
tmp_env[[name]] <- obj
save(list = name, file = st_rda, envir = tmp_env)
readr::write_csv(obj, st_csv)
if (!requireNamespace("haven", quietly = TRUE)) {
stop(
"Package 'haven' is required to write .dta to inst/extdata. ",
"Install with: install.packages('haven')",
call. = FALSE
)
}
haven::write_dta(obj, st_dta)
invisible(list(rda = st_rda, csv = st_csv, dta = st_dta))
}
ffv_inv_type_by_ivars <- function(df, type_var) {
df %>%
dplyr::group_by(ivars, .data[[type_var]]) %>%
dplyr::tally(name = "n") %>%
tidyr::pivot_wider(names_from = ivars, values_from = n, values_fill = 0)
}
ffv_inv_type_share_by_ivars <- function(df, type_var) {
df %>%
dplyr::group_by(ivars, .data[[type_var]]) %>%
dplyr::tally(name = "n") %>%
dplyr::group_by(ivars) %>%
dplyr::mutate(share = n / sum(n)) %>%
dplyr::select(-n) %>%
tidyr::pivot_wider(
id_cols = dplyr::all_of(type_var),
names_from = ivars,
values_from = share
)
}
ffv_render_table <- function(
df,
caption,
name = NULL,
st_format = st_kableformat) {
# Optional latex (+ csv) save, gated by the per-table ls_save_res switch.
if (!is.null(name) && isTRUE(ls_save_res[[name]])) {
kbl_latex <- kbl(
df,
format = "latex",
booktabs = TRUE,
caption = caption,
digits = 4
)
ffp_save_res_table(kbl_latex, name, spt_res, df = df, bl_save = TRUE)
}
kbl(
df,
format = st_format,
booktabs = TRUE,
caption = caption,
digits = 4
) %>%
kable_styling(
bootstrap_options = bs_style,
full_width = FALSE,
position = "left"
)
}
bl_win_val_bad <- function(x) {
is.na(x) | !is.finite(x)
}
if (verbose) {
print(glue::glue("f-{it_file_code}, controls"))
print(glue::glue(" bl_replace_data_output: {bl_replace_data_output} -> {st_data_out}/"))
print(glue::glue(" windows: fw{it_window_forward} bw{it_window_backward}"))
print(glue::glue(" fl_min_invest_size_cut: {fl_min_invest_size_cut}"))
print(glue::glue(" mth_inv_start in [{it_mth_inv_start_min_cut}, {it_mth_inv_start_max_cut}]"))
print(glue::glue(" ar_st_vars_to_keep: {paste(ar_st_vars_to_keep, collapse = ', ')}"))
print(glue::glue(" ar_st_blnc_vars_stata: {paste(ar_st_blnc_vars_stata, collapse = ', ')}"))
}
#> f-646572, controls
#> bl_replace_data_output: FALSE -> /home/runner/work/PrjThaiHFID/PrjThaiHFID/data-temp/
#> windows: fw12 bw12
#> fl_min_invest_size_cut: 10000
#> mth_inv_start in [14, 144]
#> ar_st_vars_to_keep: agg_BS_1021, agg_BS_1012, agg_BS_1011, agg_BS_3022
#> ar_st_blnc_vars_stata: IS_1111, IS_1113, IS_1115, IS_1116, IS_1011, IS_1013Run gateway (step 1)
Re-calls the gateway inline (same parameters as ffv_invest_loan_bridge.qmd). Does not load saved gateway .rda files from data/.
if (verbose) {
print(glue::glue("f-{it_file_code}, gateway: calling ffp_hfid_invest_loan_linked_abc_investloan_char_gateway"))
}
#> f-646572, gateway: calling ffp_hfid_invest_loan_linked_abc_investloan_char_gateway
ls_gateway_result <- PrjThaiHFID::ffp_hfid_invest_loan_linked_abc_investloan_char_gateway(
svr_lender_var = svr_lender_var,
svr_principal = svr_principal,
svr_principal_last = svr_principal_last,
svr_principal_interest_sum = svr_principal_interest_sum,
bl_filter_bridge_grvgr0 = bl_filter_bridge_grvgr0,
it_ll_grv_min = it_ll_grv_min,
bl_filter_loan_duration_a = bl_filter_loan_duration_a,
bl_filter_loan_duration_b = bl_filter_loan_duration_b,
bl_filter_lender_type = bl_filter_lender_type,
bl_filter_bridge_informal = bl_filter_bridge_informal,
bl_filter_loan_size = bl_filter_loan_size,
bl_filter_loan_duration_more = bl_filter_loan_duration_more,
fl_sd_ithres = fl_sd_ithres,
it_thres_invest_mth_gap = it_thres_invest_mth_gap,
it_gap_LBL_IL_min = it_gap_LBL_IL_min,
ar_st_vars_to_keep = ar_st_vars_to_keep,
fl_min_invest_size = fl_min_invest_size_cut,
it_mth_inv_start_min = it_mth_inv_start_min_cut,
it_mth_inv_start_max = it_mth_inv_start_max_cut,
bl_drop_afrombc = bl_drop_afrombc,
bl_drop_cfromb = bl_drop_cfromb,
bl_compare2baserda = bl_compare2baserda,
verbose = verbose,
verbose_detail = verbose_detail,
it_verbose_detail_nrow = it_verbose_detail_nrow
)
#>
#> ── char_gateway ────────────────────────────────────────────────────────────────
#> ℹ A.1 Loan non-duplicate
#> ✔ A.1 Loan non-duplicate [17ms]
#>
#> ✔ tstm_loans_pn_nd: 19587 rows
#> ℹ A.2 Hook pairs
#> ✔ A.2 Hook pairs [6ms]
#>
#> ✔ tstm_loans_hooks: 47754 rows
#> ℹ A.3 Bridges from hooks
#> ✔ A.3 Bridges from hooks [6ms]
#>
#> ✔ tstm_loans_bridges: 94907 rows
#> ℹ A.4 Bridge type
#> ✔ A.4 Bridge type [6ms]
#>
#> ✔ tstm_loans_bridges_type: 94907 rows
#> ℹ B Investment gateway
#> ✔ B Investment gateway [6ms]
#>
#> ℹ Group B: processing 4 asset ivar(s): agg_BS_1021, agg_BS_1012, agg_BS_1011, agg_BS_3022
#> ✔ tstm_invest: 6530 rows
#> ℹ C.1 Bridge roster
#> ✔ C.1 Bridge roster [6ms]
#>
#> Adding missing grouping variables: `br_type`, `br_type_id`
#> ✔ tstm_roster_invest_loan_bridge: 83252 rows
#> ℹ C.2 Linker
#> ✔ C.2 Linker [6ms]
#>
#> ✔ tstm_roster_invest_loan_linker: 4975 rows
#> ℹ C.3 Invest-loan linked
#> ✔ C.3 Invest-loan linked [6ms]
#>
#> ✔ tstm_roster_invest_loan_linked: 166942 rows
#> ℹ D.1 abc_distinct filter
#> ✔ D.1 abc_distinct filter [6ms]
#>
#> ✔ tstm_roster_invest2loan2bridge_clean: 13859 rows
#> ℹ D.1b NA synchronization
#> ✔ D.1b NA synchronization [6ms]
#>
#> ℹ D.2 Bridge char
#> ✔ D.2 Bridge char [81ms]
#>
#> ✔ tstm_roster_invest2loan2bridge: 13859 rows
#> ℹ D.3 Investment char
#> ✔ D.3 Investment char [6ms]
#>
#> ✔ tstm_invest2loan2bridge_chars: 4232 rows
#> ℹ E Invest typing summary
#> ✔ E Invest typing summary [6ms]
#>
#> ── Group E: invest typing summary
#> ──────────────────────────────────────────────
#> ℹ Thresholds: capital_invest >= 10000; mth_inv_start in [14, 144]
#>
#> ── Invest rows not in bridge chars
#> Count not in chars: 2298
#> By exclusion reason (mutually exclusive):
#> # A tibble: 4 × 3
#> st_unmatched_reason n share
#> <chr> <int> <dbl>
#> 1 fail_size_and_start 238 0.104
#> 2 fail_size_only 1276 0.555
#> 3 fail_start_only 781 0.340
#> 4 pass_cuts_still_unmatched 3 0.00131
#> 2x2 cross-tab (bl_meets_size x bl_meets_start):
#> # A tibble: 4 × 3
#> bl_meets_size bl_meets_start n
#> <lgl> <lgl> <int>
#> 1 FALSE FALSE 238
#> 2 FALSE TRUE 1276
#> 3 TRUE FALSE 781
#> 4 TRUE TRUE 3
#> ! 3 row(s) pass cuts on tstm_invest but are not in bridge chars.Inspect df_invest_pipeline_consistency in the return list.
#>
#> ── Pipeline consistency (per ivars)
#> # A tibble: 4 × 11
#> ivars n_invest n_pass_typing n_in_chars n_not_in_linked n_pass_not_in_linked
#> <chr> <int> <int> <int> <int> <int>
#> 1 agg_BS… 1390 736 735 0 0
#> 2 agg_BS… 1414 821 821 0 0
#> 3 agg_BS… 1890 1433 1433 0 0
#> 4 agg_BS… 1836 1245 1243 0 0
#> # ℹ 5 more variables: n_pass_not_in_chars <int>,
#> # n_capital_mismatch_invest_roster <int>,
#> # n_start_mismatch_invest_roster <int>, n_ctr_mismatch_invest_roster <int>,
#> # n_capital_mismatch_chars_roster <int>
#> ✖ GATEWAY INVEST PIPELINE CONSISTENCY: FAIL (see df_invest_pipeline_consistency)
#> Warning: Investment pipeline consistency check failed: some investments passing
#> typing thresholds are missing from tstm_invest2loan2bridge_chars, or
#> tstm_invest keys/fields disagree with the invest2loan roster in this run.
#> Inspect df_invest_pipeline_consistency in the return list. Do not mix return
#> objects with packaged data/ .rda files.
tstm_invest <- ls_gateway_result$tstm_invest
tstm_invdates_uniq <- ls_gateway_result$tstm_invdates_uniq
tstm_invest2loan2bridge_chars <- ls_gateway_result$tstm_invest2loan2bridge_chars
if (verbose) {
print(glue::glue("f-{it_file_code}, gateway: tstm_invest {nrow(tstm_invest)} rows"))
print(glue::glue("f-{it_file_code}, gateway: tstm_invdates_uniq {nrow(tstm_invdates_uniq)} rows"))
print(glue::glue(
"f-{it_file_code}, gateway: tstm_invest2loan2bridge_chars {nrow(tstm_invest2loan2bridge_chars)} rows"
))
if (!isTRUE(ls_gateway_result$bl_invest_pipeline_consistency_ok)) {
print(glue::glue("f-{it_file_code}, gateway: warning — invest pipeline consistency check failed"))
}
}
#> f-646572, gateway: tstm_invest 6530 rows
#> f-646572, gateway: tstm_invdates_uniq 3725 rows
#> f-646572, gateway: tstm_invest2loan2bridge_chars 4232 rows
#> f-646572, gateway: warning — invest pipeline consistency check failedInvestment windows (step 2)
if (verbose) {
print(glue::glue(
"f-{it_file_code}, window: ff_hfid_invest_window fw{it_window_forward} bw{it_window_backward}"
))
}
#> f-646572, window: ff_hfid_invest_window fw12 bw12
tstm_invdates_uniq_fwbw <- PrjThaiHFID::ff_hfid_invest_window(
df_invdates_uniq = tstm_invdates_uniq,
it_window_forward = it_window_forward,
it_window_backward = it_window_backward,
bl_forward_include_imth = bl_forward_include_imth,
bl_backward_include_imth = bl_backward_include_imth
)
if (verbose) {
print(glue::glue("f-{it_file_code}, window: tstm_invdates_uniq_fwbw {nrow(tstm_invdates_uniq_fwbw)} rows"))
}
#> f-646572, window: tstm_invdates_uniq_fwbw 89400 rowsWindow balance-sheet statistics (step 3)
Subset tstm_asset_loan to Stata-used agg_* columns only (not the full wide asset file).
ar_st_agg_missing <- setdiff(ar_st_agg_cols_asset_loan, names(PrjThaiHFID::tstm_asset_loan))
if (length(ar_st_agg_missing) > 0L) {
stop(
"tstm_asset_loan missing Stata agg columns: ",
paste(ar_st_agg_missing, collapse = ", "),
call. = FALSE
)
}
df_asset_loan_stata <- PrjThaiHFID::tstm_asset_loan %>%
select(all_of(ar_st_cols_asset_loan))
if (verbose) {
print(glue::glue("f-{it_file_code}, winstats: ff_hfid_invest_winstats"))
print(glue::glue(
"f-{it_file_code}, winstats: df_asset_loan_stata {ncol(df_asset_loan_stata)} cols"
))
}
#> f-646572, winstats: ff_hfid_invest_winstats
#> f-646572, winstats: df_asset_loan_stata 8 cols
tstm_invest_stats <- PrjThaiHFID::ff_hfid_invest_winstats(
df_asset_loan = df_asset_loan_stata,
df_invest = tstm_invest,
df_invwin_uniq = tstm_invdates_uniq_fwbw,
bl_compute_sum = bl_compute_sum,
bl_compute_mean = bl_compute_mean
)
if (verbose) {
print(glue::glue("f-{it_file_code}, winstats: tstm_invest_stats {nrow(tstm_invest_stats)} rows"))
}
#> f-646572, winstats: tstm_invest_stats 78360 rowsWindow validity diagnostics (step 3b)
win_forward / win_backward can be NA or non-finite when the window lacks enough observed balance-sheet months (calendar edge or household not in every survey month).
tb_win_bad_share <- tstm_invest_stats %>%
dplyr::ungroup() %>%
dplyr::summarize(
n = dplyr::n(),
n_fw_bad = sum(bl_win_val_bad(win_forward)),
n_bw_bad = sum(bl_win_val_bad(win_backward)),
share_fw_bad = round(n_fw_bad / n, 4),
share_bw_bad = round(n_bw_bad / n, 4)
)
df_fw_bad_mth <- tstm_invest_stats %>%
dplyr::ungroup() %>%
dplyr::filter(bl_win_val_bad(win_forward)) %>%
dplyr::summarize(
n = dplyr::n(),
mth_inv_start_min = min(mth_inv_start, na.rm = TRUE),
mth_inv_start_mean = mean(mth_inv_start, na.rm = TRUE),
mth_inv_start_max = max(mth_inv_start, na.rm = TRUE)
)
df_bw_bad_mth <- tstm_invest_stats %>%
dplyr::ungroup() %>%
dplyr::filter(bl_win_val_bad(win_backward)) %>%
dplyr::summarize(
n = dplyr::n(),
mth_inv_start_min = min(mth_inv_start, na.rm = TRUE),
mth_inv_start_mean = mean(mth_inv_start, na.rm = TRUE),
mth_inv_start_max = max(mth_inv_start, na.rm = TRUE)
)
ffv_render_table(
tb_win_bad_share,
caption = "Share of invest-stats rows with NA or non-finite window stats (before bridge merge)",
name = "tb_win_bad_share"
)| n | n_fw_bad | n_bw_bad | share_fw_bad | share_bw_bad |
|---|---|---|---|---|
| 78360 | 4944 | 6144 | 0.0631 | 0.0784 |
ffv_render_table(
df_fw_bad_mth,
caption = "mth_inv_start range when win_forward is NA or invalid",
name = "df_fw_bad_mth"
)| n | mth_inv_start_min | mth_inv_start_mean | mth_inv_start_max |
|---|---|---|---|
| 4944 | 1 | 146.551 | 160 |
ffv_render_table(
df_bw_bad_mth,
caption = "mth_inv_start range when win_backward is NA or invalid",
name = "df_bw_bad_mth"
)| n | mth_inv_start_min | mth_inv_start_mean | mth_inv_start_max |
|---|---|---|---|
| 6144 | 1 | 10.6934 | 104 |
Merge bridge characteristics (step 4)
tstm_invest2loan2bridge_chars_sel <- tstm_invest2loan2bridge_chars %>%
dplyr::select(all_of(c(st_bridge_id_col, "ivars", "hh_inv_asset_ctr", st_bridge_type_col))) %>%
dplyr::rename(id = dplyr::all_of(st_bridge_id_col))
tstm_invest_stats_bridgechar_all_invest <- tstm_invest_stats %>%
dplyr::left_join(
tstm_invest2loan2bridge_chars_sel,
by = ar_st_bridge_join_keys
) %>%
dplyr::mutate(
bl_meets_size = capital_invest >= fl_min_invest_size_cut,
bl_meets_start = mth_inv_start >= it_mth_inv_start_min_cut &
mth_inv_start <= it_mth_inv_start_max_cut,
bl_has_investloan_type_m8 = !is.na(.data[[st_bridge_type_col]]),
investloan_type_m8_avail_2x2 = dplyr::case_when(
bl_has_investloan_type_m8 ~ "has_investloan_type_m8",
!bl_meets_size & !bl_meets_start ~ "absent_fail_size_and_start",
!bl_meets_size & bl_meets_start ~ "absent_fail_size_only",
bl_meets_size & !bl_meets_start ~ "absent_fail_start_only",
bl_meets_size & bl_meets_start ~ "absent_pass_cuts_still_unmatched",
TRUE ~ "absent_other"
),
investloan_type_m8_cat2 = ifelse(.data[[st_bridge_type_col]] == st_investloan_no_loan, 0L, 1L),
investloan_type_m8_cat2v2 = dplyr::case_when(
.data[[st_bridge_type_col]] == st_investloan_no_loan ~ 0L,
.data[[st_bridge_type_col]] %in% ar_st_investloan_loan_ab ~ 0L,
TRUE ~ 1L
),
investloan_type_m8_cat3 = dplyr::case_when(
.data[[st_bridge_type_col]] == st_investloan_no_loan ~ 0L,
.data[[st_bridge_type_col]] %in% ar_st_investloan_loan_ab ~ 1L,
TRUE ~ 2L
),
investloan_type_m8_cat4 = dplyr::case_when(
.data[[st_bridge_type_col]] == st_investloan_no_loan ~ 0L,
.data[[st_bridge_type_col]] %in% ar_st_investloan_loan_ab ~ 1L,
grepl(ar_st_investloan_prefix_4cat, .data[[st_bridge_type_col]]) ~ 2L,
TRUE ~ 3L
)
)
if (verbose) {
print(glue::glue(
"f-{it_file_code}, merge: tstm_invest_stats_bridgechar_all_invest ",
"{nrow(tstm_invest_stats_bridgechar_all_invest)} rows x ",
"{ncol(tstm_invest_stats_bridgechar_all_invest)} cols"
))
}
#> f-646572, merge: tstm_invest_stats_bridgechar_all_invest 78360 rows x 26 colsMerge household geography (step 5)
Attach province, village, and northeast indicators from data-raw/tm_key_id_hh_anony.rda (anonymized-only crosswalk) via anonymized household ID (id = tmid_hh). id_prov (tmid_prov), id_village (tmid_vil), bl_id_ne (1 if region_name == "Northeast", else 0).
if (!file.exists(st_path_tm_key_hh)) {
stop("Missing required input: ", st_path_tm_key_hh, call. = FALSE)
}
load(st_path_tm_key_hh)
df_hh_geo <- tm_key_id_hh_anony %>%
dplyr::distinct(tmid_hh, tmid_prov, tmid_vil, region_name) %>%
dplyr::mutate(
id_prov = as.integer(tmid_prov),
id_village = as.integer(tmid_vil),
bl_id_ne = as.integer(region_name == "Northeast")
) %>%
dplyr::select(tmid_hh, id_prov, id_village, bl_id_ne)
n_before_geo <- nrow(tstm_invest_stats_bridgechar_all_invest)
tstm_invest_stats_bridgechar_all_invest <- tstm_invest_stats_bridgechar_all_invest %>%
dplyr::left_join(df_hh_geo, by = c("id" = "tmid_hh"))
n_miss_geo <- sum(
is.na(tstm_invest_stats_bridgechar_all_invest$id_prov) |
is.na(tstm_invest_stats_bridgechar_all_invest$id_village)
)
if (verbose) {
print(glue::glue(
"f-{it_file_code}, merge-hh-geo: {n_before_geo} rows; ",
"NA id_prov/id_village={n_miss_geo}; ",
"bl_id_ne share={round(100 * mean(tstm_invest_stats_bridgechar_all_invest$bl_id_ne, na.rm = TRUE), 2)}%"
))
}
#> f-646572, merge-hh-geo: 78360 rows; NA id_prov/id_village=0; bl_id_ne share=49.48%
if (n_miss_geo > 0L) {
warning(
"Some rows lack geography after tm_key_id_hh join (n = ", n_miss_geo, ")",
call. = FALSE
)
}Merge MBF village counts and timing (step 6)
Follows R-script/ffs_bridge_count/ffs_bridge_count_mbf.R §6: village household counts from census_vil_hh_structure_count.rda, joined on id_village (tmid_vil), plus MBF-relative month bins from mth_inv_start.
if (!file.exists(st_path_census_vil)) {
stop("Missing required input: ", st_path_census_vil, call. = FALSE)
}
load(st_path_census_vil)
df_vil_hh <- census_vil_hh_structure_count %>%
dplyr::select(tmid_vil, N_households)
tstm_invest_stats_bridgechar_all_invest <- tstm_invest_stats_bridgechar_all_invest %>%
dplyr::left_join(df_vil_hh, by = c("id_village" = "tmid_vil")) %>%
dplyr::mutate(
mth_gap_mbf = mth_inv_start - it_mbf_start_month,
mth_gap_mbf_cat = dplyr::case_when(
mth_gap_mbf >= -24 & mth_gap_mbf <= -13 ~ "-2yr",
mth_gap_mbf >= -12 & mth_gap_mbf <= -1 ~ "-1yr",
mth_gap_mbf >= 0 & mth_gap_mbf <= 11 ~ "yr1",
mth_gap_mbf >= 12 & mth_gap_mbf <= 23 ~ "yr2",
mth_gap_mbf >= 24 & mth_gap_mbf <= 35 ~ "yr3",
mth_gap_mbf >= 36 & mth_gap_mbf <= 47 ~ "yr4",
mth_gap_mbf >= 48 & mth_gap_mbf <= 59 ~ "yr5",
mth_gap_mbf >= 60 & mth_gap_mbf <= 71 ~ "yr6",
mth_gap_mbf >= 72 & mth_gap_mbf <= 83 ~ "yr7",
mth_gap_mbf >= 84 & mth_gap_mbf <= 95 ~ "yr8",
mth_gap_mbf >= 96 & mth_gap_mbf <= 107 ~ "yr9",
TRUE ~ NA_character_
),
mth_gap_mbf_cat_int = dplyr::case_when(
mth_gap_mbf_cat == "-2yr" ~ -2L,
mth_gap_mbf_cat == "-1yr" ~ -1L,
mth_gap_mbf_cat == "yr1" ~ 1L,
mth_gap_mbf_cat == "yr2" ~ 2L,
mth_gap_mbf_cat == "yr3" ~ 3L,
mth_gap_mbf_cat == "yr4" ~ 4L,
mth_gap_mbf_cat == "yr5" ~ 5L,
mth_gap_mbf_cat == "yr6" ~ 6L,
mth_gap_mbf_cat == "yr7" ~ 7L,
mth_gap_mbf_cat == "yr8" ~ 8L,
mth_gap_mbf_cat == "yr9" ~ 9L,
TRUE ~ NA_integer_
)
)
n_miss_n_hh <- sum(is.na(tstm_invest_stats_bridgechar_all_invest$N_households))
n_miss_mbf_cat <- sum(is.na(tstm_invest_stats_bridgechar_all_invest$mth_gap_mbf_cat))
if (verbose) {
print(glue::glue(
"f-{it_file_code}, merge-mbf-vil: rows={nrow(tstm_invest_stats_bridgechar_all_invest)}; ",
"NA N_households={n_miss_n_hh}; NA mth_gap_mbf_cat={n_miss_mbf_cat}"
))
print(glue::glue(
" mth_gap_mbf range: [{min(tstm_invest_stats_bridgechar_all_invest$mth_gap_mbf, na.rm = TRUE)}, ",
"{max(tstm_invest_stats_bridgechar_all_invest$mth_gap_mbf, na.rm = TRUE)}]"
))
print(table(tstm_invest_stats_bridgechar_all_invest$mth_gap_mbf_cat_int, useNA = "ifany"))
}
#> f-646572, merge-mbf-vil: rows=78360; NA N_households=0; NA mth_gap_mbf_cat=11760
#> mth_gap_mbf range: [-36, 123]
#>
#> -2 -1 1 2 3 4 5 6 7 8 9 <NA>
#> 6144 6252 6732 5592 7476 6420 7296 5400 5724 4992 4572 11760
if (n_miss_n_hh > 0L) {
warning(
"Some rows lack N_households after census join (n = ", n_miss_n_hh, ")",
call. = FALSE
)
}Analysis subset (step 7)
Primary return: tstm_invest_stats_bridgechar — investloan_type_m8 present and finite win_forward / win_backward.
tstm_invest_stats_bridgechar <- tstm_invest_stats_bridgechar_all_invest %>%
dplyr::ungroup() %>%
dplyr::filter(
!is.na(.data[[st_bridge_type_col]]),
!bl_win_val_bad(win_forward),
!bl_win_val_bad(win_backward)
) %>%
dplyr::select(-investloan_type_m8_avail_2x2)
if (verbose) {
print(glue::glue(
"f-{it_file_code}, filter: tstm_invest_stats_bridgechar ",
"{nrow(tstm_invest_stats_bridgechar)} rows ",
"({round(100 * nrow(tstm_invest_stats_bridgechar) / nrow(tstm_invest_stats_bridgechar_all_invest), 1)}% of all_invest)"
))
}
#> f-646572, filter: tstm_invest_stats_bridgechar 50184 rows (64% of all_invest)Save tstm_invest_stats_bridgechar
ls_saved <- ffv_save_gateway_tstm(
obj = tstm_invest_stats_bridgechar,
name = "tstm_invest_stats_bridgechar",
out_dir = st_data_out
)
if (verbose) {
if (bl_replace_data_output) {
print(glue::glue("f-{it_file_code}, save: replaced canonical data/tstm_invest_stats_bridgechar.rda"))
} else {
print(glue::glue(
"f-{it_file_code}, save: wrote data-temp/tstm_invest_stats_bridgechar.rda (data/ unchanged)"
))
}
print(glue::glue("f-{it_file_code}, save: {ls_saved$csv}"))
print(glue::glue("f-{it_file_code}, save: {ls_saved$dta}"))
}
#> f-646572, save: wrote data-temp/tstm_invest_stats_bridgechar.rda (data/ unchanged)
#> f-646572, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/inst/extdata/tstm_invest_stats_bridgechar.csv
#> f-646572, save: /home/runner/work/PrjThaiHFID/PrjThaiHFID/inst/extdata/tstm_invest_stats_bridgechar.dtaTables: pipeline funnel
tb_funnel <- data.frame(
stage = c(
"tstm_invest_stats",
"tstm_invest_stats_bridgechar_all_invest",
"tstm_invest_stats_bridgechar"
),
n_rows = c(
nrow(tstm_invest_stats),
nrow(tstm_invest_stats_bridgechar_all_invest),
nrow(tstm_invest_stats_bridgechar)
),
n_cols = c(
ncol(tstm_invest_stats),
ncol(tstm_invest_stats_bridgechar_all_invest),
ncol(tstm_invest_stats_bridgechar)
)
) %>%
dplyr::mutate(
pct_of_stats = round(100 * n_rows / n_rows[stage == "tstm_invest_stats"], 1),
pct_of_all_invest = round(
100 * n_rows / n_rows[stage == "tstm_invest_stats_bridgechar_all_invest"],
1
)
)
ffv_render_table(
tb_funnel,
caption = "Row counts through the window + bridge pipeline",
name = "tb_funnel"
)| stage | n_rows | n_cols | pct_of_stats | pct_of_all_invest |
|---|---|---|---|---|
| tstm_invest_stats | 78360 | 17 | 100 | 100 |
| tstm_invest_stats_bridgechar_all_invest | 78360 | 33 | 100 | 100 |
| tstm_invest_stats_bridgechar | 50184 | 32 | 64 | 64 |
Tables: investloan_type_m8 by asset (ivars)
tb_m8_counts <- ffv_inv_type_by_ivars(
tstm_invest_stats_bridgechar, "investloan_type_m8"
)
tb_m8_shares <- ffv_inv_type_share_by_ivars(
tstm_invest_stats_bridgechar, "investloan_type_m8"
)
# Publication-style builder: bridge type x asset-definition (ivars) counts.
f_tb_m8_counts <- function(st_format) {
ar_st_ivars <- names(tb_m8_counts)[-1]
bk <- kbl(
tb_m8_counts,
format = st_format,
linesep = "",
booktabs = TRUE,
align = "c",
caption = "Analysis-sample counts by investloan_type_m8 and ivars",
col.names = c("Bridge type (investloan_type_m8)", ar_st_ivars)
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE, position = "left"
) %>%
add_header_above(c(" " = 1, "Asset definition (ivars) — counts" = length(ar_st_ivars))) %>%
column_spec(1, width = "16em") %>%
column_spec(2:(length(ar_st_ivars) + 1), width = "6em")
return(bk)
}
# Publication-style builder: bridge type x asset-definition (ivars) shares.
f_tb_m8_shares <- function(st_format) {
ar_st_ivars <- names(tb_m8_shares)[-1]
bk <- kbl(
tb_m8_shares,
format = st_format,
linesep = "",
booktabs = TRUE,
align = "c",
caption = "Analysis-sample shares by investloan_type_m8 and ivars",
col.names = c("Bridge type (investloan_type_m8)", ar_st_ivars),
digits = 4
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE, position = "left"
) %>%
add_header_above(c(" " = 1, "Asset definition (ivars) — within-ivars shares" = length(ar_st_ivars))) %>%
column_spec(1, width = "16em") %>%
column_spec(2:(length(ar_st_ivars) + 1), width = "6em")
return(bk)
}
f_tb_m8_counts(st_kableformat)| Bridge type (investloan_type_m8) | agg_BS_1011 | agg_BS_1012 | agg_BS_1021 | agg_BS_3022 |
|---|---|---|---|---|
| 1-investment-no-loan | 5016 | 2880 | 2424 | 4572 |
| 2-investment-loan-a | 3348 | 1680 | 1500 | 2916 |
| 3-investment-loan-hook | 1356 | 696 | 600 | 1236 |
| 4-brg-single-lender | 708 | 300 | 324 | 636 |
| 5-brg-baac-vilfund-combo | 468 | 372 | 324 | 444 |
| 6-brg-informal-quasifor-combo | 36 | 12 | 24 | 36 |
| 7-brg-has-informal-bridge | 3360 | 2076 | 1968 | 2604 |
| 8-brg-has-informal-in-bridge | 2700 | 1716 | 1548 | 2304 |
ffp_save_res_table(
f_tb_m8_counts("latex"), "tb_m8_counts", spt_res,
df = tb_m8_counts, bl_save = ls_save_res[["tb_m8_counts"]]
)
f_tb_m8_shares(st_kableformat)| Bridge type (investloan_type_m8) | agg_BS_1011 | agg_BS_1012 | agg_BS_1021 | agg_BS_3022 |
|---|---|---|---|---|
| 1-investment-no-loan | 0.2952 | 0.2959 | 0.2782 | 0.3100 |
| 2-investment-loan-a | 0.1970 | 0.1726 | 0.1722 | 0.1977 |
| 3-investment-loan-hook | 0.0798 | 0.0715 | 0.0689 | 0.0838 |
| 4-brg-single-lender | 0.0417 | 0.0308 | 0.0372 | 0.0431 |
| 5-brg-baac-vilfund-combo | 0.0275 | 0.0382 | 0.0372 | 0.0301 |
| 6-brg-informal-quasifor-combo | 0.0021 | 0.0012 | 0.0028 | 0.0024 |
| 7-brg-has-informal-bridge | 0.1977 | 0.2133 | 0.2259 | 0.1766 |
| 8-brg-has-informal-in-bridge | 0.1589 | 0.1763 | 0.1777 | 0.1562 |
ffp_save_res_table(
f_tb_m8_shares("latex"), "tb_m8_shares", spt_res,
df = tb_m8_shares, bl_save = ls_save_res[["tb_m8_shares"]]
)Tables: regression category dummies
tb_cat2 <- tstm_invest_stats_bridgechar %>%
dplyr::group_by(investloan_type_m8_cat2, investloan_type_m8) %>%
dplyr::tally(name = "n") %>%
dplyr::arrange(investloan_type_m8_cat2, investloan_type_m8)
tb_cat3 <- tstm_invest_stats_bridgechar %>%
dplyr::group_by(investloan_type_m8_cat3, investloan_type_m8) %>%
dplyr::tally(name = "n") %>%
dplyr::arrange(investloan_type_m8_cat3, investloan_type_m8)
tb_cat4 <- tstm_invest_stats_bridgechar %>%
dplyr::group_by(investloan_type_m8_cat4, investloan_type_m8) %>%
dplyr::tally(name = "n") %>%
dplyr::arrange(investloan_type_m8_cat4, investloan_type_m8)
ffv_render_table(tb_cat2, caption = "Counts by investloan_type_m8_cat2 and investloan_type_m8", name = "tb_cat2")| investloan_type_m8_cat2 | investloan_type_m8 | n |
|---|---|---|
| 0 | 1-investment-no-loan | 14892 |
| 1 | 2-investment-loan-a | 9444 |
| 1 | 3-investment-loan-hook | 3888 |
| 1 | 4-brg-single-lender | 1968 |
| 1 | 5-brg-baac-vilfund-combo | 1608 |
| 1 | 6-brg-informal-quasifor-combo | 108 |
| 1 | 7-brg-has-informal-bridge | 10008 |
| 1 | 8-brg-has-informal-in-bridge | 8268 |
ffv_render_table(tb_cat3, caption = "Counts by investloan_type_m8_cat3 and investloan_type_m8", name = "tb_cat3")| investloan_type_m8_cat3 | investloan_type_m8 | n |
|---|---|---|
| 0 | 1-investment-no-loan | 14892 |
| 1 | 2-investment-loan-a | 9444 |
| 1 | 3-investment-loan-hook | 3888 |
| 2 | 4-brg-single-lender | 1968 |
| 2 | 5-brg-baac-vilfund-combo | 1608 |
| 2 | 6-brg-informal-quasifor-combo | 108 |
| 2 | 7-brg-has-informal-bridge | 10008 |
| 2 | 8-brg-has-informal-in-bridge | 8268 |
ffv_render_table(tb_cat4, caption = "Counts by investloan_type_m8_cat4 and investloan_type_m8", name = "tb_cat4")| investloan_type_m8_cat4 | investloan_type_m8 | n |
|---|---|---|
| 0 | 1-investment-no-loan | 14892 |
| 1 | 2-investment-loan-a | 9444 |
| 1 | 3-investment-loan-hook | 3888 |
| 2 | 4-brg-single-lender | 1968 |
| 2 | 5-brg-baac-vilfund-combo | 1608 |
| 2 | 6-brg-informal-quasifor-combo | 108 |
| 2 | 8-brg-has-informal-in-bridge | 8268 |
| 3 | 7-brg-has-informal-bridge | 10008 |
Tables: balance dimensions
tb_blnc_grid <- tstm_invest_stats_bridgechar %>%
dplyr::count(blnc_vars, blnc_agg_stats, ivars, name = "n") %>%
dplyr::arrange(blnc_vars, blnc_agg_stats, ivars)
ffv_render_table(
tb_blnc_grid,
caption = "Analysis-sample row counts by blnc_vars, blnc_agg_stats, and ivars",
name = "tb_blnc_grid"
)| blnc_vars | blnc_agg_stats | ivars | n |
|---|---|---|---|
| IS_1011 | mean | agg_BS_1011 | 1416 |
| IS_1011 | mean | agg_BS_1012 | 811 |
| IS_1011 | mean | agg_BS_1021 | 726 |
| IS_1011 | mean | agg_BS_3022 | 1229 |
| IS_1011 | sum | agg_BS_1011 | 1416 |
| IS_1011 | sum | agg_BS_1012 | 811 |
| IS_1011 | sum | agg_BS_1021 | 726 |
| IS_1011 | sum | agg_BS_3022 | 1229 |
| IS_1013 | mean | agg_BS_1011 | 1416 |
| IS_1013 | mean | agg_BS_1012 | 811 |
| IS_1013 | mean | agg_BS_1021 | 726 |
| IS_1013 | mean | agg_BS_3022 | 1229 |
| IS_1013 | sum | agg_BS_1011 | 1416 |
| IS_1013 | sum | agg_BS_1012 | 811 |
| IS_1013 | sum | agg_BS_1021 | 726 |
| IS_1013 | sum | agg_BS_3022 | 1229 |
| IS_1111 | mean | agg_BS_1011 | 1416 |
| IS_1111 | mean | agg_BS_1012 | 811 |
| IS_1111 | mean | agg_BS_1021 | 726 |
| IS_1111 | mean | agg_BS_3022 | 1229 |
| IS_1111 | sum | agg_BS_1011 | 1416 |
| IS_1111 | sum | agg_BS_1012 | 811 |
| IS_1111 | sum | agg_BS_1021 | 726 |
| IS_1111 | sum | agg_BS_3022 | 1229 |
| IS_1113 | mean | agg_BS_1011 | 1416 |
| IS_1113 | mean | agg_BS_1012 | 811 |
| IS_1113 | mean | agg_BS_1021 | 726 |
| IS_1113 | mean | agg_BS_3022 | 1229 |
| IS_1113 | sum | agg_BS_1011 | 1416 |
| IS_1113 | sum | agg_BS_1012 | 811 |
| IS_1113 | sum | agg_BS_1021 | 726 |
| IS_1113 | sum | agg_BS_3022 | 1229 |
| IS_1115 | mean | agg_BS_1011 | 1416 |
| IS_1115 | mean | agg_BS_1012 | 811 |
| IS_1115 | mean | agg_BS_1021 | 726 |
| IS_1115 | mean | agg_BS_3022 | 1229 |
| IS_1115 | sum | agg_BS_1011 | 1416 |
| IS_1115 | sum | agg_BS_1012 | 811 |
| IS_1115 | sum | agg_BS_1021 | 726 |
| IS_1115 | sum | agg_BS_3022 | 1229 |
| IS_1116 | mean | agg_BS_1011 | 1416 |
| IS_1116 | mean | agg_BS_1012 | 811 |
| IS_1116 | mean | agg_BS_1021 | 726 |
| IS_1116 | mean | agg_BS_3022 | 1229 |
| IS_1116 | sum | agg_BS_1011 | 1416 |
| IS_1116 | sum | agg_BS_1012 | 811 |
| IS_1116 | sum | agg_BS_1021 | 726 |
| IS_1116 | sum | agg_BS_3022 | 1229 |
Tables: outcome summaries for regression
tb_outcome_m8 <- tstm_invest_stats_bridgechar %>%
dplyr::group_by(investloan_type_m8) %>%
dplyr::summarize(
n = dplyr::n(),
win_diff_mean = mean(win_diff, na.rm = TRUE),
win_diff_median = median(win_diff, na.rm = TRUE),
win_forward_mean = mean(win_forward, na.rm = TRUE),
win_backward_mean = mean(win_backward, na.rm = TRUE),
capital_invest_mean = mean(capital_invest, na.rm = TRUE),
capital_invest_median = median(capital_invest, na.rm = TRUE),
.groups = "drop"
) %>%
dplyr::arrange(investloan_type_m8)
# Publication-style builder: outcome summaries by bridge type.
f_tb_outcome_m8 <- function(st_format) {
bk <- kbl(
tb_outcome_m8,
format = st_format,
linesep = "",
booktabs = TRUE,
align = "c",
caption = "Mean/median win_diff and capital_invest by investloan_type_m8 (all blnc rows)",
col.names = c(
"Bridge type (investloan_type_m8)", "N",
"Mean", "Median", "Fwd mean", "Bwd mean",
"Mean", "Median"
),
digits = 4
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE, position = "left"
) %>%
add_header_above(c(
" " = 2,
"win_diff" = 2,
"Window means" = 2,
"capital_invest" = 2
)) %>%
column_spec(1, width = "16em")
return(bk)
}
f_tb_outcome_m8(st_kableformat)| Bridge type (investloan_type_m8) | N | Mean | Median | Fwd mean | Bwd mean | Mean | Median |
|---|---|---|---|---|---|---|---|
| 1-investment-no-loan | 14892 | 5190.170 | 20.0000 | 149427.92 | 144237.75 | 150142.99 | 39245.50 |
| 2-investment-loan-a | 9444 | 5579.087 | 0.0000 | 112231.25 | 106652.16 | 130044.54 | 45690.42 |
| 3-investment-loan-hook | 3888 | 5690.960 | 160.0000 | 135154.93 | 129463.97 | 122155.56 | 48170.24 |
| 4-brg-single-lender | 1968 | 7002.587 | 21.9853 | 110317.68 | 103315.10 | 90503.63 | 44903.07 |
| 5-brg-baac-vilfund-combo | 1608 | -9828.215 | 671.6534 | 100724.27 | 110552.48 | 91099.16 | 50075.16 |
| 6-brg-informal-quasifor-combo | 108 | -38276.970 | -2690.1835 | 112656.14 | 150933.11 | 153511.51 | 49107.63 |
| 7-brg-has-informal-bridge | 10008 | 6715.753 | 611.3628 | 82867.63 | 76151.88 | 100066.38 | 40082.09 |
| 8-brg-has-informal-in-bridge | 8268 | 4002.023 | 157.0833 | 115319.89 | 111317.86 | 124212.79 | 45874.92 |
ffp_save_res_table(
f_tb_outcome_m8("latex"), "tb_outcome_m8", spt_res,
df = tb_outcome_m8, bl_save = ls_save_res[["tb_outcome_m8"]]
)Tables: Stata-aligned preview slice
Main revenue specification in invest_rev_diff_250524_v2.do: ivars == "agg_BS_1021", blnc_vars == "IS_1111", blnc_agg_stats == "mean".
df_stata_slice <- tstm_invest_stats_bridgechar %>%
dplyr::filter(
ivars == "agg_BS_1021",
blnc_vars == "IS_1111",
blnc_agg_stats == "mean"
)
tb_stata_m8 <- df_stata_slice %>%
dplyr::group_by(investloan_type_m8) %>%
dplyr::tally(name = "n") %>%
dplyr::arrange(investloan_type_m8)
tb_stata_summary <- df_stata_slice %>%
dplyr::summarize(
n = dplyr::n(),
win_diff_mean = mean(win_diff, na.rm = TRUE),
win_diff_sd = sd(win_diff, na.rm = TRUE),
capital_invest_mean = mean(capital_invest, na.rm = TRUE),
capital_invest_sd = sd(capital_invest, na.rm = TRUE)
)
ffv_render_table(
tb_stata_m8,
caption = "Stata-aligned slice: counts by investloan_type_m8 (agg_BS_1021, IS_1111, mean)",
name = "tb_stata_m8"
)| investloan_type_m8 | n |
|---|---|
| 1-investment-no-loan | 202 |
| 2-investment-loan-a | 125 |
| 3-investment-loan-hook | 50 |
| 4-brg-single-lender | 27 |
| 5-brg-baac-vilfund-combo | 27 |
| 6-brg-informal-quasifor-combo | 2 |
| 7-brg-has-informal-bridge | 164 |
| 8-brg-has-informal-in-bridge | 129 |
# Publication-style builder: pooled Stata-slice summaries.
f_tb_stata_summary <- function(st_format) {
bk <- kbl(
tb_stata_summary,
format = st_format,
linesep = "",
booktabs = TRUE,
align = "c",
caption = "Stata-aligned slice: pooled win_diff and capital_invest summaries",
col.names = c("N", "Mean", "SD", "Mean", "SD"),
digits = 4
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE, position = "left"
) %>%
add_header_above(c(
" " = 1,
"win_diff" = 2,
"capital_invest" = 2
)) %>%
column_spec(1, width = "6em")
return(bk)
}
f_tb_stata_summary(st_kableformat)| N | Mean | SD | Mean | SD |
|---|---|---|---|---|
| 726 | 614.4499 | 26325.72 | 82124.96 | 222630.2 |
ffp_save_res_table(
f_tb_stata_summary("latex"), "tb_stata_summary", spt_res,
df = tb_stata_summary, bl_save = ls_save_res[["tb_stata_summary"]]
)Tables: merge diagnostics (all_invest)
tb_m8_avail <- tstm_invest_stats_bridgechar_all_invest %>%
dplyr::ungroup() %>%
dplyr::count(investloan_type_m8_avail_2x2, name = "n") %>%
dplyr::mutate(share = round(n / sum(n), 4)) %>%
dplyr::arrange(dplyr::desc(n))
tb_meets_m8 <- tstm_invest_stats_bridgechar_all_invest %>%
dplyr::ungroup() %>%
dplyr::count(bl_meets_size, bl_meets_start, bl_has_investloan_type_m8, name = "n") %>%
dplyr::arrange(bl_meets_size, bl_meets_start, bl_has_investloan_type_m8)
ffv_render_table(
tb_m8_avail,
caption = "All-invest rows by investloan_type_m8_avail_2x2 (before analysis filter)",
name = "tb_m8_avail"
)| investloan_type_m8_avail_2x2 | n | share |
|---|---|---|
| has_investloan_type_m8 | 50784 | 0.6481 |
| absent_fail_size_only | 15312 | 0.1954 |
| absent_fail_start_only | 9372 | 0.1196 |
| absent_fail_size_and_start | 2856 | 0.0364 |
| absent_pass_cuts_still_unmatched | 36 | 0.0005 |
ffv_render_table(
tb_meets_m8,
caption = "All-invest rows by size/start cuts and bridge-char match",
name = "tb_meets_m8"
)| bl_meets_size | bl_meets_start | bl_has_investloan_type_m8 | n |
|---|---|---|---|
| FALSE | FALSE | FALSE | 2856 |
| FALSE | TRUE | FALSE | 15312 |
| TRUE | FALSE | FALSE | 9372 |
| TRUE | TRUE | FALSE | 36 |
| TRUE | TRUE | TRUE | 50784 |
Tables: mth_inv_start by bridge type
Supports MBF before/after subsample regressions (R-script/ffv_invest_return_bridge/README.md).
tb_mth_m8 <- tstm_invest_stats_bridgechar %>%
dplyr::filter(blnc_agg_stats == "mean", blnc_vars == "IS_1111", ivars == "agg_BS_1021") %>%
dplyr::group_by(investloan_type_m8) %>%
dplyr::summarize(
n = dplyr::n(),
mth_inv_start_min = min(mth_inv_start, na.rm = TRUE),
mth_inv_start_mean = round(mean(mth_inv_start, na.rm = TRUE), 1),
mth_inv_start_max = max(mth_inv_start, na.rm = TRUE),
.groups = "drop"
) %>%
dplyr::arrange(investloan_type_m8)
# Publication-style builder: month-of-start summary by bridge type.
f_tb_mth_m8 <- function(st_format) {
bk <- kbl(
tb_mth_m8,
format = st_format,
linesep = "",
booktabs = TRUE,
align = "c",
caption = "Survey month of investment start by investloan_type_m8 (Stata revenue slice)",
col.names = c("Bridge type (investloan_type_m8)", "N", "Min", "Mean", "Max"),
digits = 4
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE, position = "left"
) %>%
add_header_above(c(
" " = 2,
"mth_inv_start (survey month)" = 3
)) %>%
column_spec(1, width = "16em")
return(bk)
}
f_tb_mth_m8(st_kableformat)| Bridge type (investloan_type_m8) | N | Min | Mean | Max |
|---|---|---|---|---|
| 1-investment-no-loan | 202 | 14 | 68.9 | 142 |
| 2-investment-loan-a | 125 | 14 | 71.3 | 140 |
| 3-investment-loan-hook | 50 | 16 | 73.4 | 139 |
| 4-brg-single-lender | 27 | 19 | 85.3 | 138 |
| 5-brg-baac-vilfund-combo | 27 | 20 | 97.9 | 141 |
| 6-brg-informal-quasifor-combo | 2 | 27 | 51.5 | 76 |
| 7-brg-has-informal-bridge | 164 | 14 | 63.3 | 142 |
| 8-brg-has-informal-in-bridge | 129 | 15 | 75.5 | 143 |
ffp_save_res_table(
f_tb_mth_m8("latex"), "tb_mth_m8", spt_res,
df = tb_mth_m8, bl_save = ls_save_res[["tb_mth_m8"]]
)Downstream analysis
This vignette builds the analysis dataset only; it does not run regressions.
-
Dependent variable:
win_difffromff_hfid_invest_winstats()— difference between forward- and backward-window balance means (win_forward - win_backward) overit_window_forward/it_window_backwardmonths (default 12 each). -
Key regressor:
capital_invest(investment size at the spell). -
Subgroups:
investloan_type_m8or collapsed dummiesinvestloan_type_m8_cat2–cat4; loops overblnc_vars(revenue vs profit) andivars(three paper asset definitions inar_st_vars_to_keep). -
Time heterogeneity: split samples by
mth_gap_mbf,mth_gap_mbf_cat, ormth_gap_mbf_cat_int(MBF start month 37; see step 6). -
Next step: replicate tables in
stata/invest_rev_diff_250524_v2.doin R usingtstm_invest_stats_bridgecharloaded fromdata/ordata-temp/depending onbl_replace_data_output.
To refresh canonical data/tstm_invest_stats_bridgechar.rda, set bl_replace_data_output <- TRUE in the setup chunk and re-knit.
Session info
sessionInfo()
#> R version 4.6.1 (2026-06-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] kableExtra_1.4.0 glue_1.8.1 tidyr_1.3.2 dplyr_1.2.1
#> [5] PrjThaiHFID_0.2.0
#>
#> loaded via a namespace (and not attached):
#> [1] utf8_1.2.6 generics_0.1.4 xml2_1.6.0 stringi_1.8.7
#> [5] hms_1.1.4 digest_0.6.39 magrittr_2.0.5 evaluate_1.0.5
#> [9] RColorBrewer_1.1-3 fastmap_1.2.0 rprojroot_2.1.1 jsonlite_2.0.0
#> [13] purrr_1.2.2 viridisLite_0.4.3 scales_1.4.0 textshaping_1.0.5
#> [17] cli_3.6.6 rlang_1.2.0 crayon_1.5.3 bit64_4.8.2
#> [21] withr_3.0.3 yaml_2.3.12 otel_0.2.0 tools_4.6.1
#> [25] parallel_4.6.1 tzdb_0.5.0 forcats_1.0.1 vctrs_0.7.3
#> [29] R6_2.6.1 lifecycle_1.0.5 stringr_1.6.0 bit_4.6.0
#> [33] vroom_1.7.1 pkgconfig_2.0.3 pillar_1.11.1 systemfonts_1.3.2
#> [37] haven_2.5.5 xfun_0.59 tibble_3.3.1 tidyselect_1.2.1
#> [41] rstudioapi_0.19.0 knitr_1.51 farver_2.1.2 htmltools_0.5.9
#> [45] rmarkdown_2.31 svglite_2.2.2 readr_2.2.0 compiler_4.6.1