---
title: "Investment return windows and bridge categories"
description: >
  Tier 4 (investment analyses; re-runs the gateway inline). Forward/backward
  window return differences by investment size and bridge type, producing
  `tstm_invest_stats_bridgechar` (issues #32/#2/#3).
vignette: >
  %\VignetteIndexEntry{Investment return windows and bridge categories}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
knitr:
  opts_chunk:
    collapse: true
    comment: "#>"
---

This vignette chains the investment–loan–bridge gateway ([PrjThaiHFID-#32](https://github.com/FanWangEcon/PrjThaiHFID/issues/32)), forward/backward investment windows ([#2](https://github.com/FanWangEcon/PrjThaiHFID/issues/2)), and window balance-sheet statistics ([#3](https://github.com/FanWangEcon/PrjThaiHFID/issues/3)), then merges `investloan_type_m8` labels and regression category dummies.

**Unit of observation (final):** household × investment spell × `ivars` × `blnc_vars` × `blnc_agg_stats`, with bridge linkage labels attached.

**Downstream use:** the saved object `tstm_invest_stats_bridgechar` supports regressions of income change (`win_diff`, forward minus backward window means) on investment size (`capital_invest`), optionally by loan-bridge type and survey-month subsamples. Specifications align with `stata/invest_rev_diff_250524_v2.do` (mean aggregation; revenue `IS_1111` / profit `IS_1011` loops).

Production logic is ported from `R-dev/ffs_hfid_window_windiff.R`.

## Outputs at a glance

This vignette saves one dataset and renders a series of diagnostic and paper-aligned tables.

| Object | Data page | Role |
|--------|-----------|------|
| `tstm_invest_stats_bridgechar` | [reference](../reference/tstm_invest_stats_bridgechar.html) | Household x investment-spell x balance-outcome x `ivars` window changes with bridge labels |

The output is documented in [`R/data-res.R`](https://github.com/FanWangEcon/PrjThaiHFID/blob/main/R/data-res.R). It is written to `data/` when `bl_replace_data_output <- TRUE`, otherwise to gitignored `data-temp/`.

**Tables produced (not saved as data):** pipeline funnel; `investloan_type_m8` by asset `ivars`; regression category dummies; balance dimensions grid; outcome summaries for regression; Stata-aligned preview slice; merge diagnostics; and `mth_inv_start` by bridge type. Each table is gated by its own `ls_save_res[["<name>"]]` switch (all `FALSE` by default); flip one on to write its LaTeX/CSV to `res/res_invest_return_bridge/` via `ffp_save_res_table()`.

::: {.callout-note collapse="true"}
## What ships vs. what is generated locally

- **Shipped with the installed package:** the packaged datasets in `data/` (lazy-loaded inputs `tstm_loans_panel`, `tstm_asset_loan`).
- **Generated locally when this vignette runs** (neither pushed to GitHub nor shipped in the package): the `tstm_invest_stats_bridgechar` `.rda` written to `data/` (or `data-temp/`) and its `inst/extdata/*.csv` / `*.dta` export; and any tables written to `res/res_invest_return_bridge/` (`.tex`/`.csv`, only when the matching `ls_save_res` control is `TRUE`). The `inst/extdata/` and `res/` artifacts are git-ignored, saved only for local convenience. The anonymized crosswalk `data-raw/tm_key_id_hh_anony.rda` and `data-raw/census_vil_hh_structure_count.rda` are local build inputs (not shipped).
:::

## Required packaged inputs

The gateway loads only two objects from the package `data/` folder (must exist locally when knitting):

| Object | Role |
|--------|------|
| `tstm_loans_panel` | Monthly loan panel (Group A) |
| `tstm_asset_loan` | Household-month asset and loan aggregates (Group B) |
| `data-raw/tm_key_id_hh_anony.rda` | Anonymized household → province/village crosswalk (step 5) |
| `data-raw/census_vil_hh_structure_count.rda` | Village `N_households` (step 6) |

`ff_hfid_invest_winstats()` also reads `tstm_asset_loan` directly (Stata `agg_*` subset). All other pipeline objects are computed in this run.

## Contrast with `ffv_invest_loan_bridge.qmd`

| | [`ffv_invest_loan_bridge.qmd`](ffv_invest_loan_bridge.qmd) | This vignette |
|--|--|--|
| **Saved objects** (`bl_replace_data_output <- TRUE`) | Eight gateway intermediates: `tstm_loans_pn_nd`, `tstm_loans_hooks`, `tstm_loans_bridges_type`, `tstm_invdates_uniq`, `tstm_invest`, `tstm_roster_invest_loan_linked`, `tstm_roster_invest2loan2bridge`, `tstm_invest2loan2bridge_chars` | **One** object: `tstm_invest_stats_bridgechar` |
| **Uses prior vignette `.rda`?** | N/A (produces gateway outputs) | **No** — re-calls `ffp_hfid_invest_loan_linked_abc_investloan_char_gateway()` so every gateway parameter used to build `tstm_invest_stats_bridgechar` appears in this file |
| **Primary outputs** | Bridge-type counts / paper Table-style tabulations | Window balance changes + bridge labels for return regressions |

**Design intent:** self-contained, parameter-transparent knit rather than `load(tstm_invest2loan2bridge_chars.rda)` from the loan-bridge vignette. By contrast, [`ffv_bridge_timing.qmd`](ffv_bridge_timing.qmd) *does* consume saved gateway outputs from `data/`.

## Pipeline structure

The seven-step chain from packaged inputs to the saved `tstm_invest_stats_bridgechar`; an ASCII version of the same flow is in `R-dev/ffs_hfid_window_windiff.R`.

```{mermaid}
flowchart TD
  subgraph inputs [Packaged data]
    loansPanel[tstm_loans_panel]
    assetLoan[tstm_asset_loan]
  end
  subgraph step1 [Section1_Gateway]
    gw[ffp_hfid_invest_loan_linked_abc_investloan_char_gateway]
    gw --> tstm_invest
    gw --> tstm_invdates_uniq
    gw --> tstm_invest2loan2bridge_chars
  end
  subgraph step2 [Section2_Window]
    win[ff_hfid_invest_window]
    tstm_invdates_uniq --> win
    win --> tstm_invdates_uniq_fwbw
  end
  subgraph step3 [Section3_Winstats]
    ws[ff_hfid_invest_winstats]
    assetLoan --> ws
    tstm_invest --> ws
    tstm_invdates_uniq_fwbw --> ws
    ws --> tstm_invest_stats
  end
  subgraph step4 [Section4_Merge_bridge]
    merge[left_join_bridge_chars_and_cat_dummies]
    tstm_invest_stats --> merge
    tstm_invest2loan2bridge_chars --> merge
    merge --> allInvest[tstm_invest_stats_bridgechar_all_invest]
  end
  subgraph step5 [Section5_Merge_geography]
    geo[merge_tm_key_id_hh_id_prov_village_bl_id_ne]
    allInvest --> geo
  end
  subgraph step6 [Section6_Merge_MBF_timing]
    mbf[merge_N_households_mth_gap_mbf]
    geo --> mbf
  end
  subgraph step7 [Section7_Filter_and_save]
    mbf --> filt[filter_m8_and_valid_windows]
    filt --> out[tstm_invest_stats_bridgechar]
    out --> saveRda[data_or_data_temp]
  end
  loansPanel --> gw
  assetLoan --> gw
```

### Functions relied on

| Step | Function | Source |
|------|----------|--------|
| 1 | `ffp_hfid_invest_loan_linked_abc_investloan_char_gateway()` | `R/ffp_investloan_type_wrapper.R` |
| 2 | `ff_hfid_invest_window()` | `R/ffs_window.R` |
| 3 | `ff_hfid_invest_winstats()` | `R/ffs_winstats.R` |
| 4–5 | dplyr merge, filter, mutate | this vignette |

## Control parameters

Gateway parameters match [`ffv_invest_loan_bridge.qmd`](ffv_invest_loan_bridge.qmd). Window length `12` forward and `12` backward matches Stata `fw12bw12` naming in `invest_rev_diff_*.do`.

```{r setup}
library(PrjThaiHFID)
library(dplyr)
library(tidyr)
library(glue)
library(kableExtra)

bs_style <- c("striped", "hover", "condensed", "responsive")
options(kable_styling_bootstrap_options = bs_style)

# Script control
verbose <- TRUE
verbose_detail <- FALSE
it_verbose_detail_nrow <- 100L
it_file_code <- 646572L

# TRUE: overwrite canonical data/*.rda; FALSE: write to data-temp/ (gitignored)
bl_replace_data_output <- FALSE
bl_write_latex <- FALSE
st_kableformat <- "html"

# Gateway — loan panel variable names (Group A)
svr_lender_var <- "forinfm4"
svr_principal <- "bf5klm_bm6h_joint"
svr_principal_last <- "bm6h"
svr_principal_interest_sum <- "bm6b"

# Gateway — bridge filters (Group A.3–A.4)
bl_filter_bridge_grvgr0 <- TRUE
it_ll_grv_min <- -1L
bl_filter_loan_duration_a <- FALSE
bl_filter_loan_duration_b <- FALSE
bl_filter_lender_type <- FALSE
bl_filter_bridge_informal <- FALSE
bl_filter_loan_size <- FALSE
bl_filter_loan_duration_more <- FALSE

# Gateway — investment typing and sample (Groups B–E)
fl_sd_ithres <- stats::qnorm(0.99)
it_thres_invest_mth_gap <- 2L
it_gap_LBL_IL_min <- -6L
# Investment ivars: paper set plus agg_BS_3022 (hh assets only) for stata/invest_rev_diff_250519.do tables 2 and 4
ar_st_vars_to_keep <- c(
  "agg_BS_1021", "agg_BS_1012", "agg_BS_1011", "agg_BS_3022"
)
ar_ivars_paper <- ar_st_vars_to_keep
fl_min_invest_size_cut <- 10000
it_mth_inv_start_min_cut <- 14L
it_mth_inv_start_max_cut <- 144L
bl_drop_afrombc <- TRUE
bl_drop_cfromb <- TRUE
bl_compare2baserda <- FALSE

# Window — forward/backward month spans (issue #2)
it_window_forward <- 12L
it_window_backward <- 12L
bl_forward_include_imth <- FALSE
bl_backward_include_imth <- FALSE

# Winstats — balance-sheet aggregation (issue #3)
bl_compute_sum <- TRUE
bl_compute_mean <- TRUE
ar_st_asset_loan_id_cols <- c("id", "month")
ar_st_blnc_vars_stata <- c(
  "IS_1111", "IS_1113", "IS_1115", "IS_1116",
  "IS_1011", "IS_1013"
)
ar_st_agg_cols_asset_loan <- paste0("agg_", ar_st_blnc_vars_stata)
ar_st_cols_asset_loan <- c(ar_st_asset_loan_id_cols, ar_st_agg_cols_asset_loan)

# Bridge merge — join keys and category dummies
ar_st_bridge_join_keys <- c("id", "ivars", "hh_inv_asset_ctr")
st_bridge_id_col <- "hhid_Num"
st_bridge_type_col <- "investloan_type_m8"
st_investloan_no_loan <- "1-investment-no-loan"
ar_st_investloan_loan_ab <- c("2-investment-loan-a", "3-investment-loan-hook")
ar_st_investloan_prefix_4cat <- "^(4-|5-|6-|8-)"
# here::here() anchors on the temporary _quarto.yml pkgdown writes under
# vignettes/ during a quarto render; find the package root explicitly so paths
# resolve under both Quarto and knitr.
spn_pkg_root <- rprojroot::find_root(rprojroot::has_file("DESCRIPTION"))
st_path_tm_key_hh <- file.path(spn_pkg_root, "data-raw", "tm_key_id_hh_anony.rda")
st_path_census_vil <- file.path(spn_pkg_root, "data-raw", "census_vil_hh_structure_count.rda")
# MBF funding start month (same as R-script/ffs_bridge_count/ffs_bridge_count_mbf.R)
it_mbf_start_month <- 37L

st_data_out <- file.path(
  spn_pkg_root,
  if (bl_replace_data_output) "data" else "data-temp",
  fsep = .Platform$file.sep
)
st_extdata_dir <- file.path(spn_pkg_root, "inst", "extdata")
for (st_dir in c(st_data_out, st_extdata_dir)) {
  if (!dir.exists(st_dir)) {
    dir.create(st_dir, recursive = TRUE)
  }
}

# Per-vignette results folder and per-table save switches (all FALSE by default).
spt_res <- file.path(spn_pkg_root, "res", "res_invest_return_bridge")
ls_save_res <- list(
  tb_win_bad_share = FALSE,
  df_fw_bad_mth = FALSE,
  df_bw_bad_mth = FALSE,
  tb_funnel = FALSE,
  tb_m8_counts = FALSE,
  tb_m8_shares = FALSE,
  tb_cat2 = FALSE,
  tb_cat3 = FALSE,
  tb_cat4 = FALSE,
  tb_blnc_grid = FALSE,
  tb_outcome_m8 = FALSE,
  tb_stata_m8 = FALSE,
  tb_stata_summary = FALSE,
  tb_m8_avail = FALSE,
  tb_meets_m8 = FALSE,
  tb_mth_m8 = FALSE
)

ffv_save_gateway_tstm <- function(obj, name, out_dir, extdata_dir = st_extdata_dir) {
  if (!dir.exists(out_dir)) {
    dir.create(out_dir, recursive = TRUE)
  }
  if (!dir.exists(extdata_dir)) {
    dir.create(extdata_dir, recursive = TRUE)
  }

  st_rda <- file.path(out_dir, paste0(name, ".rda"))
  st_csv <- file.path(extdata_dir, paste0(name, ".csv"))
  st_dta <- file.path(extdata_dir, paste0(name, ".dta"))

  tmp_env <- new.env(parent = emptyenv())
  tmp_env[[name]] <- obj
  save(list = name, file = st_rda, envir = tmp_env)

  readr::write_csv(obj, st_csv)

  if (!requireNamespace("haven", quietly = TRUE)) {
    stop(
      "Package 'haven' is required to write .dta to inst/extdata. ",
      "Install with: install.packages('haven')",
      call. = FALSE
    )
  }
  haven::write_dta(obj, st_dta)

  invisible(list(rda = st_rda, csv = st_csv, dta = st_dta))
}

ffv_inv_type_by_ivars <- function(df, type_var) {
  df %>%
    dplyr::group_by(ivars, .data[[type_var]]) %>%
    dplyr::tally(name = "n") %>%
    tidyr::pivot_wider(names_from = ivars, values_from = n, values_fill = 0)
}

ffv_inv_type_share_by_ivars <- function(df, type_var) {
  df %>%
    dplyr::group_by(ivars, .data[[type_var]]) %>%
    dplyr::tally(name = "n") %>%
    dplyr::group_by(ivars) %>%
    dplyr::mutate(share = n / sum(n)) %>%
    dplyr::select(-n) %>%
    tidyr::pivot_wider(
      id_cols = dplyr::all_of(type_var),
      names_from = ivars,
      values_from = share
    )
}

ffv_render_table <- function(
    df,
    caption,
    name = NULL,
    st_format = st_kableformat) {
  # Optional latex (+ csv) save, gated by the per-table ls_save_res switch.
  if (!is.null(name) && isTRUE(ls_save_res[[name]])) {
    kbl_latex <- kbl(
      df,
      format = "latex",
      booktabs = TRUE,
      caption = caption,
      digits = 4
    )
    ffp_save_res_table(kbl_latex, name, spt_res, df = df, bl_save = TRUE)
  }
  kbl(
    df,
    format = st_format,
    booktabs = TRUE,
    caption = caption,
    digits = 4
  ) %>%
    kable_styling(
      bootstrap_options = bs_style,
      full_width = FALSE,
      position = "left"
    )
}

bl_win_val_bad <- function(x) {
  is.na(x) | !is.finite(x)
}

if (verbose) {
  print(glue::glue("f-{it_file_code}, controls"))
  print(glue::glue("  bl_replace_data_output: {bl_replace_data_output} -> {st_data_out}/"))
  print(glue::glue("  windows: fw{it_window_forward} bw{it_window_backward}"))
  print(glue::glue("  fl_min_invest_size_cut: {fl_min_invest_size_cut}"))
  print(glue::glue("  mth_inv_start in [{it_mth_inv_start_min_cut}, {it_mth_inv_start_max_cut}]"))
  print(glue::glue("  ar_st_vars_to_keep: {paste(ar_st_vars_to_keep, collapse = ', ')}"))
  print(glue::glue("  ar_st_blnc_vars_stata: {paste(ar_st_blnc_vars_stata, collapse = ', ')}"))
}
```

## Run gateway (step 1)

Re-calls the gateway inline (same parameters as `ffv_invest_loan_bridge.qmd`). Does **not** load saved gateway `.rda` files from `data/`.

```{r gateway}
if (verbose) {
  print(glue::glue("f-{it_file_code}, gateway: calling ffp_hfid_invest_loan_linked_abc_investloan_char_gateway"))
}

ls_gateway_result <- PrjThaiHFID::ffp_hfid_invest_loan_linked_abc_investloan_char_gateway(
  svr_lender_var = svr_lender_var,
  svr_principal = svr_principal,
  svr_principal_last = svr_principal_last,
  svr_principal_interest_sum = svr_principal_interest_sum,
  bl_filter_bridge_grvgr0 = bl_filter_bridge_grvgr0,
  it_ll_grv_min = it_ll_grv_min,
  bl_filter_loan_duration_a = bl_filter_loan_duration_a,
  bl_filter_loan_duration_b = bl_filter_loan_duration_b,
  bl_filter_lender_type = bl_filter_lender_type,
  bl_filter_bridge_informal = bl_filter_bridge_informal,
  bl_filter_loan_size = bl_filter_loan_size,
  bl_filter_loan_duration_more = bl_filter_loan_duration_more,
  fl_sd_ithres = fl_sd_ithres,
  it_thres_invest_mth_gap = it_thres_invest_mth_gap,
  it_gap_LBL_IL_min = it_gap_LBL_IL_min,
  ar_st_vars_to_keep = ar_st_vars_to_keep,
  fl_min_invest_size = fl_min_invest_size_cut,
  it_mth_inv_start_min = it_mth_inv_start_min_cut,
  it_mth_inv_start_max = it_mth_inv_start_max_cut,
  bl_drop_afrombc = bl_drop_afrombc,
  bl_drop_cfromb = bl_drop_cfromb,
  bl_compare2baserda = bl_compare2baserda,
  verbose = verbose,
  verbose_detail = verbose_detail,
  it_verbose_detail_nrow = it_verbose_detail_nrow
)

tstm_invest <- ls_gateway_result$tstm_invest
tstm_invdates_uniq <- ls_gateway_result$tstm_invdates_uniq
tstm_invest2loan2bridge_chars <- ls_gateway_result$tstm_invest2loan2bridge_chars

if (verbose) {
  print(glue::glue("f-{it_file_code}, gateway: tstm_invest {nrow(tstm_invest)} rows"))
  print(glue::glue("f-{it_file_code}, gateway: tstm_invdates_uniq {nrow(tstm_invdates_uniq)} rows"))
  print(glue::glue(
    "f-{it_file_code}, gateway: tstm_invest2loan2bridge_chars {nrow(tstm_invest2loan2bridge_chars)} rows"
  ))
  if (!isTRUE(ls_gateway_result$bl_invest_pipeline_consistency_ok)) {
    print(glue::glue("f-{it_file_code}, gateway: warning — invest pipeline consistency check failed"))
  }
}
```

## Investment windows (step 2)

```{r window}
if (verbose) {
  print(glue::glue(
    "f-{it_file_code}, window: ff_hfid_invest_window fw{it_window_forward} bw{it_window_backward}"
  ))
}

tstm_invdates_uniq_fwbw <- PrjThaiHFID::ff_hfid_invest_window(
  df_invdates_uniq = tstm_invdates_uniq,
  it_window_forward = it_window_forward,
  it_window_backward = it_window_backward,
  bl_forward_include_imth = bl_forward_include_imth,
  bl_backward_include_imth = bl_backward_include_imth
)

if (verbose) {
  print(glue::glue("f-{it_file_code}, window: tstm_invdates_uniq_fwbw {nrow(tstm_invdates_uniq_fwbw)} rows"))
}
```

## Window balance-sheet statistics (step 3)

Subset `tstm_asset_loan` to Stata-used `agg_*` columns only (not the full wide asset file).

```{r winstats}
ar_st_agg_missing <- setdiff(ar_st_agg_cols_asset_loan, names(PrjThaiHFID::tstm_asset_loan))
if (length(ar_st_agg_missing) > 0L) {
  stop(
    "tstm_asset_loan missing Stata agg columns: ",
    paste(ar_st_agg_missing, collapse = ", "),
    call. = FALSE
  )
}

df_asset_loan_stata <- PrjThaiHFID::tstm_asset_loan %>%
  select(all_of(ar_st_cols_asset_loan))

if (verbose) {
  print(glue::glue("f-{it_file_code}, winstats: ff_hfid_invest_winstats"))
  print(glue::glue(
    "f-{it_file_code}, winstats: df_asset_loan_stata {ncol(df_asset_loan_stata)} cols"
  ))
}

tstm_invest_stats <- PrjThaiHFID::ff_hfid_invest_winstats(
  df_asset_loan = df_asset_loan_stata,
  df_invest = tstm_invest,
  df_invwin_uniq = tstm_invdates_uniq_fwbw,
  bl_compute_sum = bl_compute_sum,
  bl_compute_mean = bl_compute_mean
)

if (verbose) {
  print(glue::glue("f-{it_file_code}, winstats: tstm_invest_stats {nrow(tstm_invest_stats)} rows"))
}
```

### Window validity diagnostics (step 3b)

`win_forward` / `win_backward` can be NA or non-finite when the window lacks enough observed balance-sheet months (calendar edge or household not in every survey month).

```{r winstats-na}
tb_win_bad_share <- tstm_invest_stats %>%
  dplyr::ungroup() %>%
  dplyr::summarize(
    n = dplyr::n(),
    n_fw_bad = sum(bl_win_val_bad(win_forward)),
    n_bw_bad = sum(bl_win_val_bad(win_backward)),
    share_fw_bad = round(n_fw_bad / n, 4),
    share_bw_bad = round(n_bw_bad / n, 4)
  )

df_fw_bad_mth <- tstm_invest_stats %>%
  dplyr::ungroup() %>%
  dplyr::filter(bl_win_val_bad(win_forward)) %>%
  dplyr::summarize(
    n = dplyr::n(),
    mth_inv_start_min = min(mth_inv_start, na.rm = TRUE),
    mth_inv_start_mean = mean(mth_inv_start, na.rm = TRUE),
    mth_inv_start_max = max(mth_inv_start, na.rm = TRUE)
  )

df_bw_bad_mth <- tstm_invest_stats %>%
  dplyr::ungroup() %>%
  dplyr::filter(bl_win_val_bad(win_backward)) %>%
  dplyr::summarize(
    n = dplyr::n(),
    mth_inv_start_min = min(mth_inv_start, na.rm = TRUE),
    mth_inv_start_mean = mean(mth_inv_start, na.rm = TRUE),
    mth_inv_start_max = max(mth_inv_start, na.rm = TRUE)
  )

ffv_render_table(
  tb_win_bad_share,
  caption = "Share of invest-stats rows with NA or non-finite window stats (before bridge merge)",
  name = "tb_win_bad_share"
)
ffv_render_table(
  df_fw_bad_mth,
  caption = "mth_inv_start range when win_forward is NA or invalid",
  name = "df_fw_bad_mth"
)
ffv_render_table(
  df_bw_bad_mth,
  caption = "mth_inv_start range when win_backward is NA or invalid",
  name = "df_bw_bad_mth"
)
```

## Merge bridge characteristics (step 4)

```{r merge-categories}
tstm_invest2loan2bridge_chars_sel <- tstm_invest2loan2bridge_chars %>%
  dplyr::select(all_of(c(st_bridge_id_col, "ivars", "hh_inv_asset_ctr", st_bridge_type_col))) %>%
  dplyr::rename(id = dplyr::all_of(st_bridge_id_col))

tstm_invest_stats_bridgechar_all_invest <- tstm_invest_stats %>%
  dplyr::left_join(
    tstm_invest2loan2bridge_chars_sel,
    by = ar_st_bridge_join_keys
  ) %>%
  dplyr::mutate(
    bl_meets_size = capital_invest >= fl_min_invest_size_cut,
    bl_meets_start = mth_inv_start >= it_mth_inv_start_min_cut &
      mth_inv_start <= it_mth_inv_start_max_cut,
    bl_has_investloan_type_m8 = !is.na(.data[[st_bridge_type_col]]),
    investloan_type_m8_avail_2x2 = dplyr::case_when(
      bl_has_investloan_type_m8 ~ "has_investloan_type_m8",
      !bl_meets_size & !bl_meets_start ~ "absent_fail_size_and_start",
      !bl_meets_size & bl_meets_start ~ "absent_fail_size_only",
      bl_meets_size & !bl_meets_start ~ "absent_fail_start_only",
      bl_meets_size & bl_meets_start ~ "absent_pass_cuts_still_unmatched",
      TRUE ~ "absent_other"
    ),
    investloan_type_m8_cat2 = ifelse(.data[[st_bridge_type_col]] == st_investloan_no_loan, 0L, 1L),
    investloan_type_m8_cat2v2 = dplyr::case_when(
      .data[[st_bridge_type_col]] == st_investloan_no_loan ~ 0L,
      .data[[st_bridge_type_col]] %in% ar_st_investloan_loan_ab ~ 0L,
      TRUE ~ 1L
    ),
    investloan_type_m8_cat3 = dplyr::case_when(
      .data[[st_bridge_type_col]] == st_investloan_no_loan ~ 0L,
      .data[[st_bridge_type_col]] %in% ar_st_investloan_loan_ab ~ 1L,
      TRUE ~ 2L
    ),
    investloan_type_m8_cat4 = dplyr::case_when(
      .data[[st_bridge_type_col]] == st_investloan_no_loan ~ 0L,
      .data[[st_bridge_type_col]] %in% ar_st_investloan_loan_ab ~ 1L,
      grepl(ar_st_investloan_prefix_4cat, .data[[st_bridge_type_col]]) ~ 2L,
      TRUE ~ 3L
    )
  )

if (verbose) {
  print(glue::glue(
    "f-{it_file_code}, merge: tstm_invest_stats_bridgechar_all_invest ",
    "{nrow(tstm_invest_stats_bridgechar_all_invest)} rows x ",
    "{ncol(tstm_invest_stats_bridgechar_all_invest)} cols"
  ))
}
```

## Merge household geography (step 5)

Attach province, village, and northeast indicators from `data-raw/tm_key_id_hh_anony.rda` (anonymized-only crosswalk) via anonymized household ID (`id` = `tmid_hh`). `id_prov` (`tmid_prov`), `id_village` (`tmid_vil`), `bl_id_ne` (1 if `region_name == "Northeast"`, else 0).

```{r merge-hh-geo}
if (!file.exists(st_path_tm_key_hh)) {
  stop("Missing required input: ", st_path_tm_key_hh, call. = FALSE)
}
load(st_path_tm_key_hh)

df_hh_geo <- tm_key_id_hh_anony %>%
  dplyr::distinct(tmid_hh, tmid_prov, tmid_vil, region_name) %>%
  dplyr::mutate(
    id_prov = as.integer(tmid_prov),
    id_village = as.integer(tmid_vil),
    bl_id_ne = as.integer(region_name == "Northeast")
  ) %>%
  dplyr::select(tmid_hh, id_prov, id_village, bl_id_ne)

n_before_geo <- nrow(tstm_invest_stats_bridgechar_all_invest)

tstm_invest_stats_bridgechar_all_invest <- tstm_invest_stats_bridgechar_all_invest %>%
  dplyr::left_join(df_hh_geo, by = c("id" = "tmid_hh"))

n_miss_geo <- sum(
  is.na(tstm_invest_stats_bridgechar_all_invest$id_prov) |
    is.na(tstm_invest_stats_bridgechar_all_invest$id_village)
)

if (verbose) {
  print(glue::glue(
    "f-{it_file_code}, merge-hh-geo: {n_before_geo} rows; ",
    "NA id_prov/id_village={n_miss_geo}; ",
    "bl_id_ne share={round(100 * mean(tstm_invest_stats_bridgechar_all_invest$bl_id_ne, na.rm = TRUE), 2)}%"
  ))
}

if (n_miss_geo > 0L) {
  warning(
    "Some rows lack geography after tm_key_id_hh join (n = ", n_miss_geo, ")",
    call. = FALSE
  )
}
```

## Merge MBF village counts and timing (step 6)

Follows [`R-script/ffs_bridge_count/ffs_bridge_count_mbf.R`](R-script/ffs_bridge_count/ffs_bridge_count_mbf.R) §6: village household counts from `census_vil_hh_structure_count.rda`, joined on `id_village` (`tmid_vil`), plus MBF-relative month bins from `mth_inv_start`.

```{r merge-mbf-vil}
if (!file.exists(st_path_census_vil)) {
  stop("Missing required input: ", st_path_census_vil, call. = FALSE)
}
load(st_path_census_vil)

df_vil_hh <- census_vil_hh_structure_count %>%
  dplyr::select(tmid_vil, N_households)

tstm_invest_stats_bridgechar_all_invest <- tstm_invest_stats_bridgechar_all_invest %>%
  dplyr::left_join(df_vil_hh, by = c("id_village" = "tmid_vil")) %>%
  dplyr::mutate(
    mth_gap_mbf = mth_inv_start - it_mbf_start_month,
    mth_gap_mbf_cat = dplyr::case_when(
      mth_gap_mbf >= -24 & mth_gap_mbf <= -13 ~ "-2yr",
      mth_gap_mbf >= -12 & mth_gap_mbf <=  -1 ~ "-1yr",
      mth_gap_mbf >=   0 & mth_gap_mbf <=  11 ~ "yr1",
      mth_gap_mbf >=  12 & mth_gap_mbf <=  23 ~ "yr2",
      mth_gap_mbf >=  24 & mth_gap_mbf <=  35 ~ "yr3",
      mth_gap_mbf >=  36 & mth_gap_mbf <=  47 ~ "yr4",
      mth_gap_mbf >=  48 & mth_gap_mbf <=  59 ~ "yr5",
      mth_gap_mbf >=  60 & mth_gap_mbf <=  71 ~ "yr6",
      mth_gap_mbf >=  72 & mth_gap_mbf <=  83 ~ "yr7",
      mth_gap_mbf >=  84 & mth_gap_mbf <=  95 ~ "yr8",
      mth_gap_mbf >=  96 & mth_gap_mbf <= 107 ~ "yr9",
      TRUE ~ NA_character_
    ),
    mth_gap_mbf_cat_int = dplyr::case_when(
      mth_gap_mbf_cat == "-2yr" ~ -2L,
      mth_gap_mbf_cat == "-1yr" ~ -1L,
      mth_gap_mbf_cat == "yr1" ~ 1L,
      mth_gap_mbf_cat == "yr2" ~ 2L,
      mth_gap_mbf_cat == "yr3" ~ 3L,
      mth_gap_mbf_cat == "yr4" ~ 4L,
      mth_gap_mbf_cat == "yr5" ~ 5L,
      mth_gap_mbf_cat == "yr6" ~ 6L,
      mth_gap_mbf_cat == "yr7" ~ 7L,
      mth_gap_mbf_cat == "yr8" ~ 8L,
      mth_gap_mbf_cat == "yr9" ~ 9L,
      TRUE ~ NA_integer_
    )
  )

n_miss_n_hh <- sum(is.na(tstm_invest_stats_bridgechar_all_invest$N_households))
n_miss_mbf_cat <- sum(is.na(tstm_invest_stats_bridgechar_all_invest$mth_gap_mbf_cat))

if (verbose) {
  print(glue::glue(
    "f-{it_file_code}, merge-mbf-vil: rows={nrow(tstm_invest_stats_bridgechar_all_invest)}; ",
    "NA N_households={n_miss_n_hh}; NA mth_gap_mbf_cat={n_miss_mbf_cat}"
  ))
  print(glue::glue(
    "  mth_gap_mbf range: [{min(tstm_invest_stats_bridgechar_all_invest$mth_gap_mbf, na.rm = TRUE)}, ",
    "{max(tstm_invest_stats_bridgechar_all_invest$mth_gap_mbf, na.rm = TRUE)}]"
  ))
  print(table(tstm_invest_stats_bridgechar_all_invest$mth_gap_mbf_cat_int, useNA = "ifany"))
}

if (n_miss_n_hh > 0L) {
  warning(
    "Some rows lack N_households after census join (n = ", n_miss_n_hh, ")",
    call. = FALSE
  )
}
```

## Analysis subset (step 7)

Primary return: `tstm_invest_stats_bridgechar` — `investloan_type_m8` present and finite `win_forward` / `win_backward`.

```{r filter-analysis}
tstm_invest_stats_bridgechar <- tstm_invest_stats_bridgechar_all_invest %>%
  dplyr::ungroup() %>%
  dplyr::filter(
    !is.na(.data[[st_bridge_type_col]]),
    !bl_win_val_bad(win_forward),
    !bl_win_val_bad(win_backward)
  ) %>%
  dplyr::select(-investloan_type_m8_avail_2x2)

if (verbose) {
  print(glue::glue(
    "f-{it_file_code}, filter: tstm_invest_stats_bridgechar ",
    "{nrow(tstm_invest_stats_bridgechar)} rows ",
    "({round(100 * nrow(tstm_invest_stats_bridgechar) / nrow(tstm_invest_stats_bridgechar_all_invest), 1)}% of all_invest)"
  ))
}
```

## Save `tstm_invest_stats_bridgechar`

```{r save-output}
ls_saved <- ffv_save_gateway_tstm(
  obj = tstm_invest_stats_bridgechar,
  name = "tstm_invest_stats_bridgechar",
  out_dir = st_data_out
)

if (verbose) {
  if (bl_replace_data_output) {
    print(glue::glue("f-{it_file_code}, save: replaced canonical data/tstm_invest_stats_bridgechar.rda"))
  } else {
    print(glue::glue(
      "f-{it_file_code}, save: wrote data-temp/tstm_invest_stats_bridgechar.rda (data/ unchanged)"
    ))
  }
  print(glue::glue("f-{it_file_code}, save: {ls_saved$csv}"))
  print(glue::glue("f-{it_file_code}, save: {ls_saved$dta}"))
}
```

## Tables: pipeline funnel

```{r table-funnel}
tb_funnel <- data.frame(
  stage = c(
    "tstm_invest_stats",
    "tstm_invest_stats_bridgechar_all_invest",
    "tstm_invest_stats_bridgechar"
  ),
  n_rows = c(
    nrow(tstm_invest_stats),
    nrow(tstm_invest_stats_bridgechar_all_invest),
    nrow(tstm_invest_stats_bridgechar)
  ),
  n_cols = c(
    ncol(tstm_invest_stats),
    ncol(tstm_invest_stats_bridgechar_all_invest),
    ncol(tstm_invest_stats_bridgechar)
  )
) %>%
  dplyr::mutate(
    pct_of_stats = round(100 * n_rows / n_rows[stage == "tstm_invest_stats"], 1),
    pct_of_all_invest = round(
      100 * n_rows / n_rows[stage == "tstm_invest_stats_bridgechar_all_invest"],
      1
    )
  )

ffv_render_table(
  tb_funnel,
  caption = "Row counts through the window + bridge pipeline",
  name = "tb_funnel"
)
```

## Tables: `investloan_type_m8` by asset (`ivars`)

```{r table-m8-ivars}
tb_m8_counts <- ffv_inv_type_by_ivars(
  tstm_invest_stats_bridgechar, "investloan_type_m8"
)
tb_m8_shares <- ffv_inv_type_share_by_ivars(
  tstm_invest_stats_bridgechar, "investloan_type_m8"
)

# Publication-style builder: bridge type x asset-definition (ivars) counts.
f_tb_m8_counts <- function(st_format) {
  ar_st_ivars <- names(tb_m8_counts)[-1]
  bk <- kbl(
    tb_m8_counts,
    format = st_format,
    linesep = "",
    booktabs = TRUE,
    align = "c",
    caption = "Analysis-sample counts by investloan_type_m8 and ivars",
    col.names = c("Bridge type (investloan_type_m8)", ar_st_ivars)
  ) %>%
    kable_styling(
      bootstrap_options = c("striped", "hover", "condensed", "responsive"),
      full_width = FALSE, position = "left"
    ) %>%
    add_header_above(c(" " = 1, "Asset definition (ivars) — counts" = length(ar_st_ivars))) %>%
    column_spec(1, width = "16em") %>%
    column_spec(2:(length(ar_st_ivars) + 1), width = "6em")
  return(bk)
}

# Publication-style builder: bridge type x asset-definition (ivars) shares.
f_tb_m8_shares <- function(st_format) {
  ar_st_ivars <- names(tb_m8_shares)[-1]
  bk <- kbl(
    tb_m8_shares,
    format = st_format,
    linesep = "",
    booktabs = TRUE,
    align = "c",
    caption = "Analysis-sample shares by investloan_type_m8 and ivars",
    col.names = c("Bridge type (investloan_type_m8)", ar_st_ivars),
    digits = 4
  ) %>%
    kable_styling(
      bootstrap_options = c("striped", "hover", "condensed", "responsive"),
      full_width = FALSE, position = "left"
    ) %>%
    add_header_above(c(" " = 1, "Asset definition (ivars) — within-ivars shares" = length(ar_st_ivars))) %>%
    column_spec(1, width = "16em") %>%
    column_spec(2:(length(ar_st_ivars) + 1), width = "6em")
  return(bk)
}

f_tb_m8_counts(st_kableformat)
ffp_save_res_table(
  f_tb_m8_counts("latex"), "tb_m8_counts", spt_res,
  df = tb_m8_counts, bl_save = ls_save_res[["tb_m8_counts"]]
)
f_tb_m8_shares(st_kableformat)
ffp_save_res_table(
  f_tb_m8_shares("latex"), "tb_m8_shares", spt_res,
  df = tb_m8_shares, bl_save = ls_save_res[["tb_m8_shares"]]
)
```

## Tables: regression category dummies

```{r table-cat-dummies}
tb_cat2 <- tstm_invest_stats_bridgechar %>%
  dplyr::group_by(investloan_type_m8_cat2, investloan_type_m8) %>%
  dplyr::tally(name = "n") %>%
  dplyr::arrange(investloan_type_m8_cat2, investloan_type_m8)

tb_cat3 <- tstm_invest_stats_bridgechar %>%
  dplyr::group_by(investloan_type_m8_cat3, investloan_type_m8) %>%
  dplyr::tally(name = "n") %>%
  dplyr::arrange(investloan_type_m8_cat3, investloan_type_m8)

tb_cat4 <- tstm_invest_stats_bridgechar %>%
  dplyr::group_by(investloan_type_m8_cat4, investloan_type_m8) %>%
  dplyr::tally(name = "n") %>%
  dplyr::arrange(investloan_type_m8_cat4, investloan_type_m8)

ffv_render_table(tb_cat2, caption = "Counts by investloan_type_m8_cat2 and investloan_type_m8", name = "tb_cat2")
ffv_render_table(tb_cat3, caption = "Counts by investloan_type_m8_cat3 and investloan_type_m8", name = "tb_cat3")
ffv_render_table(tb_cat4, caption = "Counts by investloan_type_m8_cat4 and investloan_type_m8", name = "tb_cat4")
```

## Tables: balance dimensions

```{r table-blnc-grid}
tb_blnc_grid <- tstm_invest_stats_bridgechar %>%
  dplyr::count(blnc_vars, blnc_agg_stats, ivars, name = "n") %>%
  dplyr::arrange(blnc_vars, blnc_agg_stats, ivars)

ffv_render_table(
  tb_blnc_grid,
  caption = "Analysis-sample row counts by blnc_vars, blnc_agg_stats, and ivars",
  name = "tb_blnc_grid"
)
```

## Tables: outcome summaries for regression

```{r table-outcome-by-m8}
tb_outcome_m8 <- tstm_invest_stats_bridgechar %>%
  dplyr::group_by(investloan_type_m8) %>%
  dplyr::summarize(
    n = dplyr::n(),
    win_diff_mean = mean(win_diff, na.rm = TRUE),
    win_diff_median = median(win_diff, na.rm = TRUE),
    win_forward_mean = mean(win_forward, na.rm = TRUE),
    win_backward_mean = mean(win_backward, na.rm = TRUE),
    capital_invest_mean = mean(capital_invest, na.rm = TRUE),
    capital_invest_median = median(capital_invest, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  dplyr::arrange(investloan_type_m8)

# Publication-style builder: outcome summaries by bridge type.
f_tb_outcome_m8 <- function(st_format) {
  bk <- kbl(
    tb_outcome_m8,
    format = st_format,
    linesep = "",
    booktabs = TRUE,
    align = "c",
    caption = "Mean/median win_diff and capital_invest by investloan_type_m8 (all blnc rows)",
    col.names = c(
      "Bridge type (investloan_type_m8)", "N",
      "Mean", "Median", "Fwd mean", "Bwd mean",
      "Mean", "Median"
    ),
    digits = 4
  ) %>%
    kable_styling(
      bootstrap_options = c("striped", "hover", "condensed", "responsive"),
      full_width = FALSE, position = "left"
    ) %>%
    add_header_above(c(
      " " = 2,
      "win_diff" = 2,
      "Window means" = 2,
      "capital_invest" = 2
    )) %>%
    column_spec(1, width = "16em")
  return(bk)
}

f_tb_outcome_m8(st_kableformat)
ffp_save_res_table(
  f_tb_outcome_m8("latex"), "tb_outcome_m8", spt_res,
  df = tb_outcome_m8, bl_save = ls_save_res[["tb_outcome_m8"]]
)
```

## Tables: Stata-aligned preview slice

Main revenue specification in `invest_rev_diff_250524_v2.do`: `ivars == "agg_BS_1021"`, `blnc_vars == "IS_1111"`, `blnc_agg_stats == "mean"`.

```{r table-stata-slice}
df_stata_slice <- tstm_invest_stats_bridgechar %>%
  dplyr::filter(
    ivars == "agg_BS_1021",
    blnc_vars == "IS_1111",
    blnc_agg_stats == "mean"
  )

tb_stata_m8 <- df_stata_slice %>%
  dplyr::group_by(investloan_type_m8) %>%
  dplyr::tally(name = "n") %>%
  dplyr::arrange(investloan_type_m8)

tb_stata_summary <- df_stata_slice %>%
  dplyr::summarize(
    n = dplyr::n(),
    win_diff_mean = mean(win_diff, na.rm = TRUE),
    win_diff_sd = sd(win_diff, na.rm = TRUE),
    capital_invest_mean = mean(capital_invest, na.rm = TRUE),
    capital_invest_sd = sd(capital_invest, na.rm = TRUE)
  )

ffv_render_table(
  tb_stata_m8,
  caption = "Stata-aligned slice: counts by investloan_type_m8 (agg_BS_1021, IS_1111, mean)",
  name = "tb_stata_m8"
)

# Publication-style builder: pooled Stata-slice summaries.
f_tb_stata_summary <- function(st_format) {
  bk <- kbl(
    tb_stata_summary,
    format = st_format,
    linesep = "",
    booktabs = TRUE,
    align = "c",
    caption = "Stata-aligned slice: pooled win_diff and capital_invest summaries",
    col.names = c("N", "Mean", "SD", "Mean", "SD"),
    digits = 4
  ) %>%
    kable_styling(
      bootstrap_options = c("striped", "hover", "condensed", "responsive"),
      full_width = FALSE, position = "left"
    ) %>%
    add_header_above(c(
      " " = 1,
      "win_diff" = 2,
      "capital_invest" = 2
    )) %>%
    column_spec(1, width = "6em")
  return(bk)
}

f_tb_stata_summary(st_kableformat)
ffp_save_res_table(
  f_tb_stata_summary("latex"), "tb_stata_summary", spt_res,
  df = tb_stata_summary, bl_save = ls_save_res[["tb_stata_summary"]]
)
```

## Tables: merge diagnostics (`all_invest`)

```{r table-merge-diag}
tb_m8_avail <- tstm_invest_stats_bridgechar_all_invest %>%
  dplyr::ungroup() %>%
  dplyr::count(investloan_type_m8_avail_2x2, name = "n") %>%
  dplyr::mutate(share = round(n / sum(n), 4)) %>%
  dplyr::arrange(dplyr::desc(n))

tb_meets_m8 <- tstm_invest_stats_bridgechar_all_invest %>%
  dplyr::ungroup() %>%
  dplyr::count(bl_meets_size, bl_meets_start, bl_has_investloan_type_m8, name = "n") %>%
  dplyr::arrange(bl_meets_size, bl_meets_start, bl_has_investloan_type_m8)

ffv_render_table(
  tb_m8_avail,
  caption = "All-invest rows by investloan_type_m8_avail_2x2 (before analysis filter)",
  name = "tb_m8_avail"
)
ffv_render_table(
  tb_meets_m8,
  caption = "All-invest rows by size/start cuts and bridge-char match",
  name = "tb_meets_m8"
)
```

## Tables: `mth_inv_start` by bridge type

Supports MBF before/after subsample regressions (`R-script/ffv_invest_return_bridge/README.md`).

```{r table-mth-start}
tb_mth_m8 <- tstm_invest_stats_bridgechar %>%
  dplyr::filter(blnc_agg_stats == "mean", blnc_vars == "IS_1111", ivars == "agg_BS_1021") %>%
  dplyr::group_by(investloan_type_m8) %>%
  dplyr::summarize(
    n = dplyr::n(),
    mth_inv_start_min = min(mth_inv_start, na.rm = TRUE),
    mth_inv_start_mean = round(mean(mth_inv_start, na.rm = TRUE), 1),
    mth_inv_start_max = max(mth_inv_start, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  dplyr::arrange(investloan_type_m8)

# Publication-style builder: month-of-start summary by bridge type.
f_tb_mth_m8 <- function(st_format) {
  bk <- kbl(
    tb_mth_m8,
    format = st_format,
    linesep = "",
    booktabs = TRUE,
    align = "c",
    caption = "Survey month of investment start by investloan_type_m8 (Stata revenue slice)",
    col.names = c("Bridge type (investloan_type_m8)", "N", "Min", "Mean", "Max"),
    digits = 4
  ) %>%
    kable_styling(
      bootstrap_options = c("striped", "hover", "condensed", "responsive"),
      full_width = FALSE, position = "left"
    ) %>%
    add_header_above(c(
      " " = 2,
      "mth_inv_start (survey month)" = 3
    )) %>%
    column_spec(1, width = "16em")
  return(bk)
}

f_tb_mth_m8(st_kableformat)
ffp_save_res_table(
  f_tb_mth_m8("latex"), "tb_mth_m8", spt_res,
  df = tb_mth_m8, bl_save = ls_save_res[["tb_mth_m8"]]
)
```

## Downstream analysis

This vignette builds the analysis dataset only; it does **not** run regressions.

- **Dependent variable:** `win_diff` from `ff_hfid_invest_winstats()` — difference between forward- and backward-window balance means (`win_forward - win_backward`) over `it_window_forward` / `it_window_backward` months (default 12 each).
- **Key regressor:** `capital_invest` (investment size at the spell).
- **Subgroups:** `investloan_type_m8` or collapsed dummies `investloan_type_m8_cat2`–`cat4`; loops over `blnc_vars` (revenue vs profit) and `ivars` (three paper asset definitions in `ar_st_vars_to_keep`).
- **Time heterogeneity:** split samples by `mth_gap_mbf`, `mth_gap_mbf_cat`, or `mth_gap_mbf_cat_int` (MBF start month 37; see step 6).
- **Next step:** replicate tables in `stata/invest_rev_diff_250524_v2.do` in R using `tstm_invest_stats_bridgechar` loaded from `data/` or `data-temp/` depending on `bl_replace_data_output`.

To refresh canonical `data/tstm_invest_stats_bridgechar.rda`, set `bl_replace_data_output <- TRUE` in the setup chunk and re-knit.

## Session info

```{r session-info}
sessionInfo()
```
