Skip to contents

Encapsulates the full investment-loan-bridge workflow that generates linked investment, loan, hook, and bridge classifications. This function orchestrates four main stages: (1) loan deduplication, hook and bridge identification; (2) investment identification; (3) investment-to-loan and investment-to-bridge linking; (4) investment-loan-bridge type classification. Returns all intermediate and final dataframes in a named list.

Usage

ffp_hfid_invest_loan_linked_abc_investloan_char_gateway(
  svr_lender_var = "forinfm4",
  svr_principal = "bf5klm_bm6h_joint",
  svr_principal_last = "bm6h",
  svr_principal_interest_sum = "bm6b",
  bl_filter_bridge_grvgr0 = TRUE,
  it_ll_grv_min = -1,
  bl_filter_loan_duration_a = FALSE,
  bl_filter_loan_duration_b = FALSE,
  bl_filter_lender_type = FALSE,
  bl_filter_bridge_informal = FALSE,
  bl_filter_loan_size = FALSE,
  bl_filter_loan_duration_more = FALSE,
  fl_sd_ithres = stats::qnorm(0.99),
  it_thres_invest_mth_gap = 2,
  it_gap_LBL_IL_min = -6,
  ar_st_vars_to_keep = c("agg_BS_1025", "agg_BS_3021", "agg_BS_1021", "agg_BS_1012",
    "agg_BS_1011"),
  fl_min_invest_size = 10000,
  it_mth_inv_start_min = 15,
  it_mth_inv_start_max = 145,
  bl_drop_afrombc = TRUE,
  bl_drop_cfromb = TRUE,
  bl_compare2baserda = FALSE,
  verbose = TRUE,
  verbose_detail = FALSE,
  it_verbose_detail_nrow = 100
)

Arguments

svr_lender_var

Character, variable name for defining formal/informal lender. Default: "forinfm4".

svr_principal

Character, variable name for principal amount. Default: "bf5klm_bm6h_joint".

svr_principal_last

Character, variable name for principal last amount. Default: "bm6h".

svr_principal_interest_sum

Character, variable name for principal interest sum. Default: "bm6b".

bl_filter_bridge_grvgr0

Logical, apply bridge gradient filter. Default: TRUE.

it_ll_grv_min

Numeric, minimum river gradient. Default: -1.

bl_filter_loan_duration_a

Logical, filter by loan A duration. Default: FALSE.

bl_filter_loan_duration_b

Logical, filter by loan B duration. Default: FALSE.

bl_filter_lender_type

Logical, filter by lender type. Default: FALSE.

bl_filter_bridge_informal

Logical, filter by informal bridges. Default: FALSE.

bl_filter_loan_size

Logical, filter by loan size. Default: FALSE.

bl_filter_loan_duration_more

Logical, additional duration filter. Default: FALSE.

fl_sd_ithres

Numeric, statistical threshold for investment identification (standard deviations). Default: qnorm(0.99).

it_thres_invest_mth_gap

Numeric, minimum month gap for investment thresholding. Default: 2.

it_gap_LBL_IL_min

Numeric, minimum gap for investment-loan-bridge linker. Default: -6.

ar_st_vars_to_keep

Character vector, investment variable names to keep. Default: c("agg_BS_1025", "agg_BS_3021", "agg_BS_1021", "agg_BS_1012", "agg_BS_1011").

fl_min_invest_size

Numeric, minimum investment size. Default: 10000.

it_mth_inv_start_min

Numeric, minimum investment start month. Default: 15.

it_mth_inv_start_max

Numeric, maximum investment start month. Default: 145.

bl_drop_afrombc

Logical, drop set A loans in B or C. Default: TRUE.

bl_drop_cfromb

Logical, drop set C loans in B. Default: TRUE.

bl_compare2baserda

Logical, compare results to base RDA files. Default: FALSE.

verbose

Logical, when TRUE show a cli step progress bar through Groups A-E, per-stage row counts, the invest-not-in-chars 2x2 cut table, and pipeline consistency summary. Default: TRUE.

verbose_detail

Logical, print detailed verbose output. Default: FALSE.

it_verbose_detail_nrow

Numeric, number of rows for detailed verbose output. Default: 100.

Value

A list containing all intermediate and final dataframes:

tstm_loans_pn_nd

Non-duplicate loans dataframe.

tstm_loans_hooks

Hooks dataframe.

tstm_loans_bridges

Bridges dataframe.

tstm_loans_bridges_1t2

Bridges 1-to-2 dataframe.

tstm_loans_bridges_type

Bridges with type classification.

tstm_invdates_uniq

Unique investment dates dataframe.

tstm_invest

Investment dataframe.

tstm_roster_invest_loan_bridge

Roster linking investments to loans and bridges.

tstm_roster_invest_loan_linker

Roster after linker processing.

tstm_roster_invest_loan_linked

Roster with investment-loan-bridge links.

tstm_roster_invest2loan2bridge_clean

Cleaned roster after filtering by criteria.

tstm_roster_invest2loan2bridge

Bridge characteristics for investments.

tstm_invest2loan2bridge_chars

Investment characteristics with loan-bridge types.

bl_invest_pipeline_consistency_ok

Logical: internal checks confirm every investment passing typing thresholds is in tstm_invest2loan2bridge_chars, all tstm_invest keys appear in the invest2loan roster, and invest/roster fields match within this run (see Details).

df_invest_pipeline_consistency

Per-ivars summary tibble from the consistency check.

n_invest_not_in_chars

Invest rows (ar_st_vars_to_keep) absent from tstm_invest2loan2bridge_chars.

df_unmatched_by_reason

Mutually exclusive exclusion reasons among those rows.

df_unmatched_cut_crosstab

2x2 table: bl_meets_size x bl_meets_start.

Details

The function implements the workflow described in PrjThaiHFID-issue-32.

Packaged data inputs (data/ folder)

Only two pre-built datasets are loaded directly from the package data/ folder. All other return-list objects are computed in this run from these inputs (and from each other). When bl_compare2baserda = TRUE, additional packaged benchmark objects are read for dplyr::all_equal() checks only; they are not used in the pipeline unless you set that flag.

tstm_loans_panel (Group A.1)

Monthly loan panel (loan x month). Feeds ffp_hfid_loan_non_duplicate() and everything downstream on the loan/bridge side.

tstm_asset_loan (Group B)

Household-month asset and loan aggregates. Subset to ar_st_vars_to_keep before ffp_hfid_invest_gateway(), so jump detection, tstm_invdates_uniq, and tstm_invest use only the analysis ivars (and hh_inv_ctr spans reflect those ivars only).

Optional compare-only benchmarks when bl_compare2baserda = TRUE: tstm_loans_hooks, tstm_loans_bridges_type, tstm_invdates_uniq, tstm_invest, tstm_roster_invest_loan_linked, tstm_roster_invest2loan2bridge, tstm_invest2loan2bridge_chars.

Derived pipeline (built in-run, not loaded from data/)

Group A

tstm_loans_pn_ndtstm_loans_hookststm_loans_bridges, tstm_loans_bridges_1t2tstm_loans_bridges_type.

Group B

Subset tstm_asset_loan to ar_st_vars_to_keep, then tstm_invdates_uniq, tstm_invest (investment spells with hh_inv_asset_ctr and household span hh_inv_ctr pooled across kept ivars only).

Group C

tstm_roster_invest_loan_bridgetstm_roster_invest_loan_linkertstm_roster_invest_loan_linked.

Group D

tstm_roster_invest2loan2bridge_clean (NA sync on loan/lender IDs) → tstm_roster_invest2loan2bridgetstm_invest2loan2bridge_chars.

ID keys used across stages

Household

hhid_Num in loan/roster objects; id in tstm_invest (same entity).

Non-duplicate loan

hh_loan_id_nd (created in Group A.1). Bridge triples use hh_loan_id_nd_1t2, hh_loan_id_nd_paired_1t2, hh_loan_id_nd_paired_2t3.

Bridge entity

bridge_id (Group A.4, on tstm_loans_bridges_type).

Investment spell

(id|hhid_Num, ivars, hh_inv_asset_ctr) — one row per asset-specific investment in tstm_invest.

Household investment span

hh_inv_ctr — shared across ivars for the same (id, mth_inv_start, mth_inv_end); used to link investments to loans in Group C.

ID-based merges (21 join operations in the computational pipeline)

Counts exclude bind_rows stacks and Group E diagnostic joins. Merge type and keys are as implemented in the nested functions below.

Group A — loans, hooks, bridges (4 joins)

  1. ffp_hfid_hook_pairs() left_join: hook-pair scaffold to tstm_loans_pn_nd (self) on hhid_Num + hh_loan_id_nd_within_paired (within-household loan index) to attach paired-loan dates and lender.

  2. ffp_hfid_bridge_from_hook() left_join: hooks to main-loan attributes on hhid_Num + hh_loan_id_nd.

  3. ffp_hfid_bridge_from_hook() left_join: result to paired-loan attributes on hhid_Num + hh_loan_id_nd_paired.

  4. ffp_hfid_bridge_from_hook() left_join: tstm_loans_bridges_1t2 to tstm_loans_bridges_2t3 on hhid_Num + hh_loan_id_nd_paired_1t2 = hh_loan_id_nd_2t3 (chain A–B with B–C into A–B–C).

ffp_hfid_loan_non_duplicate() and ffp_hfid_bridge_type() use grouping only (no joins).

Group B — investments (5 joins)

  1. ffp_hfid_invest_unique_dura() left_join: asset-specific investment spells to tstm_invdates_uniq on id + mth_inv_start + mth_inv_end to assign hh_inv_ctr.

  2. ffp_hfid_invest_combine() left_join: jump panel on id + ivars + mth_inv_start_prior = month for capital_prior.

  3. ffp_hfid_invest_combine() left_join: jump panel on id + ivars + mth_inv_end = month for capital_end.

  4. ffp_hfid_invest_combine() left_join: attach capital_prior on id + ivars + hh_inv_asset_ctr.

  5. ffp_hfid_invest_combine() left_join: attach capital_end on id + ivars + hh_inv_asset_ctr, then compute capital_invest.

Group C — roster and linking (8 joins; bridge linker computed but not consumed)

  1. ffp_hfid_invest_loan_or_bridge_linker() left_join (many-to-many): investment roster to bridge roster on hhid_Num only (candidate invest–bridge pairs; filtered by date overlap). Not passed to Group C.3 in this gateway.

  2. ffp_hfid_invest_loan_or_bridge_linker() left_join (many-to-many): investment roster to loan roster on hhid_Num only → tstm_roster_invest_loan_linker.

  3. ffp_hfid_invest_loan_linked() full_join (many-to-many): tstm_invest to loan linker on hhid_Num + hh_inv_ctr (invest2loan stack).

  4. ffp_hfid_invest_loan_linked() full_join (many-to-many): tstm_loans_pn_nd to loan linker on hhid_Num + hh_loan_id_nd (loan2invest stack).

  5. ffp_hfid_invest_loan_linked() left_join: invest2loan stack to tstm_loans_pn_nd on hhid_Num + hh_loan_id_nd (loan attributes).

  6. ffp_hfid_invest_loan_linked() left_join (many-to-many): loan2invest stack to tstm_invest on hhid_Num + hh_inv_ctr (investment attributes).

  7. ffp_hfid_invest_loan_linked() left_join (many-to-many): combined roster to triply-linked tstm_loans_bridges on hhid_Num + hh_loan_id_nd = hh_loan_id_nd_1t2.

  8. ffp_hfid_invest_loan_linked() left_join (many-to-many): rows without triple bridge to tstm_loans_bridges_1t2 on hhid_Num + hh_loan_id_nd = hh_loan_id_nd_1t2 (hook-only linkage).

ffp_hfid_invest_loan_bridge_roster() only bind_rows the three roster sources.

Group D — typing (4 joins)

  1. ffp_hfid_invest_loan_linked_abc_investloan_char() left_join: investment row to loan-set A aggregates on hhid_Num + ivars + hh_inv_asset_ctr.

  2. ffp_hfid_invest_loan_linked_abc_investloan_char() left_join: + loan-set B.

  3. ffp_hfid_invest_loan_linked_abc_investloan_char() left_join: + loan-set C.

  4. ffp_hfid_invest_loan_linked_abc_investloan_char() left_join: bridge-type fields to loan aggregates (same investment key).

ffp_hfid_invest_loan_linked_abc_distinct() and ffp_hfid_invest_loan_linked_abc_bridge_char() filter/aggregate without joins. The gateway's D.1b step only synchronizes forinfm4* with hh_loan_id_nd* via mutate, not a merge.

Group E — consistency diagnostics (optional, 5 joins) ffp_gateway_check_invest_pipeline_consistency() and ffp_gateway_summarize_invest_not_in_chars() use anti_join / inner_join / left_join on (id|hhid_Num, ivars, hh_inv_asset_ctr) to validate the run; they do not change pipeline outputs.

Investment inclusion logic (unit of observation: investment)

Each investment is identified by (hhid_Num, ivars, hh_inv_asset_ctr) in roster outputs (id in tstm_invest). The final file tstm_invest2loan2bridge_chars is built from the same run as tstm_invest; use objects from this function's return list together—do not mix with stale data/ .rda snapshots.

Pipeline stages for investments

  1. Group B — subset tstm_asset_loan to ar_st_vars_to_keep, then ffp_hfid_invest_gateway: investment spells for those ivars only.

  2. Group Cffp_hfid_invest_loan_linked: tstm_invest is full_joined to the loan linker on hh_inv_ctr (household investment span, not hh_inv_asset_ctr). Every invest row is kept in the invest2loan stack, including spells with no matched loan (NA loan IDs).

  3. Group D.1ffp_hfid_invest_loan_linked_abc_distinct: keeps merge_type == "invest2loan", ivars %in% ar_st_vars_to_keep, then applies typing thresholds on roster row values: capital_invest >= fl_min_invest_size and mth_inv_start in [it_mth_inv_start_min, it_mth_inv_start_max].

  4. Group D.3ffp_hfid_invest_loan_linked_abc_investloan_char: one row per investment with investloan_type_* (including 1-investment-no-loan when no set-A/B/C loan linkage averages are present).

What is excluded from tstm_invest2loan2bridge_chars

Only investments that fail the joint typing thresholds above, or whose ivars is not in ar_st_vars_to_keep. Investments without loans are not excluded for lacking a loan; they appear as 1-investment-no-loan when they pass the thresholds.

On a single gateway run, for each ivars in ar_st_vars_to_keep: pass thresholds on tstm_invest if and only if present in tstm_invest2loan2bridge_chars. The function runs an internal consistency check (see bl_invest_pipeline_consistency_ok in the return list) and prints a PASS/FAIL banner when verbose = TRUE.

NA Synchronization: After the abc_distinct step, the function synchronizes NA values between loan IDs and formal/informal indicators. When hh_loan_id_nd* is NA, the corresponding forinfm4* variable is set to NA to maintain consistency between loan identification and lender classification. This is critical for accurate bridge-type interpretation.

Parameter Groups:

  • Group A (Loans/Hooks/Bridges): svr_*, bl_filter_bridge_*, it_ll_grv_min, and loan duration/size/type filters control loan deduplication and bridge identification.

  • Group B (Investments): fl_sd_ithres and it_thres_invest_mth_gap control investment identification from asset data.

  • Group C (Linking): it_gap_LBL_IL_min controls investment-loan-bridge linker time gap tolerance.

  • Group D (Typing): ar_st_vars_to_keep, fl_min_invest_size, it_mth_inv_start_*, bl_drop_* control filtering and classification into investment-loan-bridge types.

See also

Used by vignette(s) ffv_invest_loan_bridge and ffv_invest_return_bridge. Related issue(s): PrjThaiHFID-#32.

Author

Fan Wang, http://fanwangecon.github.io

Examples

if (FALSE) { # \dontrun{
  # Run with default parameters
  ls_result <- ffp_hfid_invest_loan_linked_abc_investloan_char_gateway()
  
  # Access final investment-bridge characteristics
  df_chars <- ls_result$tstm_invest2loan2bridge_chars
  
  # Check distribution by investment variable
  df_chars %>% group_by(ivars) %>% tally()
} # }