Wrapper function for investment-loan-bridge gateway workflow
Source:R/ffp_investloan_type_wrapper.R
ffp_hfid_invest_loan_linked_abc_investloan_char_gateway.RdEncapsulates the full investment-loan-bridge workflow that generates linked investment, loan, hook, and bridge classifications. This function orchestrates four main stages: (1) loan deduplication, hook and bridge identification; (2) investment identification; (3) investment-to-loan and investment-to-bridge linking; (4) investment-loan-bridge type classification. Returns all intermediate and final dataframes in a named list.
Usage
ffp_hfid_invest_loan_linked_abc_investloan_char_gateway(
svr_lender_var = "forinfm4",
svr_principal = "bf5klm_bm6h_joint",
svr_principal_last = "bm6h",
svr_principal_interest_sum = "bm6b",
bl_filter_bridge_grvgr0 = TRUE,
it_ll_grv_min = -1,
bl_filter_loan_duration_a = FALSE,
bl_filter_loan_duration_b = FALSE,
bl_filter_lender_type = FALSE,
bl_filter_bridge_informal = FALSE,
bl_filter_loan_size = FALSE,
bl_filter_loan_duration_more = FALSE,
fl_sd_ithres = stats::qnorm(0.99),
it_thres_invest_mth_gap = 2,
it_gap_LBL_IL_min = -6,
ar_st_vars_to_keep = c("agg_BS_1025", "agg_BS_3021", "agg_BS_1021", "agg_BS_1012",
"agg_BS_1011"),
fl_min_invest_size = 10000,
it_mth_inv_start_min = 15,
it_mth_inv_start_max = 145,
bl_drop_afrombc = TRUE,
bl_drop_cfromb = TRUE,
bl_compare2baserda = FALSE,
verbose = TRUE,
verbose_detail = FALSE,
it_verbose_detail_nrow = 100
)Arguments
- svr_lender_var
Character, variable name for defining formal/informal lender. Default: "forinfm4".
- svr_principal
Character, variable name for principal amount. Default: "bf5klm_bm6h_joint".
- svr_principal_last
Character, variable name for principal last amount. Default: "bm6h".
- svr_principal_interest_sum
Character, variable name for principal interest sum. Default: "bm6b".
- bl_filter_bridge_grvgr0
Logical, apply bridge gradient filter. Default: TRUE.
- it_ll_grv_min
Numeric, minimum river gradient. Default: -1.
- bl_filter_loan_duration_a
Logical, filter by loan A duration. Default: FALSE.
- bl_filter_loan_duration_b
Logical, filter by loan B duration. Default: FALSE.
- bl_filter_lender_type
Logical, filter by lender type. Default: FALSE.
- bl_filter_bridge_informal
Logical, filter by informal bridges. Default: FALSE.
- bl_filter_loan_size
Logical, filter by loan size. Default: FALSE.
- bl_filter_loan_duration_more
Logical, additional duration filter. Default: FALSE.
- fl_sd_ithres
Numeric, statistical threshold for investment identification (standard deviations). Default: qnorm(0.99).
- it_thres_invest_mth_gap
Numeric, minimum month gap for investment thresholding. Default: 2.
- it_gap_LBL_IL_min
Numeric, minimum gap for investment-loan-bridge linker. Default: -6.
- ar_st_vars_to_keep
Character vector, investment variable names to keep. Default: c("agg_BS_1025", "agg_BS_3021", "agg_BS_1021", "agg_BS_1012", "agg_BS_1011").
- fl_min_invest_size
Numeric, minimum investment size. Default: 10000.
- it_mth_inv_start_min
Numeric, minimum investment start month. Default: 15.
- it_mth_inv_start_max
Numeric, maximum investment start month. Default: 145.
- bl_drop_afrombc
Logical, drop set A loans in B or C. Default: TRUE.
- bl_drop_cfromb
Logical, drop set C loans in B. Default: TRUE.
- bl_compare2baserda
Logical, compare results to base RDA files. Default: FALSE.
- verbose
Logical, when
TRUEshow a cli step progress bar through Groups A-E, per-stage row counts, the invest-not-in-chars 2x2 cut table, and pipeline consistency summary. Default: TRUE.- verbose_detail
Logical, print detailed verbose output. Default: FALSE.
- it_verbose_detail_nrow
Numeric, number of rows for detailed verbose output. Default: 100.
Value
A list containing all intermediate and final dataframes:
- tstm_loans_pn_nd
Non-duplicate loans dataframe.
- tstm_loans_hooks
Hooks dataframe.
- tstm_loans_bridges
Bridges dataframe.
- tstm_loans_bridges_1t2
Bridges 1-to-2 dataframe.
- tstm_loans_bridges_type
Bridges with type classification.
- tstm_invdates_uniq
Unique investment dates dataframe.
- tstm_invest
Investment dataframe.
- tstm_roster_invest_loan_bridge
Roster linking investments to loans and bridges.
- tstm_roster_invest_loan_linker
Roster after linker processing.
- tstm_roster_invest_loan_linked
Roster with investment-loan-bridge links.
- tstm_roster_invest2loan2bridge_clean
Cleaned roster after filtering by criteria.
- tstm_roster_invest2loan2bridge
Bridge characteristics for investments.
- tstm_invest2loan2bridge_chars
Investment characteristics with loan-bridge types.
- bl_invest_pipeline_consistency_ok
Logical: internal checks confirm every investment passing typing thresholds is in
tstm_invest2loan2bridge_chars, alltstm_investkeys appear in theinvest2loanroster, and invest/roster fields match within this run (see Details).- df_invest_pipeline_consistency
Per-
ivarssummary tibble from the consistency check.- n_invest_not_in_chars
Invest rows (
ar_st_vars_to_keep) absent fromtstm_invest2loan2bridge_chars.- df_unmatched_by_reason
Mutually exclusive exclusion reasons among those rows.
- df_unmatched_cut_crosstab
2x2 table:
bl_meets_sizexbl_meets_start.
Details
The function implements the workflow described in PrjThaiHFID-issue-32.
Packaged data inputs (data/ folder)
Only two pre-built datasets are loaded directly from the package data/ folder.
All other return-list objects are computed in this run from these inputs (and from
each other). When bl_compare2baserda = TRUE, additional packaged benchmark
objects are read for dplyr::all_equal() checks only; they are not used in the
pipeline unless you set that flag.
tstm_loans_panel(Group A.1)Monthly loan panel (loan x month). Feeds
ffp_hfid_loan_non_duplicate()and everything downstream on the loan/bridge side.tstm_asset_loan(Group B)Household-month asset and loan aggregates. Subset to
ar_st_vars_to_keepbeforeffp_hfid_invest_gateway(), so jump detection,tstm_invdates_uniq, andtstm_investuse only the analysis ivars (andhh_inv_ctrspans reflect those ivars only).
Optional compare-only benchmarks when bl_compare2baserda = TRUE:
tstm_loans_hooks, tstm_loans_bridges_type, tstm_invdates_uniq, tstm_invest,
tstm_roster_invest_loan_linked, tstm_roster_invest2loan2bridge,
tstm_invest2loan2bridge_chars.
Derived pipeline (built in-run, not loaded from data/)
- Group A
tstm_loans_pn_nd→tstm_loans_hooks→tstm_loans_bridges,tstm_loans_bridges_1t2→tstm_loans_bridges_type.- Group B
Subset
tstm_asset_loantoar_st_vars_to_keep, thentstm_invdates_uniq,tstm_invest(investment spells withhh_inv_asset_ctrand household spanhh_inv_ctrpooled across kept ivars only).- Group C
tstm_roster_invest_loan_bridge→tstm_roster_invest_loan_linker→tstm_roster_invest_loan_linked.- Group D
tstm_roster_invest2loan2bridge_clean(NA sync on loan/lender IDs) →tstm_roster_invest2loan2bridge→tstm_invest2loan2bridge_chars.
ID keys used across stages
- Household
hhid_Numin loan/roster objects;idintstm_invest(same entity).- Non-duplicate loan
hh_loan_id_nd(created in Group A.1). Bridge triples usehh_loan_id_nd_1t2,hh_loan_id_nd_paired_1t2,hh_loan_id_nd_paired_2t3.- Bridge entity
bridge_id(Group A.4, ontstm_loans_bridges_type).- Investment spell
(id|hhid_Num, ivars, hh_inv_asset_ctr)— one row per asset-specific investment intstm_invest.- Household investment span
hh_inv_ctr— shared acrossivarsfor the same(id, mth_inv_start, mth_inv_end); used to link investments to loans in Group C.
ID-based merges (21 join operations in the computational pipeline)
Counts exclude bind_rows stacks and Group E diagnostic joins. Merge type and keys
are as implemented in the nested functions below.
Group A — loans, hooks, bridges (4 joins)
ffp_hfid_hook_pairs()left_join: hook-pair scaffold totstm_loans_pn_nd(self) onhhid_Num+hh_loan_id_nd_within_paired(within-household loan index) to attach paired-loan dates and lender.ffp_hfid_bridge_from_hook()left_join: hooks to main-loan attributes onhhid_Num+hh_loan_id_nd.ffp_hfid_bridge_from_hook()left_join: result to paired-loan attributes onhhid_Num+hh_loan_id_nd_paired.ffp_hfid_bridge_from_hook()left_join:tstm_loans_bridges_1t2totstm_loans_bridges_2t3onhhid_Num+hh_loan_id_nd_paired_1t2=hh_loan_id_nd_2t3(chain A–B with B–C into A–B–C).
ffp_hfid_loan_non_duplicate() and ffp_hfid_bridge_type() use grouping only (no joins).
Group B — investments (5 joins)
ffp_hfid_invest_unique_dura()left_join: asset-specific investment spells totstm_invdates_uniqonid+mth_inv_start+mth_inv_endto assignhh_inv_ctr.ffp_hfid_invest_combine()left_join: jump panel onid+ivars+mth_inv_start_prior=monthforcapital_prior.ffp_hfid_invest_combine()left_join: jump panel onid+ivars+mth_inv_end=monthforcapital_end.ffp_hfid_invest_combine()left_join: attachcapital_prioronid+ivars+hh_inv_asset_ctr.ffp_hfid_invest_combine()left_join: attachcapital_endonid+ivars+hh_inv_asset_ctr, then computecapital_invest.
Group C — roster and linking (8 joins; bridge linker computed but not consumed)
ffp_hfid_invest_loan_or_bridge_linker()left_join(many-to-many): investment roster to bridge roster onhhid_Numonly (candidate invest–bridge pairs; filtered by date overlap). Not passed to Group C.3 in this gateway.ffp_hfid_invest_loan_or_bridge_linker()left_join(many-to-many): investment roster to loan roster onhhid_Numonly →tstm_roster_invest_loan_linker.ffp_hfid_invest_loan_linked()full_join(many-to-many):tstm_investto loan linker onhhid_Num+hh_inv_ctr(invest2loan stack).ffp_hfid_invest_loan_linked()full_join(many-to-many):tstm_loans_pn_ndto loan linker onhhid_Num+hh_loan_id_nd(loan2invest stack).ffp_hfid_invest_loan_linked()left_join: invest2loan stack totstm_loans_pn_ndonhhid_Num+hh_loan_id_nd(loan attributes).ffp_hfid_invest_loan_linked()left_join(many-to-many): loan2invest stack totstm_investonhhid_Num+hh_inv_ctr(investment attributes).ffp_hfid_invest_loan_linked()left_join(many-to-many): combined roster to triply-linkedtstm_loans_bridgesonhhid_Num+hh_loan_id_nd=hh_loan_id_nd_1t2.ffp_hfid_invest_loan_linked()left_join(many-to-many): rows without triple bridge totstm_loans_bridges_1t2onhhid_Num+hh_loan_id_nd=hh_loan_id_nd_1t2(hook-only linkage).
ffp_hfid_invest_loan_bridge_roster() only bind_rows the three roster sources.
Group D — typing (4 joins)
ffp_hfid_invest_loan_linked_abc_investloan_char()left_join: investment row to loan-set A aggregates onhhid_Num+ivars+hh_inv_asset_ctr.ffp_hfid_invest_loan_linked_abc_investloan_char()left_join: + loan-set B.ffp_hfid_invest_loan_linked_abc_investloan_char()left_join: + loan-set C.ffp_hfid_invest_loan_linked_abc_investloan_char()left_join: bridge-type fields to loan aggregates (same investment key).
ffp_hfid_invest_loan_linked_abc_distinct() and
ffp_hfid_invest_loan_linked_abc_bridge_char() filter/aggregate without joins.
The gateway's D.1b step only synchronizes forinfm4* with hh_loan_id_nd* via
mutate, not a merge.
Group E — consistency diagnostics (optional, 5 joins)
ffp_gateway_check_invest_pipeline_consistency() and
ffp_gateway_summarize_invest_not_in_chars() use anti_join / inner_join /
left_join on (id|hhid_Num, ivars, hh_inv_asset_ctr) to validate the run; they do
not change pipeline outputs.
Investment inclusion logic (unit of observation: investment)
Each investment is identified by (hhid_Num, ivars, hh_inv_asset_ctr) in roster
outputs (id in tstm_invest). The final file tstm_invest2loan2bridge_chars is
built from the same run as tstm_invest; use objects from this function's return
list together—do not mix with stale data/ .rda snapshots.
Pipeline stages for investments
Group B — subset
tstm_asset_loantoar_st_vars_to_keep, thenffp_hfid_invest_gateway: investment spells for those ivars only.Group C —
ffp_hfid_invest_loan_linked:tstm_investisfull_joined to the loan linker onhh_inv_ctr(household investment span, nothh_inv_asset_ctr). Every invest row is kept in theinvest2loanstack, including spells with no matched loan (NAloan IDs).Group D.1 —
ffp_hfid_invest_loan_linked_abc_distinct: keepsmerge_type == "invest2loan",ivars %in% ar_st_vars_to_keep, then applies typing thresholds on roster row values:capital_invest >= fl_min_invest_sizeandmth_inv_startin[it_mth_inv_start_min, it_mth_inv_start_max].Group D.3 —
ffp_hfid_invest_loan_linked_abc_investloan_char: one row per investment withinvestloan_type_*(including1-investment-no-loanwhen no set-A/B/C loan linkage averages are present).
What is excluded from tstm_invest2loan2bridge_chars
Only investments that fail the joint typing thresholds above, or whose ivars
is not in ar_st_vars_to_keep. Investments without loans are not excluded for
lacking a loan; they appear as 1-investment-no-loan when they pass the thresholds.
On a single gateway run, for each ivars in ar_st_vars_to_keep:
pass thresholds on tstm_invest if and only if present in
tstm_invest2loan2bridge_chars. The function runs an internal consistency check
(see bl_invest_pipeline_consistency_ok in the return list) and prints a PASS/FAIL
banner when verbose = TRUE.
NA Synchronization: After the abc_distinct step, the function synchronizes
NA values between loan IDs and formal/informal indicators. When hh_loan_id_nd*
is NA, the corresponding forinfm4* variable is set to NA to maintain consistency
between loan identification and lender classification. This is critical for
accurate bridge-type interpretation.
Parameter Groups:
Group A (Loans/Hooks/Bridges):
svr_*,bl_filter_bridge_*,it_ll_grv_min, and loan duration/size/type filters control loan deduplication and bridge identification.Group B (Investments):
fl_sd_ithresandit_thres_invest_mth_gapcontrol investment identification from asset data.Group C (Linking):
it_gap_LBL_IL_mincontrols investment-loan-bridge linker time gap tolerance.Group D (Typing):
ar_st_vars_to_keep,fl_min_invest_size,it_mth_inv_start_*,bl_drop_*control filtering and classification into investment-loan-bridge types.
See also
Used by vignette(s) ffv_invest_loan_bridge and ffv_invest_return_bridge.
Related issue(s):
PrjThaiHFID-#32.
Author
Fan Wang, http://fanwangecon.github.io
Examples
if (FALSE) { # \dontrun{
# Run with default parameters
ls_result <- ffp_hfid_invest_loan_linked_abc_investloan_char_gateway()
# Access final investment-bridge characteristics
df_chars <- ls_result$tstm_invest2loan2bridge_chars
# Check distribution by investment variable
df_chars %>% group_by(ivars) %>% tally()
} # }