Loan as unit of observation file with loan term, dates, categorization, etc.
Format
Loan dataframe with various loan attributes
- loan_id
Unique identifier for each loan record; mean=10473.179, sd=6046.384, min=1, max=20944, Q1=5236.75, median=10474.5, Q3=15709.25; total non-NA count = 20940
- S_region
Region identifier; unique values: NE (n=12112), CENTRAL (n=8828); total non-NA count = 20940
- provid_Num
Province identifier number; unique values: Buriram(NE) (n=6861), Lopburi(Central) (n=6315), Sisaket(NE) (n=5251), Chacheongsao(Central) (n=2513); total non-NA count = 20940
- vilid_Num
Village anonymized identifier number; mean=673.251, sd=244.643, min=118, max=985, Q1=439, median=716, Q3=870; total non-NA count = 20940
- hhid_Num
Household anonymized identifier number; mean=5424.94, sd=2609.484, min=1003, max=9996, Q1=3107, median=5364, Q3=7668; total non-NA count = 20940
- surveymonth
Survey month when data was collected; mean=78.241, sd=44.416, min=0, max=160, Q1=45, median=78, Q3=113; total non-NA count = 20940
- institutional
Logical indicator for institutional lender; unique values: TRUE (n=16605), FALSE (n=4335); total non-NA count = 20940
- forinfm3
Three-category lender classification (Formal/Quasi-formal/Informal); unique values: Formal (n=10868), Quasi-formal (n=5737), Informal (n=4335); total non-NA count = 20940
- formal
Logical indicator for formal lender (quasi-formal treated as informal); unique values: TRUE (n=12795), FALSE (n=8145); total non-NA count = 20940
- formalqf
Logical indicator for formal lender (quasi-formal treated as formal); unique values: TRUE (n=13873), FALSE (n=7067); total non-NA count = 20940
- G_LenderType
Type of lender providing the loan; unique values: Village Fund (n=7408), Other Non-Indi Formal or Informal (n=4659), BAAC (n=3396), Relatives (n=1363), Others (n=1349), Neighbor (n=998), MoneyLender (n=625), Agri Coop (n=577), PCG (n=501), Commercial Bank (n=64); total non-NA count = 20940
- calenderyear
Variable: calenderyear; unique values: 2005 (n=2013), 2004 (n=2000), 2006 (n=1829), 2003 (n=1784), 2002 (n=1680), 2001 (n=1551), 2007 (n=1488), 2008 (n=1470), 1998 (n=1438), 2009 (n=1381), 2010 (n=1311), 2011 (n=1237), 2000 (n=933), 1999 (n=825); total non-NA count = 20940
- calendermonth
Variable: calendermonth; unique values: 1998m8 (n=1098), 2005m10 (n=352), 2004m10 (n=323), 2008m10 (n=273), 2006m1 (n=239), 2003m10 (n=238), 2004m1 (n=238), 2002m10 (n=234), 2001m9 (n=232), 2003m9 (n=231), 2007m10 (n=227), 2009m1 (n=223), 2003m12 (n=222), 2002m9 (n=220), 2004m9 (n=220); total non-NA count = 20940
- S_Init_Interest_Has
Variable: S_Init_Interest_Has; unique values: Has Interest (n=17332), No interest (n=3595); total non-NA count = 20927
- S_Init_Interest_Rate
Variable: interest rate value (percentage points); mean=8.151, sd=11.711, min=0, max=200, Q1=3, median=6, Q3=8; total non-NA count = 16915
- S_Init_Interest_MthAnnWk
Variable: is interest rate rate yearly, monthly, etc.; unique values: Yearly (n=13110), Monthly (n=3229), Other (n=758), Weekly (n=165); total non-NA count = 17262
- S_Init_Amount
Variable: initial loan amount in baht; mean=28701.324, sd=93815.657, min=63, max=7e+06, Q1=5000, median=13950, Q3=20000; total non-NA count = 20928
- S_Init_Interest_IntMthlyRat
Variable: interest rate converted to monthly fraction rate; mean=0.013, sd=0.03, min=0, max=1.17, Q1=0.005, median=0.006, Q3=0.01; total non-NA count = 16158
- G_Location
Variable: G_Location; unique values: Village (n=12585), Tambon or Amphoe (n=5795), Changwat and Out (n=2528); total non-NA count = 20908
- G_Loan_Init_Length
Variable: loan length/duration; mean=12.649, sd=13.477, min=0, max=360, Q1=7, median=12, Q3=13; total non-NA count = 18493
- G_Loan_Init_IntTotRat
Variable: total expected interest payment to principal ratio; mean=0.098, sd=0.35, min=0, max=18.6, Q1=0.01, median=0.06, Q3=0.08; total non-NA count = 20911
- G_Loan_Init_IntMthlyRat
Variable: G_Loan_Init_IntMthlyRat; mean=0.01, sd=0.076, min=0, max=9.0874996, Q1=0.003, median=0.005, Q3=0.008; total non-NA count = 17874
- G_Loan_Repaid_Length
Variable: G_Loan_Repaid_Length; mean=13.767, sd=17.153, min=1, max=160, Q1=6, median=12, Q3=13; total non-NA count = 20940
- G_Loan_Repaid_Status
Variable: G_Loan_Repaid_Status; unique values: Repaid On Time (n=14924), No Repay Date (n=2380), Not Due (n=1818), Repaid Eventually (n=1241), Past Due (n=559); total non-NA count = 20922
- G_Loan_Repaid_Amount
Variable: G_Loan_Repaid_Amount; mean=27615.408, sd=121757.406, min=0, max=13160000, Q1=3210, median=11660, Q3=21400; total non-NA count = 20940
- G_Loan_Repaid_Ratio
Variable: G_Loan_Repaid_Ratio; mean=1.022, sd=0.531, min=0, max=42.490707, Q1=1, median=1.055, Q3=1.07; total non-NA count = 20928
- G_Loan_Repaid_IntTotRat
Variable: G_Loan_Repaid_IntTotRat; mean=0.022, sd=0.531, min=-1, max=41.490707, Q1=0, median=0.055, Q3=0.07; total non-NA count = 20928
- G_Loan_Repaid_IntMthlyRat
Variable: G_Loan_Repaid_IntMthlyRat; mean=-0.008, sd=0.108, min=-1, max=3, Q1=0, median=0.005, Q3=0.008; total non-NA count = 20928
- A_Asset_Total
Variable: A_Asset_Total; mean=2360473.747, sd=9334882.36, min=5436.0918, max=144555968, Q1=422567.84, median=840814.345, Q3=1821477.1; total non-NA count = 20628
- A_Asset_Physical
Variable: A_Asset_Physical; mean=1693239.677, sd=8979261.076, min=0, max=141565648, Q1=143135.14, median=427096.13, Q3=991508.205; total non-NA count = 20628
- A_Asset_Land
Variable: A_Asset_Land; mean=1549614.621, sd=8945401.055, min=0, max=141015200, Q1=76372.977, median=333278.375, Q3=833670; total non-NA count = 20628
- DMh_popAtHome
Variable: number of individuals at home; mean=4.306, sd=1.702, min=1, max=15, Q1=3, median=4, Q3=5; total non-NA count = 18495
- EDh_gradeAtt_head
Variable: educational attainment of household head; unique values: Primary 4 (n=11868), primary 6 (n=1496), no school (n=1228), LowSec 3 (n=849), Primary 2 (n=477), Primary 3 (n=421), primary 7 (n=354), LowSec 2 (n=143), HighSec 3 (n=140), Uni 4 (n=77), HighSec 2 (n=74), HighTech T2 L2 (n=53), HighSec 1 (n=44), HighTech T1 L2 (n=43), LowTech 3 (n=41); total non-NA count = 17400
- CPI_COUNTRY_Rural_MoC
Variable: CPI_COUNTRY_Rural_MoC; mean=74.337, sd=10.473, min=63.299999, max=96.699997, Q1=65, median=70.3, Q3=82.1; total non-NA count = 19703
- CPI_NORTHEAST_Rural_MoC
Variable: CPI_NORTHEAST_Rural_MoC; mean=72.414, sd=10.953, min=61.099998, max=96.800003, Q1=63.1, median=67.3, Q3=81.7; total non-NA count = 19703
- calenderyear_grp
Calendar year grouped into periods; unique values: 2002-2006 (n=9306), 2007-2011 (n=6887), 1998-2001 (n=4747); total non-NA count = 20940
Details
The packaged object stores anonymized household and geography identifiers
(column names are unchanged). Values in hhid_Num and vilid_Num are derived
from tmid_hh and related codes in data-raw/tm_key_id.rda via
data-raw/id_anonymize/ (02d_anonymize_tstm_loans_raw.R), not the raw
composite survey codes. hhid_Num is the same entity as id in
tstm_asset_loan and hhid_Num in tstm_loans_panel. True IDs are backed up
in data-raw/id_anonymize/tstm_loans_raw_true_id.csv (see that folder's README
to restore).
Pipeline. Raw input built in data-raw/. Used by vignettes
ffv_gen_asset_loan (issue
#5),
ffv_loan_terms_dist / ffv_loan_terms_dist_comm (issue
#14), and
ffv_loan_overlap (issue
#36).
Examples
data(tstm_loans)
ffp_preview_dataset(tstm_loans)
#>
#> ── tstm_loans ──────────────────────────────────────────────────────────────────
#> Dimensions: 20940 rows × 36 columns(5.3 Mb)
#>
#> ── Column names (36) ──
#>
#> • 1. loan_id
#> • 2. S_region
#> • 3. provid_Num
#> • 4. vilid_Num
#> • 5. hhid_Num
#> • 6. surveymonth
#> • 7. institutional
#> • 8. forinfm3
#> • 9. formal
#> • 10. formalqf
#> • 11. G_LenderType
#> • 12. calenderyear
#> • 13. calendermonth
#> • 14. S_Init_Interest_Has
#> • 15. S_Init_Interest_Rate
#> • 16. S_Init_Interest_MthAnnWk
#> • 17. S_Init_Amount
#> • 18. S_Init_Interest_IntMthlyRat
#> • 19. G_Location
#> • 20. G_Loan_Init_Length
#> • 21. G_Loan_Init_IntTotRat
#> • 22. G_Loan_Init_IntMthlyRat
#> • 23. G_Loan_Repaid_Length
#> • 24. G_Loan_Repaid_Status
#> • 25. G_Loan_Repaid_Amount
#> • 26. G_Loan_Repaid_Ratio
#> • 27. G_Loan_Repaid_IntTotRat
#> • 28. G_Loan_Repaid_IntMthlyRat
#> • 29. A_Asset_Total
#> • 30. A_Asset_Physical
#> • 31. A_Asset_Land
#> • 32. DMh_popAtHome
#> • 33. EDh_gradeAtt_head
#> • 34. CPI_COUNTRY_Rural_MoC
#> • 35. CPI_NORTHEAST_Rural_MoC
#> • 36. calenderyear_grp
#>
#> ── Summary statistics (all variables) ──
#>
#> loan_id S_region provid_Num vilid_Num
#> Min. : 1 Length :20940 Length :20940 Min. :118.0
#> 1st Qu.: 5237 N.unique : 2 N.unique : 4 1st Qu.:439.0
#> Median :10474 N.blank : 0 N.blank : 0 Median :716.0
#> Mean :10473 Min.nchar: 2 Min.nchar: 11 Mean :673.3
#> 3rd Qu.:15709 Max.nchar: 7 Max.nchar: 21 3rd Qu.:870.0
#> Max. :20944 Max. :985.0
#>
#> hhid_Num surveymonth institutional forinfm3
#> Min. :1003 Min. : 0.00 Mode :logical Length :20940
#> 1st Qu.:3107 1st Qu.: 45.00 FALSE:4335 N.unique : 3
#> Median :5364 Median : 78.00 TRUE :16605 N.blank : 0
#> Mean :5425 Mean : 78.24 Min.nchar: 6
#> 3rd Qu.:7668 3rd Qu.:113.00 Max.nchar: 12
#> Max. :9996 Max. :160.00
#>
#> formal formalqf G_LenderType calenderyear
#> Mode :logical Mode :logical Length :20940 Min. :1998
#> FALSE:8145 FALSE:7067 N.unique : 10 1st Qu.:2002
#> TRUE :12795 TRUE :13873 N.blank : 0 Median :2005
#> Min.nchar: 3 Mean :2005
#> Max.nchar: 33 3rd Qu.:2008
#> Max. :2011
#>
#> calendermonth S_Init_Interest_Has S_Init_Interest_Rate
#> Length :20940 Length :20940 Min. : 0.000
#> N.unique : 161 N.unique : 2 1st Qu.: 3.000
#> N.blank : 0 N.blank : 0 Median : 6.000
#> Min.nchar: 6 Min.nchar: 11 Mean : 8.151
#> Max.nchar: 7 Max.nchar: 12 3rd Qu.: 8.000
#> NAs : 13 Max. :200.000
#> NAs :4025
#> S_Init_Interest_MthAnnWk S_Init_Amount S_Init_Interest_IntMthlyRat
#> Length :20940 Min. : 63 Min. :0.000000
#> N.unique : 4 1st Qu.: 5000 1st Qu.:0.005000
#> N.blank : 0 Median : 13950 Median :0.005833
#> Min.nchar: 5 Mean : 28701 Mean :0.013263
#> Max.nchar: 7 3rd Qu.: 20000 3rd Qu.:0.010000
#> NAs : 3678 Max. :7000000 Max. :1.170000
#> NAs :12 NAs :4782
#> G_Location G_Loan_Init_Length G_Loan_Init_IntTotRat
#> Length :20940 Min. : 0.00 Min. : 0.00000
#> N.unique : 3 1st Qu.: 7.00 1st Qu.: 0.01004
#> N.blank : 0 Median : 12.00 Median : 0.06000
#> Min.nchar: 7 Mean : 12.65 Mean : 0.09823
#> Max.nchar: 16 3rd Qu.: 13.00 3rd Qu.: 0.08000
#> NAs : 32 Max. :360.00 Max. :18.60000
#> NAs :2447 NAs :29
#> G_Loan_Init_IntMthlyRat G_Loan_Repaid_Length G_Loan_Repaid_Status
#> Min. :0.000000 Min. : 1.00 Length :20940
#> 1st Qu.:0.003417 1st Qu.: 6.00 N.unique : 5
#> Median :0.005000 Median : 12.00 N.blank : 0
#> Mean :0.010099 Mean : 13.77 Min.nchar: 7
#> 3rd Qu.:0.008333 3rd Qu.: 13.00 Max.nchar: 17
#> Max. :9.087500 Max. :160.00 NAs : 18
#> NAs :3066
#> G_Loan_Repaid_Amount G_Loan_Repaid_Ratio G_Loan_Repaid_IntTotRat
#> Min. : 0 Min. : 0.000 Min. :-1.00000
#> 1st Qu.: 3210 1st Qu.: 1.000 1st Qu.: 0.00000
#> Median : 11660 Median : 1.055 Median : 0.05500
#> Mean : 27615 Mean : 1.022 Mean : 0.02198
#> 3rd Qu.: 21400 3rd Qu.: 1.070 3rd Qu.: 0.07000
#> Max. :13160000 Max. :42.491 Max. :41.49071
#> NAs :12 NAs :12
#> G_Loan_Repaid_IntMthlyRat A_Asset_Total A_Asset_Physical
#> Min. :-1.000000 Min. : 5436 Min. : 0
#> 1st Qu.: 0.000000 1st Qu.: 422568 1st Qu.: 143135
#> Median : 0.004615 Median : 840814 Median : 427096
#> Mean :-0.007743 Mean : 2360474 Mean : 1693240
#> 3rd Qu.: 0.007768 3rd Qu.: 1821477 3rd Qu.: 991508
#> Max. : 3.000000 Max. :144555968 Max. :141565648
#> NAs :12 NAs :312 NAs :312
#> A_Asset_Land DMh_popAtHome EDh_gradeAtt_head CPI_COUNTRY_Rural_MoC
#> Min. : 0 Min. : 1.000 Length :20940 Min. :63.30
#> 1st Qu.: 76373 1st Qu.: 3.000 N.unique : 19 1st Qu.:65.00
#> Median : 333278 Median : 4.000 N.blank : 0 Median :70.30
#> Mean : 1549615 Mean : 4.306 Min.nchar: 5 Mean :74.34
#> 3rd Qu.: 833670 3rd Qu.: 5.000 Max.nchar: 14 3rd Qu.:82.10
#> Max. :141015200 Max. :15.000 NAs : 3540 Max. :96.70
#> NAs :312 NAs :2445 NAs :1237
#> CPI_NORTHEAST_Rural_MoC calenderyear_grp
#> Min. :61.10 Length :20940
#> 1st Qu.:63.10 N.unique : 3
#> Median :67.30 N.blank : 0
#> Mean :72.41 Min.nchar: 9
#> 3rd Qu.:81.70 Max.nchar: 9
#> Max. :96.80
#> NAs :1237
#> ── Sample rows (first 6) ──
#>
#> # A tibble: 6 × 36
#> loan_id S_region provid_Num vilid_Num hhid_Num surveymonth institutional
#> <int> <chr> <chr> <int> <int> <dbl> <lgl>
#> 1 1 NE Buriram(NE) 716 1003 0 TRUE
#> 2 2 NE Buriram(NE) 716 1003 0 FALSE
#> 3 3 NE Buriram(NE) 716 1003 1 FALSE
#> 4 4 NE Buriram(NE) 716 1003 2 FALSE
#> 5 5 NE Buriram(NE) 716 1003 3 FALSE
#> 6 6 NE Buriram(NE) 716 1003 9 FALSE
#> # ℹ 29 more variables: forinfm3 <chr>, formal <lgl>, formalqf <lgl>,
#> # G_LenderType <chr>, calenderyear <dbl>, calendermonth <chr>,
#> # S_Init_Interest_Has <chr>, S_Init_Interest_Rate <dbl>,
#> # S_Init_Interest_MthAnnWk <chr>, S_Init_Amount <dbl>,
#> # S_Init_Interest_IntMthlyRat <dbl>, G_Location <chr>,
#> # G_Loan_Init_Length <dbl>, G_Loan_Init_IntTotRat <dbl>,
#> # G_Loan_Init_IntMthlyRat <dbl>, G_Loan_Repaid_Length <dbl>, …