Stata Recode a Discrete Variable with Alternative Labels and Values Subgroups: recode, inrange, inlist (DO, more see: Fan and Stata4Econ)

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      name:  stata_recode_discrete_subset
       log:  C:\Users\fan/Stata4Econ//gen/replace/fs_recode.smcl
  log type:  smcl
 opened on:  17 Apr 2020, 21:53:30


. log on $st_logname (log already on)

. . ///-- Site Link: Fan's Project Reusable Stata Codes Table of Content > di "https://fanwangecon.github.io/" https://fanwangecon.github.io/

. di "https://fanwangecon.github.io/Stata4Econ/" https://fanwangecon.github.io/Stata4Econ/

. . ///-- File Title > global filetitle "Stata Recode a Discrete Variable with Alternative Labels and Values Subgroups: recode, inrange, inlist"

. . ///--- Load Data > set more off

. sysuse auto, clear (1978 Automobile Data)

. . /////////////////////////////////////////////////////////////////////////////// > ///--- Recode Method 1 > /////////////////////////////////////////////////////////////////////////////// > . ///--- Recode Method 1a: recode > capture drop turn_m5

. recode turn /// > (min/35 = 1 "Turn <35") /// > (36 = 2 "Turn = 36") /// > (37 = 3 "Turn = 37") /// > (38/45 = 4 "Turn 38 to 45") /// > (46/max = 5 "Turn > 45") /// > (else =. ) /// > , gen(turn_m5) (74 differences between turn and turn_m5)

. tab turn_m5

RECODE of | turn (Turn | Circle (ft.) | ) | Freq. Percent Cum. --------------+----------------------------------- Turn <35 | 16 21.62 21.62 Turn = 36 | 9 12.16 33.78 Turn = 37 | 4 5.41 39.19 Turn 38 to 45 | 39 52.70 91.89 Turn > 45 | 6 8.11 100.00 --------------+----------------------------------- Total | 74 100.00

. . ///--- Recode Method 1b: egen cut > capture drop turn_m5_cut

. egen turn_m5_cut = cut(turn), at(31, 36, 37, 38, 46, 51) label (1 missing value generated)

. tab turn_m5_cut

turn_m5_cut | Freq. Percent Cum. ------------+----------------------------------- 31- | 16 21.92 21.92 36- | 9 12.33 34.25 37- | 4 5.48 39.73 38- | 39 53.42 93.15 46- | 5 6.85 100.00 ------------+----------------------------------- Total | 73 100.00

. . capture drop turn_m7_cut

. egen turn_m7_cut = cut(turn), at(31(3)52) label

. tab turn_m7_cut

turn_m7_cut | Freq. Percent Cum. ------------+----------------------------------- 31- | 4 5.41 5.41 34- | 21 28.38 33.78 37- | 8 10.81 44.59 40- | 17 22.97 67.57 43- | 18 24.32 91.89 46- | 5 6.76 98.65 49- | 1 1.35 100.00 ------------+----------------------------------- Total | 74 100.00

. . ///--- Recode Method 1c: inrange and inlist > capture drop turn_m5_alt

. clonevar turn_m5_alt = turn

. label variable turn_m5_alt "Recode using inlist and inrange"

. replace turn_m5_alt = 1 if inrange(turn, 31, 35) (16 real changes made)

. replace turn_m5_alt = 2 if inlist(turn, 36) (9 real changes made)

. replace turn_m5_alt = 3 if inlist(turn, 37) (4 real changes made)

. replace turn_m5_alt = 4 if inrange(turn, 38, 45) (39 real changes made)

. replace turn_m5_alt = 5 if inlist(turn, 46, 48, 51) (6 real changes made)

. label define turn_m5_alt 1 "Turn <35" 2 "Turn = 36" 3 "Turn = 37" 4 "Turn 38 to 45" 5 "Turn > 45", modify

. label values turn_m5_alt turn_m5_alt

. tab turn_m5_alt

Recode using | inlist and | inrange | Freq. Percent Cum. --------------+----------------------------------- Turn <35 | 16 21.62 21.62 Turn = 36 | 9 12.16 33.78 Turn = 37 | 4 5.41 39.19 Turn 38 to 45 | 39 52.70 91.89 Turn > 45 | 6 8.11 100.00 --------------+----------------------------------- Total | 74 100.00

. . ///--- compare > tab turn_m5 turn_m5_cut

RECODE of | turn (Turn | Circle (ft.) | turn_m5_cut ) | 31- 36- 37- 38- 46- | Total --------------+-------------------------------------------------------+---------- Turn <35 | 16 0 0 0 0 | 16 Turn = 36 | 0 9 0 0 0 | 9 Turn = 37 | 0 0 4 0 0 | 4 Turn 38 to 45 | 0 0 0 39 0 | 39 Turn > 45 | 0 0 0 0 5 | 5 --------------+-------------------------------------------------------+---------- Total | 16 9 4 39 5 | 73



. tab turn_m5 turn_m5_alt

RECODE of | turn (Turn | Circle (ft.) | Recode using inlist and inrange ) | Turn <35 Turn = 36 Turn = 37 Turn 38 t Turn > 45 | Total --------------+-------------------------------------------------------+---------- Turn <35 | 16 0 0 0 0 | 16 Turn = 36 | 0 9 0 0 0 | 9 Turn = 37 | 0 0 4 0 0 | 4 Turn 38 to 45 | 0 0 0 39 0 | 39 Turn > 45 | 0 0 0 0 6 | 6 --------------+-------------------------------------------------------+---------- Total | 16 9 4 39 6 | 74



. tab turn_m5 turn_m7_cut

RECODE of | turn (Turn | Circle (ft.) | turn_m7_cut ) | 31- 34- 37- 40- 43- 46- 49- | Total --------------+-----------------------------------------------------------------------------+---------- Turn <35 | 4 12 0 0 0 0 0 | 16 Turn = 36 | 0 9 0 0 0 0 0 | 9 Turn = 37 | 0 0 4 0 0 0 0 | 4 Turn 38 to 45 | 0 0 4 17 18 0 0 | 39 Turn > 45 | 0 0 0 0 0 5 1 | 6 --------------+-----------------------------------------------------------------------------+---------- Total | 4 21 8 17 18 5 1 | 74



. . /////////////////////////////////////////////////////////////////////////////// > ///--- Recode Method 2a: Recode based on single variable, > /// slightly less typing, compose ingredients together > /////////////////////////////////////////////////////////////////////////////// > /* > Define string using local strings to avoid some retyping. > try to make variable label not longer than width limit. > */ . . //-- Set Variable Strings . global svr_newv "trunk_new"

. global svr_oldv "trunk"

. global slb_labl "this is the new version of the trunk variable"

. global slb_note "we reset this variable be grouping values 5 to 10, 11 to 13, 14 "

. global slb_note "$slb_note to 18, 20 to 22, and 23 into subgroups. We did this "

. global slb_note "$slb_note test things out for reseting variables"

. . //-- value resetting . #delimit; delimiter now ; . global slb_valv " > (min/4 = 1 "trunk <5") > (5/10 = 2 "Turn = 36") > (11/13 = 3 "Turn = 37") > (14/18 = 4 "Turn 38 to 45") > (20/22 = 5 "Turn > 45") > (23 = 5 "Turn > 45") > (else =. ) > ";

. #delimit cr delimiter now cr . . //-- recode . * generate . capture drop $svr_newv

. recode $svr_oldv $slb_valv, gen($svr_newv) (74 differences between trunk and trunk_new)

. label variable $svr_newv "$slb_labl"

. notes $svr_newv: $slb_note

. * summ . d $svr_oldv $svr_newv, f

storage display value variable name type format label variable label -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- trunk int %8.0g Trunk space (cu. ft.) trunk_new int %13.0g trunk_new * this is the new version of the trunk variable

. notes $svr_oldv $svr_newv

trunk_new: 1. we reset this variable be grouping values 5 to 10, 11 to 13, 14 to 18, 20 to 22, and 23 into subgroups. We did this test things out for reseting variables

. summ $svr_oldv $svr_newv

Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- trunk | 74 13.75676 4.277404 5 23 trunk_new | 74 3.418919 1.020432 2 5

. tab $svr_oldv $svr_newv

Trunk | this is the new version of the trunk space (cu. | variable ft.) | Turn = 36 Turn = 37 Turn 38 t Turn > 45 | Total -----------+--------------------------------------------+---------- 5 | 1 0 0 0 | 1 6 | 1 0 0 0 | 1 7 | 3 0 0 0 | 3 8 | 5 0 0 0 | 5 9 | 4 0 0 0 | 4 10 | 5 0 0 0 | 5 11 | 0 8 0 0 | 8 12 | 0 3 0 0 | 3 13 | 0 4 0 0 | 4 14 | 0 0 4 0 | 4 15 | 0 0 5 0 | 5 16 | 0 0 12 0 | 12 17 | 0 0 8 0 | 8 18 | 0 0 1 0 | 1 20 | 0 0 0 6 | 6 21 | 0 0 0 2 | 2 22 | 0 0 0 1 | 1 23 | 0 0 0 1 | 1 -----------+--------------------------------------------+---------- Total | 19 15 30 10 | 74



. tab $svr_newv

this is the | new version | of the trunk | variable | Freq. Percent Cum. --------------+----------------------------------- Turn = 36 | 19 25.68 25.68 Turn = 37 | 15 20.27 45.95 Turn 38 to 45 | 30 40.54 86.49 Turn > 45 | 10 13.51 100.00 --------------+----------------------------------- Total | 74 100.00

. . . /////////////////////////////////////////////////////////////////////////////// > ///--- Recode Method 2b: same as method 2a, but do it for multiple variables loop loop > /////////////////////////////////////////////////////////////////////////////// > /* > 1. Define string using local strings to avoid some retyping. > 2. Summarize outputs iteration by iteration, verbose or not > 3. Summarize outputs at the end overall > 4. if new and old variables have the same name, understand we want to use the > same name, will relabel generate a new variable with the same variable name > and keep old variable as old_abc, where abc is the current var name > */ . global svr_newv_all ""

. foreach it_var of numlist 1 2 3 { 2. . //-- Variable by Variable Naming Settings . if (`it_var' == 1) { 3. //-- Set Variable Strings . global svr_newv "price_2m" 4. global svr_oldv "price" 5. global slb_labl "price discretized 2 levels" 6. global slb_note "reset the price variable into two groups, original variable has" 7. global slb_note "$slb_note 74 observations with 74 unique values. " 8. . //-- value resetting . #delimit; delimiter now ; . global slb_valv " > (min/6000 = 1 "price <= 6000") > (6001/max = 2 "price > 6000") > (else =. ) > "; 9. #delimit cr delimiter now cr . . //-- states verbose show or not . global bl_verbose_print = 0 10. } 11. if (`it_var' == 2) { 12. //-- Set Variable Strings . global svr_newv "price_3m" 13. global svr_oldv "price" 14. global slb_labl "price discretized 3 levels" 15. global slb_note "reset the price variable into two groups, original variable has" 16. global slb_note "$slb_note 74 observations with 74 unique values. " 17. . //-- value resetting . #delimit; delimiter now ; . global slb_valv " > (min/5500 = 1 "price <= 5500") > (5501/8500 = 2 "5501 <= price <= 8500") > (8501/max = 3 "8501 <= price") > (else =. ) > "; 18. #delimit cr delimiter now cr . . //-- states verbose show or not . global bl_verbose_print = 0 19. } 20. if (`it_var' == 3) { 21. //-- Set Variable Strings . * this is an example where I relabel and revalue names, but keep variable name . * auto keep an old version . global svr_newv "foreign" 22. global svr_oldv "foreign" 23. global slb_labl "is car domestic (relabled, previous 1 is foreign now 0)" 24. global slb_note "reseting the foreign variable previously 1 is foreign 0" 25. global slb_note "$slb_note is domestic, now 1 is domestic 0 is foreign" 26. . //-- value resetting . #delimit; delimiter now ; . global slb_valv " > (1 = 0 "foreign car") > (0 = 1 "domestic car") > (else =. ) > "; 27. #delimit cr delimiter now cr . . //-- states verbose show or not . global bl_verbose_print = 1 28. } 29. . //-- recode . di "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" 30. di "Generate the `it_var'th variable: Generates $svr_newv based on $svr_oldv" 31. di "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" 32. * generate . global svr_oldv_use "${svr_oldv}" 33. if ("$svr_newv" == "$svr_oldv") { 34. * allows for relabeling the same variable keeping name . global svr_oldv_use "_prev_${svr_oldv}" 35. clonevar _prev_${svr_oldv} = $svr_oldv 36. notes $svr_oldv_use: "this variable $svr_oldv_use is replaced by $svr_newv" 37. } 38. capture drop $svr_newv 39. recode $svr_oldv_use $slb_valv, gen($svr_newv) 40. label variable $svr_newv "$slb_labl" 41. notes $svr_newv: $slb_note 42. . //-- summarize . d $svr_newv, f 43. summ $svr_oldv_use $svr_newv 44. tab $svr_newv 45. pwcorr $svr_oldv_use $svr_newv, sig 46. if ($bl_verbose_print) { 47. d $svr_oldv_use $svr_newv, f 48. notes $svr_oldv_use $svr_newv 49. tab $svr_oldv_use $svr_newv 50. label list $svr_newv 51. } 52. . //-- Store all strings for easier later retrieval . global svr_newv_all `"$svr_newv_all $svr_newv"' 53. . } xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Generate the 1th variable: Generates price_2m based on price xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx (74 differences between price and price_2m)

storage display value variable name type format label variable label -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- price_2m int %13.0g price_2m * price discretized 2 levels

Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- price | 74 6165.257 2949.496 3291 15906 price_2m | 74 1.310811 .4659848 1 2

price | discretized 2 | levels | Freq. Percent Cum. --------------+----------------------------------- price <= 6000 | 51 68.92 68.92 price > 6000 | 23 31.08 100.00 --------------+----------------------------------- Total | 74 100.00

| price price_2m -------------+------------------ price | 1.0000 | | price_2m | 0.8001 1.0000 | 0.0000 | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Generate the 2th variable: Generates price_3m based on price xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx (74 differences between price and price_3m)

storage display value variable name type format label variable label -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- price_3m int %21.0g price_3m * price discretized 3 levels

Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- price | 74 6165.257 2949.496 3291 15906 price_3m | 74 1.581081 .7764824 1 3

price discretized 3 | levels | Freq. Percent Cum. ----------------------+----------------------------------- price <= 5500 | 44 59.46 59.46 5501 <= price <= 8500 | 17 22.97 82.43 8501 <= price | 13 17.57 100.00 ----------------------+----------------------------------- Total | 74 100.00

| price price_3m -------------+------------------ price | 1.0000 | | price_3m | 0.9085 1.0000 | 0.0000 | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Generate the 3th variable: Generates foreign based on foreign xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx (74 differences between _prev_foreign and foreign)

storage display value variable name type format label variable label -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- foreign byte %12.0g foreign * is car domestic (relabled, previous 1 is foreign now 0)

Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- _prev_fore~n | 74 .2972973 .4601885 0 1 foreign | 74 .7027027 .4601885 0 1

is car | domestic | (relabled, | previous 1 | is foreign | now 0) | Freq. Percent Cum. -------------+----------------------------------- foreign car | 22 29.73 29.73 domestic car | 52 70.27 100.00 -------------+----------------------------------- Total | 74 100.00

| _prev_~n foreign -------------+------------------ _prev_fore~n | 1.0000 | | foreign | -1.0000 1.0000 | 1.0000 |

storage display value variable name type format label variable label -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- _prev_foreign byte %8.0g origin * Car type foreign byte %12.0g foreign * is car domestic (relabled, previous 1 is foreign now 0)

_prev_foreign: 1. "this variable _prev_foreign is replaced by foreign"

foreign: 1. reseting the foreign variable previously 1 is foreign 0 is domestic, now 1 is domestic 0 is foreign

| is car domestic | (relabled, previous 1 | is foreign now 0) Car type | foreign c domestic | Total -----------+----------------------+---------- Domestic | 0 52 | 52 Foreign | 22 0 | 22 -----------+----------------------+---------- Total | 22 52 | 74

foreign: 0 foreign car 1 domestic car

. //-- recode . di "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

. di "We just finished Generating `it_var' Variables, here is their joint summary" We just finished Generating Variables, here is their joint summary

. di "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

. d $svr_newv_all, f

storage display value variable name type format label variable label -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- price_2m int %13.0g price_2m * price discretized 2 levels price_3m int %21.0g price_3m * price discretized 3 levels foreign byte %12.0g foreign * is car domestic (relabled, previous 1 is foreign now 0)

. summ $svr_newv_all

Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- price_2m | 74 1.310811 .4659848 1 2 price_3m | 74 1.581081 .7764824 1 3 foreign | 74 .7027027 .4601885 0 1

. pwcorr $svr_newv_all, sig

| price_2m price_3m foreign -------------+--------------------------- price_2m | 1.0000 | | price_3m | 0.8570 1.0000 | 0.0000 | foreign | -0.1381 -0.1233 1.0000 | 0.2406 0.2953 |

. . . ///--- End Log and to HTML > log close _all name: stata_recode_discrete_subset log: C:\Users\fan/Stata4Econ//gen/replace/fs_recode.smcl log type: smcl closed on: 17 Apr 2020, 21:53:30 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------