This is a work-in-progress website consisting of R panel data and optimization examples for Statistics/Econometrics/Economic Analysis.
Materials gathered from various projects in which R code is used. Files are from the R4Econ repository. This is not a R package, but a list of examples in PDF/HTML/Rmd formats. REconTools is a package that can be installed with tools used in projects involving R.
Bullet points show which base R, tidyverse or other functions/commands are used to achieve various objectives. An effort is made to use only base R and tidyverse packages whenever possible to reduce dependencies. The goal of this repository is to make it easier to find/re-use codes produced for various projects.
From other repositories: for research support toolboxes, see matlab toolbox, r toolbox, and python toolbox; for code examples, see matlab examples, stata examples, r examples, python examples, and latex examples; for packaging example, see pkgtestr for developing r packages; for teaching, see intro mathematics for economists, and intro statistics for undergraduates. see here for all of fan’s public repositories.
Please contact FanWangEcon for issues or problems.
1 Array, Matrix, Dataframe
1.1 List
- Multi-dimensional Named Lists: rmd | r | pdf | html
- Initiate Empty List. Named one and two dimensional lists. List of Dataframes.
- Collapse named and unamed list to string and print input code.
- r: deparse(substitute()) + vector(mode = “list”, length = it_N) + names(list) <- paste0(‘e’,seq()) + dimnames(ls2d)[[1]] <- paste0(‘r’,seq()) + dimnames(ls2d)[[2]] <- paste0(‘c’,seq())
- tidyr: unnest()
1.2 Array
- Basic Arrays Operations in R: rmd | r | pdf | html
- Generate N-dimensional array of NA values, label dimension elements.
- Basic array operations in R, rep, head, tail, na, etc.
- E notation.
- Get N cuts from M points.
- r: sum() + prod() + rep() + array(NA, dim=c(3, 3)) + array(NA, dim=c(3, 3, 3)) + dimnames(mn)[[3]] = paste0(‘k=’, 0:4) + head() + tail() + na_if() + Re()
- purrr: reduce()
- Generate Special Arrays: rmd | r | pdf | html
- Generate equi-distance, special log spaced array.
- Generate probability mass function with non-unique and non-sorted value and probability arrays.
- Generate a set of integer sequences, with gaps in between, e.g., (1,2,3), (5), (10,11).
- r: seq() + sort() + runif() + ceiling() + sample() + apply() + do.call()
- stats: aggregate()
- String Operations: rmd | r | pdf | html
- Split, concatenate, subset, replace, and substring strings.
- Convert number to string without decimal and negative sign.
- Concatenate numeric and string arrays as a single string.
- Regular expression
- r: paste0() + paste0(round(runif(3),3), collapse=’,’) + sub() + gsub() + grepl() + sprintf()
- Meshgrid Matrices, Arrays and Scalars: rmd | r | pdf | html
- Meshgrid Matrices, Arrays and Scalars to form all combination dataframe.
- tidyr: expand_grid() + expand.grid()
1.3 Matrix
- Matrix Basics: rmd | r | pdf | html
- Generate and combine NA, fixed and random matrixes. Name columns and rows.
- Sort all rows and all columns of a matrix.
- Replace values outside min and max in matrix by NA values.
- R: rep() + rbind() + matrix(NA) + matrix(NA_real_) + matrix(NA_integer_) + colnames() + rownames() + t(apply(mt, 1, sort)) + apply(mt, 2, sort) + colMeans + rowMeans + which()
- Linear Algebra Operations: rmd | r | pdf | html
1.4 Regular Expression, Date, etc.
- R String Regular Expression (Regex): rmd | r | pdf | html
- Regular expression.
- Find characters that that contain or not contain certain certain strings, numbers, and symbols.
- r: grepl()
2 Manipulate and Summarize Dataframes
2.1 Variables in Dataframes
- Generate Tibble Dataframes from Matrix and List: rmd | r | pdf | html
- Generate tibble data from two dimensional named lists, unlist for exporting.
- Generate tibble dataframe, rename tibble variables, generate tibble row and column names.
- Export tibble table to csv file with date and time stamp in file name.
- Rename numeric sequential columns with string prefix and suffix.
- base: Sys.time() + format() + sample(LETTERS, 5, replace = TRUE) + is.list
- dplyr: as_tibble(mt) + rename_all(~c(ar_names)) + rename_at(vars(starts_with(“xx”)), funs(str_replace(., “yy”, “yyyy”)) + rename_at(vars(num_range(‘‘,ar_it)), funs(paste0(st,.))) + rowid_to_column() + row_number() + min_rank() + dense_rank() + mutate_if()
- base: colnames + rownames
- Interact and Cut Variables to Generate Categorical Variables: rmd | r | pdf | html
- Convert rowname to variable name.
- Generate categorical variable from a continuous variable.
- Convert numeric variables to factor variables, generate interaction variables (joint factors), and label factors with descriptive words.
- Graph MPG and 1/4 Miles Time (qsec) from the mtcars dataset over joint shift-type (am) and engine-type (vs) categories.
- r: cut(breaks = ar, values = ar, right = FALSE)
- tibble: rownames_to_column()
- forcats: as_factor() + fct_recode() + fct_cross()
- Randomly Draw Subsets of Rows from Matrix: rmd | r | pdf | html
- Given matrix, randomly sample rows, or select if random value is below threshold.
- r: rnorm() + sample() + df[sample(dim(df)[1], it_M, replace=FALSE),]
- dplyr: case_when() + mutate(var = case_when(rnorm(n(),mean=0,sd=1) < 0 ~ 1, TRUE ~ 0)) %>% filter(var == 1)
- Generate Variables Conditional on Other Variables, Categorical from Continuous: rmd | r | pdf | html
- Use case_when to generate elseif conditional variables: NA, approximate difference, etc.
- Generate Categorical Variables from Continuous Variables.
- dplyr: case_when() + na_if() + mutate(var = na_if(case_when(rnorm(n())< 0 ~ -99, TRUE ~ mpg), -99))
- r: e-notation + all.equal() + isTRUE(all.equal(a,b,tol)) + is.na() + NA_real_ + NA_character_ + NA_integer_
- R Tibble Dataframe String Manipulations: rmd | r | pdf | html
- There are multiple CEV files, each containing the same file structure but simulated
- with different parameters, gather a subset of columns from different files, and provide
- with correct attributes based on CSV file names.
- r: cbind(ls_st, ls_st) + as_tibble(mt_st)
2.2 Counting Observation
- R Example Counting, Tabulation, and Cross Tabulation: rmd | r | pdf | html
- Uncount to generate panel skeleton from years in survey
- dplyr: tally() + spread() + distinct() + uncount(yr_n) + group_by() + mutate(yr = row_number() + start_yr)
2.3 Sorting, Indexing, Slicing
- Sorted Index, Interval Index and Expand Value from One Row: rmd | r | pdf | html
- Sort and generate index for rows
- Generate negative and positive index based on deviations
- Populate Values from one row to other rows
- dplyr: arrange() + row_number() + mutate(lowest = min(Sepal.Length)) + case_when(row_number()==x ~ Septal.Length) + mutate(Sepal.New = Sepal.Length[Sepal.Index == 1])
- R Within-group Ascending and Descending Sort, Selection, and Differencing: rmd | r | pdf | html
- Sort a dataframe by multiple variables, some in descending order.
- Select observations with the highest M values from within N groups (top scoring students from each class).
- dplyr: arrange(a, b, desc(c)) + group_by() + lag() + lead() + slice_head(n=1)
2.4 Advanced Group Aggregation
- Cummean Test, Cumulative Mean within Group: rmd | r | pdf | html
- There is a dataframe with a grouping variable and some statistics sorted by another within group
- variable, calculate the cumulative mean of that variable.
- dplyr: cummean() + group_by(id, isna = is.na(val)) + mutate(val_cummean = ifelse(isna, NA, cummean(val)))
- Count Unique Groups and Mean within Groups: rmd | r | pdf | html
- Unique groups defined by multiple values and count obs within group.
- Mean, sd, observation count for non-NA within unique groups.
- dplyr: group_by() + summarise(n()) + summarise_if(is.numeric, funs(mean = mean(., na.rm = TRUE), n = sum(is.na(.)==0)))
- By Groups, One Variable All Statistics: rmd | r | pdf | html
- Pick stats, overall, and by multiple groups, stats as matrix or wide row with name=(ctsvar + catevar + catelabel).
- tidyr: group_by() + summarize_at(, funs()) + rename(!!var := !!sym(var)) + mutate(!!var := paste0(var,’str’,!!!syms(vars))) + gather() + unite() + spread(varcates, value)
- By within Individual Groups Variables, Averages: rmd | r | pdf | html
- By Multiple within Individual Groups Variables.
- Averages for all numeric variables within all groups of all group variables. Long to Wide to very Wide.
- tidyr: gather() + group_by() + summarise_if(is.numeric, funs(mean(., na.rm = TRUE))) + mutate(all_m_cate = paste0(variable, ‘_c’, value)) + unite() + spread()
2.5 Distributional Statistics
- Tibble Basics: rmd | r | pdf | html
- input multiple variables with comma separated text strings
- quantitative/continuous and categorical/discrete variables
- histogram and summary statistics
- tibble: ar_one <- c(107.72,101.28) + ar_two <- c(101.72,101.28) + mt_data <- cbind(ar_one, ar_two) + as_tibble(mt_data)
2.6 Summarize Multiple Variables
- Apply the Same Function over Columns and Row Groups: rmd | r | pdf | html
- Compute row-specific quantiles, based on values across columns within each row.
- Sum values within-row across multiple columns, ignoring NA.
- Sum values within-group across multiple rows for matched columns, ignoring NA.
- Replace NA values in selected columns by alternative values.
- r: rowSums() + cumsum() + gsub() + mutate_at(vars(matches()), .funs = list(gs = ~sum(.))) + mutate_at(vars(contains()), .funs = list(cumu = ~cumsum(.))) + rename_at(vars(contains()), list(~gsub(“M”, “”, .)))
- dplyr: group_by(across(one_of(ar_st_vars))) + mutate(across(matches(), func) + rename_at() + mutate_at() + rename_at(vars(starts_with()), funs(str_replace(., “v”, “var”))) + mutate_at(vars(one_of()), list(~replace_na(., 99)))
- purrr: reduce()
3 Functions
3.1 Dataframe Mutate
- Nonlinear Function of Scalars and Arrays over Rows: rmd | r | pdf | html
- Five methods to evaluate scalar nonlinear function over matrix.
- Evaluate non-linear function with scalar from rows and arrays as constants.
- r: .$fl_A + fl_A=$`(., ‘fl_A’) + .[[svr_fl_A]]
- dplyr: rowwise() + mutate(out = funct(inputs))
- Evaluate Functions over Rows of Meshes Matrices: rmd | r | pdf | html
- Mesh states and choices together and rowwise evaluate many matrixes.
- Cumulative sum over multiple variables.
- Rename various various with common prefix and suffix appended.
- r: ffi <- function(fl_A, ar_B)
- tidyr: expand_grid() + rowwise() + df %>% rowwise() %>% mutate(var = ffi(fl_A, ar_B))
- ggplot2: geom_line() + facet_wrap() + geom_hline() + facet_wrap(. ~ var_id, scales = ‘free’) + geom_hline(yintercept=0, linetype=”dashed”, color=”red”, size=1) +
3.2 Dataframe Do Anything
- Dataframe Row to Array (Mx1 by N) to (MxQ by N+1): rmd | r | pdf | html
- Generate row value specific arrays of varying Length, and stack expanded dataframe.
- Given row-specific information, generate row-specific arrays that expand matrix.
- dplyr: do() + unnest() + left_join() + df %>% group_by(ID) %>% do(inc = rnorm(.$Q, mean=.$mean, sd=.$sd)) %>% unnest(c(inc))
- Simulate country-specific wage draws and compute country wage GINIs: Dataframe (Mx1 by N) to (MxQ by N+1) to (Mx1 by N: rmd | r | pdf | html
- Define attributes for M groups across N variables, simulate up to Q observations for each of the M Groups, then compute M-specific statistics based on the sample of observations within each M.
- Start with a matrix that is (Mx1 by N); Expand this to (MxQ by N+1), where, the additional column contains the MxQ specific variable; Compute statistics for each M based on the Q observations with M, and then present (Mx1 by N+1) dataframe.
- dplyr: group_by(ID) + do(inc = rnorm(.$N, mean=.$mn, sd=.$sd)) + unnest(c(inc)) + left_join(df, by=”ID”)
- Dataframe Subset to Dataframe (MxP by N) to (MxQ by N+Z-1): rmd | r | pdf | html
- Group by mini dataframes as inputs for function. Stack output dataframes with group id.
- dplyr: group_by() + do() + unnest()
3.3 Apply and pmap
- Apply and Sapply function over arrays and rows: rmd | r | pdf | html
- Evaluate function f(x_i,y_i,c), where c is a constant and x and y vary over each row of a matrix, with index i indicating rows.
- Get same results using apply and sapply with defined and anonymous functions.
- Convert list of list to table.
- r: do.call() + as_tibble(do.call(rbind,ls)) + apply(mt, 1, func) + sapply(ls_ar, func, ar1, ar2)
- Mutate rowwise, mutate pmap, and rowwise do unnest: rmd | r | pdf | html
- Evaluate function f(x_i,y_i,c), where c is a constant and x and y vary over each row of a matrix, with index i indicating rows.
- Get same results using various types of mutate rowwise, mutate pmap and rowwise do unnest.
- dplyr: rowwise() + do() + unnest()
- purrr: pmap(func)
- tidyr: unlist()
4 Multi-dimensional Data Structures
4.1 Generate, Gather, Bind and Join
- R dplyr Group by Index and Generate Panel Data Structure: rmd | r | pdf | html
- Build skeleton panel frame with N observations and T periods with gender and height.
- Generate group Index based on a list of grouping variables.
- r: runif() + rnorm() + rbinom(n(), 1, 0.5) + cumsum()
- dplyr: group_by() + row_number() + ungroup() + one_of() + mutate(var = (row_number()==1)1)*
- tidyr: uncount()
- R DPLYR Join Multiple Dataframes Together: rmd | r | pdf | html
- Join dataframes together with one or multiple keys. Stack dataframes together.
- dplyr: filter() + rename(!!sym(vsta) := !!sym(vstb)) + mutate(var = rnom(n())) + left_join(df, by=(c(‘id’=’id’, ‘vt’=’vt’))) + left_join(df, by=setNames(c(‘id’, ‘vt’), c(‘id’, ‘vt’))) + bind_rows()
- R Gather Data Columns from Multiple CSV Files: rmd | r | pdf | html
- There are multiple CEV files, each containing the same file structure but simulated
- with different parameters, gather a subset of columns from different files, and provide
- with correct attributes based on CSV file names.
- Separate numeric and string components of a string variable value apart.
- r: file() + writeLines() + readLines() + close() + gsub() + read.csv() + do.call(bind_rows, ls_df) + apply()
- tidyr: separate()
- regex: (?<=[A-Za-z])(?=[-0-9])
4.2 Wide and Long
- Convert Table from Long to Wide with dplyr: rmd | r | pdf | html
- Long attendance roster to wide roster and calculate cumulative attendance by each day for students.
- Convert long roster with attendance and test-scores to wide.
- tidyr: pivot_wider(id_cols = c(v1), names_from = v2, names_prefix = “id”, names_sep = “_”, values_from = c(v3, v4))
- dplyr: mutate(var = case_when(rnorm(n()) < 0 ~ 1, TRUE ~ 0)) + rename_at(vars(num_range(‘’, ar_it)), list(~paste0(st_prefix, . , ‘’))) + mutate_at(vars(contains(str)), list(~replace_na(., 0))) + mutate_at(vars(contains(str)), list(~cumsum(.)))
- Convert Table from Wide to Long with dplyr: rmd | r | pdf | html
- Given a matrix of values with row and column labels, create a table where the unit of observation are the row and column categories, and the values in the matrix is stored in a single variable.
- Reshape wide to long two sets of variables, two categorical variables added to wide table.
- tidyr: pivot_longer(cols = starts_with(‘zi’), names_to = c(‘zi’), names_pattern = paste0(“zi(.)”), values_to = “ev”) + pivot_longer(cols = matches(‘a line b’), names_to = c(‘va’, ‘vb’), names_pattern = paste0(“(.)_(.)”), values_to = “ev”)
- dplyr: left_join()
4.3 Within Panel Comparisons and Statistics
- Find Closest Values Along Grids: rmd | r | pdf | html
- There is an array (matrix) of values, find the index of the values closest to another value.
- r: do.call(bind_rows, ls_df)
- dplyr: left_join(tb, by=(c(‘vr_a’=’vr_a’, ‘vr_b’=’vr_b’)))
- Cross-group Within-time and Cross-time Within-group Statistics: rmd | r | pdf | html
- Compute relative values across countries at each time, and relative values within country across time.
- dplyr: arrange(v1, v2) %>% group_by(v1) %>% mutate(stats := v3/first(v3))
4.4 Join and Merge Files Together by Keys
- Mesh join: rmd | r | pdf | html
- Full join, expand multiple-rows of data-frame with the same set of expansion rows and columns
- dplyr: full_join()
5 Linear Regression
5.1 Linear and Polynomial Fitting
- Find Best Fit of Curves Through Points: rmd | r | pdf | html
- There are three x and y points, find the quadratic curve that fits through them exactly.
- There are N sets of x and y points, find the Mth order polynomial fit by regressing y on poly(x, M).
- stats: lm(y ~ poly(x, 2), dataset=df) + summary.lm(rs) + predict(rs)
- Fit a Time Series with Polynomial and Analytical Expressions for Coefficients: rmd | r | pdf | html
- Given a time series of data points from a polynomial data generating process, solve for the polynomial coefficients.
- Mth derivative of Mth order polynomial is time invariant, use functions of differences of differences of differences to identify polynomial coefficients analytically.
- R: matrix multiplication
5.2 OLS and IV
- IV/OLS Regression: rmd | r | pdf | html
- R Instrumental Variables and Ordinary Least Square Regression store all Coefficients and Diagnostics as Dataframe Row.
- aer: *library(aer) + ivreg(as.formula, diagnostics = TRUE) *
- M Outcomes and N RHS Alternatives: rmd | r | pdf | html
- There are M outcome variables and N alternative explanatory variables. Regress all M outcome variables on N endogenous/independent right hand side variables one by one, with controls and/or IVs, collect coefficients.
- dplyr: bind_rows(lapply(listx, function(x)(bind_rows(lapply(listy, regf.iv))) + starts_with() + ends_with() + reduce(full_join)
5.3 Decomposition
- Regression Decomposition: rmd | r | pdf | html
- Post multiple regressions, fraction of outcome variables’ variances explained by multiple subsets of right hand side variables.
- dplyr: gather() + group_by(var) + mutate_at(vars, funs(mean = mean(.))) + rowSums(matmat) + mutate_if(is.numeric, funs(frac = (./value_var)))*
6 Nonlinear and Other Regressions
6.1 Logit Regression
- Logit Regression: rmd | r | pdf | html
- Logit regression testing and prediction.
- stats: glm(as.formula(), data, family=’binomial’) + predict(rs, newdata, type = “response”)
- Estimate Logistic Choice Model with Aggregate Shares: rmd | r | pdf | html
- Aggregate share logistic OLS with K worker types, T time periods and M occupations.
- Estimate logistic choice model with aggregate shares, allowing for occupation-specific wages and occupation-specific intercepts.
- Estimate allowing for K and M specific intercepts, K and M specific coefficients, and homogeneous coefficients.
- Create input matrix data structures for logistic aggregate share estimation.
- stats: lm(y ~ . -1)
- Fit Prices Given Quantities Logistic Choice with Aggregate Data: rmd | r | pdf | html
- A multinomial logistic choice problem generates choice probabilities across alternatives, find the prices that explain aggregate shares.
- stats: lm(y ~ . -1)
6.2 Quantile Regression
- Quantile Regressions with Quantreg: rmd | r | pdf | html
- Quantile regression with continuous outcomes. Estimates and tests quantile coefficients.
- quantreg: rq(mpg ~ disp + hp + factor(am), tau = c(0.25, 0.50, 0.75), data = mtcars) + anova(rq(), test = “Wald”, joint=TRUE) + anova(rq(), test = “Wald”, joint=FALSE)
7 Optimization
7.1 Grid Based Optimization
- Find the Maximizing or Minimizing Point Given Some Objective Function: rmd | r | pdf | html
- Find the maximizing or minimizing point given some objective function.
- base: while + min + which.min + sapply
- Concurrent Bisection over Dataframe Rows: rmd | r | pdf | html
- Post multiple regressions, fraction of outcome variables’ variances explained by multiple subsets of right hand side variables.
- tidyr: pivot_longer(cols = starts_with(‘abc’), names_to = c(‘a’, ‘b’), names_pattern = paste0(‘prefix’, “(.)_(.)”), values_to = val) + pivot_wider(names_from = !!sym(name), values_from = val) + mutate(!!sym(abc) := case_when(efg < 0 ~ !!sym(opq), TRUE ~ iso))
- gglot2: geom_line() + facet_wrap() + geom_hline()
8 Mathematics
8.1 Basics
- Analytical Formula Fit Curves Through Points: rmd | r | pdf | html
- There are three pairs of points, formulas for the exact quadratic curve that fits through the points.
- There are three pairs of points, we observe only differences in y values, formulas for the linear and quadratic parameters.
- There are three pairs of points, formulas for the linear best fit line through the points.
- stats: lm(y ~ x + I(x^2), dataset=df) + lm(y ~ poly(x, 2), dataset=df) + summary.lm(rs) + predict(rs)
- Quadratic and Ratio Rescaling of Parameters with Fixed Min and Max: rmd | r | pdf | html
- For 0<theta<1, generate 0 < thetaHat(theta, lambda) < 1, where lambda is between positive and negative infinity, used to rescale theta.
- Fit a quadratic function for three points, where the starting and ending points are along the 45 degree line.
- r: sort(unique()) + sapply(ar, func, param=val)
- ggplot2: geom_line() + geom_vline() + labs(title, subtitle, x, y, caption) + scale_y_continuous(breaks, limits)
- Rescaling Bounded Parameter to be Unbounded and Positive and Negative Exponents with Different Bases: rmd | r | pdf | html
- Log of alternative bases, bases that are not e, 10 or 2.
- A parameter is constrained between 1 and negative infinity, use exponentials of different bases to scale the bounded parameter to an unbounded parameter.
- Positive exponentials are strictly increasing. Negative exponentials are strictly decreasing.
- A positive number below 1 to a negative exponents is above 1, and a positive number above 1 to a negative exponents is below 1.
- graphics: plot(x, y) + title() + legend()
- Find the Closest Point Along a Line to Another Point: rmd | r | pdf | html
- A line crosses through the origin, what is the closest point along this line to another point.
- Graph several functions jointly with points and axis.
- graphics: par(mfrow = c(1, 1)) + curve(fc) + points(x, y) + abline(v=0, h=0)
- linear solve x with f(x) = 0: rmd | r | pdf | html
- Evaluate and solve statistically relevant problems with one equation and one unknown that permit analytical solutions.
8.2 Production Functions
- Nested Constant Elasticity of Substitution Production Function: rmd | r | pdf | html
- A nested-CES production function with nest-specific elasticities.
- Re-state the nested-CES problem as several sub-problems.
- Marginal products and its relationship to prices in expenditure minimization.
- Latent Dynamic Health Production Function: rmd | r | pdf | html
- A model of latent health given lagged latent health and health inputs.
- Find individual-specific production function coefficient given self-rated discrete health status probabilities.
- Persistence of latent health status given observed discrete current and lagged outcomes.
8.3 Inequality Models
- GINI for Discrete Samples or Discrete Random Variable: rmd | r | pdf | html
- Given sample of data points that are discrete, compute the approximate GINI coefficient.
- Given a discrete random variable, compute the GINI coefficient.
- r: sort() + cumsum() + sum()
- CES and Atkinson Inequality Index: rmd | r | pdf | html
- Analyze how changing individual outcomes shift utility given inequality preference parameters.
- Discrete a continuous normal random variable with a binomial discrete random variable.
- Draw Cobb-Douglas, Utilitarian and Leontief indifference curve.
- r: apply(mt, 1, funct(x){}) + do.call(rbind, ls_mt)
- tidyr: expand_grid()
- ggplot2: geom_line() + facet_wrap()
- econ: Atkinson (JET, 1970)
9 Statistics
9.1 Random Draws
- Randomly Perturb Some Parameter Value with Varying Magnitudes: rmd | r | pdf | html
- Given some existing parameter value, with an intensity value between 0 and 1, decide how to perturb the value.
- r: matrix
- stats: qlnorm()
- graphics: par() + hist() + abline()
9.2 Distributions
- Integrate Normal Shocks: rmd | r | pdf | html
- Random Sampling (Monte Carlo) integrate shocks.
- Trapezoidal rule (symmetric rectangles) integrate normal shock.
9.3 Discrete Random Variable
- Binomial Approximation of Normal: rmd | r | pdf | html
- Approximate a continuous normal random variable with a discrete binomial random variable.
- r: hist() + plot()
- stats: dbinom() + rnorm()
10 Tables and Graphs
10.1 R Base Plots
- R Base Plot Line with Curves and Scatter: rmd | r | pdf | html
- Plot scatter points, line plot and functional curve graphs together.
- Set margins for legend to be outside of graph area, change line, point, label and legend sizes.
- Generate additional lines for plots successively, record successively, and plot all steps, or initial steps results.
- r: plot() + curve() + legend() + title() + axis() + par() + recordPlot()
10.2 ggplot Line Related Plots
- ggplot2 Basic Line Plot for Multiple Time Series: rmd | r | pdf | html
- Given three time series, present both in levels, in log levels, and as ratio
- ggplot: ggplot() + geom_line()
- ggplot Line Plot Multiple Categorical Variables With Continuous Variable: rmd | r | pdf | html
- One category is subplot, one category is line-color, one category is line-type.
- One category is subplot, one category is differentiated by line-color, line-type and scatter-shapes.
- One category are separate plots, two categories are subplots rows and columns, one category is differentiated by line-color, line-type and scatter-shapes.
- ggplot: ggplot() + facet_wrap() + facet_grid() + geom_line() + geom_point() + geom_smooth() + geom_hline() + scale_colour_manual() + scale_shape_manual() + scale_shape_discrete() + scale_linetype_manual() + scale_x_continuous() + scale_y_continuous() + theme_bw() + theme() + guides() + theme() + ggsave()
- dplyr: *filter(vara %in% c(1, 2) & varb == “val”) + mutate_if() + !any(is.na(suppressWarnings(as.numeric(na.omit(x))))) & is.character(x) *
- Time Series with Shaded Regions, plot GDP with recessions: rmd | r | pdf | html
- Plot several time series with multiple shaded windows.
- Plot GDP with shaded recession window, and differentially shaded pre- and post-recession windows.
- r: sample + pmin + diff + which
- ggplot: ggplot() + geom_line() + geom_rect(aes(xmin, xmax, ymin, ymax)) + theme_light() + scale_colour_manual() + scale_shape_discrete() + scale_linetype_manual() + scale_fill_manual()
10.3 ggplot Scatter Related Plots
- ggplot Scatter Plot Grouped or Unique Patterns and Colors: rmd | r | pdf | html
- Scatter Plot Three Continuous Variables and Multiple Categorical Variables
- Two continuous variables for the x-axis and the y-axis, another continuous variable for size of scatter, other categorical variables for scatter shape and size.
- Scatter plot with unique pattern and color for each scatter point.
- Y and X label axis with two layers of text in levels and deviation from some mid-point values.
- tibble: rownames_to_column()
- ggplot: ggplot() + geom_jitter() + geom_smooth() + geom_point(size=1, stroke=1) + scale_colour_manual() + scale_shape_discrete() + scale_linetype_manual() + scale_x_continuous() + scale_y_continuous() + theme_bw() + theme()
- ggplot Multiple Scatter-Lines and Facet Wrap Over Categories: rmd | r | pdf | html
- ggplot multiple lines with scatter as points and connecting lines.
- Facet wrap to generate subfigures for sub-categories.
- Generate separate plots from data saved separately.
- r: apply
- ggplot: facet_wrap() + geom_smooth() + geom_point() + facet_wrap() + scale_colour_manual() + scale_shape_manual() + scale_linetype_manual()
10.4 Write and Read Plots
- Base R Save Images At Different Sizes: rmd | r | pdf | html
- Base R store image core, add legends/titles/labels/axis of different sizes to save figures of different sizes.
- r: png() + setEPS() + postscript() + dev.off()
11 Get Data
11.1 Environmental Data
- CDS ECMWF Global Enviornmental Data Download: rmd | r | pdf | html
- Using Python API get get ECMWF ERA5 data.
- Dynamically modify a python API file, run python inside a Conda virtual environment with R-reticulate.
- r: file() + writeLines() + unzip() + list.files() + unlink()
- r-reticulate: use_python() + Sys.setenv(RETICULATE_PYTHON = spth_conda_env)
12 Coding and Development
12.1 Installation and Packages
- R, RTools, Rstudio Installation and Update with VSCode: rmd | r | pdf | html
- Install and update R, RTools, and Rstudio.
- Set-up R inside VSCode.
- installr: updateR()
- Handling R Packages: rmd | r | pdf | html
- Resolve conflicts between two packages with identically named function.
- tidyverse: tidyverse_conflicts
- dplyr: filter
- stats: filter
- conflicted: conflict_prefer()
12.2 Files In and Out
- Decompose File Paths to Get Folder and Files Names: rmd | r | pdf | html
- Decompose file path and get file path folder names and file name.
- Check if file name exists.
- r: .Platform$file.sep + tail() + strsplit() + basename() + dirname() + substring() + dir.exists() + file.exists()
- Save Text to File, Read Text from File, Replace Text in File: rmd | r | pdf | html
- Save data to file, read text from file, replace text in file.
- r: kable() + file() + writeLines() + readLines() + close() + gsub()
- Convert R Markdown File to R, PDF and HTML: rmd | r | pdf | html
- Find all files in a folder with a particula suffix, with exclusion.
- Convert R Markdow File to R, PDF and HTML.
- Modify markdown pounds hierarchy.
- r: file() + writeLines() + readLines() + close() + gsub()
12.3 Python with R
- Python in R with Reticulate: rmd | r | pdf | html
- Use Python in R with Reticulate
- reticulate: py_config() + use_condaenv() + py_run_string() + Sys.which(‘python’)
12.4 Command Line
- System and Shell Commands in R: rmd | r | pdf | html
- Run system executable and shell commands.
- Activate conda environment with shell script.
- r: system() + shell()
12.5 Run Code in Parallel in R
- Run Code in Parallel in R: rmd | r | pdf | html
- Running parallel code in R
- parallel: detectCores() + makeCluster()
- doParallel: registerDoParallel()
- foreach: *dopar *
Please contact for issues or problems.