Chapter 5 Linear Regression

5.1 Linear and Polynomial Fitting

5.1.1 Fit Curves Through Points

Go back to fan’s REconTools research support package, R4Econ examples page, PkgTestR packaging guide, or Stat4Econ course page.

5.1.1.1 Polynomial Fit with N Sets of Points

There are three points defined by their x and y coordinates. We draw these points randomly, and we find a curve that fits through them. The fit is exact. If there are more than three sets of points, we will not be able to fit exactly. In the illustration before, first, we draw 3 sets of x and y points, then we draw 4 and 5 sets, and we compare the prediction results.

Note that we are not generating data from a particular set of quadratic (polynomial) parameters, we are just drawing random values. When we draw exactly three pairs of random x and y values, we can find some polynomial that provides an exact fit through the three points in 2d space.

We define the function here.

#' Test polynomial fit to random draw x and y points
#'
#' @description Draw random sets of x and y points, fit a polynomial curve through
#' and compare predictions of y and actual y values.
#'
#' @param it_xy_pairs The number of x and y pair points
#' @param it_seed The random seed for drawing values
#' @param it_poly_fit An integer value of the order of polynomial
#' @param fl_mean The mean of the the random normal draw
#' @param fl_sd The standard deviation of the random normal draw
#' @returns
#' \itemize{
#'   \item rs_lm_poly - polynomial fit estimation results
#'   \item df_data_predict - N by 3 where N = \code{it_xy_pairs} and
#' columns are x, y, y-predict, and residual
#'   \item td_table_return - Display version of df_data_predict with title
#' }
#' @import stats, tibble, dplyr
#' @author Fan Wang, \url{http://fanwangecon.github.io}
ffi_lm_quad_fit <- function(it_xy_pairs = 3, it_seed = 123,
                            it_poly_fit = 2, fl_mean = 1, fl_sd = 1,
                            verbose = FALSE) {

  # 1. Generate three pairs of random numbers
  set.seed(it_seed)
  mt_rnorm <- matrix(
    rnorm(it_xy_pairs * 2, mean = fl_mean, sd = fl_sd),
    nrow = it_xy_pairs, ncol = 2
  )
  colnames(mt_rnorm) <- c("x", "y")
  rownames(mt_rnorm) <- paste0("p", seq(1, it_xy_pairs))
  df_rnorm <- as_tibble(mt_rnorm)

  # 2. Quadratic fit using ORTHOGONAL POLYNOMIAL
  # For predictions, lm(y ~ x + I(x^2)) and lm(y ~ poly(x, 2)) are the same,
  # but they have different parameters because x is transformed by poly().
  rs_lm_quad <- stats::lm(y ~ poly(x, it_poly_fit), data = df_rnorm)
  if (verbose) print(stats::summary.lm(rs_lm_quad))

  # 3. Fit prediction
  ar_y_predict <- stats::predict(rs_lm_quad)
  df_data_predict <- cbind(df_rnorm, ar_y_predict) %>%
    mutate(res = ar_y_predict - y)
  if (verbose) print(df_data_predict)

  # 4. show values
  st_poly_order <- "Quadratic"
  if (it_poly_fit != 2) {
    st_poly_order <- paste0(it_poly_fit, "th order")
  }
  td_table_return <- kable(df_data_predict,
    caption = paste0(
      st_poly_order, " Fit of ", it_xy_pairs, " Sets of Random (X,Y) Points"
    )
  ) %>%
    kable_styling_fc()

  return(list(
    rs_lm_quad = rs_lm_quad,
    df_data_predict = df_data_predict,
    td_table_return = td_table_return
  ))
}

In the first example below, we simulate 3 set of points and estimate quadratic exact fit.

ls_ffi_lm_quad_fit <-
  ffi_lm_quad_fit(
    it_xy_pairs = 3, it_seed = 123,
    it_poly_fit = 2, fl_mean = 1, fl_sd = 1
  )
ls_ffi_lm_quad_fit$td_table_return

Table 5.1: Quadratic Fit of 3 Sets of Random (X,Y) Points
x	y	ar_y_predict
0.4395244	1.070508	1.070508
0.7698225	1.129288	1.129288
2.5587083	2.715065	2.715065

In the second example below, we simulate 4 set of points and estimate a quadratic non-exact fit.

ls_ffi_lm_quad_fit <-
  ffi_lm_quad_fit(
    it_xy_pairs = 4, it_seed = 345,
    it_poly_fit = 2, fl_mean = 1, fl_sd = 1
  )
ls_ffi_lm_quad_fit$td_table_return

Table 5.2: Quadratic Fit of 4 Sets of Random (X,Y) Points
x	y	ar_y_predict	res
0.2150918	0.9324684	0.9373687	0.0049003
0.7204856	0.3664796	1.5207575	1.1542779
0.8385421	0.0722760	-0.0080226	-0.0802987
0.7094034	2.7107710	1.6318915	-1.0788795

In the third example below, we simulate the same 4 sets of points as in the prior example, but now use a cubic polynomial to fit the data exactly.

ls_ffi_lm_cubic_fit <-
  ffi_lm_quad_fit(
    it_xy_pairs = 4, it_seed = 345,
    it_poly_fit = 3, fl_mean = 1, fl_sd = 1
  )
ls_ffi_lm_cubic_fit$td_table_return

Table 5.3: 3th order Fit of 4 Sets of Random (X,Y) Points
x	y	ar_y_predict
0.2150918	0.9324684	0.9324684
0.7204856	0.3664796	0.3664796
0.8385421	0.0722760	0.0722760
0.7094034	2.7107710	2.7107710

5.1.2 Polynomial Time Series

Go back to fan’s REconTools research support package, R4Econ examples page, PkgTestR packaging guide, or Stat4Econ course page.

5.1.2.1 Analytical Solution for Time-Series Polynomial Coefficients

This file is developed in support of Bhalotra, Fernandez, and Wang (2021), where we use polynomials to model share parameter changes over time in the context of a nested-CES problem.

5.1.2.1.1 Mth Order Polynomial and its Derivative

We have a polynomial of Mth order:

\[ y\left(t\right) = a_0 + a_1 \cdot t + a_2 \cdot t^2 + ... + a_M \cdot t^M \]

Taking derivative of of \(y\) with respect to \(t\), we have:

\[ \begin{aligned} \frac{d^{M-0}y}{dt^{M-0}} &= a_{M-0} \cdot \frac{\left(M-0\right)!}{0!} \cdot t^{-0+0} \\ \frac{d^{M-1}y}{dt^{M-1}} &= a_{M-0} \cdot \frac{\left(M-0\right)!}{1!} \cdot t^{-0+1} + a_{M-1} \cdot \frac{\left(M-1\right)!}{0!} \cdot t^{-1+1} \\ \frac{d^{M-2}y}{dt^{M-2}} &= a_{M-0} \cdot \frac{\left(M-0\right)!}{2!} \cdot t^{-0+2} + a_{M-1} \cdot \frac{\left(M-1\right)!}{1!} \cdot t^{-1+2} + a_{M-2} \cdot \frac{\left(M-2\right)!}{0!} \cdot t^{-2+2} \\ \frac{d^{M-3}y}{dt^{M-3}} &= a_{M-0} \cdot \frac{\left(M-0\right)!}{3!} \cdot t^{-0+3} + a_{M-1} \cdot \frac{\left(M-1\right)!}{2!} \cdot t^{-1+3} + a_{M-2} \cdot \frac{\left(M-2\right)!}{1!} \cdot t^{-2+3} + a_{M-3} \cdot \frac{\left(M-3\right)!}{0!} \cdot t^{-3+3} \end{aligned} \]

Given the structure above, we have the following formulation for polynomial derivative, where we have a polynomial of \(M^{th}\) order and we are interested in the \(N^{th}\) derivative:

\[ \begin{aligned} \frac{d^{N}y(M)}{dt^{N}} &= \sum_{i=0}^{M-N} \left( a_{M-i} \cdot \frac{\left(M-i\right)!}{\left(M-N-i\right)!} \cdot t^{M-N-i} \right) \end{aligned} \]

Note that the \(M^{th}\) derivative of a \(M^{th}\) order polynomial is equal to \(\left(a_M \cdot M!\right)\)

5.1.2.1.2 Repeatedly Taking Differences of Differences

Suppose the data generating process is a \(M^{th}\) order polynomial, then differencing of time-series observables can be used to identify polynomial coefficients.

In the simplest case of a 1st order polynomial with \(Y_t = a_0 + a_1 \cdot t\), given 2 periods of data, \(t=0,t=1\), we have \(a_1 = Y_2 - Y_1\), and \(a_0 = Y_0\). The rate of change in \(Y\) over \(t\) captures coefficient \(a_1\).

For a second order polynomial, while the first derivative is time-varying, the second derivative, acceleration, is invariant over \(t\). Similarly, for any \(M^{th}\) order polynomial, the \(M^{th}\) derivative is time-invariant.

Because of this time-invariance of the \(M^{th}\) derivative, the differencing idea can be used to identify \(a_M\), the coefficient for the highest polynomial term for the \(M^{th}\) order polynomial.

Now we difference observable \(y_t\) overtime. We difference the differences, difference the differences of the differences, and difference the differences of the differences of the differences, etc. It turns out that the difference is a summation over all observations \(\left\{y_t\right\}_{t=1}^{T}\), with the number of times each \(y_t\) term appearing following Pascal’s Triangle.

Specifically:

1st difference is \(y_1 - y_0\)
2nd difference is \((y_2-y_1) - (y_1-y_0) = y_2 - 2y_1 + y_0\)
3rd difference is \(((y_3-y_2) - (y_2-y_1)) - ((y_2-y_1) - (y_1-y_0)) = y_3 - 3y_2 + 3y_1 - y_0\)
4th difference is \((((y_4 - y_3) - (y_3 - y_2)) - ((y_3-y_2) - (y_2-y_1))) - (((y_3-y_2) - (y_2-y_1)) - ((y_2-y_1) - (y_1-y_0))) = y_4 - 4y_3 + 6y_2 - 4y_1 + y_0\)
5th difference is \(((((y_5-y_4) - (y_4-y_3)) - ((y_4-y_3) - (y_3-y_2))) - (((y_4-y_3) - (y_3-y_2)) - ((y_3-y_2) - (y_2-y_1)))) - ((((y_4-y_3) - (y_3-y_2)) - ((y_3-y_2) - (y_2-y_1))) - (((y_3-y_2) - (y_2-y_1)) - ((y_2-y_1) - (y_1-y_0)))) = y_5 - 5y_4 + 10y_3 - 10y_2 + 5y_1 - y_0\)

Note that the pattern has alternating signs, and the coefficients are binomial. We have the following formula:

\[ \begin{aligned} \Delta^M &= \sum_{i=0}^{M} \left(\left(-1\right)^i\cdot\frac{M!}{\left(M-i\right)!i!}\right) \cdot y_{\left(M-i\right)} \\ \end{aligned} \]

When there are \(T\) periods of data, and we are interested in the \(T-1\) difference, the differencing formula is unique. However, for less than \(T-1\) difference, we can use alternative consecutive data segments. Specifically, given \(T\) periods of data from \(t=1\) to \(t=T\), we have the notation \(\Delta^M_{\tau}\) where \(\tau\) is the starting time. We have, for \(M \le T-1\):

\[ \begin{aligned} \Delta^M_{\tau}\left(\left\{y_t\right\}_{t=1}^{T}\right) &= \sum_{i=0}^{M} \left(\left(-1\right)^i\cdot\frac{M!}{\left(M-i\right)!i!}\right) \cdot y_{\left(\tau + \left(M-i\right)\right)} \thinspace\thinspace\text{for}\thinspace\thinspace 1 \le \tau \le T-M \\ \end{aligned} \]

5.1.2.1.3 Solutions for Polynomial Coefficients with Differences of Differences

Intuitively, for a \(M^{th}\) order polynomial, the coefficient on the highest polynomial term is a function of the \((M-1)^{th}\) difference. Coefficients of lower polynomial terms, \(m<M\), are function of the \((m-1)^{th}\) difference along with higher order polynomial coefficients already computed.

Formally, we have, for a \(M^{th}\) order polynomial, a vector of \(\left\{a_m\right\}_{m=0}^{M}\) \(M+1\) polynomial coefficients. For formula for the coefficient for the largest polynomial is:

\[ \begin{aligned} a_M &= \sum_{i=0}^{M} \left( \left(-1\right)^i \left( \left(M-i\right)! i! \right)^{-1} \right) % \cdot y_{\left(M-i\right)} \end{aligned} \]

Given this, we have also, given \(T\) periods of data from \(t=1\) to \(t=T\): \[ \begin{aligned} a_{M-1} &= \sum_{i=0}^{M-1} \left( \left(-1\right)^i \left( \left(M-1-i\right)! i! \right)^{-1} \right) \cdot \left( y_{\left(\tau + \left(M-1-i\right)\right)} - a_M \cdot t^M \right) \thinspace\thinspace \text{for} \thinspace\thinspace 1 \le \tau \le T-M-1 \end{aligned} \]

Using one formula, given \(a_{m+1}\), we have:

\[ \begin{aligned} a_{m} &= \sum_{i=0}^{m} \left( \left(-1\right)^i \left( \left(m-i\right)! i! \right)^{-1} \right) \cdot \left( y_{\left(\tau + \left(m-i\right)\right)} - \sum_{j=0}^{M-m-1} a_{M-j} \cdot t^{M-j} \right) \thinspace\thinspace \text{for} \thinspace\thinspace 1 \le \tau \le T-m \end{aligned} \]

5.1.2.1.4 Identifying Polynomial Coefficients with Differences for Third Order Polynomial

To illustrate, we test the formulas with a 3rd order polynomial, and derive some 3rd-order specific formulas.

For data from a 3rd order polynomial data generating process, we can use the 3rd difference to identify the coefficient in front of \(x^3\). With this, we can iteratively to lower polynomials and identify all relevant coefficients.

Specifically, using equations from the two sections above, we have:

\[ \begin{aligned} \frac{d^{3}y(3)}{dt^{3}} &= y_3 - 3y_2 + 3y_1 - y_0 \\ \frac{d^{3}y(3)}{dt^{3}} &= 3! \cdot a_3 \end{aligned} \]

Combining the two equations we have, that \(a_3\) is the 3rd difference divided by \(6\):

\[ \begin{aligned} 3! \cdot a_3 &= y_3 - 3y_2 + 3y_1 - y_0\\ a_3 &= \frac{y_3 - 3y_2 + 3y_1 - y_0 }{3\cdot2} \end{aligned} \]

For the linear and cubic terms, we have:

\[ \begin{aligned} \frac{d^{2}y(M=3)}{dt^{2}} &= 3 \cdot a_2 + 6 \cdot a_3 \cdot t \\ \frac{d^{1}y(M=3)}{dt^{1}} &= a_1 + 2 \cdot a_2 + 3 \cdot a_3 \cdot t^2 \end{aligned} \]

Note that the 3rd derivative of a 3rd order polynomial is a constant, but the 2nd derivative of a 3rd order polynomial is not. This means that to use the second difference to identify \(a_2\) parameter, we first have to difference out from \(y_t\) the impact of the 3rd polynomial term, which we can because we know \(a_3\) now.

Differencing out the 3rd term, we have now the 2nd derivative of a 2nd order polynomial: \[ \begin{aligned} \frac{d^{2}\left(y(M=3) - a_3\cdot t^3\right)}{dt^{2}} &= \frac{d^{2}\left(\widehat{y}(M=2)\right)}{dt^{2}} \thinspace\thinspace, \\ \end{aligned} \] where \(\widehat{y}\left(M,2\right)=y\left(M\right)-a_3\cdot t^3\).

So this means we have:

\[ \begin{aligned} \frac{d^{2}\widehat{y}(3,2)}{dt^{2}} &= \left(1 \cdot y_2 - 2 \cdot y_1 + 1 \cdot y_0\right) - a_3\cdot\left(1\cdot 2^3 - 2 \cdot 1^3 + 1 \cdot 0^3 \right) \\ &= \left(1 \cdot y_2 - 2 \cdot y_1 + 1 \cdot y_0\right) - a_3\cdot \left(2^3 - 2\right) \\ \frac{d^{2}\widehat{y}(3,2)}{dt^{2}} &= 2! \cdot a_2 \end{aligned} \]

Given the value for \(a_3\), we have:

\[ \begin{aligned} 2! \cdot a_2 &= \left(1 \cdot y_2 - 2 \cdot y_1 + 1 \cdot y_0\right) - a_3\cdot 6 \\ 2! \cdot a_2 &= \left(1 \cdot y_2 - 2 \cdot y_1 + 1 \cdot y_0\right) - \frac{y_3 - 3y_2 + 3y_1 - y_0 }{3\cdot2} \cdot \left(2^3 - 2\right) \\ 2! \cdot a_2 &= \left(y_2 - 2y_1 + y_0\right) - \left(y_3 - 3y_2 + 3y_1 - y_0\right) \\ 2! \cdot a_2 & = - y_3 + 4y_2 - 5y_1 + 2y_0 \\ a_2 & = \frac{- y_3 + 4y_2 - 5y_1 + 2y_0}{2} \end{aligned} \]

Following the same strategy, we can also find \(a_1\). Let \(\widehat{y}\left(M,1\right)=y\left(M\right)-a_3\cdot t^3-a_2\cdot t^2\)

\[ \begin{aligned} \frac{d^{2}\widehat{y}(3,1)}{dt^{2}} &= \left(y_1 - y_0\right) - a_3\cdot\left(1^3 - 0^3 \right) - a_2\cdot\left(1^2 - 0^2 \right) \\ &= \left(y_1 - y_0\right) - a_3 - a_2 \\ \frac{d^{2}\widehat{y}(3,1)}{dt^{2}} &= 1! \cdot a_1 \end{aligned} \]

Hence:

\[ \begin{aligned} a_1 &= \left(y_1 - y_0\right) - a_3\cdot\left(1^3 - 0^3 \right) - a_2\cdot\left(1^2 - 0^2 \right) \\ &= \left(y_1 - y_0\right) - a_3 - a_2 \\ &= \left(y_1 - y_0\right) - \frac{y_3 - 3y_2 + 3y_1 - y_0 }{3\cdot2} - \frac{- y_3 + 4y_2 - 5y_1 + 2y_0}{2} \\ &= \frac{6y_1 - 6y_0}{6} - \frac{y_3 - 3y_2 + 3y_1 - y_0 }{6} - \frac{-3y_3 + 12y_2 - 15y_1 + 6y_0}{6} \\ &= \frac{6y_1 - 6y_0 -y_3 + 3y_2 - 3y_1 + y_0 +3y_3 - 12y_2 + 15y_1 - 6y_0}{6} \\ &= \frac{2y_3 - 9y_2 + 18y_1 - 11y_0}{6} \end{aligned} \]

Finally, we know that \(a_0 = y_0\). We have now analytical expressions for each of the 4 polynomial coefficients for a 3rd order polynomial. Given data from the data-generating process, these would back out the underlying parameters of the data generating process using data from four periods at \(t=0,1,2,3\).

5.1.2.1.5 Third Order Polynomial Simulation and Solving for Parameters

Now we generated a time-series of values and solve back for the underlying polynomial coefficients.

# polynomial coefficients
set.seed(123)
ar_coef_poly <- rnorm(4)
# time right hand side matrix
ar_t <- 0:3
ar_power <- 0:3
mt_t_data <- do.call(rbind, lapply(ar_power, function(power) {
  ar_t^power
}))
# Final matrix, each row is an observation, or time.
mt_t_data <- t(mt_t_data)
# General model prediction
ar_y <- mt_t_data %*% matrix(ar_coef_poly, ncol = 1, nrow = 4)
# Prediction and Input time matrix
mt_all_data <- cbind(ar_y, mt_t_data)
st_cap <- paste0(
  "C1=Y, each row is time, t=0, incremental by 1, ",
  "each column a polynomial term from 0th to higher."
)
kable(mt_all_data, caption = st_cap) %>% kable_styling_fc()

Table 5.4: C1=Y, each row is time, t=0, incremental by 1, each column a polynomial term from 0th to higher.
-0.5604756	1	0	0	0
0.8385636	1	1	1	1
5.7780698	1	2	4	8
14.6810933	1	3	9	27

Backing out coefficients using the formulas for 3rd order polynomial derived above, we have:

# The constant term
alpha_0 <- ar_y[1]
# The cubic term
alpha_3 <- as.numeric((t(ar_y) %*% c(-1, +3, -3, +1))/(3*2))
# The quadratic term, difference cubic out, alpha_2_1t3 = alpha_2_2t4
ar_y_hat <- ar_y - alpha_3*ar_t^3
alpha_2_1t3 <- as.numeric((t(ar_y_hat[1:3]) %*% c(1, -2, +1))/(2))
alpha_2_2t4 <- as.numeric((t(ar_y_hat[2:4]) %*% c(1, -2, +1))/(2))
alpha_2 <- alpha_2_1t3
# The linear term, difference cubic out and quadratic
ar_y_hat <- ar_y - alpha_3*ar_t^3 - alpha_2*ar_t^2
alpha_1_1t2 <- as.numeric((t(ar_y_hat[1:2]) %*% c(-1, +1))/(1))
alpha_1_2t3 <- as.numeric((t(ar_y_hat[2:3]) %*% c(-1, +1))/(1))
alpha_1_3t4 <- as.numeric((t(ar_y_hat[3:4]) %*% c(-1, +1))/(1))
alpha_1 <- alpha_1_1t2
# Collect results
ar_names <- c("Constant", "Linear", "Quadratic", "Cubic")
ar_alpha_solved <- c(alpha_0, alpha_1, alpha_2, alpha_3)
mt_alpha <- cbind(ar_names, ar_alpha_solved, ar_coef_poly)
# Display
ar_st_varnames <- c('Coefficient Counter', 'Polynomial Terms', 'Solved Coefficient Given Y', 'Actual DGP Coefficient')
tb_alpha <- as_tibble(mt_alpha) %>%
  rowid_to_column(var = "polynomial_term_coef") %>%
  rename_all(~c(ar_st_varnames))
# Display
st_cap = paste0('Solving for polynomial coefficients.')
kable(tb_alpha, caption = st_cap) %>% kable_styling_fc()

Table 5.5: Solving for polynomial coefficients.
Coefficient Counter	Polynomial Terms	Solved Coefficient Given Y	Actual DGP Coefficient
1	Constant	-0.560475646552213	-0.560475646552213
2	Linear	-0.230177489483281	-0.23017748948328
3	Quadratic	1.55870831414913	1.55870831414912
4	Cubic	0.0705083914245757	0.070508391424576

Note that given that the data is exact output from DGP, and we have the same number of data and parameters, parameters are exactly identified. However, this is only really true for the \(a_3\) parameter, which requires all four periods of data. - For \(a_2\), it is over-identified, we can arrived at it, given \(a_3\), either with difference of differences using data from \(t=1,2,3\) or using data from \(t=2,3,4\). - For \(a_1\), it is also over-identified, given \(a_3\) and \(a_2\). The difference of \(t=1,2\), \(t=2,3\) or \(t=3,4\) can all identify \(a_1\).

Note also that the solution above can be found by running a linear regression as well. The point of doing the exercise here and showing analytically how layers of differences of differences identify each polynomial coefficient is to show what in the underlying variation of the data is identifying each of the polynomial term.

In effect, all identification is based on the fact that the \(M^{th}\) order polynomial’s \(M^{th}\) derivative is a constant, it is invariant over \(t\). This is the core assumption, or restriction of the otherwise highly flexible polynomial functional form. With this core invariance at the max degree derivative condition, all other parameters are obtained through the simple act of differencing.

5.2 OLS and IV

Back to Fan’s R4Econ Homepage Table of Content

5.2.1 OLS and IV Regression

Go back to fan’s REconTools research support package, R4Econ examples page, PkgTestR packaging guide, or Stat4Econ course page.

IV regression using AER package. Option to store all results in dataframe row for combining results from other estimations together. Produce Row Statistics.

5.2.1.1 Construct Program

# IV regression function
# The code below uses the AER library's regresison function
# All results are stored in a single row as data_frame
# This functoin could work with dplyr do
# var.y is single outcome, vars.x, vars.c and vars.z are vectors of endogenous variables, controls and instruments.
regf.iv <- function(var.y, vars.x, 
                    vars.c, vars.z, df, transpose=TRUE) {
  
  # A. Set-Up Equation
  str.vars.x <- paste(vars.x, collapse='+')
  str.vars.c <- paste(vars.c, collapse='+')
  
  df <- df %>% 
    select(one_of(var.y, vars.x, vars.c, vars.z)) %>% 
    drop_na() %>% filter_all(all_vars(!is.infinite(.)))
  
  if (length(vars.z) >= 1) {
    #     library(AER)
    str.vars.z <- paste(vars.z, collapse='+')
    equa.iv <- paste(var.y,
                     paste(paste(str.vars.x, str.vars.c, sep='+'),
                           paste(str.vars.z, str.vars.c, sep='+'),
                           sep='|'),
                     sep='~')
    #     print(equa.iv)
    
    # B. IV Regression
    ivreg.summ <- summary(ivreg(as.formula(equa.iv), data=df),
                          vcov = sandwich, df = Inf, diagnostics = TRUE)
    
    # C. Statistics from IV Regression
    #     ivreg.summ$coef
    #     ivreg.summ$diagnostics
    
    # D. Combine Regression Results into a Matrix
    df.results <- suppressWarnings(suppressMessages(
      as_tibble(ivreg.summ$coef, rownames='rownames') %>%
        full_join(as_tibble(ivreg.summ$diagnostics, rownames='rownames')) %>%
        full_join(tibble(rownames=c('vars'),
                         var.y=var.y,
                         vars.x=str.vars.x,
                         vars.z=str.vars.z,
                         vars.c=str.vars.c))))
  } else {
    
    # OLS regression
    equa.ols <- paste(var.y,
                      paste(paste(vars.x, collapse='+'),
                            paste(vars.c, collapse='+'), sep='+'),
                      sep='~')
    
    lmreg.summ <- summary(lm(as.formula(equa.ols), data=df))
    
    lm.diagnostics <- as_tibble(
      list(df1=lmreg.summ$df[[1]],
           df2=lmreg.summ$df[[2]],
           df3=lmreg.summ$df[[3]],
           sigma=lmreg.summ$sigma,
           r.squared=lmreg.summ$r.squared,
           adj.r.squared=lmreg.summ$adj.r.squared)) %>%
      gather(variable, value) %>%
      rename(rownames = variable) %>%
      rename(v = value)
    
    df.results <- suppressWarnings(suppressMessages(
      as_tibble(lmreg.summ$coef, rownames='rownames') %>%
        full_join(lm.diagnostics) %>%
        full_join(tibble(rownames=c('vars'),
                         var.y=var.y,
                         vars.x=str.vars.x,
                         vars.c=str.vars.c))))
  }
  
  # E. Flatten Matrix, All IV results as a single tibble 
  # row to be combined with other IV results
  df.row.results <- df.results %>%
    gather(variable, value, -rownames) %>%
    drop_na() %>%
    unite(esti.val, rownames, variable) %>%
    mutate(esti.val = gsub(' ', '', esti.val))
  
  if (transpose) {
    df.row.results <- df.row.results %>% spread(esti.val, value)
  }
  
  # F. Return
  return(data.frame(df.row.results))
}

5.2.1.2 Program Testing

Load Data

# Library
library(tidyverse)
library(AER)

# Load Sample Data
setwd('C:/Users/fan/R4Econ/_data/')
df <- read_csv('height_weight.csv')

5.2.1.2.1 Example No Instrument, OLS

# One Instrucments
var.y <- c('hgt')
vars.x <- c('prot')
vars.z <- NULL
vars.c <- c('sex', 'hgt0', 'wgt0')
# Regression
regf.iv(var.y, vars.x, vars.c, vars.z, df, transpose=FALSE) %>%
  kable() %>%
  kable_styling_fc()

esti.val	value
(Intercept)_Estimate	52.1186286658655
prot_Estimate	0.37447238635789
sexMale_Estimate	0.611043720578337
hgt0_Estimate	0.148513781160842
wgt0_Estimate	0.00150560230505629
(Intercept)_Std.Error	1.57770483608693
prot_Std.Error	0.004181211911338
sexMale_Std.Error	0.118396259120663
hgt0_Std.Error	0.0393807494783184
wgt0_Std.Error	0.000187123663624396
(Intercept)_tvalue	33.0344608660336
prot_tvalue	89.5607288744324
sexMale_tvalue	5.16100529794268
hgt0_tvalue	3.77122790013451
wgt0_tvalue	8.04602836377986
(Intercept)_Pr(>\|t\|)	9.92126150965951e-233
prot_Pr(>\|t\|)	0
sexMale_Pr(>\|t\|)	2.48105505495376e-07
hgt0_Pr(>\|t\|)	0.000162939618371172
wgt0_Pr(>\|t\|)	9.05257561534482e-16
df1_v	5
df2_v	18958
df3_v	5
sigma_v	8.06197784622979
r.squared_v	0.319078711001326
adj.r.squared_v	0.318935041565942
vars_var.y	hgt
vars_vars.x	prot
vars_vars.c	sex+hgt0+wgt0

5.2.1.2.2 Example 1 Insturment

# One Instrucments
var.y <- c('hgt')
vars.x <- c('prot')
vars.z <- c('momEdu')
vars.c <- c('sex', 'hgt0', 'wgt0')
# Regression
regf.iv(var.y, vars.x, vars.c, vars.z, df, transpose=FALSE) %>%
  kable() %>%
  kable_styling_fc()

esti.val	value
(Intercept)_Estimate	43.4301969117894
prot_Estimate	0.130833343849431
sexMale_Estimate	0.868121847262594
hgt0_Estimate	0.412093881816612
wgt0_Estimate	0.000858630042618959
(Intercept)_Std.Error	1.82489550970971
prot_Std.Error	0.0192036220809207
sexMale_Std.Error	0.133730167005418
hgt0_Std.Error	0.0459431912925973
wgt0_Std.Error	0.000226910577025037
(Intercept)_zvalue	23.7987307660689
prot_zvalue	6.81295139521715
sexMale_zvalue	6.49159323361512
hgt0_zvalue	8.96963990141912
wgt0_zvalue	3.78400184723085
(Intercept)_Pr(>\|z\|)	3.44237661591474e-125
prot_Pr(>\|z\|)	9.56164541652958e-12
sexMale_Pr(>\|z\|)	8.49333228164569e-11
hgt0_Pr(>\|z\|)	2.97485394504032e-19
wgt0_Pr(>\|z\|)	0.000154326676599558
Weakinstruments_df1	1
Wu-Hausman_df1	1
Sargan_df1	0
Weakinstruments_df2	16394
Wu-Hausman_df2	16393
Weakinstruments_statistic	935.817456612075
Wu-Hausman_statistic	123.595856606734
Weakinstruments_p-value	6.3971492917806e-200
Wu-Hausman_p-value	1.30703637796418e-28
vars_var.y	hgt
vars_vars.x	prot
vars_vars.z	momEdu
vars_vars.c	sex+hgt0+wgt0

5.2.1.2.3 Example Multiple Instrucments

# Multiple Instrucments
var.y <- c('hgt')
vars.x <- c('prot')
vars.z <- c('momEdu', 'wealthIdx', 'p.A.prot', 'p.A.nProt')
vars.c <- c('sex', 'hgt0', 'wgt0')
# Regression
regf.iv(var.y, vars.x, vars.c, vars.z, df, transpose=FALSE) %>%
  kable() %>%
  kable_styling_fc()

esti.val	value
(Intercept)_Estimate	42.243761355532
prot_Estimate	0.266999451947032
sexMale_Estimate	0.695548488813011
hgt0_Estimate	0.424954881262903
wgt0_Estimate	0.00048695142032974
(Intercept)_Std.Error	1.8535668678938
prot_Std.Error	0.0154939347964086
sexMale_Std.Error	0.133157977814372
hgt0_Std.Error	0.0463195803786077
wgt0_Std.Error	0.000224867994873511
(Intercept)_zvalue	22.7905246297013
prot_zvalue	17.2325142357589
sexMale_zvalue	5.22348341593647
hgt0_zvalue	9.17441129192881
wgt0_zvalue	2.16549901022443
(Intercept)_Pr(>\|z\|)	5.6929407426237e-115
prot_Pr(>\|z\|)	1.51424021933765e-66
sexMale_Pr(>\|z\|)	1.75588197501936e-07
hgt0_Pr(>\|z\|)	4.54048595586446e-20
wgt0_Pr(>\|z\|)	0.0303494911144483
Weakinstruments_df1	4
Wu-Hausman_df1	1
Sargan_df1	3
Weakinstruments_df2	14914
Wu-Hausman_df2	14916
Weakinstruments_statistic	274.147084958342
Wu-Hausman_statistic	17.7562545747099
Sargan_statistic	463.729664547247
Weakinstruments_p-value	8.6173195623464e-228
Wu-Hausman_p-value	2.52567249124201e-05
Sargan_p-value	3.45452874915773e-100
vars_var.y	hgt
vars_vars.x	prot
vars_vars.z	momEdu+wealthIdx+p.A.prot+p.A.nProt
vars_vars.c	sex+hgt0+wgt0

5.2.1.2.4 Example Multiple Endogenous Variables

# Multiple Instrucments
var.y <- c('hgt')
vars.x <- c('prot', 'cal')
vars.z <- c('momEdu', 'wealthIdx', 'p.A.prot', 'p.A.nProt')
vars.c <- c('sex', 'hgt0', 'wgt0')
# Regression
regf.iv(var.y, vars.x, vars.c, vars.z, df, transpose=FALSE) %>%
  kable() %>%
  kable_styling_fc()

esti.val	value
(Intercept)_Estimate	44.0243196254372
prot_Estimate	-1.40256232471058
cal_Estimate	0.0651048957501503
sexMale_Estimate	0.120832787571903
hgt0_Estimate	0.286525437984394
wgt0_Estimate	0.000850481389651284
(Intercept)_Std.Error	2.75354847245259
prot_Std.Error	0.198640060272442
cal_Std.Error	0.00758881298876464
sexMale_Std.Error	0.209984580636194
hgt0_Std.Error	0.0707828182891216
wgt0_Std.Error	0.000337112104445047
(Intercept)_zvalue	15.9882130515846
prot_zvalue	-7.06082309271814
cal_zvalue	8.57906181724851
sexMale_zvalue	0.575436478268133
hgt0_zvalue	4.04795181810993
wgt0_zvalue	2.52284441417891
(Intercept)_Pr(>\|z\|)	1.54396598289425e-57
prot_Pr(>\|z\|)	1.65519210798224e-12
cal_Pr(>\|z\|)	9.56500647777971e-18
sexMale_Pr(>\|z\|)	0.564996139463126
hgt0_Pr(>\|z\|)	5.16677787150118e-05
wgt0_Pr(>\|z\|)	0.011640989283946
Weakinstruments(prot)_df1	4
Weakinstruments(cal)_df1	4
Wu-Hausman_df1	2
Sargan_df1	2
Weakinstruments(prot)_df2	14914
Weakinstruments(cal)_df2	14914
Wu-Hausman_df2	14914
Weakinstruments(prot)_statistic	274.147084958342
Weakinstruments(cal)_statistic	315.036848606229
Wu-Hausman_statistic	94.7020085425631
Sargan_statistic	122.0819796289
Weakinstruments(prot)_p-value	8.6173195623464e-228
Weakinstruments(cal)_p-value	1.18918641221312e-260
Wu-Hausman_p-value	1.35024050402095e-41
Sargan_p-value	3.09196773720141e-27
vars_var.y	hgt
vars_vars.x	prot+cal
vars_vars.z	momEdu+wealthIdx+p.A.prot+p.A.nProt
vars_vars.c	sex+hgt0+wgt0

5.2.1.2.5 Examples Line by Line

The examples are just to test the code with different types of variables.

# Selecting Variables
var.y <- c('hgt')
vars.x <- c('prot', 'cal')
vars.z <- c('momEdu', 'wealthIdx', 'p.A.prot', 'p.A.nProt')
vars.c <- c('sex', 'hgt0', 'wgt0')

# A. create Equation
str.vars.x <- paste(vars.x, collapse='+')
str.vars.c <- paste(vars.c, collapse='+')
str.vars.z <- paste(vars.z, collapse='+')
print(str.vars.x)

## [1] "prot+cal"

print(str.vars.c)

## [1] "sex+hgt0+wgt0"

print(str.vars.z)

## [1] "momEdu+wealthIdx+p.A.prot+p.A.nProt"

equa.iv <- paste(var.y,
                 paste(paste(str.vars.x, str.vars.c, sep='+'),
                       paste(str.vars.z, str.vars.c, sep='+'),
                       sep='|'),
                 sep='~')
print(equa.iv)

## [1] "hgt~prot+cal+sex+hgt0+wgt0|momEdu+wealthIdx+p.A.prot+p.A.nProt+sex+hgt0+wgt0"

# B. regression
res.ivreg <- ivreg(as.formula(equa.iv), data=df)
coef(res.ivreg)

##   (Intercept)          prot           cal       sexMale          hgt0          wgt0 
## 44.0243196254 -1.4025623247  0.0651048958  0.1208327876  0.2865254380  0.0008504814

# C. Regression Summary
ivreg.summ <- summary(res.ivreg, vcov = sandwich, df = Inf, diagnostics = TRUE)

ivreg.summ$coef

##                  Estimate   Std. Error    z value     Pr(>|z|)
## (Intercept) 44.0243196254 2.7535484725 15.9882131 1.543966e-57
## prot        -1.4025623247 0.1986400603 -7.0608231 1.655192e-12
## cal          0.0651048958 0.0075888130  8.5790618 9.565006e-18
## sexMale      0.1208327876 0.2099845806  0.5754365 5.649961e-01
## hgt0         0.2865254380 0.0707828183  4.0479518 5.166778e-05
## wgt0         0.0008504814 0.0003371121  2.5228444 1.164099e-02
## attr(,"df")
## [1] 0
## attr(,"nobs")
## [1] 14922

ivreg.summ$diagnostics

##                         df1   df2 statistic       p-value
## Weak instruments (prot)   4 14914 274.14708 8.617320e-228
## Weak instruments (cal)    4 14914 315.03685 1.189186e-260
## Wu-Hausman                2 14914  94.70201  1.350241e-41
## Sargan                    2    NA 122.08198  3.091968e-27

# D. Combine Regression Results into a Matrix
df.results <- suppressMessages(as_tibble(ivreg.summ$coef, rownames='rownames') %>%
    full_join(as_tibble(ivreg.summ$diagnostics, rownames='rownames')) %>%
    full_join(tibble(rownames=c('vars'),
                     var.y=var.y,
                     vars.x=str.vars.x,
                     vars.z=str.vars.z,
                     vars.c=str.vars.c)))
# E. Flatten Matrix, All IV results as a single tibble row to be combined with other IV results
df.row.results <- df.results %>%
    gather(variable, value, -rownames) %>%
    drop_na() %>%
    unite(esti.val, rownames, variable) %>%
    mutate(esti.val = gsub(' ', '', esti.val))

# F. Results as Single Colum
# df.row.results

# G. Results as Single Row
# df.row.results

# t(df.row.results %>% spread(esti.val, value)) %>%
#   kable() %>%
#   kable_styling_fc_wide()

5.2.2 IV Loop over RHS

Go back to fan’s REconTools research support package, R4Econ examples page, PkgTestR packaging guide, or Stat4Econ course page.

Regression with a Variety of Outcome Variables and Right Hand Side Variables. There are M outcome variables, and there are N alternative right hand side variables. Regress each M outcome variable and each N alternative right hand side variable, with some common sets of controls and perhaps shared instruments. The output file is a M by N matrix of coefficients, with proper variable names and row names. The matrix stores coefficients for this key endogenous variable.

Dependency: R4Econ/linreg/ivreg/ivregdfrow.R

5.2.2.1 Construct Program

The program relies on double lapply. lapply is used for convenience, not speed.

ff_reg_mbyn <- function(list.vars.y, list.vars.x,
                        vars.c, vars.z, df,
                        return_all = FALSE,
                        stats_ends = 'value', time = FALSE) {

  # regf.iv() function is from C:\Users\fan\R4Econ\linreg\ivreg\ivregdfrow.R
  if (time) {
    start_time <- Sys.time()
  }

  if (return_all) {
    df.reg.out.all <-
      bind_rows(lapply(list.vars.x,
                       function(x) (
                         bind_rows(
                           lapply(list.vars.y, regf.iv,
                                  vars.x=x, vars.c=vars.c, vars.z=vars.z, df=df))
                       )))

  } else {
    df.reg.out.all <-
      (lapply(list.vars.x,
              function(x) (
                bind_rows(
                  lapply(list.vars.y, regf.iv,
                         vars.x=x, vars.c=vars.c, vars.z=vars.z, df=df)) %>%
                  select(vars_var.y, starts_with(x)) %>%
                  select(vars_var.y, ends_with(stats_ends))
              ))) %>% reduce(full_join)
  }

  if (time) {
    end_time <- Sys.time()
    print(paste0('Estimation for all ys and xs took (seconds):',
                 end_time - start_time))
  }

  return(df.reg.out.all)
}

5.2.2.2 Prepare Data

# Library
library(tidyverse)
library(AER)

# Load Sample Data
setwd('C:/Users/fan/R4Econ/_data/')
df <- read_csv('height_weight.csv')

# Source Dependency
source('C:/Users/fan/R4Econ/linreg/ivreg/ivregdfrow.R')

# Setting
options(repr.matrix.max.rows=50, repr.matrix.max.cols=50)

Parameters.

var.y1 <- c('hgt')
var.y2 <- c('wgt')
var.y3 <- c('vil.id')
list.vars.y <- c(var.y1, var.y2, var.y3)

var.x1 <- c('prot')
var.x2 <- c('cal')
var.x3 <- c('wealthIdx')
var.x4 <- c('p.A.prot')
var.x5 <- c('p.A.nProt')
list.vars.x <- c(var.x1, var.x2, var.x3, var.x4, var.x5)

vars.z <- c('indi.id')
vars.c <- c('sex', 'wgt0', 'hgt0', 'svymthRound')

5.2.2.3 Program Testing

5.2.2.3.1 Test Program OLS Z-Stat

vars.z <- NULL
suppressWarnings(suppressMessages(
  ff_reg_mbyn(list.vars.y, list.vars.x,
              vars.c, vars.z, df,
              return_all = FALSE,
              stats_ends = 'value'))) %>%
  kable() %>%
  kable_styling_fc_wide()

vars_var.y	prot_tvalue	cal_tvalue	wealthIdx_tvalue	p.A.prot_tvalue	p.A.nProt_tvalue
hgt	18.8756010031766	23.4421863484656	13.5088996182178	3.83682180045522	32.5448257554849
wgt	16.359112505607	17.368603130933	14.1390521528125	1.36958319982295	12.0961557911473
vil.id	-14.9385580468905	-19.6150110809449	34.0972558327354	8.45943342783161	17.7801422421398

5.2.2.3.2 Test Program IV T-stat

vars.z <- c('indi.id')
suppressWarnings(suppressMessages(
  ff_reg_mbyn(list.vars.y, list.vars.x,
              vars.c, vars.z, df,
              return_all = FALSE,
              stats_ends = 'value'))) %>%
  kable() %>%
  kable_styling_fc_wide()

vars_var.y	prot_zvalue	cal_zvalue	wealthIdx_zvalue	p.A.prot_zvalue	p.A.nProt_zvalue
hgt	8.87674929306359	12.0739764946734	4.62589553677888	26.6373587567245	32.1162192385715
wgt	5.60385871757577	6.12251870088371	5.17869536991513	11.9295584469951	12.3509307017258
vil.id	-9.22106223346427	-13.0586007975956	-51.5866689219473	-29.9627476577358	-38.3528894620791

5.2.2.3.3 Test Program OLS Coefficient

vars.z <- NULL
suppressWarnings(suppressMessages(
  ff_reg_mbyn(list.vars.y, list.vars.x,
              vars.c, vars.z, df,
              return_all = FALSE,
              stats_ends = 'Estimate'))) %>%
  kable() %>%
  kable_styling_fc_wide()

vars_var.y	prot_Estimate	cal_Estimate	wealthIdx_Estimate	p.A.prot_Estimate	p.A.nProt_Estimate
hgt	0.0494310938067476	0.00243408846205617	0.210456554881893	3.86952250259533e-05	0.00542428867316432
wgt	16.5557424523601	0.69907250036464	106.678721085982	0.00521731297924599	0.77951423205071
vil.id	-0.0758835879205561	-0.00395676177098467	0.451733304543376	0.000149388430455129	0.00526237555580908

5.2.2.3.4 Test Program IV coefficient

vars.z <- c('indi.id')
suppressWarnings(suppressMessages(
  ff_reg_mbyn(list.vars.y, list.vars.x,
              vars.c, vars.z, df,
              return_all = FALSE,
              stats_ends = 'Estimate'))) %>%
  kable() %>%
  kable_styling_fc_wide()

vars_var.y	prot_Estimate	cal_Estimate	wealthIdx_Estimate	p.A.prot_Estimate	p.A.nProt_Estimate
hgt	0.859205733631884	0.0238724384575233	0.144503490136916	0.00148073028434634	0.0141317656200728
wgt	98.9428234201429	2.71948246216963	69.1816142882735	0.221916473012471	2.1185694049434
vil.id	-6.0245137913613	-0.168054407187466	-1.91414470908346	-0.00520794333267228	-0.0494468877742103

5.2.2.3.5 Test Program OLS Return All

vars.z <- NULL
t(suppressWarnings(suppressMessages(
  ff_reg_mbyn(list.vars.y, list.vars.x,
            vars.c, vars.z, df,
            return_all = TRUE,
            stats_ends = 'Estimate')))) %>%
  kable() %>%
  kable_styling_fc_wide()

X.Intercept._Estimate	27.3528514188611	99.8738847289486	31.4646660224049	27.9038445914726	219.626705179383	30.510398789855	35.7840188807902	-2662.74787734015	29.2381039651118	23.9948407749741	-547.959546430028	22.3367814226239	24.4904444950826	-476.703973630535	22.7781908464511
X.Intercept._Pr…t..	5.68247182207379e-231	0.755297055538094	6.78164655341251e-84	8.24252674004467e-242	0.493216914827213	1.62608789535448e-79	2.26726906490865e-145	7.13318862989451e-05	1.53578035270423e-124	2.11912344055752e-165	0.0941551350855863	3.04337266226398e-49	2.34941965808017e-181	0.143844033032196	9.58029450711073e-52
X.Intercept._Std.Error	0.831272666092281	320.450650378664	1.61328519718754	0.828072565159453	320.522532223671	1.60831193651105	1.38461348429899	670.301542938553	1.22602177264147	0.866581042166723	327.343126852911	1.5098937308759	0.843371070670839	326.132837036935	1.50045265589571
X.Intercept._tvalue	32.904788686778	0.311666974652514	19.5034740771549	33.6973421962113	0.685214557790027	18.9704485163755	25.8440491058103	-3.9724627003943	23.8479483950095	27.6890903532572	-1.6739607509042	14.7936116071336	29.0387533397396	-1.46168652614562	15.1808794212527
adj.r.squared_v	0.814249026159781	0.60716936506893	0.037324751268097	0.81608922805658	0.607863678511207	0.045349871107604	0.935014931990565	0.92193683733695	0.0595431228127756	0.814690803458616	0.617300597776144	0.026113107419984	0.824542352656376	0.620250730454724	0.0385437355117912
df1_v	6	6	6	6	6	6	6	6	6	6	6	6	6	6	6
df2_v	18957	18962	18999	18957	18962	18999	25092	25102	30013	18587	18591	18845	18587	18591	18845
df3_v	6	6	6	6	6	6	6	6	6	6	6	6	6	6	6
hgt0_Estimate	0.603918173406165	56.3852027199177	-0.296844389234445	0.589847843438402	52.9707041800702	-0.273219210757899	0.439374451256045	47.1769696647486	-0.359081639820446	0.687269209411872	72.1055606233589	-0.108789161111504	0.622395388389211	62.7336220289252	-0.157811627494689
hgt0_Pr…t..	1.14533314567228e-183	1.52417506966924e-12	1.40290395213742e-13	7.79174951113908e-177	3.05720143843438e-11	8.49149153665201e-12	2.71000479248473e-36	0.00520266507060087	2.41020063625353e-31	1.31914432912238e-220	4.78613024243947e-19	0.0034801146146179	1.11511327164595e-190	8.38546282719406e-15	2.13723119924767e-05
hgt0_Std.Error	0.0206657538633712	7.96735224000553	0.0401060913799595	0.0205836398278422	7.96822145797113	0.0399777363511634	0.0348701896610764	16.8823489375742	0.0307984635553862	0.0213841849324283	8.0774490640068	0.0372288594891345	0.0208846437570216	8.07589192978209	0.0371223237183417
hgt0_tvalue	29.2231378249682	7.07703149319762	-7.40147890309685	28.6561486875879	6.64774497790597	-6.83428417151857	12.6002885423504	2.79445531182863	-11.6590764073245	32.1391351404586	8.92677379355594	-2.92217281443326	29.8015803204667	7.76801157994421	-4.25112470577148
prot_Estimate	0.0494310938067476	16.5557424523601	-0.0758835879205561	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
prot_Pr…t..	9.54769322340913e-79	9.61203373208916e-60	3.56396093563316e-50	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
prot_Std.Error	0.00261878251179546	1.01201959743756	0.00507971302734612	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
prot_tvalue	18.8756010031766	16.359112505607	-14.9385580468905	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
r.squared_v	0.814298005954592	0.607272921412825	0.0375780335372857	0.816137722617266	0.60796705182314	0.0456010419476621	0.93502787877066	0.921952383432195	0.0596997716363459	0.814740639193486	0.617403496088207	0.0263714328556816	0.824589538985803	0.620352835549783	0.0387987636986582
sexMale_Estimate	0.935177182449465	415.163616765342	-0.254089999175327	0.893484662055601	405.534891838043	-0.18138948961095	1.80682463132053	999.926876716704	-0.334367777515325	0.932686930233099	397.14194867534	-0.445232370682025	0.964669805007069	401.590563681033	-0.423829627017541
sexMale_Pr…t..	2.36432111723075e-51	2.48252880292341e-67	0.0343768259467531	2.0876593533594e-47	2.51355675685191e-64	0.129768754080749	1.26527362034417e-66	2.64630894140163e-86	0.000311174554786564	7.9048902058919e-47	6.1944974268004e-59	7.93666802281656e-05	1.24556615237397e-52	1.18469030740813e-60	0.000156446936361652
sexMale_Std.Error	0.0618482294097284	23.8518341439671	0.120093045309629	0.0616078355613521	23.856750758352	0.11972270545355	0.104475287357897	50.5879876531385	0.0927193334338761	0.0647209948973254	24.4473730956476	0.112797805327956	0.0629827627260292	24.3549086073391	0.112083516545939
sexMale_tvalue	15.1205166481672	17.4059409544549	-2.11577613441495	14.5027763743757	16.9987478993161	-1.51508010885476	17.2942776901007	19.7660931597596	-3.60623577771709	14.4108867873977	16.2447698213451	-3.94717228218692	15.3164098120516	16.4891016491032	-3.78137339083065
sigma_v	4.21029844914315	1623.77111076428	8.18491760066961	4.18939119979502	1622.33549880859	8.15073036560541	8.18607049768594	3964.45339913597	7.93450742809862	4.35662621773428	1645.77655955938	7.6435668370875	4.23923961592693	1639.42085007515	7.59462918474114
svymthRound_Estimate	0.871665891005652	189.042906883819	-0.015475958799392	0.851989049736821	185.318286001897	0.0201471237605437	0.432815253441721	189.87799479506	0.00215144302579708	0.919614676961391	205.597385664746	-0.0509574460702801	0.921894094780682	205.945143306004	-0.0557204455206458
svymthRound_Pr…t..	0	0	0.0397984032097092	0	0	0.0117151185126457	0	0	0.000447277200167213	0	0	1.3713938980249e-18	0	0	7.79141497752014e-23
svymthRound_Std.Error	0.00387681209575622	1.4955473831309	0.00752730297891319	0.00411253488213796	1.59266949679222	0.00799217807522279	0.000728323735328996	0.352701518968251	0.000612792699568233	0.00331108017589108	1.25083486490652	0.00578476859618168	0.00317113547025635	1.22639878616071	0.00565696328562864
svymthRound_tvalue	224.840892330022	126.403823119306	-2.05597660181156	207.168832400007	116.357025971267	2.52085521254881	594.262183761196	538.353209678557	3.51088227277016	277.738571133786	164.368128386085	-8.8088996513906	290.714194782148	167.926734460268	-9.84988636256525
vars_var.y	hgt	wgt	vil.id	hgt	wgt	vil.id	hgt	wgt	vil.id	hgt	wgt	vil.id	hgt	wgt	vil.id
vars_vars.c	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound
vars_vars.x	prot	prot	prot	cal	cal	cal	wealthIdx	wealthIdx	wealthIdx	p.A.prot	p.A.prot	p.A.prot	p.A.nProt	p.A.nProt	p.A.nProt
wgt0_Estimate	-0.000146104685986974	0.637023553461056	-0.00090339059153386	-0.000116898230009977	0.64939400361476	-0.000941137072743922	0.00122231975126217	1.32870822160237	-0.000845938526704835	-0.000489534836079631	0.580023505722661	-0.0015619691115606	3.23596154259002e-05	0.655512063046742	-0.00115432723977404
wgt0_Pr…t..	0.13601158349758	2.96480083692537e-63	2.05763549729284e-06	0.230228828648909	7.43034302413299e-66	6.66901196231758e-07	1.22269348058886e-13	6.75367630220942e-62	4.32675510884396e-09	7.77000489086255e-07	7.42419220782874e-54	1.40362012201848e-19	0.740027016459631	4.09082062948253e-67	2.7547278172825e-11
wgt0_Std.Error	9.79994437486569e-05	0.0378027371614794	0.00019022150316743	9.74307633896925e-05	0.037739875283113	0.000189270503626622	0.000164767846917987	0.0798131859486413	0.000144040382619523	9.90410500454322e-05	0.0374185042114355	0.000172365145002826	9.75208524392676e-05	0.0377202854835201	0.000173241059789274
wgt0_tvalue	-1.49087260496799	16.851254731633	-4.7491507347553	-1.19980821193426	17.2071051836606	-4.97244448929307	7.41843614592217	16.6477281392748	-5.87292612891309	-4.94274682926999	15.5009805428139	-9.06197776548728	0.33182252427554	17.3782370584956	-6.66312732777168
cal_Estimate	NA	NA	NA	0.00243408846205617	0.69907250036464	-0.00395676177098467	NA	NA	NA	NA	NA	NA	NA	NA	NA
cal_Pr…t..	NA	NA	NA	8.01672708887592e-120	4.71331900886782e-67	7.94646124034726e-85	NA	NA	NA	NA	NA	NA	NA	NA	NA
cal_Std.Error	NA	NA	NA	0.000103833679413418	0.0402492068645181	0.000201721108117471	NA	NA	NA	NA	NA	NA	NA	NA	NA
cal_tvalue	NA	NA	NA	23.4421863484656	17.368603130933	-19.6150110809449	NA	NA	NA	NA	NA	NA	NA	NA	NA
wealthIdx_Estimate	NA	NA	NA	NA	NA	NA	0.210456554881893	106.678721085982	0.451733304543376	NA	NA	NA	NA	NA	NA
wealthIdx_Pr…t..	NA	NA	NA	NA	NA	NA	1.93494257269637e-41	3.25483455344948e-45	4.82890644809744e-250	NA	NA	NA	NA	NA	NA
wealthIdx_Std.Error	NA	NA	NA	NA	NA	NA	0.0155791042075756	7.54496977117115	0.0132483771350797	NA	NA	NA	NA	NA	NA
wealthIdx_tvalue	NA	NA	NA	NA	NA	NA	13.5088996182178	14.1390521528125	34.0972558327354	NA	NA	NA	NA	NA	NA
p.A.prot_Estimate	NA	NA	NA	NA	NA	NA	NA	NA	NA	3.86952250259533e-05	0.00521731297924599	0.000149388430455129	NA	NA	NA
p.A.prot_Pr…t..	NA	NA	NA	NA	NA	NA	NA	NA	NA	0.000125048896903774	0.170833589209346	2.88060045452304e-17	NA	NA	NA
p.A.prot_Std.Error	NA	NA	NA	NA	NA	NA	NA	NA	NA	1.00852286184785e-05	0.00380941660201473	1.76593895713677e-05	NA	NA	NA
p.A.prot_tvalue	NA	NA	NA	NA	NA	NA	NA	NA	NA	3.83682180045522	1.36958319982295	8.45943342783161	NA	NA	NA
p.A.nProt_Estimate	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0.00542428867316432	0.77951423205071	0.00526237555580908
p.A.nProt_Pr…t..	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	5.25341325087684e-226	1.4795093994274e-33	3.76857802825063e-70
p.A.nProt_Std.Error	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0.000166671307872962	0.0644431375975831	0.000295969260770984
p.A.nProt_tvalue	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	32.5448257554849	12.0961557911473	17.7801422421398

5.2.2.3.6 Test Program IV Return All

vars.z <- c('indi.id')
t(suppressWarnings(suppressMessages(
  ff_reg_mbyn(list.vars.y, list.vars.x,
            vars.c, vars.z, df,
            return_all = TRUE,
            stats_ends = 'Estimate')))) %>%
  kable() %>%
  kable_styling_fc_wide()

X.Intercept._Estimate	40.2173991882485	1408.16266370338	-64.4906360678745	39.6732302989797	1325.54736576349	-59.8304089440739	35.5561817357076	-2791.22153490887	21.8005242861703	24.300926170769	-499.067024090765	21.4632286881799	25.2992097396208	-352.278518335053	17.9359211845123
X.Intercept._Pr…z..	3.69748205151063e-59	0.00217397545492022	0.000109756271652438	1.300302408639e-103	0.00138952700446946	3.75547414435365e-07	2.01357089607565e-142	1.95034793045609e-05	1.17899313915906e-34	1.979686075101e-84	0.155922992161587	1.84405333725821e-09	1.2938856648278e-157	0.287184942020645	1.13855583545194e-12
X.Intercept._Std.Error	2.47963650425926	459.377029871511	16.6730992506847	1.83545587850906	414.64590052724	11.7754321199163	1.39936229104615	653.605248808668	1.77547715238971	1.24813311286047	351.723712331823	3.57067054655076	0.945826571483205	330.990098562007	2.52170174724026
X.Intercept._zvalue	16.2190704642262	3.06537456628437	-3.86794531108104	21.6149190855006	3.1968177282785	-5.08095230262336	25.4088465604759	-4.27050048939548	12.2786847788088	19.4698193008246	-1.41891776582847	6.01097984492387	26.7482543865813	-1.06431739156408	7.11262590992028
hgt0_Estimate	0.403139725682047	35.5765914326644	1.20995060148709	0.357976348181521	31.017270649736	1.50374470896815	0.460434521499931	59.1545587745054	0.412512139030997	0.515794899568958	46.2591615803456	0.520812513246497	0.510868687340372	45.5654716961748	0.534362107843996
hgt0_Pr…z..	1.25009876505916e-13	0.00044580263634977	0.000971126494011876	2.8214126518022e-17	0.00131003033160538	3.70002169493517e-08	2.98739737306075e-37	0.000542570320025762	3.02226358149701e-20	8.57492957072496e-59	2.85614887356539e-07	1.10039023739786e-08	3.24936431571418e-102	6.34545453001148e-08	3.42500501257647e-17
hgt0_Std.Error	0.054394831296424	10.1318250571452	0.366789440586502	0.0423453726228281	9.65135595902108	0.273179527952845	0.0361031059207924	17.1025823111652	0.0447499166719814	0.0319035514862777	9.01263684091064	0.0911390672918101	0.0237991645880177	8.4243486539678	0.0633800587736384
hgt0_zvalue	7.41136089723565	3.51137048182399	3.29876072646029	8.45373003019788	3.21377335800617	5.50460248700526	12.7533216258482	3.45880859967487	9.2181655231836	16.1673191710588	5.13270005181653	5.7144814921021	21.4658243759356	5.40878275197143	8.43107624359371
prot_Estimate	0.859205733631884	98.9428234201429	-6.0245137913613	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
prot_Pr…z..	6.88427337868624e-19	2.09631602338248e-08	2.94171378765984e-20	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
prot_Std.Error	0.0967928354474626	17.656195205247	0.653342710289674	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
prot_zvalue	8.87674929306359	5.60385871757577	-9.22106223346427	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
Sargan_df1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
sexMale_Estimate	0.154043421788434	333.799680049256	5.41175429817601	0.106307556058088	330.452608866756	5.831189427888	1.80283907885784	997.747599807137	-0.452827875182574	1.02741625216021	411.365911332902	-0.789122421167278	1.02009164592613	409.820707458849	-0.746032636368026
sexMale_Pr…z..	0.388078129326844	5.06413216619956e-24	5.80077629935398e-06	0.423490075743443	2.52735690942035e-27	6.12283824653399e-12	1.16893284800756e-65	2.02347084790526e-89	0.000647195788039458	1.69796551008425e-27	2.05327249428077e-54	0.00428270841485319	1.70848440094532e-51	2.36314216739175e-62	6.57521045475854e-05
sexMale_Std.Error	0.178475271469509	33.0216035385256	1.19371921154444	0.132821186086593	30.5174257712039	0.847955715223007	0.105343525210948	49.7632792630674	0.132754263303729	0.0945646985181949	26.482231353221	0.276250047248342	0.0675715533063684	24.5920104216274	0.186921458372094
sexMale_zvalue	0.863107928174524	10.1085242471589	4.5335236677428	0.800381017443866	10.8283251459096	6.87676174970345	17.1139049623383	20.049876426605	-3.41102322376305	10.8646912458832	15.5336574870179	-2.85655126226232	15.0964658352761	16.6647907361991	-3.99115565898775
svymthRound_Estimate	0.209901650858444	121.789859431719	4.84745570027421	0.32289383712906	135.494858749213	4.07024693316579	0.433164820953122	190.077351395411	0.013743826466697	1.00582859923507	218.549980922773	-0.369567838754919	0.929266902426876	207.078222946319	-0.0985678389223871
svymthRound_Pr…z..	0.00846239710330183	5.96047652726604e-17	2.07373887990734e-19	9.66146445970411e-11	4.48931446156578e-34	5.64723572075311e-36	0	0	1.57416908709045e-66	0	0	2.42696379710431e-102	0	0	1.8456989795407e-27
svymthRound_Std.Error	0.0797183179466224	14.5577085129173	0.538050140686242	0.0498896912189897	11.133488331491	0.325043349284404	0.00120472816008757	0.739269879490012	0.000797655931686455	0.00746867714609322	1.93157117819081	0.0172056989832521	0.00539330635998891	1.46167854745869	0.00907867488118111
svymthRound_zvalue	2.63304164293819	8.3660048093182	9.00930105527276	6.47215545415433	12.1700274626388	12.5221664806451	359.553993426731	257.114967982378	17.2302692435809	134.672925279842	113.146221785871	-21.4793853545069	172.300040161039	141.671520941694	-10.857073329799
vars_var.y	hgt	wgt	vil.id	hgt	wgt	vil.id	hgt	wgt	vil.id	hgt	wgt	vil.id	hgt	wgt	vil.id
vars_vars.c	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound	sex+wgt0+hgt0+svymthRound
vars_vars.x	prot	prot	prot	cal	cal	cal	wealthIdx	wealthIdx	wealthIdx	p.A.prot	p.A.prot	p.A.prot	p.A.nProt	p.A.nProt	p.A.nProt
vars_vars.z	indi.id	indi.id	indi.id	indi.id	indi.id	indi.id	indi.id	indi.id	indi.id	indi.id	indi.id	indi.id	indi.id	indi.id	indi.id
Weakinstruments_df1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
Weakinstruments_df2	18957	18962	18999	18957	18962	18999	25092	25102	30013	18587	18591	18845	18587	18591	18845
Weakinstruments_p.value	1.42153759924051e-19	4.45734829676535e-19	5.72345606958224e-20	1.77770827184557e-37	4.03760292921249e-37	5.4744773509349e-38	0	0	0	0	0	0	0	0	0
Weakinstruments_statistic	82.0931934821259	79.8251182827394	83.8989817367576	164.392129625297	162.747072038427	166.752606654978	7029.47383089423	7038.38467113138	12942.6315513376	1710.98122418593	1715.15052113401	1725.71954882909	5097.88462603714	5110.7741807339	5136.55662964893
wgt0_Estimate	-0.00163274724538053	0.492582112313714	0.00999798623641609	-0.000658938519303158	0.601258436431589	0.00326074237566447	0.00112485055604174	1.27282038539716	-0.00512158791392231	0.000716628918445026	0.761704518610409	-0.00601345031606022	0.00092210011725941	0.792700893714032	-0.00668277875606412
wgt0_Pr…z..	4.88365163658792e-08	2.3313655523177e-20	7.95432753715951e-07	0.000328431498069385	2.09211347307227e-48	0.00667886646010296	2.26123807436414e-11	6.67525280115101e-56	6.51923754467955e-127	2.43477572087317e-06	8.22014792868139e-69	5.19751747054247e-44	1.68237436783295e-15	4.81415543482617e-82	2.5484884022112e-105
wgt0_Std.Error	0.000299284876595226	0.0532753838702927	0.00202532507408109	0.000183457551985467	0.0411255751282266	0.0012021409416413	0.000168187467853392	0.0808047514011814	0.000213715312589844	0.000152036990659248	0.0434474820358988	0.000432182413699013	0.00011580150512101	0.0413159097814226	0.000306609919183132
wgt0_zvalue	-5.45549532590907	9.24596082710511	4.93648469787117	-3.59177647456756	14.6200614716489	2.71244598924693	6.68807593335234	15.7518012657181	-23.964534182684	4.71351685755978	17.5316147891159	-13.9141485758099	7.96276452793799	19.186335189222	-21.7957030674948
Wu.Hausman_df1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1
Wu.Hausman_df2	18956	18961	18998	18956	18961	18998	25091	25101	30012	18586	18590	18844	18586	18590	18844
Wu.Hausman_p.value	1.53929570350174e-118	3.1341589140336e-08	0	2.88592506947287e-108	7.64959440843259e-07	0	0.0221987672063031	0.00993600230368616	0	1.80909125272665e-238	2.14946499922086e-35	0	3.15182965429387e-108	1.76811257415626e-17	0
Wu.Hausman_statistic	543.467268879861	30.6481856102737	5652.51924793073	494.955883488803	24.4605456761215	5583.56513052695	5.23078768861662	6.64734699528169	25949.7118056005	1119.87022468742	154.793296861585	4826.92242730011	494.903094649185	72.5307870103482	7607.83405438101
cal_Estimate	NA	NA	NA	0.0238724384575233	2.71948246216963	-0.168054407187466	NA	NA	NA	NA	NA	NA	NA	NA	NA
cal_Pr…z..	NA	NA	NA	1.4495661654094e-33	9.21076021353428e-10	5.6761450167746e-39	NA	NA	NA	NA	NA	NA	NA	NA	NA
cal_Std.Error	NA	NA	NA	0.00197718112736554	0.444177077283097	0.0128692506794762	NA	NA	NA	NA	NA	NA	NA	NA	NA
cal_zvalue	NA	NA	NA	12.0739764946734	6.12251870088371	-13.0586007975956	NA	NA	NA	NA	NA	NA	NA	NA	NA
wealthIdx_Estimate	NA	NA	NA	NA	NA	NA	0.144503490136916	69.1816142882735	-1.91414470908346	NA	NA	NA	NA	NA	NA
wealthIdx_Pr…z..	NA	NA	NA	NA	NA	NA	3.72983264927879e-06	2.23442991283626e-07	0	NA	NA	NA	NA	NA	NA
wealthIdx_Std.Error	NA	NA	NA	NA	NA	NA	0.0312379492766362	13.3588885513857	0.0371054140359332	NA	NA	NA	NA	NA	NA
wealthIdx_zvalue	NA	NA	NA	NA	NA	NA	4.62589553677888	5.17869536991513	-51.5866689219473	NA	NA	NA	NA	NA	NA
p.A.prot_Estimate	NA	NA	NA	NA	NA	NA	NA	NA	NA	0.00148073028434634	0.221916473012471	-0.00520794333267228	NA	NA	NA
p.A.prot_Pr…z..	NA	NA	NA	NA	NA	NA	NA	NA	NA	2.50759287111904e-156	8.30126393445594e-33	3.00201193980421e-197	NA	NA	NA
p.A.prot_Std.Error	NA	NA	NA	NA	NA	NA	NA	NA	NA	5.55884799941939e-05	0.0186022369560852	0.000173813943639702	NA	NA	NA
p.A.prot_zvalue	NA	NA	NA	NA	NA	NA	NA	NA	NA	26.6373587567245	11.9295584469951	-29.9627476577358	NA	NA	NA
p.A.nProt_Estimate	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0.0141317656200728	2.1185694049434	-0.0494468877742103
p.A.nProt_Pr…z..	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	2.61782083799059e-226	4.81511329045864e-35	0
p.A.nProt_Std.Error	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	0.000440019589949139	0.17153115470459	0.00128926108222172
p.A.nProt_zvalue	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA	32.1162192385715	12.3509307017258	-38.3528894620791

5.2.2.4 Program Line by Line

Set Up Parameters

vars.z <- c('indi.id')
vars.z <- NULL
vars.c <- c('sex', 'wgt0', 'hgt0', 'svymthRound')

5.2.2.4.1 Lapply

df.reg.out <- as_tibble(
  bind_rows(lapply(list.vars.y, regf.iv,
                   vars.x=var.x1, vars.c=vars.c, vars.z=vars.z, df=df)))

5.2.2.4.2 Nested Lapply Test

lapply(list.vars.y, function(y) (mean(df[[var.x1]], na.rm=TRUE) +
                                   mean(df[[y]], na.rm=TRUE)))

## [[1]]
## [1] 98.3272
## 
## [[2]]
## [1] 13626.51
## 
## [[3]]
## [1] 26.11226

lapplytwice <- lapply(
  list.vars.x, function(x) (
    lapply(list.vars.y, function(y) (mean(df[[x]], na.rm=TRUE) +
                                       mean(df[[y]], na.rm=TRUE)))))
# lapplytwice

5.2.2.4.3 Nested Lapply All

df.reg.out.all <- bind_rows(
  lapply(list.vars.x,
         function(x) (
           bind_rows(
             lapply(list.vars.y, regf.iv,
                    vars.x=x, vars.c=vars.c, vars.z=vars.z, df=df))
         )))

# df.reg.out.all %>%
#   kable() %>%
#   kable_styling_fc_wide()

5.2.2.4.4 Nested Lapply Select

df.reg.out.all <-
  (lapply(list.vars.x,
          function(x) (
            bind_rows(lapply(list.vars.y, regf.iv,
                             vars.x=x, vars.c=vars.c, vars.z=vars.z, df=df)) %>%
              select(vars_var.y, starts_with(x)) %>%
              select(vars_var.y, ends_with('value'))
          ))) %>% reduce(full_join)

df.reg.out.all %>%
  kable() %>%
  kable_styling_fc_wide()

vars_var.y	prot_tvalue	cal_tvalue	wealthIdx_tvalue	p.A.prot_tvalue	p.A.nProt_tvalue
hgt	18.8756010031766	23.4421863484656	13.5088996182178	3.83682180045522	32.5448257554849
wgt	16.359112505607	17.368603130933	14.1390521528125	1.36958319982295	12.0961557911473
vil.id	-14.9385580468905	-19.6150110809449	34.0972558327354	8.45943342783161	17.7801422421398

5.3 Decomposition

5.3.1 Decompose RHS

Go back to fan’s REconTools research support package, R4Econ examples page, PkgTestR packaging guide, or Stat4Econ course page.

One runs a number of regressions. With different outcomes, and various right hand side variables.

What is the remaining variation in the left hand side variable if right hand side variable one by one is set to the average of the observed values.

Dependency: R4Econ/linreg/ivreg/ivregdfrow.R

The code below does not work with categorical variables (except for dummies). Dummy variable inputs need to be converted to zero/one first. The examples are just to test the code with different types of variables.

# Library
library(tidyverse)
library(AER)

# Load Sample Data
setwd('C:/Users/fan/R4Econ/_data/')
df <- read_csv('height_weight.csv')

# Source Dependency
source('C:/Users/fan/R4Econ/linreg/ivreg/ivregdfrow.R')

Data Cleaning.

# Convert Variable for Sex which is categorical to Numeric
df <- df
df$male <- (as.numeric(factor(df$sex)) - 1)
summary(factor(df$sex))

## Female   Male 
##  16446  18619

summary(df$male)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   1.000   0.531   1.000   1.000

df.use <- df %>% filter(S.country == 'Guatemala') %>%
  filter(svymthRound %in% c(12, 18, 24))
dim(df.use)

## [1] 2022   16

Setting Up Parameters.

# Define Left Hand Side Variab les
var.y1 <- c('hgt')
var.y2 <- c('wgt')
vars.y <- c(var.y1, var.y2)
# Define Right Hand Side Variables
vars.x <- c('prot')
vars.c <- c('male', 'wgt0', 'hgt0', 'svymthRound')
# vars.z <- c('p.A.prot')
vars.z <- c('vil.id')
# vars.z <- NULL
vars.xc <- c(vars.x, vars.c)

# Other variables to keep
vars.other.keep <- c('S.country', 'vil.id', 'indi.id', 'svymthRound')

# Decompose sequence
vars.tomean.first <- c('male', 'hgt0')
var.tomean.first.name.suffix <- '_mh02m'
vars.tomean.second <- c(vars.tomean.first, 'hgt0', 'wgt0')
var.tomean.second.name.suffix <- '_mh0me2m'
vars.tomean.third <- c(vars.tomean.second, 'prot')
var.tomean.third.name.suffix <- '_mh0mep2m'
vars.tomean.fourth <- c(vars.tomean.third, 'svymthRound')
var.tomean.fourth.name.suffix <- '_mh0mepm2m'
list.vars.tomean = list(
#                         vars.tomean.first,
                        vars.tomean.second,
                        vars.tomean.third,
                        vars.tomean.fourth
                        )
list.vars.tomean.name.suffix <- list(
#                                     var.tomean.first.name.suffix,
                                     var.tomean.second.name.suffix,
                                     var.tomean.third.name.suffix,
                                     var.tomean.fourth.name.suffix
                                    )

5.3.1.1 Obtain Regression Coefficients from somewhere

# Regressions
# regf.iv from C:\Users\fan\R4Econ\linreg\ivreg\ivregdfrow.R
df.reg.out <- as_tibble(
  bind_rows(lapply(vars.y, regf.iv,
                   vars.x=vars.x, vars.c=vars.c, vars.z=vars.z, df=df)))
# Regressions
# reg1 <- regf.iv(var.y = var.y1, vars.x, vars.c, vars.z, df.use)
# reg2 <- regf.iv(var.y = var.y2, vars.x, vars.c, vars.z, df.use)
# df.reg.out <- as_tibble(bind_rows(reg1, reg2))

# df.reg.out

# Select Variables
str.esti.suffix <- '_Estimate'
arr.esti.name <- paste0(vars.xc, str.esti.suffix)
str.outcome.name <- 'vars_var.y'
arr.columns2select <- c(arr.esti.name, str.outcome.name)
arr.columns2select

## [1] "prot_Estimate"        "male_Estimate"        "wgt0_Estimate"       
## [4] "hgt0_Estimate"        "svymthRound_Estimate" "vars_var.y"

# Generate dataframe for coefficients
df.coef <- df.reg.out[,c(arr.columns2select)] %>%
  mutate_at(vars(arr.esti.name), as.numeric) %>% column_to_rownames(str.outcome.name)
df.coef %>%
  kable() %>%
  kable_styling_fc()

	prot_Estimate	male_Estimate	wgt0_Estimate	hgt0_Estimate	svymthRound_Estimate
hgt	-0.2714772	1.244735	0.0004430	0.6834853	1.133919
wgt	-59.0727542	489.852902	0.7696158	75.4867897	250.778883

str(df.coef)

## 'data.frame':    2 obs. of  5 variables:
##  $ prot_Estimate       : num  -0.271 -59.073
##  $ male_Estimate       : num  1.24 489.85
##  $ wgt0_Estimate       : num  0.000443 0.769616
##  $ hgt0_Estimate       : num  0.683 75.487
##  $ svymthRound_Estimate: num  1.13 250.78

5.3.1.2 Decomposition Step 1

# Decomposition Step 1: gather
df.decompose_step1 <- df.use %>%
                        filter(svymthRound %in% c(12, 18, 24)) %>%
                        select(one_of(c(vars.other.keep, vars.xc, vars.y))) %>%
                        drop_na() %>%
                        gather(variable, value, -one_of(c(vars.other.keep, vars.xc)))
options(repr.matrix.max.rows=20, repr.matrix.max.cols=20)
dim(df.decompose_step1)

## [1] 1382   10

head(df.decompose_step1, 10) %>%
  kable() %>%
  kable_styling_fc()

S.country	vil.id	indi.id	svymthRound	prot	male	wgt0	hgt0	variable	value
Guatemala	3	1352	18	13.3	1	2545.2	47.4	hgt	70.2
Guatemala	3	1352	24	46.3	1	2545.2	47.4	hgt	75.8
Guatemala	3	1354	12	1.0	1	3634.3	51.2	hgt	66.3
Guatemala	3	1354	18	9.8	1	3634.3	51.2	hgt	69.2
Guatemala	3	1354	24	15.4	1	3634.3	51.2	hgt	75.3
Guatemala	3	1356	12	8.6	1	3911.8	51.9	hgt	68.1
Guatemala	3	1356	18	17.8	1	3911.8	51.9	hgt	74.1
Guatemala	3	1356	24	30.5	1	3911.8	51.9	hgt	77.1
Guatemala	3	1357	12	1.0	1	3791.4	52.6	hgt	71.5
Guatemala	3	1357	18	12.7	1	3791.4	52.6	hgt	77.8

5.3.1.3 Decomposition Step 2

# Decomposition Step 2: mutate_at(vars, funs(mean = mean(.)))
# the xc averaging could have taken place earlier, no difference in mean across variables
df.decompose_step2 <- df.decompose_step1 %>%
                        group_by(variable) %>%
                        mutate_at(vars(c(vars.xc, 'value')), funs(mean = mean(.))) %>%
                        ungroup()

options(repr.matrix.max.rows=20, repr.matrix.max.cols=20)
dim(df.decompose_step2)

## [1] 1382   16

head(df.decompose_step2,10) %>%
  kable() %>%
  kable_styling_fc_wide()

S.country	vil.id	indi.id	svymthRound	prot	male	wgt0	hgt0	variable	value	prot_mean	male_mean	wgt0_mean	hgt0_mean	svymthRound_mean	value_mean
Guatemala	3	1352	18	13.3	1	2545.2	47.4	hgt	70.2	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216
Guatemala	3	1352	24	46.3	1	2545.2	47.4	hgt	75.8	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216
Guatemala	3	1354	12	1.0	1	3634.3	51.2	hgt	66.3	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216
Guatemala	3	1354	18	9.8	1	3634.3	51.2	hgt	69.2	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216
Guatemala	3	1354	24	15.4	1	3634.3	51.2	hgt	75.3	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216
Guatemala	3	1356	12	8.6	1	3911.8	51.9	hgt	68.1	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216
Guatemala	3	1356	18	17.8	1	3911.8	51.9	hgt	74.1	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216
Guatemala	3	1356	24	30.5	1	3911.8	51.9	hgt	77.1	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216
Guatemala	3	1357	12	1.0	1	3791.4	52.6	hgt	71.5	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216
Guatemala	3	1357	18	12.7	1	3791.4	52.6	hgt	77.8	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216

5.3.1.4 Decomposition Step 3 Non-Loop

ff_lr_decompose_valadj <- function(df, df.coef, vars.tomean, str.esti.suffix) {
  new_value <- (df$value +
                  rowSums((df[paste0(vars.tomean, '_mean')] - df[vars.tomean])
                          *df.coef[df$variable, paste0(vars.tomean, str.esti.suffix)]))
  return(new_value)
}

5.3.1.5 Decomposition Step 3 With Loop

df.decompose_step3 <- df.decompose_step2
for (i in 1:length(list.vars.tomean)) {
    var.decomp.cur <- (paste0('value', list.vars.tomean.name.suffix[[i]]))
    vars.tomean <- list.vars.tomean[[i]]
    var.decomp.cur
    df.decompose_step3 <- df.decompose_step3 %>%
      mutate((!!var.decomp.cur) :=
               ff_lr_decompose_valadj(., df.coef, vars.tomean, str.esti.suffix))

}

dim(df.decompose_step3)

## [1] 1382   19

head(df.decompose_step3, 10) %>%
  kable() %>%
  kable_styling_fc_wide()

S.country	vil.id	indi.id	svymthRound	prot	male	wgt0	hgt0	variable	value	prot_mean	male_mean	wgt0_mean	hgt0_mean	svymthRound_mean	value_mean	value_mh0me2m	value_mh0mep2m	value_mh0mepm2m
Guatemala	3	1352	18	13.3	1	2545.2	47.4	hgt	70.2	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216	73.19390	71.19903	71.68148
Guatemala	3	1352	24	46.3	1	2545.2	47.4	hgt	75.8	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216	78.79390	85.75778	79.43671
Guatemala	3	1354	12	1.0	1	3634.3	51.2	hgt	66.3	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216	63.61689	58.28285	65.56882
Guatemala	3	1354	18	9.8	1	3634.3	51.2	hgt	69.2	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216	66.51689	63.57185	64.05430
Guatemala	3	1354	24	15.4	1	3634.3	51.2	hgt	75.3	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216	72.61689	71.19213	64.87106
Guatemala	3	1356	12	8.6	1	3911.8	51.9	hgt	68.1	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216	64.33707	61.06626	68.35222
Guatemala	3	1356	18	17.8	1	3911.8	51.9	hgt	74.1	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216	70.33707	69.56385	70.04630
Guatemala	3	1356	24	30.5	1	3911.8	51.9	hgt	77.1	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216	73.33707	76.01161	69.69055
Guatemala	3	1357	12	1.0	1	3791.4	52.6	hgt	71.5	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216	66.83353	61.49949	68.78545
Guatemala	3	1357	18	12.7	1	3791.4	52.6	hgt	77.8	20.64819	0.5499276	3312.297	49.75137	18.42547	73.41216	73.13353	70.97578	71.45823

5.3.1.6 Decomposition Step 4 Variance

df.decompose_step3 %>%
        select(variable, contains('value')) %>%
        group_by(variable) %>%
        summarize_all(funs(mean = mean, var = var)) %>%
        select(matches('value')) %>% select(ends_with("_var")) %>%
        mutate_if(is.numeric, funs( frac = (./value_var))) %>%
        mutate_if(is.numeric, round, 3) %>%
  kable() %>%
  kable_styling_fc_wide()

value_var	value_mean_var	value_mh0me2m_var	value_mh0mep2m_var	value_mh0mepm2m_var	value_var_frac	value_mean_var_frac	value_mh0me2m_var_frac	value_mh0mep2m_var_frac	value_mh0mepm2m_var_frac
21.864	NA	25.35	49.047	23.06	1	NA	1.159	2.243	1.055
2965693.245	NA	2949187.64	4192769.518	3147506.60	1	NA	0.994	1.414	1.061

5.3.1.7 Graphical Results

Graphically, difficult to pick up exact differences in variance, a 50 percent reduction in variance visually does not look like 50 percent. Intuitively, we are kind of seeing standard deviation, not variance on the graph if we think abou the x-scale.

head(df.decompose_step3 %>%
    select(variable, contains('value'), -value_mean), 10) %>%
  kable() %>%
  kable_styling_fc()

variable	value	value_mh0me2m	value_mh0mep2m	value_mh0mepm2m
hgt	70.2	73.19390	71.19903	71.68148
hgt	75.8	78.79390	85.75778	79.43671
hgt	66.3	63.61689	58.28285	65.56882
hgt	69.2	66.51689	63.57185	64.05430
hgt	75.3	72.61689	71.19213	64.87106
hgt	68.1	64.33707	61.06626	68.35222
hgt	74.1	70.33707	69.56385	70.04630
hgt	77.1	73.33707	76.01161	69.69055
hgt	71.5	66.83353	61.49949	68.78545
hgt	77.8	73.13353	70.97578	71.45823

df.decompose_step3 %>%
    select(variable, contains('value'), -value_mean) %>%
    rename(outcome = variable) %>%
    gather(variable, value, -outcome) %>%
    ggplot(aes(x=value, color = variable, fill = variable)) +
        geom_line(stat = "density") +
        facet_wrap(~ outcome, scales='free', nrow=2)

5.3.1.8 Additional Decomposition Testings

head(df.decompose_step2[vars.tomean.first],3)
head(df.decompose_step2[paste0(vars.tomean.first, '_mean')], 3)
head(df.coef[df.decompose_step2$variable,
             paste0(vars.tomean.first, str.esti.suffix)], 3)
df.decompose.tomean.first <- df.decompose_step2 %>%
    mutate(pred_new = df.decompose_step2$value +
        rowSums((df.decompose_step2[paste0(vars.tomean.first, '_mean')]
                 - df.decompose_step2[vars.tomean.first])
            *df.coef[df.decompose_step2$variable,
                     paste0(vars.tomean.first, str.esti.suffix)])) %>%
        select(variable, value, pred_new)
head(df.decompose.tomean.first, 10)
df.decompose.tomean.first %>%
        group_by(variable) %>%
        summarize_all(funs(mean = mean, sd = sd))  %>%
  kable() %>%
  kable_styling_fc()

variable	value_mean	pred_new_mean	value_sd	pred_new_sd
hgt	73.41216	73.41216	4.675867	4.534947
wgt	8807.87656	8807.87656	1722.118824	1695.221845

Note the r-square from regression above matches up with the 1 - ratio below. This is the proper decomposition method that is equivalent to r2.

df.decompose_step2 %>%
    mutate(pred_new = df.decompose_step2$value +
        rowSums((df.decompose_step2[paste0(vars.tomean.second, '_mean')]
                 - df.decompose_step2[vars.tomean.second])
            *df.coef[df.decompose_step2$variable,
                     paste0(vars.tomean.second, str.esti.suffix)])) %>%
        select(variable, value, pred_new) %>%
        group_by(variable) %>%
        summarize_all(funs(mean = mean, var = var)) %>%
        mutate(ratio = (pred_new_var/value_var))  %>%
  kable() %>%
  kable_styling_fc()

variable	value_mean	pred_new_mean	value_var	pred_new_var	ratio
hgt	73.41216	73.41216	2.186374e+01	25.3504	1.1594724
wgt	8807.87656	8807.87656	2.965693e+06	2949187.6357	0.9944345