The various simulations on this page are done in support of Same Environment, Stratified Impacts? Air Pollution, Extreme Temperatures, and Birth Weight in South China.
Go to the RMD, R, PDF, or HTML version of this file. Go back to fan’s REconTools Package, R Code Examples Repository (bookdown site), or Intro Stats with R Repository (bookdown site).
Suppose the number of weeks that an individual is pregnant follows the binomial distributions. According to data from the Right from the Start study from West Virginia, published in Hoffman et al. (2008), median gestational age at birth was \(276/7\) weeks and the standard deviation was around \(14/7\) weeks. We use binomial to approximate normal rules to generate the relevant binomial parameters.
On each date after conception, there is some chance to give birth, due to various purely random factors, denote this chance as \(p_{\tau,i}\), where \(\tau\) is discrete time since conception, \(p_{\tau, i}\) follows the just defined binomial distribution. The distribution might or might not be individual specific. They would be individual-specific if individual experiences during the course of pregnancy, such as exposures to extreme temeprature, matter for gestational age at birth. The probability that an individual \(i\) gives birth during the week of conception is \(p_{0,i}\). Conditional on proceeding to the day after conception, the probability of giving birth is \(\frac{p_{1,i}}{1-p_{0,i}}\). Similarly for future dates. The conditional birth probability is the Hazard rate in this scenario.
Below, we formulate the gestational age at birth distribution function ffi_gestation_age_at_birth_dist().
# This function generates the distribution of gestational weeks at birth
ffi_gestation_age_at_birth_dist <- function(
mu_gabirth_days = 276,
sd_gabirth_days = 14
){
# # Parameters
# # gabirth = gestational age at birth
# mu_gabirth_days <- 276
# sd_gabirth_days <- 14
# from https://fanwangecon.github.io/R4Econ/statistics/discrandvar/htmlpdfr/fs_disc_approx_cts.html
it_binom_n <- round((mu_gabirth_days / 7)^2 / (mu_gabirth_days / 7 - (sd_gabirth_days / 7)^2))
fl_binom_p <- 1 - (sd_gabirth_days / 7)^2 / (mu_gabirth_days / 7)
# Same graphing code as from: https://fanwangecon.github.io/Stat4Econ/probability_discrete/htmlpdfr/binomial.html#24_binomial_example:_wwii_german_soldier
# Generate Data
ar_grid_gabirth <- 0:it_binom_n
ar_pdf_gabirth <- dbinom(ar_grid_gabirth, it_binom_n, fl_binom_p)
ar_cdf_gabirth <- pbinom(ar_grid_gabirth, it_binom_n, fl_binom_p)
df_dist_gabirth <- tibble(gabirth = (ar_grid_gabirth), prob = ar_pdf_gabirth, cum_prob = ar_cdf_gabirth)
# Two axis colors
axis_sec_ratio <- max(ar_cdf_gabirth) / max(ar_pdf_gabirth)
right_axis_color <- "blue"
left_axis_color <- "red"
# Probabilities
plt_dist_gabirth <- df_dist_gabirth %>%
ggplot(aes(x = gabirth)) +
geom_bar(aes(y = prob),
stat = "identity", alpha = 0.5, width = 0.5, fill = left_axis_color
)
# Cumulative Probabilities
plt_dist_gabirth <- plt_dist_gabirth +
geom_line(aes(y = cum_prob / axis_sec_ratio),
alpha = 0.75, size = 1, color = right_axis_color
)
# Titles Strings etc
graph_title <- paste0("Gestational age at birth (weeks)\n",
"Prob mass (Left) and cumulative prob (Right)")
graph_caption <- paste0("Assuming the binomial properties apply\n",
"fl_binom_p = ", fl_binom_p, ", it_binom_n = ", it_binom_n)
graph_title_x <- paste0("Gestational age at birth (weeks)\n",
"mean gestational age (days) at birth = ", mu_gabirth_days, "\n",
"standard deviation of g.a. (days) at birth = ", sd_gabirth_days)
graph_title_y_axisleft <- "Prob x weeks of gestation"
graph_title_y_axisright <- "Prob at most x weeks of gestation"
# Titles etc
plt_dist_gabirth <- plt_dist_gabirth +
labs(
title = graph_title,
x = graph_title_x,
y = graph_title_y_axisleft,
caption = graph_caption
) +
scale_y_continuous(
sec.axis =
sec_axis(~ . * axis_sec_ratio, name = graph_title_y_axisright)
) +
scale_x_continuous(
labels = ar_grid_gabirth[floor(seq(1, it_binom_n, length.out = 10))],
breaks = ar_grid_gabirth[floor(seq(1, it_binom_n, length.out = 10))]
) +
theme(
axis.text.y = element_text(face = "bold"),
axis.text.y.right = element_text(color = right_axis_color),
axis.text.y.left = element_text(color = left_axis_color)
)
# Print
return(list(
df_dist_gabirth=df_dist_gabirth,
plt_dist_gabirth=plt_dist_gabirth
))
}
# Test the function
ls_gsbirth <- ffi_gestation_age_at_birth_dist(mu_gabirth_days = 276, sd_gabirth_days = 14)
# Figure
print(ls_gsbirth$plt_dist_gabirth)
# Table
df_dist_gabirth <- ls_gsbirth$df_dist_gabirth
kable(df_dist_gabirth %>% filter(prob >= 0.01)) %>% kable_styling_fc()
gabirth | prob | cum_prob |
---|---|---|
35 | 0.0190913 | 0.0303855 |
36 | 0.0422736 | 0.0726591 |
37 | 0.0809563 | 0.1536154 |
38 | 0.1320866 | 0.2857020 |
39 | 0.1799862 | 0.4656882 |
40 | 0.1992704 | 0.6649586 |
41 | 0.1721918 | 0.8371504 |
42 | 0.1089377 | 0.9460881 |
43 | 0.0448780 | 0.9909661 |
Some research on the distribution of gestational ages at birth:
Denote week of the year with \(w\). There are potentially 52 possible weeks of the year, starting at week 1, \(w=1\), and ending with \(w=52\). Conception can happen at any month.
Suppose the conception week of the year distribution is bimodal, we will pick two max conception count weeks, out of 52 weeks. Suppose week 15 and week 40 are two peak conception months, with week 15 more heavily weighted. What is the distribution of conception weeks? We will mix two binomial distribution with a largely-uniform distribution together. We will assume later that the day (of the week) of conception is uniformly distributed.
Below, we formulate the conception date distribution function ffi_concept_distribution_year().
# This function generates the distribution of conception weeks across
# the year, with two peak months, and some randomness
ffi_concept_distribution_year <- function(it_max_weeks = 52,
it_peak_wk_1st = 15,
it_peak_wk_2nd = 40,
fl_binom_1st_wgt = 0.150,
fl_binom_2nd_wgt = 0.075,
it_runif_seed = 123,
df_dist_conception_exo = NULL) {
# # Peak (local) months and weights
# it_max_weeks <- 52
# it_peak_wk_1st <- 15
# it_peak_wk_2nd <- 40
# # Weights for the two binomial and the remaining weight is for an uniform distribution
# fl_binom_1st_wgt <- 0.25
# fl_binom_2nd_wgt <- 0.10
if (is.null(df_dist_conception_exo)) {
# Discrete random variables
ar_fl_binom_1st <- dbinom(
0:(it_max_weeks - 1), (it_max_weeks - 1),
(it_peak_wk_1st - 1) / (it_max_weeks - 1)
)
ar_fl_binom_2nd <- dbinom(
0:(it_max_weeks - 1), (it_max_weeks - 1),
(it_peak_wk_2nd - 1) / (it_max_weeks - 1)
)
set.seed(it_runif_seed)
ar_random_base <- runif(it_max_weeks, min = 0.5, max = 1)
ar_random_base <- ar_random_base / sum(ar_random_base)
# Mix two binomials and a uniform
ar_fl_p_concept_week <- ar_fl_binom_1st * fl_binom_1st_wgt +
ar_fl_binom_2nd * fl_binom_2nd_wgt +
ar_random_base * (1 - fl_binom_1st_wgt - fl_binom_2nd_wgt)
# Dataframe
df_dist_conception <- tibble(conception_calendar_week = 1:it_max_weeks,
conception_prob = ar_fl_p_concept_week)
} else {
df_dist_conception <- df_dist_conception_exo
}
# Line plot
# Title
st_title <- paste0(
"Distribution of conception week of birth\n",
"over weeks of one specific year, seed=", it_runif_seed
)
# Display
plt_concept_week_of_year <- df_dist_conception %>%
ggplot(aes(x = conception_calendar_week, y= conception_prob)) +
geom_line() +
labs(
title = st_title,
x = 'Weeks of year',
y = 'Share of conception this week'
) +
scale_x_continuous(n.breaks = 12) +
scale_y_continuous(n.breaks = 10) +
theme(
axis.text.x = element_text(angle = 45, vjust = 0.1, hjust = 0.1)
)
# Return
return(list(
df_dist_conception = df_dist_conception,
plt_concept_week_of_year = plt_concept_week_of_year
))
}
# Call function with defaults
ls_concept <- ffi_concept_distribution_year(it_max_weeks = 52,
it_peak_wk_1st = 15,
it_peak_wk_2nd = 40,
it_runif_seed = 123)
ls_concept$plt_concept_week_of_year
df_dist_conception <- ls_concept$df_dist_conception
kable(df_dist_conception) %>% kable_styling_fc()
conception_calendar_week | conception_prob |
---|---|
1 | 0.0127130 |
2 | 0.0176571 |
3 | 0.0139137 |
4 | 0.0186053 |
5 | 0.0192192 |
6 | 0.0105366 |
7 | 0.0157063 |
8 | 0.0201894 |
9 | 0.0184492 |
10 | 0.0200423 |
11 | 0.0283162 |
12 | 0.0270358 |
13 | 0.0325643 |
14 | 0.0336904 |
15 | 0.0295436 |
16 | 0.0361683 |
17 | 0.0271255 |
18 | 0.0218356 |
19 | 0.0213639 |
20 | 0.0247214 |
21 | 0.0219399 |
22 | 0.0185480 |
23 | 0.0171439 |
24 | 0.0201420 |
25 | 0.0165473 |
26 | 0.0169517 |
27 | 0.0152796 |
28 | 0.0157608 |
29 | 0.0127605 |
30 | 0.0114004 |
31 | 0.0195570 |
32 | 0.0191672 |
33 | 0.0174747 |
34 | 0.0191898 |
35 | 0.0126323 |
36 | 0.0185624 |
37 | 0.0230986 |
38 | 0.0195683 |
39 | 0.0220649 |
40 | 0.0219644 |
41 | 0.0208423 |
42 | 0.0223014 |
43 | 0.0204080 |
44 | 0.0179026 |
45 | 0.0139712 |
46 | 0.0125547 |
47 | 0.0127300 |
48 | 0.0146663 |
49 | 0.0125517 |
50 | 0.0183537 |
51 | 0.0103274 |
52 | 0.0142397 |
Let \(t\) index day of survey. We model temperature in Fahrenheit as \(F_{t}\) using an AR(1) persistent process and a sine function. According to Wikipedia, which reports Climate data for Guangzhou, China, daily mean temperature varies between 57F (Jan) and 84F (June). Record high temperature was reported in July at 102.4F. Record low temperature was reported in December at 32.0F. We will mark out these temperature levels visually in graphs below.
Specifically, temperature in Fahrenheit is modeled as: \[ F_{t} = \exp\left( \mu + \epsilon_t + \gamma \sin \left( \frac{W_t}{52}\cdot 2\cdot\pi + \frac{3-\frac{1}{6}}{2}\cdot\pi \right) \right) \] where \(\gamma\) scales the sine curve, and the shock process \(\epsilon\) follows a AR(1) process: \[ \epsilon_{t} = \rho \cdot \epsilon_{t-1} + \nu_t \] with \(0<\rho<1\), and \(\nu \sim N\left(0, \sigma_\nu\right)\). Using the function above, the coldest month will be somewhere around january.
Below, we formulate the temperature simulation function ffi_daily_temp_simulation().
ffi_daily_temp_simulation <- function(
fl_mthly_mean_lowest = 57,
fl_mthly_mean_highest = 84,
fl_record_lowest = 32,
fl_record_highest = 102.4,
it_weeks_in_year = 52,
it_days_in_week = 7,
it_years = 6,
fl_mu = 4.15,
fl_sin_scaler = 0.20,
fl_sigma_nv = 0.15,
fl_rho_persist = 0.40,
it_rand_seed = 123,
st_extreme_cold_percentile = "p05",
st_extreme_heat_percentile = "p95"
){
# # Guangzhou temp info
# fl_mthly_mean_lowest <- 57
# fl_mthly_mean_highest <- 84
# fl_record_lowest <- 32
# fl_record_highest <- 102.4
# # Total number of periods (over three years)
# it_weeks_in_year <- 52
# it_days_in_week <- 7
# it_years <- 6
it_max_days <- it_weeks_in_year*it_days_in_week
T <- it_max_days * it_years
# # Mean temp
# fl_mu <- 4.15
# fl_sin_scaler <- 0.20
# # AR 1 parameter
# fl_sigma_nv <- 0.15
# fl_rho_persist <- 0.40
# Generate a vector of shocks
set.seed(it_rand_seed)
ar_nv_draws <- rnorm(T, mean = 0, sd = fl_sigma_nv)
# Generate a vector of epsilons
ar_epsilon_ar1 <- vector("double", length=T)
ar_epsilon_ar1[1] <- ar_nv_draws[1]
for (it_t in 2:T) {
ar_epsilon_ar1[it_t] <- fl_rho_persist*ar_epsilon_ar1[it_t-1] + ar_nv_draws[it_t]
}
# Generate week by week sin curve values
ar_day_at_t <- rep(1:it_max_days, it_years)
ar_year_at_t <- as.vector(t(matrix(data=rep(1:it_years, it_max_days),
nrow=it_years, ncol=it_max_days)))
ar_base_temp <- sin((ar_day_at_t/it_max_days)*2*pi + ((3-1/6)/2)*pi)
# Generate overall temperature in Fahrenheit
ar_fahrenheit_city_over_t <-
exp(fl_mu + ar_epsilon_ar1 + fl_sin_scaler*ar_base_temp)
# Dataframe with Temperatures
mt_fahrenheit_info <- cbind(ar_day_at_t, ar_year_at_t, ceiling(ar_day_at_t/it_days_in_week),
ar_fahrenheit_city_over_t,
exp(ar_base_temp), ar_epsilon_ar1, ar_nv_draws)
ar_st_varnames <- c('survey_t','day_of_year', 'year', 'week_of_year',
'Fahrenheit', 'FnoShock', 'AR1Shock', 'RandomDraws')
# Combine to tibble, add name col1, col2, etc.
df_fahrenheit <- as_tibble(mt_fahrenheit_info) %>%
rowid_to_column(var = "t") %>%
rename_all(~c(ar_st_varnames))
# Generate extreme temperatures
df_stats_fahrenheit <- REconTools::ff_summ_percentiles(df_fahrenheit, FALSE)
# Add Extreme Thresholds
fl_lowF_threshold <- df_stats_fahrenheit %>% filter(var == "Fahrenheit") %>% pull(st_extreme_cold_percentile)
fl_highF_threshold <- df_stats_fahrenheit %>% filter(var == "Fahrenheit") %>% pull(st_extreme_heat_percentile)
df_fahrenheit <- df_fahrenheit %>%
mutate(extreme_cold = case_when(Fahrenheit <= fl_lowF_threshold ~ 1, TRUE ~ 0)) %>%
mutate(extreme_hot = case_when(Fahrenheit >= fl_highF_threshold ~ 1, TRUE ~ 0))
# REconTools::ff_summ_percentiles(df_fahrenheit, FALSE)
# Title
st_title <- paste0('Simulated Temperature for Guangzhou (Sine Wave + AR(1))\n',
'Each subplot is a different year\n',
'RED = Guangzhou Temp 1971–2000 lowest and highest monthly averages\n',
'BLUE = Guangzhou Temp 1961–2000 record lows and highs')
# Display
plt_fahrenheit <- df_fahrenheit %>%
ggplot(aes(x = ar_day_at_t, y=Fahrenheit)) +
geom_line() +
geom_hline(yintercept = fl_mthly_mean_lowest, linetype = "solid", colour = "red", size = 1) +
geom_hline(yintercept = fl_mthly_mean_highest, linetype = "solid", colour = "red", size = 1) +
geom_hline(yintercept = fl_record_lowest, linetype = "dashed", colour = "blue", size = 1) +
geom_hline(yintercept = fl_record_highest, linetype = "dashed", colour = "blue", size = 1) +
facet_wrap(~ year) +
labs(
title = st_title,
x = 'Calendar day in year',
y = 'Temperature in Fahrenheit'
) +
scale_x_continuous(n.breaks = 12) +
scale_y_continuous(n.breaks = 10) +
theme(
axis.text.x = element_text(angle = 45, vjust = 0.1, hjust = 0.1)
)
# Return
return(list(
df_fahrenheit = df_fahrenheit,
plt_fahrenheit = plt_fahrenheit
))
}
Having constructed the temperature simulation function, first we call it with AR1 shock + sin curve, persistence is high at \(\rho=0.7\).
# Test 1: Call function with AR1 + Since Curve
ls_fahrenheit <- ffi_daily_temp_simulation(
fl_mthly_mean_lowest = 57,
fl_mthly_mean_highest = 84,
fl_record_lowest = 32,
fl_record_highest = 102.4,
it_weeks_in_year = 52,
it_days_in_week = 7,
it_years = 2,
fl_mu = 4.15,
fl_sin_scaler = 0.25,
fl_sigma_nv = 0.15,
fl_rho_persist = 0.70,
it_rand_seed = 123)
print(ls_fahrenheit$plt_fahrenheit)
# df_fahrenheit <- ls_fahrenheit$df_fahrenheit
# kable(df_fahrenheit) %>% kable_styling_fc()
Second, we call the same temperature function, but we reduce shock persistence to 0.
# Test 2: Call function with defaults
ls_fahrenheit <- ffi_daily_temp_simulation(
fl_mthly_mean_lowest = 57,
fl_mthly_mean_highest = 84,
fl_record_lowest = 32,
fl_record_highest = 102.4,
it_weeks_in_year = 52,
it_days_in_week = 7,
it_years = 2,
fl_mu = 4.15,
fl_sin_scaler = 0.25,
fl_sigma_nv = 0.15,
fl_rho_persist = 0.0,
it_rand_seed = 123)
# Show
print(ls_fahrenheit$plt_fahrenheit)
# df_fahrenheit <- ls_fahrenheit$df_fahrenheit
# kable(df_fahrenheit) %>% kable_styling_fc()
Third, we call the same temperature function, but we reduce shock persistence to 0 and shock variance to almost zero.
# Test 2: Call function with defaults
ls_fahrenheit <- ffi_daily_temp_simulation(
fl_mthly_mean_lowest = 57,
fl_mthly_mean_highest = 84,
fl_record_lowest = 32,
fl_record_highest = 102.4,
it_weeks_in_year = 52,
it_days_in_week = 7,
it_years = 2,
fl_mu = 4.15,
fl_sin_scaler = 0.25,
fl_sigma_nv = 0.025,
fl_rho_persist = 0.0,
it_rand_seed = 123)
# Show
print(ls_fahrenheit$plt_fahrenheit)
# df_fahrenheit <- ls_fahrenheit$df_fahrenheit
# kable(df_fahrenheit) %>% kable_styling_fc()
We will simulate several datasets on birth and relate to temperature distributions. Below we do that, we go through several key points that will be confirmed/demonstrated by the simulations.
There are three relevant distributional concepts here:
Pre-term birth or not is determined jointly by the gestational age at birth and conception date distributions, which determine the conception and birth dates. We establish a common threshold below which a birth is considered pre-term.
Extreme temperature and whether a child is pre-term or not can be related.
Under scenario (A), conception dates are uniformly distribution across the year, and gestational age at birth is unrelated to conception time. Given this, extreme temperature or not has nothing to do with gestational age at birth.
Under scenario (B), we can spuriously have pre-term experiencing less extreme-cold/heat when \(P(C_i)\) is non-uniform/random, meaning \(P(C_i) \neq \frac{1}{\text{CountTotalDatesOfConception}}\). This could go in several directions:
Additionally, pre-term could experience more extreme-cold under scenario (C.1), where extreme temperature has a causal impact on gestational-age, causally leading to more pre-term. This is the research hypothesis of interest.
But pre-term could also experience more extreme-cold under scenario (C.2) for another reason. Suppose the distribution of high-pregnancy-risk parents/mothers across calendar month in each year is non-random: if all high risk mother get pregnant in August, and low risk mother get pregnant in March, we might end up with full-term births not experiencing extreme temperature, but pre-term births experiencing a lot of extreme-cold days. The pre-term births (likely with lower birthweight) is correlated with cold exposures, but this correlation due to the conception timing of high-risk pregnancies. In this situation, cold might not real effects on whether a birth is pre-term or not.
Under the various scenarios, the relationship between the number of days and percentage of conception days to pre-term birth of not will be different. They are:
The share/percentage of days exposed to exposure cold is the appropriate measure to use. As seen in the scenarios above, it does not generate “spurious relationships” due to shorter time spans of pre-term births.
In the exercises below, we only simulate Scenarios (A) and (B) for now.
We generate a dataframe with \(N\) individuals, each with conception day \(C_i\), birth day \(B_i\), and whether in the birth was full-term or not \(F_i\). For simplicity, assume that there are 7 days per week, 4 weeks per month, and 13 months per year (to reach 52 weeks per year). This means there are 336 days of possible birth in a year.
Our gestational age distribution function can take any mean and standard deviation, we rescale the information mentioned prior from Right from the Start under the assumption of 336 days per year. (Plug in mean and sd as if there are 365 days into the function below, it will rescale that based on the actual number of days in the simulated year).
First, we create a function for generating birth conception and gestational distributions.
ffi_pop_concept_birth_simu <- function(
it_pop_n = 50000,
it_days_in_week = 7,
it_weeks_in_month = 4,
it_months_in_year = 12,
it_years = 3,
it_rng_seed = 123,
fl_pre_term_ratio = 0.84,
fl_peak_concept_frac_of_year_1st = 0.3,
fl_peak_concept_frac_of_year_2nd = 0.9,
fl_binom_1st_wgt = 0.15,
fl_binom_2nd_wgt = 0.05,
fl_mu_gabirth_days_365 = 276,
fl_sd_gabirth_days_365 = 14,
df_dist_conception_exo = NULL
) {
# # 1. Define parameters
# # 1.a Number of individuals of interest
# it_pop_n <- 50000
# # 1.b Number of days per week, week per month
# # for simplicity, 7 days per week, 4 weeks per month, 12 months
# it_days_in_week <- 7
# it_weeks_in_month <- 4
# it_months_in_year <- 12
# it_years <- 3
# # 1.c random draw seed
# it_rng_seed <- 123
# # 1.d pre-term threshold
# # what fraction of maximum pregnancy length is pre-term?
# # Max length 44 weeks for example, 44*0.85 is about 37 months.
# fl_pre_term_ratio <- 0.84
# 1.e Other dates
it_weeks_in_year <- it_weeks_in_month*it_months_in_year
it_days_in_month <- it_days_in_week*it_weeks_in_month
it_days_in_year <- it_days_in_week*it_weeks_in_month*it_months_in_year
# 1.f Month of conception distribution
it_peak_wk_1st <- round(it_weeks_in_year*fl_peak_concept_frac_of_year_1st)
it_peak_wk_2nd <- round(it_weeks_in_year*fl_peak_concept_frac_of_year_2nd)
# fl_binom_1st_wgt <- 0.15
# fl_binom_2nd_wgt <- 0.05
# # 1.g Gestational age at birth distribution parameters
mu_gabirth_days <- round((fl_mu_gabirth_days_365/365)*it_days_in_year)
sd_gabirth_days <- round((fl_sd_gabirth_days_365/365)*it_days_in_year)
# 2. Date of conception random draws
# 2.a Week of conception distribution
ls_concept_fc <- ffi_concept_distribution_year(
it_max_weeks = it_weeks_in_year,
it_peak_wk_1st = it_peak_wk_1st, it_peak_wk_2nd = it_peak_wk_2nd,
fl_binom_1st_wgt = fl_binom_1st_wgt, fl_binom_2nd_wgt = fl_binom_2nd_wgt,
it_runif_seed = it_rng_seed*210,
df_dist_conception_exo = df_dist_conception_exo)
df_dist_conception_week <- ls_concept_fc$df_dist_conception
# 2.b.1 Randomly (uniformly) drawing the day of birth
set.seed(it_rng_seed*221)
ar_draws_conception_day_of_week <- sample(
it_days_in_week, it_pop_n, replace=TRUE)
# 2.b.2 Week of conception draws
set.seed(it_rng_seed*222)
ar_draws_conception_week <- sample(
df_dist_conception_week$conception_calendar_week,
it_pop_n,
prob=df_dist_conception_week$conception_prob,
replace=TRUE)
# 2.b.3 Randomly (uniformly) drawing the year of conception
set.seed(it_rng_seed*223)
ar_draws_conception_year <- sample(
it_years, it_pop_n, replace=TRUE)
# 2.c Date of birth
ar_draws_concept_date <- (ar_draws_conception_year-1)*it_days_in_year +
(ar_draws_conception_week-1)*it_days_in_week +
ar_draws_conception_day_of_week
ar_draws_conception_day_of_year <- (ar_draws_conception_week-1)*it_days_in_week +
ar_draws_conception_day_of_week
# 3. Gestational age at birth distribution simulation
# 3.a Gestational age distribution
ls_gsbirth_fc <- ffi_gestation_age_at_birth_dist(
mu_gabirth_days = mu_gabirth_days, sd_gabirth_days = sd_gabirth_days)
df_dist_gabirth <- ls_gsbirth_fc$df_dist_gabirth
# 3.b.1 Gestational day of week draws (random)
set.seed(it_rng_seed*321)
ar_draws_gsbirth_day_of_week <- sample(
it_days_in_week, it_pop_n, replace=TRUE)
# 3.b.2 Gestational week draws
set.seed(it_rng_seed*322)
ar_draws_gsbirth_week <- sample(
df_dist_gabirth$gabirth,
it_pop_n,
prob=df_dist_gabirth$prob,
replace=TRUE)
# 3.c Gestational days at birth
ar_draws_gsbirth_day <- ar_draws_gsbirth_week*it_days_in_week + ar_draws_gsbirth_day_of_week
ar_draws_birth_date <- ar_draws_concept_date + ar_draws_gsbirth_day
# 4. Create dataframe
# 4.a Variables and labels
mt_birth_data <- cbind(
ar_draws_concept_date, ar_draws_birth_date, ar_draws_gsbirth_day,
ar_draws_conception_year, ar_draws_conception_week,
ar_draws_conception_day_of_week, ar_draws_conception_day_of_year,
ar_draws_gsbirth_week, ar_draws_gsbirth_day_of_week)
ar_st_varnames <- c('id',
'survey_date_conception', 'survey_date_birth', 'gestation_length_in_days',
'conception_year', 'conception_week',
'conception_day_of_week', 'concept_day_of_year',
'gestational_week_at_birth', 'gestational_day_of_week_at_birth')
# 4.b tibble with conception and birth data
df_birth_data <- as_tibble(mt_birth_data) %>%
rowid_to_column(var = "id") %>%
rename_all(~c(ar_st_varnames)) %>%
arrange(survey_date_conception, survey_date_birth)
# 4.c generate cut-off for preterm
it_pre_term_threshold <- round(fl_pre_term_ratio*length(df_dist_gabirth$gabirth)*it_days_in_week)
df_birth_data <- df_birth_data %>% mutate(
preterm = case_when(it_pre_term_threshold >= gestation_length_in_days ~ 1,
TRUE ~ 0) )
# 5. Display data
# Display
st_title <- paste0('Day of year of conception and gestational age at birth\n',
'pop=', it_pop_n, ', days-in-year=', it_days_in_year, ', seed=', it_rng_seed, '\n',
'mean-ga-at-birth-in-month=', round(mu_gabirth_days/it_days_in_month, 3),
', sd-ga-at-birth-in-month=', round(sd_gabirth_days/it_days_in_month, 3))
plt_concept_birth <- df_birth_data %>%
mutate(preterm = factor(preterm)) %>%
ggplot(aes(x = concept_day_of_year, y=gestation_length_in_days, color=preterm)) +
facet_wrap(~ conception_year) +
geom_point() +
labs(
title = st_title,
x = 'Calendar day in year',
y = 'Gestational age in days at birth'
) +
scale_x_continuous(n.breaks = 12) +
scale_y_continuous(n.breaks = 10) +
theme(
axis.text.x = element_text(angle = 45, vjust = 0.1, hjust = 0.1)
)
# print(plt_concept_birth)
# kable(df_birth_data) %>% kable_styling_fc_wide()
# plot(df_birth_data$survey_date_conception, df_birth_data$gestation_length_in_days)
# Return
return(list(
df_birth_data = df_birth_data,
plt_concept_birth = plt_concept_birth,
ls_concept_fc = ls_concept_fc,
ls_gsbirth_fc = ls_gsbirth_fc
))
}
We generate a dataframe where the conception months are randomly/uniformly distributed, and there is no correlation between conception and gestation. This is Scenario (A).
# Define some dates
it_days_in_week <- 7
it_weeks_in_month <- 4
it_months_in_year <- 13
it_years <- 3
# Simulate
ls_concept_birth <- ffi_pop_concept_birth_simu(
it_pop_n = 10000,
it_days_in_week = it_days_in_week,
it_weeks_in_month = it_weeks_in_month,
it_months_in_year = it_months_in_year,
it_years = it_years,
it_rng_seed = 999,
fl_pre_term_ratio = 0.84,
fl_peak_concept_frac_of_year_1st = 0.3,
fl_peak_concept_frac_of_year_2nd = 0.9,
fl_binom_1st_wgt = 0.00,
fl_binom_2nd_wgt = 0.00,
fl_mu_gabirth_days_365 = 276,
fl_sd_gabirth_days_365 = 14
)
# Get dataframe and print distribution
df_birth_data_rand_cor0 <- ls_concept_birth$df_birth_data
print(ls_concept_birth$ls_concept_fc$plt_concept_week_of_year)
print(ls_concept_birth$ls_gsbirth_fc$plt_dist_gabirth)
print(ls_concept_birth$plt_concept_birth)
We generate a dataframe where the conception distribution has a high peak around Feb, and there is no correlation between conception and gestation. This is Scenario (B.1).
# Simulate
ls_concept_birth <- ffi_pop_concept_birth_simu(
it_pop_n = 10000,
it_days_in_week = it_days_in_week,
it_weeks_in_month = it_weeks_in_month,
it_months_in_year = it_months_in_year,
it_years = it_years,
it_rng_seed = 999,
fl_pre_term_ratio = 0.84,
fl_peak_concept_frac_of_year_1st = 0.2,
fl_peak_concept_frac_of_year_2nd = 0.9,
fl_binom_1st_wgt = 0.95,
fl_binom_2nd_wgt = 0.00,
fl_mu_gabirth_days_365 = 276,
fl_sd_gabirth_days_365 = 14
)
# Get dataframe and print distribution
df_birth_data_CFeb_cor0 <- ls_concept_birth$df_birth_data
print(ls_concept_birth$ls_concept_fc$plt_concept_week_of_year)
print(ls_concept_birth$ls_gsbirth_fc$plt_dist_gabirth)
print(ls_concept_birth$plt_concept_birth)
We generate a dataframe where the conception distribution has a high peak around October, and there is no correlation between conception and gestation. This is Scenario (B.2).
# Simulate
ls_concept_birth <- ffi_pop_concept_birth_simu(
it_pop_n = 10000,
it_days_in_week = it_days_in_week,
it_weeks_in_month = it_weeks_in_month,
it_months_in_year = it_months_in_year,
it_years = it_years,
it_rng_seed = 999,
fl_pre_term_ratio = 0.84,
fl_peak_concept_frac_of_year_1st = 0.2,
fl_peak_concept_frac_of_year_2nd = 0.8,
fl_binom_1st_wgt = 0.0,
fl_binom_2nd_wgt = 0.95,
fl_mu_gabirth_days_365 = 276,
fl_sd_gabirth_days_365 = 14
)
# Get dataframe and print distribution
df_birth_data_COct_cor0 <- ls_concept_birth$df_birth_data
print(ls_concept_birth$ls_concept_fc$plt_concept_week_of_year)
print(ls_concept_birth$ls_gsbirth_fc$plt_dist_gabirth)
print(ls_concept_birth$plt_concept_birth)
We generate a dataframe based on the actual conception distribution in Guangzhou, China from a particular survey year. Is this more similar to Scenarios A, B.1 or B.2? The data is from Same Environment, Stratified Impacts? Air Pollution, Extreme Temperatures, and Birth Weight in South China.
st_dist_conception_lbhwz_actual <- "conception_calendar_week, conception_prob
1, 0.0237
2, 0.0221
3, 0.0217
4, 0.0202
5, 0.0226
6, 0.0232
7, 0.0223
8, 0.0214
9, 0.0161
10, 0.0183
11, 0.0213
12, 0.0191
13, 0.0191
14, 0.0201
15, 0.0196
16, 0.0200
17, 0.0184
18, 0.0184
19, 0.0173
20, 0.0186
21, 0.0204
22, 0.0219
23, 0.0180
24, 0.0181
25, 0.0171
26, 0.0160
27, 0.0167
28, 0.0172
29, 0.0165
30, 0.0160
31, 0.0182
32, 0.0180
33, 0.0184
34, 0.0160
35, 0.0187
36, 0.0170
37, 0.0187
38, 0.0180
39, 0.0190
40, 0.0180
41, 0.0186
42, 0.0198
43, 0.0172
44, 0.0195
45, 0.0203
46, 0.0197
47, 0.0184
48, 0.0180
49, 0.0215
50, 0.0195
51, 0.0222
52, 0.0242"
# Raw probability data to table
df_dist_conception_lbhwz_actual = read.csv(text=st_dist_conception_lbhwz_actual, header=TRUE)
ar_st_varnames <- c('conception_calendar_week',
'conception_prob')
tb_dist_conception_lbhwz_actual <- as_tibble(df_dist_conception_lbhwz_actual) %>%
rename_all(~c(ar_st_varnames)) %>%
mutate(conception_prob = conception_prob/sum(conception_prob))
# Summarize
summary(tb_dist_conception_lbhwz_actual)
## conception_calendar_week conception_prob
## Min. : 1.00 Min. :0.01600
## 1st Qu.:13.75 1st Qu.:0.01799
## Median :26.50 Median :0.01869
## Mean :26.50 Mean :0.01923
## 3rd Qu.:39.25 3rd Qu.:0.02032
## Max. :52.00 Max. :0.02419
# Check Probability sums to 1
sum(tb_dist_conception_lbhwz_actual$conception_prob)
## [1] 1
# Simulate
ls_concept_birth <- ffi_pop_concept_birth_simu(
it_pop_n = 10000,
it_days_in_week = it_days_in_week,
it_weeks_in_month = it_weeks_in_month,
it_months_in_year = it_months_in_year,
it_years = it_years,
it_rng_seed = 999,
fl_pre_term_ratio = 0.84,
fl_peak_concept_frac_of_year_1st = 0.2,
fl_peak_concept_frac_of_year_2nd = 0.9,
fl_binom_1st_wgt = 0.95,
fl_binom_2nd_wgt = 0.00,
fl_mu_gabirth_days_365 = 276,
fl_sd_gabirth_days_365 = 14,
df_dist_conception_exo = tb_dist_conception_lbhwz_actual
)
# Get dataframe and print distribution
df_birth_data_Clbhwz_cor0 <- ls_concept_birth$df_birth_data
print(ls_concept_birth$ls_concept_fc$plt_concept_week_of_year)
print(ls_concept_birth$ls_gsbirth_fc$plt_dist_gabirth)
print(ls_concept_birth$plt_concept_birth)
we simulate temperature corresponding to the span of data of interest that covers all individuals’ potential course of pregnancy. We loop over each individual, and calculate the individual-specific extreme temperature exposure. Specifically, we store the percentage of days during pregnancy exposed to extreme cold or extreme hot. We use as threshold 5 percent temperature extreme tails. We also store the number of days as well as the percent of gestational days that are exposed to extreme cold or hot.
First, we generate the temperature distribution along with extreme cold and hot days.
# Simulate the temperature distribution using just define parameters
ls_fahrenheit <- ffi_daily_temp_simulation(
fl_mthly_mean_lowest = 57,
fl_mthly_mean_highest = 84,
fl_record_lowest = 32,
fl_record_highest = 102.4,
it_weeks_in_year = it_months_in_year*it_weeks_in_month,
it_days_in_week = it_days_in_week,
it_years = it_years+1,
it_rand_seed = 999,
st_extreme_cold_percentile = "p05",
st_extreme_heat_percentile = "p95")
print(ls_fahrenheit$plt_fahrenheit)
df_fahrenheit <- ls_fahrenheit$df_fahrenheit
summary(df_fahrenheit$Fahrenheit)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 34.46 54.57 63.42 65.07 74.42 115.24
df_stats_fahrenheit <- REconTools::ff_summ_percentiles(df_fahrenheit, FALSE)
Second, for one individual, we test computing the number of days exposed to extreme cold.
# Define the extreme cold function
ffi_extreme_cold_percent_gestation <- function(df_fahrenheit, it_date_conception, it_date_birth){
# get extreme cold
ar_extreme_cold <- df_fahrenheit %>%
filter(survey_t >= it_date_conception & survey_t <= it_date_birth) %>% pull(extreme_cold)
# extreme cold days
it_extreme_cold_days <- sum(ar_extreme_cold)
return(it_extreme_cold_days)
}
# Test the function
it_extreme_cold_days <- ffi_extreme_cold_percent_gestation(df_fahrenheit, 11, 200)
print(it_extreme_cold_days)
## [1] 9
Third, we create a function that takes the birth dataframe and temperature dataframe as inputs and generate extreme cold days exposure for each pregancy in the birth dateaframe using apply with an anonymous function.
# Given two dataframes with birth data and temperature data, find cold exposure
ffi_birth_extreme_exposure <- function(df_birth_data, df_fahrenheit){
# apply row by row, anonymous function
# see: https://fanwangecon.github.io/R4Econ/function/noloop/htmlpdfr/fs_apply.html#122_anonymous_function
mt_birth_cold <- apply(df_birth_data, 1, function(row) {
id <- row[1]
it_date_conception <- row[2]
it_date_birth <- row[3]
it_week_of_year_conception <- row[6]
it_preterm <- row[11]
it_extreme_cold_days <- ffi_extreme_cold_percent_gestation(
df_fahrenheit, it_date_conception, it_date_birth)
mt_all_res <- cbind(id, it_date_conception, it_week_of_year_conception,
it_date_birth,
it_preterm, it_extreme_cold_days)
return(mt_all_res)
})
# Column Names
ar_st_varnames <- c('id',
'survey_date_conception', 'week_of_year_conception',
'survey_date_birth',
'preterm', 'days_extreme_cold')
# Combine to tibble, add name col1, col2, etc.
tb_birth_cold <- as_tibble(t(mt_birth_cold)) %>%
rename_all(~c(ar_st_varnames)) %>%
mutate(days_extreme_cold_percent = days_extreme_cold/(survey_date_birth-survey_date_conception)) %>%
mutate(month_of_year_conception = round(week_of_year_conception/it_weeks_in_month))
# Show Results
# kable(tb_birth_cold[1:20,]) %>% kable_styling_fc()
return(tb_birth_cold)
}
Fourth, we generate cold exposures for the datasets we generated using the same temperature dataframe by different birth/conception dataframes created under different assumptions. We use the ffi_birth_extreme_exposure function just created.
# Scenario (A)
tb_birth_cold_rand_cor0 <- ffi_birth_extreme_exposure(df_birth_data_rand_cor0, df_fahrenheit)
# Scenario (B.1)
tb_birth_cold_CFeb_cor0 <- ffi_birth_extreme_exposure(df_birth_data_CFeb_cor0, df_fahrenheit)
# Scenario (B.2)
tb_birth_cold_COct_cor0 <- ffi_birth_extreme_exposure(df_birth_data_COct_cor0, df_fahrenheit)
# Scenario (B.3)
tb_birth_cold_Clbhwz_cor0 <- ffi_birth_extreme_exposure(df_birth_data_Clbhwz_cor0, df_fahrenheit)
In the four datasets that we have generated, each under a different set of conception date distribution assumptions, we have information on the binary prer-term outcome and extreme cold exposure for each individual. We now regress pre-term on extreme cold exposures.
In the regression under Scenarior (A), percent of extreme cold days is unrelated to pre-term.
# Scenarior (A) regression based on percent of days, with month of year conception fixed effects
re_esti_cold_rand_cor0 <- lm(preterm ~ days_extreme_cold_percent + factor(month_of_year_conception),
data=tb_birth_cold_rand_cor0)
summary(re_esti_cold_rand_cor0)
##
## Call:
## lm(formula = preterm ~ days_extreme_cold_percent + factor(month_of_year_conception),
## data = tb_birth_cold_rand_cor0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.1833 -0.1525 -0.1451 -0.1375 0.8941
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.151680 0.021404 7.086 1.47e-12 ***
## days_extreme_cold_percent 0.082741 0.315434 0.262 0.7931
## factor(month_of_year_conception)1 -0.007843 0.022386 -0.350 0.7261
## factor(month_of_year_conception)2 0.001869 0.022307 0.084 0.9332
## factor(month_of_year_conception)3 0.026794 0.023217 1.154 0.2485
## factor(month_of_year_conception)4 -0.009478 0.021501 -0.441 0.6593
## factor(month_of_year_conception)5 -0.003390 0.023813 -0.142 0.8868
## factor(month_of_year_conception)6 -0.010144 0.021580 -0.470 0.6383
## factor(month_of_year_conception)7 -0.002499 0.024223 -0.103 0.9178
## factor(month_of_year_conception)8 -0.013379 0.021352 -0.627 0.5309
## factor(month_of_year_conception)9 -0.003886 0.023437 -0.166 0.8683
## factor(month_of_year_conception)10 -0.012146 0.021819 -0.557 0.5778
## factor(month_of_year_conception)11 -0.018703 0.024094 -0.776 0.4376
## factor(month_of_year_conception)12 -0.049246 0.021375 -2.304 0.0212 *
## factor(month_of_year_conception)13 -0.011213 0.024470 -0.458 0.6468
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3528 on 9985 degrees of freedom
## Multiple R-squared: 0.001927, Adjusted R-squared: 0.0005276
## F-statistic: 1.377 on 14 and 9985 DF, p-value: 0.1548
In the regression under Scenarior (B.1), even controlling for conception month fixed effects, those experiencing extreme cold are associated with full-term birth. Since the outcome is equal to 1 if preterm happens, the coefficient for extreme-code below is negative. More extreme cold seems to make pre-term less likely, this is a spurious relationship due the conception been concentrated around Feburary. (Spurious because gestational age at birth is i.i.d. draw for all individual birth.)
# Scenarior (B.1) regression based on percent of days, with month of year conception fixed effects
re_esti_cold_CFeb_cor0 <- lm(preterm ~ days_extreme_cold_percent + factor(month_of_year_conception),
data=tb_birth_cold_CFeb_cor0)
summary(re_esti_cold_CFeb_cor0)
##
## Call:
## lm(formula = preterm ~ days_extreme_cold_percent + factor(month_of_year_conception),
## data = tb_birth_cold_CFeb_cor0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.47445 -0.17960 -0.13325 -0.02082 1.33148
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.67722 0.07570 8.946 < 2e-16 ***
## days_extreme_cold_percent -12.66697 0.36116 -35.073 < 2e-16 ***
## factor(month_of_year_conception)1 -0.23975 0.07639 -3.139 0.001702 **
## factor(month_of_year_conception)2 -0.31858 0.07498 -4.249 2.17e-05 ***
## factor(month_of_year_conception)3 -0.25788 0.07496 -3.440 0.000583 ***
## factor(month_of_year_conception)4 -0.10998 0.07521 -1.462 0.143698
## factor(month_of_year_conception)5 0.17635 0.09135 1.931 0.053561 .
## factor(month_of_year_conception)6 0.18984 0.09010 2.107 0.035152 *
## factor(month_of_year_conception)7 0.06726 0.09927 0.678 0.498076
## factor(month_of_year_conception)8 0.12532 0.08690 1.442 0.149310
## factor(month_of_year_conception)9 0.28153 0.09647 2.918 0.003529 **
## factor(month_of_year_conception)10 0.26670 0.08831 3.020 0.002533 **
## factor(month_of_year_conception)11 0.17190 0.10026 1.715 0.086462 .
## factor(month_of_year_conception)12 0.15399 0.08724 1.765 0.077585 .
## factor(month_of_year_conception)13 -0.03690 0.10290 -0.359 0.719887
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.333 on 9985 degrees of freedom
## Multiple R-squared: 0.1108, Adjusted R-squared: 0.1095
## F-statistic: 88.83 on 14 and 9985 DF, p-value: < 2.2e-16
In the regression under Scenarior (B.2), even controlling for conception month fixed effects, those experiencing extreme cold are associated with pre-term birth. Since the outcome is equal to 1 if preterm happens, the coefficient for extreme-code below is positive. More extreme cold seems to make pre-term more likely, this is a spurious relationship due the conception been concentrated around October. (Spurious because gestational age at birth is i.i.d. draw for all individual birth.)
# Scenarior (B.2) regression based on percent of days, with month of year conception fixed effects
re_esti_cold_COct_cor0 <- lm(preterm ~ days_extreme_cold_percent + factor(month_of_year_conception),
data=tb_birth_cold_COct_cor0)
summary(re_esti_cold_COct_cor0)
##
## Call:
## lm(formula = preterm ~ days_extreme_cold_percent + factor(month_of_year_conception),
## data = tb_birth_cold_COct_cor0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.2987 -0.1992 -0.1118 -0.0833 1.0466
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1840200 0.0827025 -2.225 0.02610 *
## days_extreme_cold_percent 4.8908282 0.2854113 17.136 < 2e-16 ***
## factor(month_of_year_conception)1 0.1729513 0.1000501 1.729 0.08390 .
## factor(month_of_year_conception)2 0.2536674 0.0951003 2.667 0.00766 **
## factor(month_of_year_conception)3 0.1208876 0.1009776 1.197 0.23127
## factor(month_of_year_conception)4 0.1373929 0.0973410 1.411 0.15814
## factor(month_of_year_conception)5 0.0009896 0.1051860 0.009 0.99249
## factor(month_of_year_conception)6 -0.0128848 0.0988770 -0.130 0.89632
## factor(month_of_year_conception)7 0.0900028 0.1053556 0.854 0.39297
## factor(month_of_year_conception)8 0.0368224 0.0894734 0.412 0.68068
## factor(month_of_year_conception)9 0.0395859 0.0835103 0.474 0.63549
## factor(month_of_year_conception)10 0.0287383 0.0824476 0.349 0.72742
## factor(month_of_year_conception)11 0.0280835 0.0825082 0.340 0.73358
## factor(month_of_year_conception)12 0.0538927 0.0828729 0.650 0.51551
## factor(month_of_year_conception)13 0.1479429 0.1176737 1.257 0.20870
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3479 on 9985 degrees of freedom
## Multiple R-squared: 0.02959, Adjusted R-squared: 0.02823
## F-statistic: 21.75 on 14 and 9985 DF, p-value: < 2.2e-16
In the regression under Scenarior (B.3), we have a very weak negative and insignificant correlation between preterm and extreme-cold exposure in percentage of days. Given our empirical conception distribution, our result is more similar to Scenarior A and not close to Scenarior B.1 and Scenarior B.2. This means that if indeed gestational age at birth are i.i.d. draws and have nothing to do with cold exposures, our empirical conception distribution will not generate a spurious correlation between extreme-cold exposure and pre-term birth (and lower birth-weight).
# Scenarior (B.3) regression based on percent of days, with month of year conception fixed effects
re_esti_cold_Clbhwz_cor0 <- lm(preterm ~ days_extreme_cold_percent + factor(month_of_year_conception),
data=tb_birth_cold_Clbhwz_cor0)
summary(re_esti_cold_Clbhwz_cor0)
##
## Call:
## lm(formula = preterm ~ days_extreme_cold_percent + factor(month_of_year_conception),
## data = tb_birth_cold_Clbhwz_cor0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.1764 -0.1524 -0.1435 -0.1331 0.8863
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.156453 0.020274 7.717 1.31e-14 ***
## days_extreme_cold_percent -0.559244 0.312876 -1.787 0.0739 .
## factor(month_of_year_conception)1 0.007591 0.021583 0.352 0.7251
## factor(month_of_year_conception)2 -0.003988 0.020844 -0.191 0.8483
## factor(month_of_year_conception)3 -0.007072 0.022648 -0.312 0.7549
## factor(month_of_year_conception)4 0.024037 0.019699 1.220 0.2224
## factor(month_of_year_conception)5 0.016194 0.022347 0.725 0.4687
## factor(month_of_year_conception)6 0.008069 0.020596 0.392 0.6952
## factor(month_of_year_conception)7 0.037003 0.023068 1.604 0.1087
## factor(month_of_year_conception)8 0.008074 0.021107 0.383 0.7021
## factor(month_of_year_conception)9 0.028895 0.023264 1.242 0.2142
## factor(month_of_year_conception)10 0.024185 0.020911 1.157 0.2475
## factor(month_of_year_conception)11 0.036997 0.022806 1.622 0.1048
## factor(month_of_year_conception)12 0.029644 0.020268 1.463 0.1436
## factor(month_of_year_conception)13 0.007813 0.023647 0.330 0.7411
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.353 on 9985 degrees of freedom
## Multiple R-squared: 0.001145, Adjusted R-squared: -0.0002554
## F-statistic: 0.8176 on 14 and 9985 DF, p-value: 0.6506
Using the data generated prior, we generate distribution of days exposed to extreme cold temperature.
First, create a function for generating statistics and visualization for the relationship between extreme cold days and pre-term or not.
# Create a function.
ffi_cold_days_preterm_analyze <- function(
tb_birth_cold,
st_title = paste0('Scenario (A), Extreme cold DAYS Distribution\n',
'Uniform Conception\n',
'Conception and Birth uncorrelated')){
# summarize
str_stats_group <- 'allperc'
ar_perc <- c(0.05, 0.25, 0.5, 0.75, 0.95)
# For tb_birth_cold
ls_summ_by_group <- REconTools::ff_summ_bygroup(
tb_birth_cold, c('preterm'),
'days_extreme_cold', str_stats_group, ar_perc)
df_table_grp_stats_rand_cor0 <- ls_summ_by_group$df_table_grp_stats
print(df_table_grp_stats_rand_cor0)
# Visualize
plt_rand_cor0_level <- tb_birth_cold %>%
mutate(preterm = factor(preterm)) %>%
group_by(preterm) %>% mutate(days_extreme_cold_mean = mean(days_extreme_cold)) %>% ungroup() %>%
ggplot(aes(x=days_extreme_cold, color=preterm)) +
geom_density() +
geom_vline(aes(xintercept=days_extreme_cold_mean, color=preterm), linetype="dashed") +
labs(
title = st_title,
x = 'Days exposed to extreme cold',
y = 'Density'
) +
scale_x_continuous(n.breaks = 10) +
scale_y_continuous(n.breaks = 10) +
theme(
axis.text.x = element_text(angle = 45, vjust = 0.1, hjust = 0.1)
)
return(list(
df_temp_preterm_stats = df_table_grp_stats_rand_cor0,
plt_temp_preterm = plt_rand_cor0_level))
}
Second, create a function for generating statistics and visualization for the relationship between extreme cold percent of gestational days and pre-term or not.
ffi_cold_percent_days_preterm_analyze <- function(
tb_birth_cold,
st_title = paste0('Scenario (A), Extreme cold PERCENT of DAYS Distribution\n',
'Uniform Conception\n',
'Conception and Birth uncorrelated')){
# summarize
str_stats_group <- 'allperc'
ar_perc <- c(0.05, 0.25, 0.5, 0.75, 0.95)
# For tb_birth_cold_rand_cor0
ls_summ_by_group <- REconTools::ff_summ_bygroup(
tb_birth_cold, c('preterm'),
'days_extreme_cold_percent', str_stats_group, ar_perc)
df_table_grp_stats_rand_cor0 <- ls_summ_by_group$df_table_grp_stats
# Visualize
plt_rand_cor0_level <- tb_birth_cold %>%
mutate(preterm = factor(preterm)) %>%
group_by(preterm) %>% mutate(days_extreme_cold_percent_mean = mean(days_extreme_cold_percent)) %>% ungroup() %>%
ggplot(aes(x=days_extreme_cold_percent, color=preterm)) +
geom_density() +
geom_vline(aes(xintercept=days_extreme_cold_percent_mean, color=preterm), linetype="dashed") +
labs(
title = st_title,
x = 'Percent of gestational days exposed to extreme cold',
y = 'Density'
) +
scale_x_continuous(n.breaks = 10) +
scale_y_continuous(n.breaks = 10) +
theme(
axis.text.x = element_text(angle = 45, vjust = 0.1, hjust = 0.1)
)
return(list(
df_temp_preterm_stats = df_table_grp_stats_rand_cor0,
plt_temp_preterm = plt_rand_cor0_level))
}
Scenario A, number of days exposed to extreme cold distribution:
# For tb_birth_cold_rand_cor0
st_title = paste0('Scenario (A), Extreme cold DAYS Distribution (dashed lines are means)\n',
'Uniform Conception\n',
'Conception and Birth uncorrelated')
ls_coldexp_preterm <- ffi_cold_days_preterm_analyze(tb_birth_cold_rand_cor0, st_title)
print(ls_coldexp_preterm$plt_temp_preterm)
Scenario B.1, number of days exposed to extreme cold distribution:
# For tb_birth_cold_CFeb_cor0
st_title = paste0('Scenario (B.1), Extreme cold DAYS Distribution (dashed lines are means)\n',
'Conception Concentrated around Feb\n',
'Conception and Birth uncorrelated')
ls_coldexp_preterm <- ffi_cold_days_preterm_analyze(tb_birth_cold_CFeb_cor0, st_title)
print(ls_coldexp_preterm$plt_temp_preterm)
Scenario B.2, number of days exposed to extreme cold distribution (average lines largely overlap):
# For tb_birth_cold_COct_cor0
st_title = paste0('Scenario (B.2), Extreme cold DAYS Distribution (dashed lines are means)\n',
'Conception Concentrated around Oct\n',
'Conception and Birth uncorrelated')
ls_coldexp_preterm <- ffi_cold_days_preterm_analyze(tb_birth_cold_COct_cor0, st_title)
print(ls_coldexp_preterm$plt_temp_preterm)
Scenario B.3, number of days exposed to extreme cold distribution (average lines largely overlap), note this is the result based on the empirical conception distribution from Guangzhou:
# For tb_birth_cold_Clbhwz_cor0
st_title = paste0('Scenario (B.3), Extreme cold DAYS Distribution (dashed lines are means)\n',
'Conception Empirical Guangzhou Distribution\n',
'Conception and Birth uncorrelated')
ls_coldexp_preterm <- ffi_cold_days_preterm_analyze(tb_birth_cold_Clbhwz_cor0, st_title)
print(ls_coldexp_preterm$plt_temp_preterm)
Scenario A, percent of days exposed to extreme cold distribution:
# For tb_birth_cold_rand_cor0
st_title = paste0('Scenario (A), Extreme cold PERCENT of DAYS Distribution (dashed = means)\n',
'Uniform Conception\n',
'Conception and Birth uncorrelated')
ls_coldexp_preterm <- ffi_cold_percent_days_preterm_analyze(tb_birth_cold_rand_cor0, st_title)
print(ls_coldexp_preterm$plt_temp_preterm)
Scenario B.1, percent of days exposed to extreme cold distribution:
# For tb_birth_cold_CFeb_cor0
st_title = paste0('Scenario (B.1), Extreme cold PERCENT of DAYS Distribution (dashed = means)\n',
'Conception Concentrated around Feb\n',
'Conception and Birth uncorrelated')
ls_coldexp_preterm <- ffi_cold_percent_days_preterm_analyze(tb_birth_cold_CFeb_cor0, st_title)
print(ls_coldexp_preterm$plt_temp_preterm)
Fourth, Scenario B.2, percent of days exposed to extreme cold distribution:
# For tb_birth_cold_COct_cor0
st_title = paste0('Scenario (B.2), Extreme cold PERCENT of DAYS Distribution (dashed = means)\n',
'Conception Concentrated around Oct\n',
'Conception and Birth uncorrelated')
ls_coldexp_preterm <- ffi_cold_percent_days_preterm_analyze(tb_birth_cold_COct_cor0, st_title)
print(ls_coldexp_preterm$plt_temp_preterm)
Scenario B.3, percent of days exposed to extreme cold distribution (average lines largely overlap), note this is the result based on the empirical conception distribution from Guangzhou:
# For tb_birth_cold_Clbhwz_cor0
st_title = paste0('Scenario (B.3), Extreme cold PERCENT of DAYS Distribution (dashed = means)\n',
'Conception Empirical Guangzhou Distribution\n',
'Conception and Birth uncorrelated')
ls_coldexp_preterm <- ffi_cold_percent_days_preterm_analyze(tb_birth_cold_Clbhwz_cor0, st_title)
print(ls_coldexp_preterm$plt_temp_preterm)