• Introductory Statistics with R tidyverse
  • Preface
  • 1 Survey
    • 1.1 Generate A Dataset in R
      • 1.1.1 A Random Sample of Students in Class
  • 2 Dataset, Tables and Graphs
    • 2.1 Opening a Dataset
      • 2.1.1 Paths to Data
    • 2.2 One Variable Graphs and Tables
      • 2.2.1 Categorical/Discrete
      • 2.2.2 Continuous/Quantitative
    • 2.3 Multiple Variables Graphs and Tables
      • 2.3.1 Two Continuous Variables
      • 2.3.2 Two Categorical Variables
      • 2.3.3 Continuous and Categorical Variable
  • 3 Summarizing Data
    • 3.1 Mean and Standard Deviation
      • 3.1.1 Temperature Across Locations over Time
    • 3.2 Coefficient of Variation and Correlation
      • 3.2.1 Education and Wage
  • 4 Basics of Probability
    • 4.1 Experimental Outcomes
      • 4.1.1 Sample Space and Probabilities
      • 4.1.2 Union and Intersection and Complements
      • 4.1.3 Conditional Probability
    • 4.2 Sample Space and Probability Examples
      • 4.2.1 Presidential Election
      • 4.2.2 Throwing a Dice
      • 4.2.3 Two Basketball Games
    • 4.3 Law of Large Number
      • 4.3.1 An Unfair Dice
    • 4.4 Multiple-Step Experiment
      • 4.4.1 Playing the Lottery Three times
  • 5 Discrete Probability Distribution
    • 5.1 Discrete Random Variable and Binomial
      • 5.1.1 Discrete Random Variable
    • 5.2 Binomial Experiment
      • 5.2.1 Binomial Example: Larceny
      • 5.2.2 Binomial Example: WWII German Soldier
    • 5.3 Poisson Distribution
      • 5.3.1 Poisson Example: Horse-Kicking
  • Appendix
  • A Index and Code Links
    • A.1 Survey links
    • A.2 Dataset, Tables and Graphs links
    • A.3 Summarizing Data links
    • A.4 Basics of Probability links
    • A.5 Discrete Probability Distribution links
  • Stat4Econ Bookdown

Introductory Statistics with R tidyverse

A Index and Code Links

A.1 Survey links

  1. An In-class Survey: rmd | r | pdf | html
    • create a tibble dataset
    • draw 10 random students from 50 and build a survey
    • r: factor() + ifelse()
    • dplyr: group_by() + mutate() + summarise()
    • tibble: add_row()
    • readr: write_csv()

A.2 Dataset, Tables and Graphs links

  1. Opening a Dataset: rmd | r | pdf | html
    • Opening a Dataset.
    • r: setwd()
    • readr: write_csv()
  2. One Variable Graphs and Tables: rmd | r | pdf | html
    • Frequency table, bar chart and histogram.
    • R function and lapply to generate graphs/tables for different variables.
    • r: c(‘word1’,‘word2’) + function() + for (ctr in c(1,2)) {} + lapply()
    • dplyr: group_by() + summarize() + n()
    • ggplot: geom_bar() + geom_histogram() + labs(title = ‘title’, caption = ‘caption’)
  3. Multiple Variables Graphs and Tables: rmd | r | pdf | html
    • Two-way frequency table, stacked bar chart annd scatter-plot
    • r: interaction()
    • dplyr: group_by(var) + summarize(freq = n()) + spread(gender, freq)
    • ggplot: aes(x,y,fill) + geom_bar(stat=‘identity’, fun.y=‘mean’, position=‘dodge’) + geom_point(size) + geom_text(size,hjust,vjust) + geom_smooth(method=lm) + labs(title,x,y,caption)

A.3 Summarizing Data links

  1. Mean and Standard Deviation: rmd | r | pdf | html
    • Mean and standard deviation from a dataset with city-month temperatures.
    • r: dim() + min() + ceiling() + lapply() + vector(mode=“character”,length) + substring(var, first, last) + func <- function(return(list))
    • dplyr: mutate() + select() + filter()
    • tidyr: gather(vara, val, -varb)
    • rlang: !!sym(str_var_name)
    • ggplot: aes(x, y, colour, linetype, shape) + facet_wrap(~var, scales=‘free_y’) + geom_line() + geom_point() + geom_jitter(size, width) + scale_x_continuous(labels, breaks)
  2. Rescaling Standard Deviation and Covariance: rmd | r | pdf | html
    • Scatter-plot of a dataset with state-level wage and education data.
    • Coefficient of variation and standard deviation, correlation and covariance.
    • r: mean() + sd() + var() + cov() + cor()
    • ggplot: geom_point(size) + geom_text() + geom_smooth()

A.4 Basics of Probability links

  1. Sample Space, Experimental Outcomes, Events, Probabilities: rmd | r | pdf | html
    • Sample Space, Experimental Outcomes, Events and Probability.
    • Union, intersection and complements
    • conditional probability
  2. Examples of Sample Space and Probabilities: rmd | r | pdf | html
    • Throwing a quarter, four candidates for election, six-sided unfair dice, two basketball games
    • r: sample(size, replace, prob)
  3. Law of Large Number Unfair Dice: rmd | r | pdf | html
    • Throw an unfair dice many times, law of large number.
    • r: head() + tail() + factor() + sample() + as.numeric() + paste0(‘dice=’, var) + sprintf(‘%0.3f’, 1.1234) + sprintf(“P(S=1)=%0.3f, P(S=2)=%0.3f”, 1.1, 1.2345)
    • stringr: str_extract() + as.numeric(str_extract(variable, “[^.n]+$”)))
    • dplyr: mutate(!!str_mean_var := as.numeric(sprintf(‘%0.5f’, freq / sum(freq))))
    • ggplot: geom_line() + scale_x_continuous(trans=‘log10’, labels=c(‘n=100’, ‘n=1000’), breaks=c(100, 1000))
  4. Multiple-Step Experiment: Playing the Lottery Three times: rmd | r | pdf | html
    • Paths after 1, 2 and 3 plays.

A.5 Discrete Probability Distribution links

  1. Discrete Random Variable and Binomial Experiment: rmd | r | pdf | html
    • Discrete Random Variable, expected value and variance.
    • Binomial Properties, examples using USA larceny clearance rate, WWII German soldier survival rate
    • r: dbinom() + pbinom() + sprintf(paste0(‘abc\n’, ‘efg = %s’), ‘opq’) + round(1.123, 2) + lapply()
    • ggplot: df %>% ggplot(aes(x)) + geom_bar(aes(y=prob), stat=‘identity’, alpha=0.5, width=0.5, fill) + geom_text(aes(y=prob, label=paste0(sprintf(‘%2.1f’, p), ‘%’)), vjust, size, color, fontface) + labs(title, x, y, caption) + scale_y_continuous(sec.axis, name) + + scale_x_continuous(labels, breaks) + theme(axis.text.y, axis.text.y.right, axis.text.y.left)
  2. Poisson Probability Distribution: rmd | r | pdf | html
    • Poisson Properties, Ladislaus Bortkiewicz and Prussian army horse-kick deaths.
    • r: dpois() + ppois()
    • ggplot: geom_bar() + geom_text() + gome_line() + geom_point() + geom_text() + labs() + scale_y_continuous() + scale_x_continuous() + theme()

Müller, Kirill, and Hadley Wickham. 2019. Tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble.

Wickham, Hadley. 2019. Tidyverse: Easily Install and Load the ’Tidyverse’. https://CRAN.R-project.org/package=tidyverse.

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, and Hiroaki Yutani. 2019. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.

Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2019. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

Wickham, Hadley, and Lionel Henry. 2019. Tidyr: Tidy Messy Data. https://CRAN.R-project.org/package=tidyr.

Wickham, Hadley, Jim Hester, and Romain Francois. 2018. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.

Xie, Yihui. 2020. Bookdown: Authoring Books and Technical Documents with R Markdown. https://CRAN.R-project.org/package=bookdown.