This is a work-in-progress course website for Introductory Statistics for Undergraduate Students, produced by Fan. Course covers a limited subset of topics from Statistics for Business and Economics (Anderson Sweeney Williams Camm Cochran 12e). Files are from Fan’s Stat4Econ repository.
R is used. Packages from Tidyverse are used, including tibble for framing data, tidyr and dplyr for reshaping data and aggregating statistics, ggplot2 for graphing, and readr for file reading and writing. Materials are presented as R, RMD, PDF and HTML files. To obtain all codes and raw files, see here for github set up. For HTML files, click on the links below.
From Fan’s other repositories: For dynamic borrowing and savings problems, see Dynamic Asset Repository; For code examples, see also R Example Code, Matlab Example Code, and Stata Example Code; For intro econ with Matlab, see Intro Mathematics for Economists, and for intro stat with R, see Intro Statistics for Undergraduates. See here for all of Fan’s public repositories.
Please contact FanWangEcon for issues or problems.
1 Survey
- An In-class Survey: rmd | r | pdf | html
- create a tibble dataset
- draw 10 random students from 50 and build a survey
- r: factor() + ifelse()
- dplyr: group_by() + mutate() + summarise()
- tibble: add_row()
- readr: write_csv()
2 Dataset, Tables and Graphs
- Opening a Dataset: rmd | r | pdf | html
- Opening a Dataset.
- r: setwd()
- readr: write_csv()
- One Variable Graphs and Tables: rmd | r | pdf | html
- Frequency table, bar chart and histogram.
- R function and lapply to generate graphs/tables for different variables.
- r: c(‘word1’,’word2’) + function() + for (ctr in c(1,2)) {} + lapply()
- dplyr: group_by() + summarize() + n()
- ggplot: geom_bar() + geom_histogram() + labs(title = ‘title’, caption = ‘caption’)
- Multiple Variables Graphs and Tables: rmd | r | pdf | html
- Two-way frequency table, stacked bar chart annd scatter-plot
- r: interaction()
- dplyr: group_by(var) + summarize(freq = n()) + spread(gender, freq)
- ggplot: aes(x,y,fill) + geom_bar(stat=’identity’, fun.y=’mean’, position=’dodge’) + geom_point(size) + geom_text(size,hjust,vjust) + geom_smooth(method=lm) + labs(title,x,y,caption)
3 Summarizing Data
- Mean and Standard Deviation: rmd | r | pdf | html
- Mean and standard deviation from a dataset with city-month temperatures.
- r: dim() + min() + ceiling() + lapply() + vector(mode=”character”,length) + substring(var, first, last) + func <- function(return(list))
- dplyr: mutate() + select() + filter()
- tidyr: gather(vara, val, -varb)
- rlang: !!sym(str_var_name)
- ggplot: aes(x, y, colour, linetype, shape) + facet_wrap(~var, scales=’free_y’) + geom_line() + geom_point() + geom_jitter(size, width) + scale_x_continuous(labels, breaks)
- Rescaling Standard Deviation and Covariance: rmd | r | pdf | html
- Scatter-plot of a dataset with state-level wage and education data.
- Coefficient of variation and standard deviation, correlation and covariance.
- r: mean() + sd() + var() + cov() + cor()
- ggplot: geom_point(size) + geom_text() + geom_smooth()
4 Basics of Probability
- Sample Space, Experimental Outcomes, Events, Probabilities: rmd | r | pdf | html
- Sample Space, Experimental Outcomes, Events and Probability.
- Union, intersection and complements
- conditional probability
- Examples of Sample Space and Probabilities: rmd | r | pdf | html
- Throwing a quarter, four candidates for election, six-sided unfair dice, two basketball games
- r: sample(size, replace, prob)
- Law of Large Number Unfair Dice: rmd | r | pdf | html
- Throw an unfair dice many times, law of large number.
- r: head() + tail() + factor() + sample() + as.numeric() + paste0(‘dice=’, var) + sprintf(‘%0.3f’, 1.1234) + sprintf(“P(S=1)=%0.3f, P(S=2)=%0.3f”, 1.1, 1.2345)
- stringr: str_extract() + as.numeric(str_extract(variable, “[^.n]+$”)))
- dplyr: mutate(!!str_mean_var := as.numeric(sprintf(‘%0.5f’, freq / sum(freq))))
- ggplot: geom_line() + scale_x_continuous(trans=’log10’, labels=c(‘n=100’, ‘n=1000’), breaks=c(100, 1000))
- Multiple-Step Experiment: Playing the Lottery Three times: rmd | r | pdf | html
- Paths after 1, 2 and 3 plays.
5 Discrete Probability Distribution
- Discrete Random Variable and Binomial Experiment: rmd | r | pdf | html
- Discrete Random Variable, expected value and variance.
- Binomial Properties, examples using USA larceny clearance rate, WWII German soldier survival rate
- r: dbinom() + pbinom() + sprintf(paste0(‘abc\n’, ‘efg = %s’), ‘opq’) + round(1.123, 2) + lapply()
- ggplot: df %>% ggplot(aes(x)) + geom_bar(aes(y=prob), stat=’identity’, alpha=0.5, width=0.5, fill) + geom_text(aes(y=prob, label=paste0(sprintf(‘%2.1f’, p), ‘%’)), vjust, size, color, fontface) + labs(title, x, y, caption) + scale_y_continuous(sec.axis, name) + + scale_x_continuous(labels, breaks) + theme(axis.text.y, axis.text.y.right, axis.text.y.left)
- Poisson Probability Distribution: rmd | r | pdf | html
- Poisson Properties, Ladislaus Bortkiewicz and Prussian army horse-kick deaths.
- r: dpois() + ppois()
- ggplot: geom_bar() + geom_text() + gome_line() + geom_point() + geom_text() + labs() + scale_y_continuous() + scale_x_continuous() + theme()
Please contact for issues or problems.