1 Run Simulations with R

Simulate random draws from a population:

sample(population, size (the number of items to choose), replace = FALSE)

Repeat the simulation with:

replicate(M, function)

1.1 Roll dices

We can combine sample() and replicate() to simulate the rolling of dice.

#Simulate rolling a dice: 
sample(1:6, 1)

#Check whether a single roll gets a 4
sample(1:6, 1) == 4

#Roll two die and check if we get a 4 and 6

sample(1:6, 1) == 4 & sample(1:6, 1) == 4
#Now we will repeatedly roll two dices for 10000 times, check wether we get 4 and 6,
#save it into an object called rep10k

rep10k <- replicate(10000, sample(1:6, 1) == 4 & sample(1:6, 1) == 6)

#Calculate probability by summing up our
#TRUE responses and then dividing by the 
#number of observations (10000)

sum(rep10k, TRUE)/10000
## [1] 0.0246

1.2 Verify Central Limit Theorem with Simulations

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Assume a uniform distribution of 100 numbers from 1 to 100 
population <- 1:100

## randomly sample 10 with replacement and calculate their sample mean
sample(population, 10, replace=TRUE) %>% mean
## [1] 43
## repeatedly do a sample of 10 for 1000 times
clt_rep1k <- replicate(1000, sample(population, 10, replace=TRUE) %>% mean)

## Let's plot the sample means for the 1000 samples
qplot(clt_rep1k)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Verify by plotting the normal curve of the sampling distribution on top of the histogram
ggplot(data.frame(x=clt_rep1k), aes(x)) + 
  geom_histogram(aes(y=..density..)) + 
  stat_function(fun=dnorm, args=list(mean=50, sd=sd(1:100)/sqrt(10)))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

2 Confidence Intervals

Data File - californiatod.csv

#install.packages(readr)
library(readr)
library(dplyr)
californiatod <- read_csv("californiatod.csv")
## Parsed with column specification:
## cols(
##   name = col_character(),
##   region = col_character(),
##   transit = col_double(),
##   density = col_double(),
##   houseval = col_double(),
##   railtype = col_character()
## )

2.1 Manual Calculations

First, compute the 95% confidence intervals for density by hand and looking up the z value from the t table, using R as a calculator.

xbar <- mean(californiatod$density)
s <- sd(californiatod$density)
n <- nrow(californiatod)
alpha <- 1 - 0.95
tstar <- qt(1 - alpha/2, df = n-1)
SE <- s/sqrt(n)
MOE <- tstar*SE
xbar + c(-1,1) * MOE
## [1] 3.933656 6.786344

2.2 CI from t.test

You can get the confidence interval when you run a single sample t-test in R (to be covered next week) using the command

t.test(californiatod$density, alternative='two.sided', mu=0, conf.level=.95)

Verify whether your results are the same using these 2 difference methods.