Simulate random draws from a population:
sample(population, size (the number of items to choose), replace = FALSE)
Repeat the simulation with:
replicate(M, function)
We can combine sample() and replicate() to simulate the rolling of dice.
#Simulate rolling a dice:
sample(1:6, 1)
#Check whether a single roll gets a 4
sample(1:6, 1) == 4
#Roll two die and check if we get a 4 and 6
sample(1:6, 1) == 4 & sample(1:6, 1) == 4
#Now we will repeatedly roll two dices for 10000 times, check wether we get 4 and 6,
#save it into an object called rep10k
rep10k <- replicate(10000, sample(1:6, 1) == 4 & sample(1:6, 1) == 6)
#Calculate probability by summing up our
#TRUE responses and then dividing by the
#number of observations (10000)
sum(rep10k, TRUE)/10000
## [1] 0.0246
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Assume a uniform distribution of 100 numbers from 1 to 100
population <- 1:100
## randomly sample 10 with replacement and calculate their sample mean
sample(population, 10, replace=TRUE) %>% mean
## [1] 43
## repeatedly do a sample of 10 for 1000 times
clt_rep1k <- replicate(1000, sample(population, 10, replace=TRUE) %>% mean)
## Let's plot the sample means for the 1000 samples
qplot(clt_rep1k)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Verify by plotting the normal curve of the sampling distribution on top of the histogram
ggplot(data.frame(x=clt_rep1k), aes(x)) +
geom_histogram(aes(y=..density..)) +
stat_function(fun=dnorm, args=list(mean=50, sd=sd(1:100)/sqrt(10)))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Data File - californiatod.csv
#install.packages(readr)
library(readr)
library(dplyr)
californiatod <- read_csv("californiatod.csv")
## Parsed with column specification:
## cols(
## name = col_character(),
## region = col_character(),
## transit = col_double(),
## density = col_double(),
## houseval = col_double(),
## railtype = col_character()
## )
First, compute the 95% confidence intervals for density by hand and looking up the z value from the t
table, using R as a calculator.
xbar <- mean(californiatod$density)
s <- sd(californiatod$density)
n <- nrow(californiatod)
alpha <- 1 - 0.95
tstar <- qt(1 - alpha/2, df = n-1)
SE <- s/sqrt(n)
MOE <- tstar*SE
xbar + c(-1,1) * MOE
## [1] 3.933656 6.786344
You can get the confidence interval when you run a single sample t-test in R (to be covered next week) using the command
t.test(californiatod$density, alternative='two.sided', mu=0, conf.level=.95)
Verify whether your results are the same using these 2 difference methods.