In this problem, you are going to look at Stat 100’s survey 1 data in Spring 2017. The csv data file can be downloaded here. Put the file, Stat100_2017spring_survey01.csv, to your R’s working directory and load it with the command
library(tidyverse)
survey <- read_csv("Stat100_2017spring_survey01.csv")
The column variables are explained on this webpage.
speed
column is the maximum speed (in mph) students claimed they had ever driven. What is the mean and sample standard deviation of speed
?Use the summarize()
function to get the answer:
summarize(survey, mean(speed), sd(speed))
# A tibble: 1 x 2
`mean(speed)` `sd(speed)`
<dbl> <dbl>
1 80.9076 36.16879
ggplot(survey) + geom_histogram(aes(speed,..density..), bins=16, fill="white", color="black")
Use the filter()
function to subset the tibble:
non_drivers <- filter(survey, speed==0)
The number of students who had never driven a car is…
nrow(non_drivers)
[1] 153
To break the number down by gender, we can use group_by()
and then summarize()
:
non_drivers %>% group_by(gender) %>% summarize(n())
# A tibble: 2 x 2
gender `n()`
<chr> <int>
1 Female 114
2 Male 39
OR use the table()
function:
table(non_drivers$gender)
Female Male
114 39
Note that n()
is a function in dplyr
that counts the number of observations in a group. This function can only be used from within summarise()
, mutate()
and filter()
.
speed
column for regular drivers and then calculate the mean and sample standard deviation.Use filter()
to subset the data and then summarize()
to calculate the statistics.
regular <- filter(survey, speed > 30)
(stats <- summarize(regular, mean=mean(speed), sd=sd(speed)))
# A tibble: 1 x 2
mean sd
<dbl> <dbl>
1 93 20.19292
ggplot(regular) +
geom_histogram(aes(speed,..density..), bins=16, fill="white", color="black") +
stat_function(fun=dnorm, args=list(mean=stats$mean, sd=stats$sd), color="red")