# load tidyverse
library(tidyverse)
workHr
and the percent of tuition is in the column tuition
.# load data
survey <- read_csv("stat100_2017fall_survey02.csv")
Parsed with column specification:
cols(
.default = col_integer(),
gender = col_character(),
genderID = col_character(),
greek = col_character(),
homeTown = col_character(),
ethnicity = col_character(),
religion = col_character(),
calculus = col_character(),
GPA = col_double(),
expectedIncome = col_double(),
president = col_character(),
politicalParty = col_character(),
section = col_character()
)
See spec(...) for full column specifications.
workHr
and tuition
for each ethnic group (given in the column ethnicity
).Use group_by()
and summarize()
:
(Avg <- group_by(survey, ethnicity) %>%
summarize(workHr_g=mean(workHr), tuition_g=mean(tuition)))
# A tibble: 6 x 3
ethnicity workHr_g tuition_g
<chr> <dbl> <dbl>
1 Black 8.031496 41.57480
2 East Asian 3.321033 83.69004
3 Hispanic 7.260638 39.30851
4 Other 7.785714 63.42857
5 South Asian 3.958333 76.11111
6 White 5.247440 63.39590
workHr
? What is the highest average workHr
? Which group has the lowest average workHr
? What is the lowest average workHr
?Use arrange()
to sort the observation by workHr
:
(Avg <- arrange(Avg, workHr_g))
# A tibble: 6 x 3
ethnicity workHr_g tuition_g
<chr> <dbl> <dbl>
1 East Asian 3.321033 83.69004
2 South Asian 3.958333 76.11111
3 White 5.247440 63.39590
4 Hispanic 7.260638 39.30851
5 Other 7.785714 63.42857
6 Black 8.031496 41.57480
We see that Blacks have the highest average workHr
of 8.03 hours/week, and East Asians have the lowest average workHr
of 3.32 hours/week.
By default, arrange()
sorts the data in ascending order. We can use the function desc()
to sort the data in descending order:
arrange(Avg, desc(workHr_g))
# A tibble: 6 x 3
ethnicity workHr_g tuition_g
<chr> <dbl> <dbl>
1 Black 8.031496 41.57480
2 Other 7.785714 63.42857
3 Hispanic 7.260638 39.30851
4 White 5.247440 63.39590
5 South Asian 3.958333 76.11111
6 East Asian 3.321033 83.69004
workHr
and the group means of tuition
. This is known as the ecological correlation. Compare the ecological correlation and the correlation between workHr
and tuition
.# correlation between workHr and tuition
cor(survey$workHr, survey$tuition)
[1] -0.1851278
# ecological correlation
cor(Avg$workHr_g, Avg$tuition_g)
[1] -0.8536816
We see that the ecological correlation is more negative than the correlation. It is generally true that the magnitude of the ecological correlation is larger than the magnitude of the correlation, as you’ve learned in Stat 100.