Introduction to Tidyverse



So far in this course, you have been learning the traditional core R syntax, known as the base R. However, base R is pretty old and there are things that were useful 10 or 20 years ago may now get in the way. It is difficult to change base R without breaking existing code, so most modern developments are put into new R packages.

In the past few years, Tidyverse has been gaining popularity in the R community. Tidyverse contains a collection of R packages that share an underlying design philosophy, grammar, and data structures. Many companies are switching to using R and tidyverse. Therefore, it is useful for you to know about it. In fact, some people argue that it is easier for beginners to learn tidyverse than base R (see, e.g, this blog post), especially for people with little or no programming experience. Many modern R courses for beginners have switched to the "tidyverse first, base R second" approach. You can learn more about tidyverse by reading the book "R for Data Science" by Hadley Wickham and Garrett Grolemund. This is one of the textbooks used in Stat 385. The book is also freely available online. Yup, it's free! This is one of the nice things about open-source software. Pretty much everything is free!

In the following set of Lon-Capa exercises, you will be exposed to some tidyverse packages. The main focus will be on the dplyr package, since it's very useful for data manipulation. You will find it less useful in this course and in many statistics courses using R (except Stat 385), because the data you'll be analyzing are mostly clean. However, real-world data are messy and many data analysts working in a company use dplyr every day. To get the most out of these exercises, you have to spend a few hours learning a couple of tidyverse packages. ggplot2 is also part of the tidyverse and you've seen some of it in the past weeks if you've paid attendtion to the brown texts in some of the Lon Capa problems. There will be no exercises on ggplot2 since it's hard to auto-grade this type of questions on Lon Capa. However, you can do what I did to practice ggplot2 if you want: use ggplot2 to re-create each and every plot in all the notes and Lon Capa problems.

To get started, install the tidyverse packages using the command

install.packages("tidyverse")

After installing the packages, read the following material before attempting the problems.

  1. A brief introduction to tibbles
  2. tibble vignette
  3. Chapter 13 of Peng's textbook (this is the most important reading material)
  4. Study the following examples that use dplyr to do the 4 Lon Capa problems in Weeks 2, 3, 5 and 8:
    1. Week 2's Optimization problem

    2. Week 3's Maximum Speed problem

    3. Week 5's Stock Market Price problem

    4. Week 8's Ecological Correlation problem


Lon Capa Exercises

Tibbles

Useful dplyr Commands

Relational Data