Tibbles and readr

A tibble is a modern version of R’s traditional data frame. It works very similarly to a data frame with a few exceptions. Here we give a brief introduction to tibbles. To get started, we first need to load the tibble package.

# load the tibble package
library(tibble)

We will also need the readr package later.

library(readr)

Like a data frame, there are several ways to create a tibble. The easiest way is to use the tibble() function, which has a similar syntax to the base R’s data.frame() function. For example,

tibble(x=1:5, y=x^2)

# A tibble: 5 x 2
      x     y
  <int> <dbl>
1     1     1
2     2     4
3     3     9
4     4    16
5     5    25

When a tibble is printed, it gives you the tibble’s dimension (5x2) and the type of column variables under the column names (x is an integer vector; y is a double-precision (i.e. real number) vector). These pieces of information are not provided when a traditional data frame is printed. Note that in the example above, we can create the y column using the x column. This is not possible with data.frame():

data.frame(x=1:5, y=x^2)

Error in data.frame(x = 1:5, y = x^2): object 'x' not found

A tibble can also be created using tribble() (transposed tibble):

tribble(
  ~name,     ~gender,  ~height,
  "Collier",   "male",    63,   
  "Fireman",   "female",  68,   
  "Flanagan",  "male",    73
)

# A tibble: 3 x 3
      name gender height
     <chr>  <chr>  <dbl>
1  Collier   male     63
2  Fireman female     68
3 Flanagan   male     73

The function as_tibble() is used to convert lists and data frames to tibbles. For example, airquality is a data frame that comes with base R. You encountered it in two Lon Capa problems in Weeks 2 and 3. We can convert it to a tibble using the command

as_tibble(airquality)

# A tibble: 153 x 6
   Ozone Solar.R  Wind  Temp Month   Day
   <int>   <int> <dbl> <int> <int> <int>
 1    41     190   7.4    67     5     1
 2    36     118   8.0    72     5     2
 3    12     149  12.6    74     5     3
 4    18     313  11.5    62     5     4
 5    NA      NA  14.3    56     5     5
 6    28      NA  14.9    66     5     6
 7    23     299   8.6    65     5     7
 8    19      99  13.8    59     5     8
 9     8      19  20.1    61     5     9
10    NA     194   8.6    69     5    10
# ... with 143 more rows

By default, no more than the first 10 rows of a tibble is printed. There are several ways to change this behavior. One method is to use print() and specifying the parameter n = an integer indicating number of lines:

print(as_tibble(airquality), n=20)

# A tibble: 153 x 6
   Ozone Solar.R  Wind  Temp Month   Day
   <int>   <int> <dbl> <int> <int> <int>
 1    41     190   7.4    67     5     1
 2    36     118   8.0    72     5     2
 3    12     149  12.6    74     5     3
 4    18     313  11.5    62     5     4
 5    NA      NA  14.3    56     5     5
 6    28      NA  14.9    66     5     6
 7    23     299   8.6    65     5     7
 8    19      99  13.8    59     5     8
 9     8      19  20.1    61     5     9
10    NA     194   8.6    69     5    10
11     7      NA   6.9    74     5    11
12    16     256   9.7    69     5    12
13    11     290   9.2    66     5    13
14    14     274  10.9    68     5    14
15    18      65  13.2    58     5    15
16    14     334  11.5    64     5    16
17    34     307  12.0    66     5    17
18     6      78  18.4    57     5    18
19    30     322  11.5    68     5    19
20    11      44   9.7    62     5    20
# ... with 133 more rows

Finally, a tibble can be created by importing a data file. In base R, we use read.table() and its related functions to import data. The result is a data frame. The readr package is part of the tidyverse. It provides similar functions to import files. For example, read_table() and read_csv() are the analogous functions of read.table() and read.csv(). They work in essentially the way as the base R functions. For example,

df <- read_csv("Stat100_Survey2_Fall2015.csv")

Parsed with column specification:
cols(
  .default = col_integer(),
  gender = col_character(),
  ethnicity = col_character(),
  religion = col_character(),
  GPA = col_double()
)

See spec(...) for full column specifications.

The function prints out the first few column names and the data type. When we print the tibble, we see

df

# A tibble: 1,138 x 23
   gender genderID greek homeTown ethnicity       religion religious   ACT
    <chr>    <int> <int>    <int>     <chr>          <chr>     <int> <int>
 1 female        1     0        1     Asian Other Religion         7    27
 2 female        1     1        2     White      Christian         6    25
 3 female        1     0        2     White      Christian         6    27
 4   male        0     1        2     White      Christian         5    29
 5 female        1     0        0     White      Christian         7    25
 6 female        1     0        3  Hispanic Other Religion         7    27
 7   male        0     0        2     White       Agnostic         5    35
 8 female        1     0        2     White      Christian         7    30
 9 female        1     0        3  Hispanic      Christian         5    23
10   male        0     0        1     White       Agnostic         3    27
# ... with 1,128 more rows, and 15 more variables: GPA <dbl>,
#   partyHr <int>, drinks <int>, sexPartners <int>, relationships <int>,
#   firstKissAge <int>, favPeriod <int>, hoursCallParents <int>,
#   socialMedia <int>, texts <int>, good_well <int>,
#   parentRelationship <int>, workHr <int>, percentTuition <int>,
#   career <int>

When a tibble has too many columns to be printed on the screen, only a few columns that can be fit on the screen are printed. We can change this behavior using print() and setting the width parameter. For example, to print all columns, we set width to infinity

print(df, width=Inf)

# A tibble: 1,138 x 23
   gender genderID greek homeTown ethnicity       religion religious   ACT   GPA partyHr drinks sexPartners relationships firstKissAge favPeriod hoursCallParents socialMedia texts good_well parentRelationship workHr percentTuition career
    <chr>    <int> <int>    <int>     <chr>          <chr>     <int> <int> <dbl>   <int>  <int>       <int>         <int>        <int>     <int>            <int>       <int> <int>     <int>              <int>  <int>          <int>  <int>
 1 female        1     0        1     Asian Other Religion         7    27   3.7       1      5           0             0           17         4                7           2     5         5                  7      0            100      4
 2 female        1     1        2     White      Christian         6    25   3.4       4     10           4             2           13         4                3           9     5         5                 10      0             70      1
 3 female        1     0        2     White      Christian         6    27   3.6       5      2           0             0           17         3                7           3     4         7                 10      0            100      1
 4   male        0     1        2     White      Christian         5    29   3.0      25     15           1             0           18         4                2           3     8         4                  7      0             80      1
 5 female        1     0        0     White      Christian         7    25   4.0       6      2           1             1           16         3                2           3    10         4                 10      0            100      2
 6 female        1     0        3  Hispanic Other Religion         7    27   3.2       0      1           2             2           14         3                2           6     3         6                  8      0             30      4
 7   male        0     0        2     White       Agnostic         5    35   1.7       3      3           1             1           13         4                0           0     4         2                  7     15             90      2
 8 female        1     0        2     White      Christian         7    30   4.0      15     15           2             1           14         3                6           6     5         7                  9      0             80      4
 9 female        1     0        3  Hispanic      Christian         5    23   2.7       3      1           0             3           15         2               14           2     3         5                 10      0              0      4
10   male        0     0        1     White       Agnostic         3    27   2.7       3      7           1             2           14         0                5           4     4         3                  4     15              0      2
# ... with 1,128 more rows

You will explore tibbles and readr in an optional Lon Capa problem. Read this vignette provided by the tibble package first before doing the Lon Capa problem.