A tibble is a modern version of R’s traditional data frame. It works very similarly to a data frame with a few exceptions. Here we give a brief introduction to tibbles. To get started, we first need to load the tibble
package.
# load the tibble package
library(tibble)
We will also need the readr
package later.
library(readr)
Like a data frame, there are several ways to create a tibble. The easiest way is to use the tibble()
function, which has a similar syntax to the base R’s data.frame()
function. For example,
tibble(x=1:5, y=x^2)
# A tibble: 5 x 2
x y
<int> <dbl>
1 1 1
2 2 4
3 3 9
4 4 16
5 5 25
When a tibble is printed, it gives you the tibble’s dimension (5x2) and the type of column variables under the column names (x
is an integer vector; y
is a double-precision (i.e. real number) vector). These pieces of information are not provided when a traditional data frame is printed. Note that in the example above, we can create the y
column using the x
column. This is not possible with data.frame()
:
data.frame(x=1:5, y=x^2)
Error in data.frame(x = 1:5, y = x^2): object 'x' not found
A tibble can also be created using tribble()
(transposed tibble):
tribble(
~name, ~gender, ~height,
"Collier", "male", 63,
"Fireman", "female", 68,
"Flanagan", "male", 73
)
# A tibble: 3 x 3
name gender height
<chr> <chr> <dbl>
1 Collier male 63
2 Fireman female 68
3 Flanagan male 73
The function as_tibble()
is used to convert lists and data frames to tibbles. For example, airquality
is a data frame that comes with base R. You encountered it in two Lon Capa problems in Weeks 2 and 3. We can convert it to a tibble using the command
as_tibble(airquality)
# A tibble: 153 x 6
Ozone Solar.R Wind Temp Month Day
<int> <int> <dbl> <int> <int> <int>
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
7 23 299 8.6 65 5 7
8 19 99 13.8 59 5 8
9 8 19 20.1 61 5 9
10 NA 194 8.6 69 5 10
# ... with 143 more rows
By default, no more than the first 10 rows of a tibble is printed. There are several ways to change this behavior. One method is to use print()
and specifying the parameter n
= an integer indicating number of lines:
print(as_tibble(airquality), n=20)
# A tibble: 153 x 6
Ozone Solar.R Wind Temp Month Day
<int> <int> <dbl> <int> <int> <int>
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
7 23 299 8.6 65 5 7
8 19 99 13.8 59 5 8
9 8 19 20.1 61 5 9
10 NA 194 8.6 69 5 10
11 7 NA 6.9 74 5 11
12 16 256 9.7 69 5 12
13 11 290 9.2 66 5 13
14 14 274 10.9 68 5 14
15 18 65 13.2 58 5 15
16 14 334 11.5 64 5 16
17 34 307 12.0 66 5 17
18 6 78 18.4 57 5 18
19 30 322 11.5 68 5 19
20 11 44 9.7 62 5 20
# ... with 133 more rows
Finally, a tibble can be created by importing a data file. In base R, we use read.table()
and its related functions to import data. The result is a data frame. The readr
package is part of the tidyverse. It provides similar functions to import files. For example, read_table()
and read_csv()
are the analogous functions of read.table()
and read.csv()
. They work in essentially the way as the base R functions. For example,
df <- read_csv("Stat100_Survey2_Fall2015.csv")
Parsed with column specification:
cols(
.default = col_integer(),
gender = col_character(),
ethnicity = col_character(),
religion = col_character(),
GPA = col_double()
)
See spec(...) for full column specifications.
The function prints out the first few column names and the data type. When we print the tibble, we see
df
# A tibble: 1,138 x 23
gender genderID greek homeTown ethnicity religion religious ACT
<chr> <int> <int> <int> <chr> <chr> <int> <int>
1 female 1 0 1 Asian Other Religion 7 27
2 female 1 1 2 White Christian 6 25
3 female 1 0 2 White Christian 6 27
4 male 0 1 2 White Christian 5 29
5 female 1 0 0 White Christian 7 25
6 female 1 0 3 Hispanic Other Religion 7 27
7 male 0 0 2 White Agnostic 5 35
8 female 1 0 2 White Christian 7 30
9 female 1 0 3 Hispanic Christian 5 23
10 male 0 0 1 White Agnostic 3 27
# ... with 1,128 more rows, and 15 more variables: GPA <dbl>,
# partyHr <int>, drinks <int>, sexPartners <int>, relationships <int>,
# firstKissAge <int>, favPeriod <int>, hoursCallParents <int>,
# socialMedia <int>, texts <int>, good_well <int>,
# parentRelationship <int>, workHr <int>, percentTuition <int>,
# career <int>
When a tibble has too many columns to be printed on the screen, only a few columns that can be fit on the screen are printed. We can change this behavior using print()
and setting the width
parameter. For example, to print all columns, we set width
to infinity
print(df, width=Inf)
# A tibble: 1,138 x 23
gender genderID greek homeTown ethnicity religion religious ACT GPA partyHr drinks sexPartners relationships firstKissAge favPeriod hoursCallParents socialMedia texts good_well parentRelationship workHr percentTuition career
<chr> <int> <int> <int> <chr> <chr> <int> <int> <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 female 1 0 1 Asian Other Religion 7 27 3.7 1 5 0 0 17 4 7 2 5 5 7 0 100 4
2 female 1 1 2 White Christian 6 25 3.4 4 10 4 2 13 4 3 9 5 5 10 0 70 1
3 female 1 0 2 White Christian 6 27 3.6 5 2 0 0 17 3 7 3 4 7 10 0 100 1
4 male 0 1 2 White Christian 5 29 3.0 25 15 1 0 18 4 2 3 8 4 7 0 80 1
5 female 1 0 0 White Christian 7 25 4.0 6 2 1 1 16 3 2 3 10 4 10 0 100 2
6 female 1 0 3 Hispanic Other Religion 7 27 3.2 0 1 2 2 14 3 2 6 3 6 8 0 30 4
7 male 0 0 2 White Agnostic 5 35 1.7 3 3 1 1 13 4 0 0 4 2 7 15 90 2
8 female 1 0 2 White Christian 7 30 4.0 15 15 2 1 14 3 6 6 5 7 9 0 80 4
9 female 1 0 3 Hispanic Christian 5 23 2.7 3 1 0 3 15 2 14 2 3 5 10 0 0 4
10 male 0 0 1 White Agnostic 3 27 2.7 3 7 1 2 14 0 5 4 4 3 4 15 0 2
# ... with 1,128 more rows
You will explore tibbles and readr
in an optional Lon Capa problem. Read this vignette provided by the tibble
package first before doing the Lon Capa problem.