No new commands/functions are introducted in Week 12, except the quantile() function mentioned in this week’s footnote. Type ?quantile to see the usage. You will encounter this function again next week.

Basic Commands

Command Purpose Example
? pull up a help page ?seq
: colon operator(generate regular sequence) 1.5:4
[ ] subset a vector x[c(2,6,1,3)]
[[ ]] extract an element in a list x[[“bar”]]
or extract a column in a data frame data_frame[[“age”]]
$ extract an element in a list x$bar
or extract a column in a data frame data_frame$age
<- assignment x <- 5 + 7
c concatenate c(1.1, 9, 3.14)
seq, seq_along sequence generation seq(1,10,0.5)
length return the length of a vector length(pi:100)
rep replicate elements of vectors rep(c(1,2,3),5)
print prints its argument print(5:10)
# comment character # ignore the rest of line
class return the class of object class(pi)
as.numeric, as.integer,… explicit coercion as.integer(pi)
matrix create a matrix matrix(1:6, nrow=2, ncol=3)
dim return the dimension attribute of object dim( matrix(1:6, nrow=2, ncol=3) )
cbind, rbind create matrix by column-/row- binding vectors cbind(1:3,4:6)
list create a list list(name=“John”, age=20)
factor create a factor factor(“blue”,“green”,levels=c(“red”,“green”,“blue”))
levels print all levels of a factor levels(x)
is.na, is.nan check if the argument is NA (or NaN) is.na(as.numeric(“abc”))
data.frame create a data frame data.frame(foo = 1:4, bar = c(T, T, F, F))
nrow, ncol return the number of rows/columns of object nrow(data.frame(foo = 1:4, bar = c(T, T, F, F)))
names names of elements of a vector/list names(x)
colnames, rownames column/row names of a matrix/data frame colnames(m)
paste concatenate strings paste(“aa”,“bb”,“cc”,sep=“:”)
sum summation of elements in a vector sum(c(1,5,12))
prod product of elements in a vector prod(c(3,15,-2))
max, min maximum/minimum value in a vector max(c(23,12,12,3))
mean, median mean/median of a vector mean(c(1,5,1,2,7))
sort sorting sort(c(1,5,1,pi,2,7))
var, sd sample variance/standard deviation of a vector sd(c(1,5,1,3,2))
cor correlation cor(c(1,4,6,1), c(2,5,1,9))
which.min, which.max first index of minimum/maximum value which.min(c(4,1,3,4,1))
set.seed set a seed for random number generation set.seed(13218)
identical check if two objects are identical identical(2>1, 5>2)
which which indices are TRUE? which(100:2 > 36)
any any component in a vector is TRUE? any(10:0 < 0)
all all components in a vector are TRUE? all(1:100 > 0)
getwd display working directory getwd()
setwd set working directory setwd(“~/Documents”)
ls list all objects in working directory ls()
rm delete variables from working space rm(x,y), rm(list=ls())
head preview the top of a dataset head(airquality,10)
tail preview the bottom of a dataset  tail(airquality,15)
summary summary of an R object summary(x)
table generate contingency table table(x), table(x,y)
str structure of an R object str(airquality)
args print arguments of a function args(rnorm)
jitter add jitters to data jitter(1:10)
unique remove duplicate elements unique(c(1,2,3,4,3,2,1,0))
gl generate factor levels gl(2,100)
sample take random samples sample(LETTERS,10)
sample.int take samples of integers sample.int(1000,6)
with access variables inside an environment with(cars, cor(speed,dist))
cumprod cumulative product cumprod(1:10)
%o% calculate the outer product of two vectors x %o% y


Logical Operators

Operator Name Example
>, < greater/less than 1:5 > seq(0,8,2)
== equality operator 1:5 == seq(0,8,2)
>=, <= greater or equal to/less than or equal to 1:5 <= seq(0,8,2)
!= not equal to 1:5 != seq(0,8,2)
! NOT !(1 > 2)
& AND (3:5 > 5:7) & (4:6 == 4:6)
| OR (3:5 > 5:7) | (4:6 == seq(2,6,2))
&& non-vectorized AND TRUE && c(TRUE,FALSE)
|| non-vectorized OR TRUE || c(TRUE,FALSE)
xor exclusive or xor(5==6, FALSE)
isTRUE Is the argument TRUE? isTRUE(6>4)


Control Structures

Examples can be found on pp. 63-70 of the text.

Command Purpose
if-else testing conditions and acting on it
for execute a loop for a fixed number of times
while execute a loop while the conditions are satisfied
repeat execute a loop indefinitely, must use break to exit the loop
break exit a loop
next skip the rest of the commands and jump to the next iteration


Loop Functions

Command Purpose Example
lapply loop over a list and evaluate function on each element lapply(1:10,sqrt)
sapply same as lapply but try to simplify the result sapply(1:10,sqrt)
vapply same as lapply but specify the output type vapply(1:10,sqrt,numeric(1))
apply apply a function over the margins of an array apply(matrix(1:10,2,5), 2, mean)
mapply multivariate version of lapply mapply(rep,1:4,4:1)
tapply apply a function over a subset of a vector tapply(InsectSprays$count,InsectSprays$spray,mean)
rowSums = apply(x,1,sum) rowSums(matrix(1:10,2,5))
rowMeans = apply(x,1,mean) rowMeans(matrix(1:10,2,5))
colSums = apply(x,2,sum) colSums(matrix(1:10,2,5))
colMeans = apply(x,2,mean) colMeans(matrix(1:10,2,5))
replicate repeated evaluation of an expression replicate(20,rnorm(10))


Mathematical Functions

Function Name Example
abs absolute value abs(3-6) = 3
sqrt square root sqrt(16) = 4
^ exponentiation 3^10 = \(3^{10}\) = 59049
exp exponential function exp(1.7) = \(e^{1.7}\) = 5.473947
log log function (base e) log(10) = 2.302585
log10 base 10 log (\(\log_{10}\)) log10(100) = 2
pi mathematical constant \(\pi\) pi = 3.141593
sin, cos, tan trigonometric functions (argument in radians) sin(pi/2) = 1
asin, acos, atan inverse trigonometric functions acos(1) = 0
sinh, cosh, tanh hyperbolic functions cosh(0) = 1
asinh, acosh, atanh inverse hyperbolic functions atanh(tanh(12)) = 12
round(x,n) round x to n decimal places round(pi,2) = 3.14
floor rounds down floor(14.7) = 14
ceiling rounds up ceiling(14.7) = 15


Probability Distributions

Every distribution has four functions. There is a root name, for example, the root name for the normal distribution is norm. This root is prefixed by one of the letters

Distribution Functions
Normal dnorm     pnorm     qnorm     rnorm
\(\chi^2\) dchisq     pchisq     qchisq     rchisq
Student t dt     pt     qt     rt
F df     pf     qf     rf


Reading and Writing Files

Command Purpose Example
read.table, read.csv, … read data from a text file data <- read.table(“foo.txt”)
write.table, write.csv, … write data to a text file write.csv(data,“foo.csv”)
readLines read text file line by line data <- readLines(“foo.txt”)
writeLines write text to file writeLines(“write something…”, “foo.txt”)
save write R objects to a file in binary format save(x,y,z,file=“out.rda”)
load reload datasets written with the function save() load(“out.rda”)
dput write R objects to a file in ASCII text dput(x,“out.R”)
dget load R objects written with dput() x2 <- dget(“out.R”)


Plotting (Base Graphics)

Command Purpose Example
plot plot data plot(dist ~ speed, data=cars)
hist plot a histogram hist(rnorm(1e5),freq=FALSE, breaks=100)
barplot create a barplot barplot(table(mtcars$cyl))
boxplot create boxplot boxplot(mpg~cyl,data=mtcars)
curve plot a function curve(x^2, xlim=x(-1,1))
legend add legend to a plot legend(1,2,c(“text1”,“text2”),col=1:2,lty=1:2)
abline add straight lines to a plot abline(1,2)
density make a density plot plot(density(rnorm(1e5)))
smoothScatter make a smooth scatter plot smoothScatter(x,y)
pairs create a matrix of scatterplots pairs(airquality)
lines add a line to an existing plot lines(x,y)
points add points to an existing plot points(x,y)


Plotting (Lattice Graphics)

The lattice library has to be loaded to R using library(lattice) before these commands can be used.

Command Purpose Example
xyplot make scatter plots xyplot(y ~ x | factor, data=data_frame)
histogram make histograms histogram(~ x | factor, data=data_frame)


Linear Regression

Command Purpose Example
lm perform linear regression lm(y ~ x, data=data_frame)
summary summarize the regression result summary(fit)
confint calculate confidence interval confint(fit, levels=0.9)
abline plot regression line (for simple regression) abline(fit,col=“red”)
predict predict new data predict(fit, newdata=data_frame)
resid get the residuals resid(fit)


Model Formulae

Here are some examples of formulae that can be used in linear regressions and the mdoels they represent.

Formula Model
y ~ x \(y=\beta_0+\beta_1 x\)
y ~ x-1 or y ~ 0+x \(y=\beta_0 x\)
y ~ x1 + x2 \(y=\beta_0 + \beta_1 x1 + \beta_2 x2\)
y ~ x1 + x1:x2 \(y=\beta_0 + \beta_1 x1 + \beta_2 x1\cdot x2\)
y ~ x1*x2 \(y=\beta_0 + \beta_1 x1 + \beta_2 x2 + \beta_3 x1\cdot x2\)
y ~ x1*x2*x3 \(y=\beta_0 + \beta_1 x1 + \beta_2 x2 + \beta_3 x3 + \beta_4 x_1\cdot x_2\)
\(+ \beta_5 x_1\cdot x_3 + \beta_6 x_2 \cdot x_3 + \beta_7 x_1\cdot x_2 \cdot x_3\)
y ~ I(x1+x2) \(y=\beta_0 + \beta_1 (x1+x2)\)
y ~ I(x1 + 4*x1*x2) + I(x2^3) - 1 \(y=\beta_1 (x1+4\cdot x2) + \beta_2 x2^3\)


Date and Time

Command Purpose Example
Sys.Date() get the current date Sys.Date()
Sys.time() get the current time Sys.time()
as.Date convert string to the Date class as.Date(“2016-09-23”)
weekdays returns the day of week weekdays(Sys.Date())
months returns the month months(Sys.Date())
quarters returns the quarter of the year quarters(Sys.Date())
system.time computes the time needed for evaluating an expression system.time( sum(1/(1:1e7)) )


ANOVA Tests

The syntax is summarized in the summary of Week 11’s notes.

Command Purpose
chisq.test \(\chi^2\) goodness-of-fit test and \(\chi^2\) independent test
t.test 2-sample t-test
aov F-test
pairwise.t.test Pairwise t-test