Commands and Functions

No new commands/functions are introducted in Week 12, except the quantile() function mentioned in this week’s footnote. Type ?quantile to see the usage. You will encounter this function again next week.

Basic Commands

Command	Purpose	Example
?	pull up a help page	?seq
:	colon operator(generate regular sequence)	1.5:4
[ ]	subset a vector	x[c(2,6,1,3)]
[[ ]]	extract an element in a list	x[[“bar”]]
	or extract a column in a data frame	data_frame[[“age”]]
$	extract an element in a list	x$bar
	or extract a column in a data frame	data_frame$age
<-	assignment	x <- 5 + 7
c	concatenate	c(1.1, 9, 3.14)
seq, seq_along	sequence generation	seq(1,10,0.5)
length	return the length of a vector	length(pi:100)
rep	replicate elements of vectors	rep(c(1,2,3),5)
print	prints its argument	print(5:10)
#	comment character	# ignore the rest of line
class	return the class of object	class(pi)
as.numeric, as.integer,…	explicit coercion	as.integer(pi)
matrix	create a matrix	matrix(1:6, nrow=2, ncol=3)
dim	return the dimension attribute of object	dim( matrix(1:6, nrow=2, ncol=3) )
cbind, rbind	create matrix by column-/row- binding vectors	cbind(1:3,4:6)
list	create a list	list(name=“John”, age=20)
factor	create a factor	factor(“blue”,“green”,levels=c(“red”,“green”,“blue”))
levels	print all levels of a factor	levels(x)
is.na, is.nan	check if the argument is NA (or NaN)	is.na(as.numeric(“abc”))
data.frame	create a data frame	data.frame(foo = 1:4, bar = c(T, T, F, F))
nrow, ncol	return the number of rows/columns of object	nrow(data.frame(foo = 1:4, bar = c(T, T, F, F)))
names	names of elements of a vector/list	names(x)
colnames, rownames	column/row names of a matrix/data frame	colnames(m)
paste	concatenate strings	paste(“aa”,“bb”,“cc”,sep=“:”)
sum	summation of elements in a vector	sum(c(1,5,12))
prod	product of elements in a vector	prod(c(3,15,-2))
max, min	maximum/minimum value in a vector	max(c(23,12,12,3))
mean, median	mean/median of a vector	mean(c(1,5,1,2,7))
sort	sorting	sort(c(1,5,1,pi,2,7))
var, sd	sample variance/standard deviation of a vector	sd(c(1,5,1,3,2))
cor	correlation	cor(c(1,4,6,1), c(2,5,1,9))
which.min, which.max	first index of minimum/maximum value	which.min(c(4,1,3,4,1))
set.seed	set a seed for random number generation	set.seed(13218)
identical	check if two objects are identical	identical(2>1, 5>2)
which	which indices are TRUE?	which(100:2 > 36)
any	any component in a vector is TRUE?	any(10:0 < 0)
all	all components in a vector are TRUE?	all(1:100 > 0)
getwd	display working directory	getwd()
setwd	set working directory	setwd(“~/Documents”)
ls	list all objects in working directory	ls()
rm	delete variables from working space	rm(x,y), rm(list=ls())
head	preview the top of a dataset	head(airquality,10)
tail	preview the bottom of a datasetÂ	tail(airquality,15)
summary	summary of an R object	summary(x)
table	generate contingency table	table(x), table(x,y)
str	structure of an R object	str(airquality)
args	print arguments of a function	args(rnorm)
jitter	add jitters to data	jitter(1:10)
unique	remove duplicate elements	unique(c(1,2,3,4,3,2,1,0))
gl	generate factor levels	gl(2,100)
sample	take random samples	sample(LETTERS,10)
sample.int	take samples of integers	sample.int(1000,6)
with	access variables inside an environment	with(cars, cor(speed,dist))
cumprod	cumulative product	cumprod(1:10)
%o%	calculate the outer product of two vectors	x %o% y

Logical Operators

Operator	Name	Example
>, <	greater/less than	1:5 > seq(0,8,2)
==	equality operator	1:5 == seq(0,8,2)
>=, <=	greater or equal to/less than or equal to	1:5 <= seq(0,8,2)
!=	not equal to	1:5 != seq(0,8,2)
!	NOT	!(1 > 2)
&	AND	(3:5 > 5:7) & (4:6 == 4:6)
\|	OR	(3:5 > 5:7) \| (4:6 == seq(2,6,2))
&&	non-vectorized AND	TRUE && c(TRUE,FALSE)
\|\|	non-vectorized OR	TRUE \|\| c(TRUE,FALSE)
xor	exclusive or	xor(5==6, FALSE)
isTRUE	Is the argument TRUE?	isTRUE(6>4)

Control Structures

Examples can be found on pp. 63-70 of the text.

Command	Purpose
if-else	testing conditions and acting on it
for	execute a loop for a fixed number of times
while	execute a loop while the conditions are satisfied
repeat	execute a loop indefinitely, must use `break` to exit the loop
break	exit a loop
next	skip the rest of the commands and jump to the next iteration

Loop Functions

Command	Purpose	Example
lapply	loop over a list and evaluate function on each element	lapply(1:10,sqrt)
sapply	same as lapply but try to simplify the result	sapply(1:10,sqrt)
vapply	same as lapply but specify the output type	vapply(1:10,sqrt,numeric(1))
apply	apply a function over the margins of an array	apply(matrix(1:10,2,5), 2, mean)
mapply	multivariate version of lapply	mapply(rep,1:4,4:1)
tapply	apply a function over a subset of a vector	tapply(InsectSprays$count,InsectSprays$spray,mean)
rowSums	= apply(x,1,sum)	rowSums(matrix(1:10,2,5))
rowMeans	= apply(x,1,mean)	rowMeans(matrix(1:10,2,5))
colSums	= apply(x,2,sum)	colSums(matrix(1:10,2,5))
colMeans	= apply(x,2,mean)	colMeans(matrix(1:10,2,5))
replicate	repeated evaluation of an expression	replicate(20,rnorm(10))

Mathematical Functions

Function	Name	Example
abs	absolute value	abs(3-6) = 3
sqrt	square root	sqrt(16) = 4
^	exponentiation	3^10 = $3^{10}$ = 59049
exp	exponential function	exp(1.7) = $e^{1.7}$ = 5.473947
log	log function (base e)	log(10) = 2.302585
log10	base 10 log ($\log_{10}$)	log10(100) = 2
pi	mathematical constant $\pi$	pi = 3.141593
sin, cos, tan	trigonometric functions (argument in radians)	sin(pi/2) = 1
asin, acos, atan	inverse trigonometric functions	acos(1) = 0
sinh, cosh, tanh	hyperbolic functions	cosh(0) = 1
asinh, acosh, atanh	inverse hyperbolic functions	atanh(tanh(12)) = 12
round(x,n)	round x to n decimal places	round(pi,2) = 3.14
floor	rounds down	floor(14.7) = 14
ceiling	rounds up	ceiling(14.7) = 15

Probability Distributions

Every distribution has four functions. There is a root name, for example, the root name for the normal distribution is norm. This root is prefixed by one of the letters

p for “probability”, the cumulative distribution function (cdf)
q for “quantile”, the inverse cdf
d for “density”, the probability density function (pdf)
r for “random”, a random variable having the specified distribution

Distribution	Functions
Normal	dnorm pnorm qnorm rnorm
$\chi^2$	dchisq pchisq qchisq rchisq
Student t	dt pt qt rt
F	df pf qf rf

Reading and Writing Files

Command	Purpose	Example
read.table, read.csv, …	read data from a text file	data <- read.table(“foo.txt”)
write.table, write.csv, …	write data to a text file	write.csv(data,“foo.csv”)
readLines	read text file line by line	data <- readLines(“foo.txt”)
writeLines	write text to file	writeLines(“write something…”, “foo.txt”)
save	write R objects to a file in binary format	save(x,y,z,file=“out.rda”)
load	reload datasets written with the function save()	load(“out.rda”)
dput	write R objects to a file in ASCII text	dput(x,“out.R”)
dget	load R objects written with dput()	x2 <- dget(“out.R”)

Plotting (Base Graphics)

Command	Purpose	Example
plot	plot data	plot(dist ~ speed, data=cars)
hist	plot a histogram	hist(rnorm(1e5),freq=FALSE, breaks=100)
barplot	create a barplot	barplot(table(mtcars$cyl))
boxplot	create boxplot	boxplot(mpg~cyl,data=mtcars)
curve	plot a function	curve(x^2, xlim=x(-1,1))
legend	add legend to a plot	legend(1,2,c(“text1”,“text2”),col=1:2,lty=1:2)
abline	add straight lines to a plot	abline(1,2)
density	make a density plot	plot(density(rnorm(1e5)))
smoothScatter	make a smooth scatter plot	smoothScatter(x,y)
pairs	create a matrix of scatterplots	pairs(airquality)
lines	add a line to an existing plot	lines(x,y)
points	add points to an existing plot	points(x,y)

Plotting (Lattice Graphics)

The lattice library has to be loaded to R using library(lattice) before these commands can be used.

Command	Purpose	Example
xyplot	make scatter plots	xyplot(y ~ x \| factor, data=data_frame)
histogram	make histograms	histogram(~ x \| factor, data=data_frame)

Linear Regression

Command	Purpose	Example
lm	perform linear regression	lm(y ~ x, data=data_frame)
summary	summarize the regression result	summary(fit)
confint	calculate confidence interval	confint(fit, levels=0.9)
abline	plot regression line (for simple regression)	abline(fit,col=“red”)
predict	predict new data	predict(fit, newdata=data_frame)
resid	get the residuals	resid(fit)

Model Formulae

Here are some examples of formulae that can be used in linear regressions and the mdoels they represent.

Formula	Model
y ~ x	$y=\beta_0+\beta_1 x$
y ~ x-1 or y ~ 0+x	$y=\beta_0 x$
y ~ x1 + x2	$y=\beta_0 + \beta_1 x1 + \beta_2 x2$
y ~ x1 + x1:x2	$y=\beta_0 + \beta_1 x1 + \beta_2 x1\cdot x2$
y ~ x1*x2	$y=\beta_0 + \beta_1 x1 + \beta_2 x2 + \beta_3 x1\cdot x2$
y ~ x1x2x3	$y=\beta_0 + \beta_1 x1 + \beta_2 x2 + \beta_3 x3 + \beta_4 x_1\cdot x_2$
	$+ \beta_5 x_1\cdot x_3 + \beta_6 x_2 \cdot x_3 + \beta_7 x_1\cdot x_2 \cdot x_3$
y ~ I(x1+x2)	$y=\beta_0 + \beta_1 (x1+x2)$
y ~ I(x1 + 4x1x2) + I(x2^3) - 1	$y=\beta_1 (x1+4\cdot x2) + \beta_2 x2^3$

Date and Time

Command	Purpose	Example
Sys.Date()	get the current date	Sys.Date()
Sys.time()	get the current time	Sys.time()
as.Date	convert string to the Date class	as.Date(“2016-09-23”)
weekdays	returns the day of week	weekdays(Sys.Date())
months	returns the month	months(Sys.Date())
quarters	returns the quarter of the year	quarters(Sys.Date())
system.time	computes the time needed for evaluating an expression	system.time( sum(1/(1:1e7)) )

ANOVA Tests

The syntax is summarized in the summary of Week 11’s notes.

Command	Purpose
chisq.test	$\chi^2$ goodness-of-fit test and $\chi^2$ independent test
t.test	2-sample t-test
aov	F-test
pairwise.t.test	Pairwise t-test

Formula	Model
y ~ x	\(y=\beta_0+\beta_1 x\)
y ~ x-1 or y ~ 0+x	\(y=\beta_0 x\)
y ~ x1 + x2	\(y=\beta_0 + \beta_1 x1 + \beta_2 x2\)
y ~ x1 + x1:x2	\(y=\beta_0 + \beta_1 x1 + \beta_2 x1\cdot x2\)
y ~ x1*x2	\(y=\beta_0 + \beta_1 x1 + \beta_2 x2 + \beta_3 x1\cdot x2\)
y ~ x1x2x3	\(y=\beta_0 + \beta_1 x1 + \beta_2 x2 + \beta_3 x3 + \beta_4 x_1\cdot x_2\)
	\(+ \beta_5 x_1\cdot x_3 + \beta_6 x_2 \cdot x_3 + \beta_7 x_1\cdot x_2 \cdot x_3\)
y ~ I(x1+x2)	\(y=\beta_0 + \beta_1 (x1+x2)\)
y ~ I(x1 + 4x1x2) + I(x2^3) - 1	\(y=\beta_1 (x1+4\cdot x2) + \beta_2 x2^3\)

Distribution	Functions
Normal	dnorm pnorm qnorm rnorm
\(\chi^2\)	dchisq pchisq qchisq rchisq
Student t	dt pt qt rt
F	df pf qf rf

Command	Purpose
chisq.test	\(\chi^2\) goodness-of-fit test and \(\chi^2\) independent test
t.test	2-sample t-test
aov	F-test
pairwise.t.test	Pairwise t-test

Commands and Functions - Weeks 1-12