No new commands/functions are introducted in Week 12, except the quantile()
function mentioned in this week’s footnote. Type ?quantile
to see the usage. You will encounter this function again next week.
Command | Purpose | Example |
---|---|---|
? | pull up a help page | ?seq |
: | colon operator(generate regular sequence) | 1.5:4 |
[ ] | subset a vector | x[c(2,6,1,3)] |
[[ ]] | extract an element in a list | x[[“bar”]] |
or extract a column in a data frame | data_frame[[“age”]] | |
$ | extract an element in a list | x$bar |
or extract a column in a data frame | data_frame$age | |
<- | assignment | x <- 5 + 7 |
c | concatenate | c(1.1, 9, 3.14) |
seq, seq_along | sequence generation | seq(1,10,0.5) |
length | return the length of a vector | length(pi:100) |
rep | replicate elements of vectors | rep(c(1,2,3),5) |
prints its argument | print(5:10) | |
# | comment character | # ignore the rest of line |
class | return the class of object | class(pi) |
as.numeric, as.integer,… | explicit coercion | as.integer(pi) |
matrix | create a matrix | matrix(1:6, nrow=2, ncol=3) |
dim | return the dimension attribute of object | dim( matrix(1:6, nrow=2, ncol=3) ) |
cbind, rbind | create matrix by column-/row- binding vectors | cbind(1:3,4:6) |
list | create a list | list(name=“John”, age=20) |
factor | create a factor | factor(“blue”,“green”,levels=c(“red”,“green”,“blue”)) |
levels | print all levels of a factor | levels(x) |
is.na, is.nan | check if the argument is NA (or NaN) | is.na(as.numeric(“abc”)) |
data.frame | create a data frame | data.frame(foo = 1:4, bar = c(T, T, F, F)) |
nrow, ncol | return the number of rows/columns of object | nrow(data.frame(foo = 1:4, bar = c(T, T, F, F))) |
names | names of elements of a vector/list | names(x) |
colnames, rownames | column/row names of a matrix/data frame | colnames(m) |
paste | concatenate strings | paste(“aa”,“bb”,“cc”,sep=“:”) |
sum | summation of elements in a vector | sum(c(1,5,12)) |
prod | product of elements in a vector | prod(c(3,15,-2)) |
max, min | maximum/minimum value in a vector | max(c(23,12,12,3)) |
mean, median | mean/median of a vector | mean(c(1,5,1,2,7)) |
sort | sorting | sort(c(1,5,1,pi,2,7)) |
var, sd | sample variance/standard deviation of a vector | sd(c(1,5,1,3,2)) |
cor | correlation | cor(c(1,4,6,1), c(2,5,1,9)) |
which.min, which.max | first index of minimum/maximum value | which.min(c(4,1,3,4,1)) |
set.seed | set a seed for random number generation | set.seed(13218) |
identical | check if two objects are identical | identical(2>1, 5>2) |
which | which indices are TRUE? | which(100:2 > 36) |
any | any component in a vector is TRUE? | any(10:0 < 0) |
all | all components in a vector are TRUE? | all(1:100 > 0) |
getwd | display working directory | getwd() |
setwd | set working directory | setwd(“~/Documents”) |
ls | list all objects in working directory | ls() |
rm | delete variables from working space | rm(x,y), rm(list=ls()) |
head | preview the top of a dataset | head(airquality,10) |
tail | preview the bottom of a dataset | tail(airquality,15) |
summary | summary of an R object | summary(x) |
table | generate contingency table | table(x), table(x,y) |
str | structure of an R object | str(airquality) |
args | print arguments of a function | args(rnorm) |
jitter | add jitters to data | jitter(1:10) |
unique | remove duplicate elements | unique(c(1,2,3,4,3,2,1,0)) |
gl | generate factor levels | gl(2,100) |
sample | take random samples | sample(LETTERS,10) |
sample.int | take samples of integers | sample.int(1000,6) |
with | access variables inside an environment | with(cars, cor(speed,dist)) |
cumprod | cumulative product | cumprod(1:10) |
%o% | calculate the outer product of two vectors | x %o% y |
Operator | Name | Example |
---|---|---|
>, < | greater/less than | 1:5 > seq(0,8,2) |
== | equality operator | 1:5 == seq(0,8,2) |
>=, <= | greater or equal to/less than or equal to | 1:5 <= seq(0,8,2) |
!= | not equal to | 1:5 != seq(0,8,2) |
! | NOT | !(1 > 2) |
& | AND | (3:5 > 5:7) & (4:6 == 4:6) |
| | OR | (3:5 > 5:7) | (4:6 == seq(2,6,2)) |
&& | non-vectorized AND | TRUE && c(TRUE,FALSE) |
|| | non-vectorized OR | TRUE || c(TRUE,FALSE) |
xor | exclusive or | xor(5==6, FALSE) |
isTRUE | Is the argument TRUE? | isTRUE(6>4) |
Examples can be found on pp. 63-70 of the text.
Command | Purpose |
---|---|
if-else | testing conditions and acting on it |
for | execute a loop for a fixed number of times |
while | execute a loop while the conditions are satisfied |
repeat | execute a loop indefinitely, must use break to exit the loop |
break | exit a loop |
next | skip the rest of the commands and jump to the next iteration |
Command | Purpose | Example |
---|---|---|
lapply | loop over a list and evaluate function on each element | lapply(1:10,sqrt) |
sapply | same as lapply but try to simplify the result | sapply(1:10,sqrt) |
vapply | same as lapply but specify the output type | vapply(1:10,sqrt,numeric(1)) |
apply | apply a function over the margins of an array | apply(matrix(1:10,2,5), 2, mean) |
mapply | multivariate version of lapply | mapply(rep,1:4,4:1) |
tapply | apply a function over a subset of a vector | tapply(InsectSprays$count,InsectSprays$spray,mean) |
rowSums | = apply(x,1,sum) | rowSums(matrix(1:10,2,5)) |
rowMeans | = apply(x,1,mean) | rowMeans(matrix(1:10,2,5)) |
colSums | = apply(x,2,sum) | colSums(matrix(1:10,2,5)) |
colMeans | = apply(x,2,mean) | colMeans(matrix(1:10,2,5)) |
replicate | repeated evaluation of an expression | replicate(20,rnorm(10)) |
Function | Name | Example |
---|---|---|
abs | absolute value | abs(3-6) = 3 |
sqrt | square root | sqrt(16) = 4 |
^ | exponentiation | 3^10 = \(3^{10}\) = 59049 |
exp | exponential function | exp(1.7) = \(e^{1.7}\) = 5.473947 |
log | log function (base e) | log(10) = 2.302585 |
log10 | base 10 log (\(\log_{10}\)) | log10(100) = 2 |
pi | mathematical constant \(\pi\) | pi = 3.141593 |
sin, cos, tan | trigonometric functions (argument in radians) | sin(pi/2) = 1 |
asin, acos, atan | inverse trigonometric functions | acos(1) = 0 |
sinh, cosh, tanh | hyperbolic functions | cosh(0) = 1 |
asinh, acosh, atanh | inverse hyperbolic functions | atanh(tanh(12)) = 12 |
round(x,n) | round x to n decimal places | round(pi,2) = 3.14 |
floor | rounds down | floor(14.7) = 14 |
ceiling | rounds up | ceiling(14.7) = 15 |
Every distribution has four functions. There is a root name, for example, the root name for the normal distribution is norm
. This root is prefixed by one of the letters
Distribution | Functions |
---|---|
Normal | dnorm pnorm qnorm rnorm |
\(\chi^2\) | dchisq pchisq qchisq rchisq |
Student t | dt pt qt rt |
F | df pf qf rf |
Command | Purpose | Example |
---|---|---|
read.table, read.csv, … | read data from a text file | data <- read.table(“foo.txt”) |
write.table, write.csv, … | write data to a text file | write.csv(data,“foo.csv”) |
readLines | read text file line by line | data <- readLines(“foo.txt”) |
writeLines | write text to file | writeLines(“write something…”, “foo.txt”) |
save | write R objects to a file in binary format | save(x,y,z,file=“out.rda”) |
load | reload datasets written with the function save() | load(“out.rda”) |
dput | write R objects to a file in ASCII text | dput(x,“out.R”) |
dget | load R objects written with dput() | x2 <- dget(“out.R”) |
Command | Purpose | Example |
---|---|---|
plot | plot data | plot(dist ~ speed, data=cars) |
hist | plot a histogram | hist(rnorm(1e5),freq=FALSE, breaks=100) |
barplot | create a barplot | barplot(table(mtcars$cyl)) |
boxplot | create boxplot | boxplot(mpg~cyl,data=mtcars) |
curve | plot a function | curve(x^2, xlim=x(-1,1)) |
legend | add legend to a plot | legend(1,2,c(“text1”,“text2”),col=1:2,lty=1:2) |
abline | add straight lines to a plot | abline(1,2) |
density | make a density plot | plot(density(rnorm(1e5))) |
smoothScatter | make a smooth scatter plot | smoothScatter(x,y) |
pairs | create a matrix of scatterplots | pairs(airquality) |
lines | add a line to an existing plot | lines(x,y) |
points | add points to an existing plot | points(x,y) |
The lattice library has to be loaded to R using library(lattice)
before these commands can be used.
Command | Purpose | Example |
---|---|---|
xyplot | make scatter plots | xyplot(y ~ x | factor, data=data_frame) |
histogram | make histograms | histogram(~ x | factor, data=data_frame) |
Command | Purpose | Example |
---|---|---|
lm | perform linear regression | lm(y ~ x, data=data_frame) |
summary | summarize the regression result | summary(fit) |
confint | calculate confidence interval | confint(fit, levels=0.9) |
abline | plot regression line (for simple regression) | abline(fit,col=“red”) |
predict | predict new data | predict(fit, newdata=data_frame) |
resid | get the residuals | resid(fit) |
Here are some examples of formulae that can be used in linear regressions and the mdoels they represent.
Formula | Model |
---|---|
y ~ x | \(y=\beta_0+\beta_1 x\) |
y ~ x-1 or y ~ 0+x | \(y=\beta_0 x\) |
y ~ x1 + x2 | \(y=\beta_0 + \beta_1 x1 + \beta_2 x2\) |
y ~ x1 + x1:x2 | \(y=\beta_0 + \beta_1 x1 + \beta_2 x1\cdot x2\) |
y ~ x1*x2 | \(y=\beta_0 + \beta_1 x1 + \beta_2 x2 + \beta_3 x1\cdot x2\) |
y ~ x1*x2*x3 | \(y=\beta_0 + \beta_1 x1 + \beta_2 x2 + \beta_3 x3 + \beta_4 x_1\cdot x_2\) |
\(+ \beta_5 x_1\cdot x_3 + \beta_6 x_2 \cdot x_3 + \beta_7 x_1\cdot x_2 \cdot x_3\) | |
y ~ I(x1+x2) | \(y=\beta_0 + \beta_1 (x1+x2)\) |
y ~ I(x1 + 4*x1*x2) + I(x2^3) - 1 | \(y=\beta_1 (x1+4\cdot x2) + \beta_2 x2^3\) |
Command | Purpose | Example |
---|---|---|
Sys.Date() | get the current date | Sys.Date() |
Sys.time() | get the current time | Sys.time() |
as.Date | convert string to the Date class | as.Date(“2016-09-23”) |
weekdays | returns the day of week | weekdays(Sys.Date()) |
months | returns the month | months(Sys.Date()) |
quarters | returns the quarter of the year | quarters(Sys.Date()) |
system.time | computes the time needed for evaluating an expression | system.time( sum(1/(1:1e7)) ) |
The syntax is summarized in the summary of Week 11’s notes.
Command | Purpose |
---|---|
chisq.test | \(\chi^2\) goodness-of-fit test and \(\chi^2\) independent test |
t.test | 2-sample t-test |
aov | F-test |
pairwise.t.test | Pairwise t-test |