No new commands/functions are introducted in Week 12, except the quantile() function mentioned in this week’s footnote. Type ?quantile to see the usage. You will encounter this function again next week.
| Command | Purpose | Example |
|---|---|---|
| ? | pull up a help page | ?seq |
| : | colon operator(generate regular sequence) | 1.5:4 |
| [ ] | subset a vector | x[c(2,6,1,3)] |
| [[ ]] | extract an element in a list | x[[“bar”]] |
| or extract a column in a data frame | data_frame[[“age”]] | |
| $ | extract an element in a list | x$bar |
| or extract a column in a data frame | data_frame$age | |
| <- | assignment | x <- 5 + 7 |
| c | concatenate | c(1.1, 9, 3.14) |
| seq, seq_along | sequence generation | seq(1,10,0.5) |
| length | return the length of a vector | length(pi:100) |
| rep | replicate elements of vectors | rep(c(1,2,3),5) |
| prints its argument | print(5:10) | |
| # | comment character | # ignore the rest of line |
| class | return the class of object | class(pi) |
| as.numeric, as.integer,… | explicit coercion | as.integer(pi) |
| matrix | create a matrix | matrix(1:6, nrow=2, ncol=3) |
| dim | return the dimension attribute of object | dim( matrix(1:6, nrow=2, ncol=3) ) |
| cbind, rbind | create matrix by column-/row- binding vectors | cbind(1:3,4:6) |
| list | create a list | list(name=“John”, age=20) |
| factor | create a factor | factor(“blue”,“green”,levels=c(“red”,“green”,“blue”)) |
| levels | print all levels of a factor | levels(x) |
| is.na, is.nan | check if the argument is NA (or NaN) | is.na(as.numeric(“abc”)) |
| data.frame | create a data frame | data.frame(foo = 1:4, bar = c(T, T, F, F)) |
| nrow, ncol | return the number of rows/columns of object | nrow(data.frame(foo = 1:4, bar = c(T, T, F, F))) |
| names | names of elements of a vector/list | names(x) |
| colnames, rownames | column/row names of a matrix/data frame | colnames(m) |
| paste | concatenate strings | paste(“aa”,“bb”,“cc”,sep=“:”) |
| sum | summation of elements in a vector | sum(c(1,5,12)) |
| prod | product of elements in a vector | prod(c(3,15,-2)) |
| max, min | maximum/minimum value in a vector | max(c(23,12,12,3)) |
| mean, median | mean/median of a vector | mean(c(1,5,1,2,7)) |
| sort | sorting | sort(c(1,5,1,pi,2,7)) |
| var, sd | sample variance/standard deviation of a vector | sd(c(1,5,1,3,2)) |
| cor | correlation | cor(c(1,4,6,1), c(2,5,1,9)) |
| which.min, which.max | first index of minimum/maximum value | which.min(c(4,1,3,4,1)) |
| set.seed | set a seed for random number generation | set.seed(13218) |
| identical | check if two objects are identical | identical(2>1, 5>2) |
| which | which indices are TRUE? | which(100:2 > 36) |
| any | any component in a vector is TRUE? | any(10:0 < 0) |
| all | all components in a vector are TRUE? | all(1:100 > 0) |
| getwd | display working directory | getwd() |
| setwd | set working directory | setwd(“~/Documents”) |
| ls | list all objects in working directory | ls() |
| rm | delete variables from working space | rm(x,y), rm(list=ls()) |
| head | preview the top of a dataset | head(airquality,10) |
| tail | preview the bottom of a dataset | tail(airquality,15) |
| summary | summary of an R object | summary(x) |
| table | generate contingency table | table(x), table(x,y) |
| str | structure of an R object | str(airquality) |
| args | print arguments of a function | args(rnorm) |
| jitter | add jitters to data | jitter(1:10) |
| unique | remove duplicate elements | unique(c(1,2,3,4,3,2,1,0)) |
| gl | generate factor levels | gl(2,100) |
| sample | take random samples | sample(LETTERS,10) |
| sample.int | take samples of integers | sample.int(1000,6) |
| with | access variables inside an environment | with(cars, cor(speed,dist)) |
| cumprod | cumulative product | cumprod(1:10) |
| %o% | calculate the outer product of two vectors | x %o% y |
| Operator | Name | Example |
|---|---|---|
| >, < | greater/less than | 1:5 > seq(0,8,2) |
| == | equality operator | 1:5 == seq(0,8,2) |
| >=, <= | greater or equal to/less than or equal to | 1:5 <= seq(0,8,2) |
| != | not equal to | 1:5 != seq(0,8,2) |
| ! | NOT | !(1 > 2) |
| & | AND | (3:5 > 5:7) & (4:6 == 4:6) |
| | | OR | (3:5 > 5:7) | (4:6 == seq(2,6,2)) |
| && | non-vectorized AND | TRUE && c(TRUE,FALSE) |
| || | non-vectorized OR | TRUE || c(TRUE,FALSE) |
| xor | exclusive or | xor(5==6, FALSE) |
| isTRUE | Is the argument TRUE? | isTRUE(6>4) |
Examples can be found on pp. 63-70 of the text.
| Command | Purpose |
|---|---|
| if-else | testing conditions and acting on it |
| for | execute a loop for a fixed number of times |
| while | execute a loop while the conditions are satisfied |
| repeat | execute a loop indefinitely, must use break to exit the loop |
| break | exit a loop |
| next | skip the rest of the commands and jump to the next iteration |
| Command | Purpose | Example |
|---|---|---|
| lapply | loop over a list and evaluate function on each element | lapply(1:10,sqrt) |
| sapply | same as lapply but try to simplify the result | sapply(1:10,sqrt) |
| vapply | same as lapply but specify the output type | vapply(1:10,sqrt,numeric(1)) |
| apply | apply a function over the margins of an array | apply(matrix(1:10,2,5), 2, mean) |
| mapply | multivariate version of lapply | mapply(rep,1:4,4:1) |
| tapply | apply a function over a subset of a vector | tapply(InsectSprays$count,InsectSprays$spray,mean) |
| rowSums | = apply(x,1,sum) | rowSums(matrix(1:10,2,5)) |
| rowMeans | = apply(x,1,mean) | rowMeans(matrix(1:10,2,5)) |
| colSums | = apply(x,2,sum) | colSums(matrix(1:10,2,5)) |
| colMeans | = apply(x,2,mean) | colMeans(matrix(1:10,2,5)) |
| replicate | repeated evaluation of an expression | replicate(20,rnorm(10)) |
| Function | Name | Example |
|---|---|---|
| abs | absolute value | abs(3-6) = 3 |
| sqrt | square root | sqrt(16) = 4 |
| ^ | exponentiation | 3^10 = \(3^{10}\) = 59049 |
| exp | exponential function | exp(1.7) = \(e^{1.7}\) = 5.473947 |
| log | log function (base e) | log(10) = 2.302585 |
| log10 | base 10 log (\(\log_{10}\)) | log10(100) = 2 |
| pi | mathematical constant \(\pi\) | pi = 3.141593 |
| sin, cos, tan | trigonometric functions (argument in radians) | sin(pi/2) = 1 |
| asin, acos, atan | inverse trigonometric functions | acos(1) = 0 |
| sinh, cosh, tanh | hyperbolic functions | cosh(0) = 1 |
| asinh, acosh, atanh | inverse hyperbolic functions | atanh(tanh(12)) = 12 |
| round(x,n) | round x to n decimal places | round(pi,2) = 3.14 |
| floor | rounds down | floor(14.7) = 14 |
| ceiling | rounds up | ceiling(14.7) = 15 |
Every distribution has four functions. There is a root name, for example, the root name for the normal distribution is norm. This root is prefixed by one of the letters
| Distribution | Functions |
|---|---|
| Normal | dnorm pnorm qnorm rnorm |
| \(\chi^2\) | dchisq pchisq qchisq rchisq |
| Student t | dt pt qt rt |
| F | df pf qf rf |
| Command | Purpose | Example |
|---|---|---|
| read.table, read.csv, … | read data from a text file | data <- read.table(“foo.txt”) |
| write.table, write.csv, … | write data to a text file | write.csv(data,“foo.csv”) |
| readLines | read text file line by line | data <- readLines(“foo.txt”) |
| writeLines | write text to file | writeLines(“write something…”, “foo.txt”) |
| save | write R objects to a file in binary format | save(x,y,z,file=“out.rda”) |
| load | reload datasets written with the function save() | load(“out.rda”) |
| dput | write R objects to a file in ASCII text | dput(x,“out.R”) |
| dget | load R objects written with dput() | x2 <- dget(“out.R”) |
| Command | Purpose | Example |
|---|---|---|
| plot | plot data | plot(dist ~ speed, data=cars) |
| hist | plot a histogram | hist(rnorm(1e5),freq=FALSE, breaks=100) |
| barplot | create a barplot | barplot(table(mtcars$cyl)) |
| boxplot | create boxplot | boxplot(mpg~cyl,data=mtcars) |
| curve | plot a function | curve(x^2, xlim=x(-1,1)) |
| legend | add legend to a plot | legend(1,2,c(“text1”,“text2”),col=1:2,lty=1:2) |
| abline | add straight lines to a plot | abline(1,2) |
| density | make a density plot | plot(density(rnorm(1e5))) |
| smoothScatter | make a smooth scatter plot | smoothScatter(x,y) |
| pairs | create a matrix of scatterplots | pairs(airquality) |
| lines | add a line to an existing plot | lines(x,y) |
| points | add points to an existing plot | points(x,y) |
The lattice library has to be loaded to R using library(lattice) before these commands can be used.
| Command | Purpose | Example |
|---|---|---|
| xyplot | make scatter plots | xyplot(y ~ x | factor, data=data_frame) |
| histogram | make histograms | histogram(~ x | factor, data=data_frame) |
| Command | Purpose | Example |
|---|---|---|
| lm | perform linear regression | lm(y ~ x, data=data_frame) |
| summary | summarize the regression result | summary(fit) |
| confint | calculate confidence interval | confint(fit, levels=0.9) |
| abline | plot regression line (for simple regression) | abline(fit,col=“red”) |
| predict | predict new data | predict(fit, newdata=data_frame) |
| resid | get the residuals | resid(fit) |
Here are some examples of formulae that can be used in linear regressions and the mdoels they represent.
| Formula | Model |
|---|---|
| y ~ x | \(y=\beta_0+\beta_1 x\) |
| y ~ x-1 or y ~ 0+x | \(y=\beta_0 x\) |
| y ~ x1 + x2 | \(y=\beta_0 + \beta_1 x1 + \beta_2 x2\) |
| y ~ x1 + x1:x2 | \(y=\beta_0 + \beta_1 x1 + \beta_2 x1\cdot x2\) |
| y ~ x1*x2 | \(y=\beta_0 + \beta_1 x1 + \beta_2 x2 + \beta_3 x1\cdot x2\) |
| y ~ x1*x2*x3 | \(y=\beta_0 + \beta_1 x1 + \beta_2 x2 + \beta_3 x3 + \beta_4 x_1\cdot x_2\) |
| \(+ \beta_5 x_1\cdot x_3 + \beta_6 x_2 \cdot x_3 + \beta_7 x_1\cdot x_2 \cdot x_3\) | |
| y ~ I(x1+x2) | \(y=\beta_0 + \beta_1 (x1+x2)\) |
| y ~ I(x1 + 4*x1*x2) + I(x2^3) - 1 | \(y=\beta_1 (x1+4\cdot x2) + \beta_2 x2^3\) |
| Command | Purpose | Example |
|---|---|---|
| Sys.Date() | get the current date | Sys.Date() |
| Sys.time() | get the current time | Sys.time() |
| as.Date | convert string to the Date class | as.Date(“2016-09-23”) |
| weekdays | returns the day of week | weekdays(Sys.Date()) |
| months | returns the month | months(Sys.Date()) |
| quarters | returns the quarter of the year | quarters(Sys.Date()) |
| system.time | computes the time needed for evaluating an expression | system.time( sum(1/(1:1e7)) ) |
The syntax is summarized in the summary of Week 11’s notes.
| Command | Purpose |
|---|---|
| chisq.test | \(\chi^2\) goodness-of-fit test and \(\chi^2\) independent test |
| t.test | 2-sample t-test |
| aov | F-test |
| pairwise.t.test | Pairwise t-test |