You should have completed lessons 1 and 3 of R programming in swirl before reading this article.
One thing you notice when opening an R console is that you can use it as a calculator. In addition to the basic arithmetic operations addition (+), subtraction (-), multiplication (*) and division (/), R has built-in standard mathematical functions. The following is a short list of standard mathematical functions.
Function | Name | Example |
---|---|---|
abs | absolute value | abs(3-6) = 3 |
sqrt | square root | sqrt(16) = 4 |
^ | exponentiation | 3^10 = \(3^{10}\) = 59049 |
exp | exponential function | exp(1.7) = \(e^{1.7}\) = 5.473947 |
log | log function (base e) | log(10) = 2.302585 |
log10 | base 10 log (\(\log_{10}\)) | log10(100) = 2 |
pi | mathematical constant \(\pi\) | pi = 3.141593 |
sin, cos, tan | trigonometric functions (argument in radians) | sin(pi/2) = 1 |
asin, acos, atan | inverse trigonometric functions | acos(1) = 0 |
sinh, cosh, tanh | hyperbolic functions | cosh(0) = 1 |
asinh, acosh, atanh | inverse hyperbolic functions | atanh(tanh(12)) = 12 |
round(x,n) | round x to n decimal places | round(pi,2) = 3.14 |
floor | rounds down | floor(14.7) = 14 |
ceiling | rounds up | ceiling(14.7) = 15 |
There are also useful statistical functions that we will talk about later. The ability of using R as a calculator allows us to analyze data interactively, as we will demonstrate later in the course. One distinct advantage of R over a conventional scientific calculator is its ability to perform vectorized operations, as demonstrated below.
As explained in lesson 3 of R programming in swirl, we can generate a sequential list of integer vector using :
. For example, 1:100
generates a vector of length 100 with values 1, 2, 3, … 100. Here “vector” simply means an array of numbers/characters/objects of the same class. We can also store the integer vector to a variable using the assignment operator <-
:
x <- 1:100
Many other programming languages use =
as an assignment operator. In R, you can use =
as an assignment operator too. For example, x = 1:100
is equivalent to x <- 1:100
in this context. However, as you will learn later that the =
operator in R has other uses. Assignments can also be made in the other direction, using the obvious change in the assignment operator. For example, 1:100 -> x
is equivalent to x <- 1:100
.
Suppose now I type
x <- 2*x - 1
What happens is that the original integer vector (1, 2, 3, …, 100) is replaced by (1, 3, 5, 7, …, 199) as each element is multiplied by 2 and then subtracted by 1. You can confirm this by typing x
to auto print its content:
x
[1] 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33
[18] 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67
[35] 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101
[52] 103 105 107 109 111 113 115 117 119 121 123 125 127 129 131 133 135
[69] 137 139 141 143 145 147 149 151 153 155 157 159 161 163 165 167 169
[86] 171 173 175 177 179 181 183 185 187 189 191 193 195 197 199
What happens if I type the following?
x <- c(1,-1)*x
As explained in lesson 1 of R programming in swirl, since c(1,-1)
is a vector of length 2 and x
is a vector of length 100, R “recycles” the c(1,-1)
vector 50 times to carry out the multiplication. The result is that the first element of the original vector is multiplied by 1, the second element is multiplied by -1, the third is multiplied by 1 and so on. So the content of x becomes:
x
[1] 1 -3 5 -7 9 -11 13 -15 17 -19 21 -23 25 -27
[15] 29 -31 33 -35 37 -39 41 -43 45 -47 49 -51 53 -55
[29] 57 -59 61 -63 65 -67 69 -71 73 -75 77 -79 81 -83
[43] 85 -87 89 -91 93 -95 97 -99 101 -103 105 -107 109 -111
[57] 113 -115 117 -119 121 -123 125 -127 129 -131 133 -135 137 -139
[71] 141 -143 145 -147 149 -151 153 -155 157 -159 161 -163 165 -167
[85] 169 -171 173 -175 177 -179 181 -183 185 -187 189 -191 193 -195
[99] 197 -199
Now let’s do the following:
x <- sum(1/x)
sum()
is a built-in R function that returns the sum of a vector. Thus, x
is now just a number (numeric vector of length = 1):
x
[1] 0.7828982
It is the result of the sum \[ 1-\frac{1}{3}+\frac{1}{5}-\frac{1}{7}+\cdots + \frac{1}{199}=\sum_{n=1}^{100} \frac{(-1)^{n-1}}{2n-1} \] The same calculation can be compressed into a one-line expression:
sum( c(1,-1)/(2*(1:100)-1) )
[1] 0.7828982
or this one-line expression
sum(c(1,-1)/seq(1,199,2))
[1] 0.7828982
The seq()
function is introduced in lesson 3 of the R programming in swirl. If you forget how it is used, type ?seq
in the R console to pull up a help page.
We see that a lengthy calculation can be carried out by just a one-line expression. If you are not impressed yet, try this:
s <- 4*sum( c(1,-1)/(2*(1:1e6)-1) )
By typing the above single line expression, we have just told R to carry out a sum over one million terms and then multiply by 4! The number 1e6
means 1000000
(1 followed by 6 zeros, or \(10^6\)), which is one million. The value stored in the variable s
is equal to \[4\left( 1-\frac{1}{3}+\frac{1}{5}-\frac{1}{7}+\cdots + \frac{1}{1999999}\right) = 4\sum_{n=1}^{10^6} \frac{(-1)^{n-1}}{2n-1} \] The value is
s
[1] 3.141592
If this number seems familiar to you, it’s because it’s close to the number \(\pi= 3.141592653589...\). In R, the variable pi
stores this number:
pi
[1] 3.141593
By default, R displays floating-point numbers to 7 significant figures even though R uses 8 bytes to store a floating-point number, corresponding to about 16 significant figures. You can change this default by the command options(digits=n)
, where n
is the number of significant figures you want R to display. For example,
options(digits=15)
sets the default printout to 15 significant figures:
c(s,pi)
[1] 3.14159165358979 3.14159265358979
So we see that s
and pi
only agree in the first 6 digits. In fact, it is well-known by mathematicians that the infinite series
\[ 4 \sum_{n=1}^{\infty} \frac{(-1)^{n-1}}{2n-1} \] converges to \(\pi\) but the convergence is very slow.
Now you can tell your friends that you have just learned a new skill to quickly compute the sum of a long series. You show your friends that you can sum the series of two million terms \[ 1+\frac{1}{2^2}+\frac{1}{3^2}+\cdots + \frac{1}{(2\times 10^6)^2}\] by just typing
sum(1/(1:2e6)^2)
[1] 1.64493356684835
Be aware, though, that if one of your friends is knowledgeable in math, he/she will laugh and point out that \[ \sum_{n=1}^{\infty} \frac{1}{n^2}=\frac{\pi^2}{6} \ ,\] which was first figured out by the famous mathematician Leonhard Euler in 1735. You will be amazed and type
pi^2/6
[1] 1.64493406684823
to confirm that your sum is indeed close to this number.
The sum()
function returns the sum of a vector. The prod()
function returns the product of a vector. For example, the vector 3:5
is an integer vector consisting of 3, 4, and 5, whereas prod(3:5)
returns \(3\times 4\times 5\) or 60:
prod(3:5)
[1] 60
Like the sum()
function, prod()
can be useful in some problems. Let’s consider the birthday problem encountered in Stat 100 to further illustrate its use.
What is the probability that at least two people share the same birthday in a set of \(n\) randomly chosen people? This is the famous birthday problem. If you forget how to solve it, look up your Stat 100 note or visit this website. The answer is given by the expression (ignore the leap date) \[P = 1 - \frac{365}{365}\cdot \frac{364}{365}\cdot \frac{363}{365}\cdots \frac{366-n}{365}\] In a class of 100 students, the probability that at least two students share the same birthday is \[ P = 1 - \frac{365}{365}\cdot \frac{364}{365}\cdot \frac{363}{365}\cdots \frac{266}{365} \] What is the numerical value of this \(P\)? Your Stat 100 instructor told you that it’s very close to 1. Do you believe it? Have you checked the calculation? When I first read the birthday problem from a book, I was suspicious of the claim. I used a calculator to carry out the calculation and confirmed the result. It took me a few minutes to finish the calculation. With R, it can be done by the following one-line expression:
1-prod((365:266)/365)
[1] 0.999999692751072
We see that it is indeed very close to 1. Let’s break down the expression to see why it gives the desired result. 365:266
is an integer vector containing 365, 364, 363, …, 266. (365:266)/365
divides each element in 365:266
by 365, so it’s a vector containing \[ \frac{365}{365}, \ \ \ \ \ \frac{364}{365}, \ \ \ \ \ \frac{363}{365}, \ \ \ \ \ \cdots \ \ \ \ \ \frac{266}{365}\] prod((365:266)/365)
returns the product of the vector in (365:266)/365
. That is, \[ {\rm prod((365:266)/365)} = \frac{365}{365}\cdot \frac{364}{365}\cdot \frac{363}{365}\cdots \frac{266}{365}\] which is the probability that all of the 100 students have different birthdays. This is a very small number:
prod((365:266)/365)
[1] 3.07248927851577e-07
Finally, 1-prod((365:266)/365)
is the probability that at least two students share the same birthday. Since prod((365:266)/365)
is very small (\(\approx 3 \times 10^{-7}\)), 1-prod((365:266)/365)
\(\approx 1\) as claimed.
Interestingly, R has a built-in function for the birthday problem. It is called the probability of coincidences. The function pbirthday(n)
returns the probability that at least 2 people share the same birthday among \(n\) randomly chosen people. For \(n=100\), the function gives
pbirthday(100)
[1] 0.999999692751072
exactly the same value calculated above. pbirthday()
has other optional parameters you can specify for generalized birthday problems. There is also an associated function qbirthday()
. Type ?pbirthday
for more information.