lm()
FunctionLinear regression is one of the most important class of models in Statistics. In R, the built-in function lm()
is the workhorse with many bells and whistles that handle linear models. Here we use the Stat 100 Survey 2, Fall 2015 (combined) data we worked on previously to demonstrate the use of lm()
to perform a simple linear regression. We will further explore the lm()
function in future lessons.
Since we have saved the data previously, we can reload it using the command
load('Stat100_Survey2_Fall2015.RData')
Using the ls()
command, we see that the data frame survey
has been loaded to the working space. To remind us of the variables in the data, we type
names(survey)
[1] "gender" "genderID" "greek"
[4] "homeTown" "ethnicity" "religion"
[7] "religious" "ACT" "GPA"
[10] "partyHr" "drinks" "sexPartners"
[13] "relationships" "firstKissAge" "favPeriod"
[16] "hoursCallParents" "socialMedia" "texts"
[19] "good_well" "parentRelationship" "workHr"
[22] "percentTuition" "career"
Recall that we plotted the number of drinks per week versus party hours per week. Here, let’s reverse the plot:
plot(partyHr ~ drinks, data=survey, pch=19, xlab="Drinks/week", ylab="Party hours/week")
It is apparent that as the number of drinks increases, the party hours tend to increase as well. To fit a linear model predicting party hours from drinks, we use the following R command
fit <- lm(partyHr ~ drinks, data=survey)
If you type the command class(fit)
, you will see that the data type is ‘lm’, something we haven’t see before. If we type fit
, it auto-prints the following
fit
Call:
lm(formula = partyHr ~ drinks, data = survey)
Coefficients:
(Intercept) drinks
2.3151 0.5459
It first tells us what is being fit in the model. Then it gives the coefficients of the fit. In our case, the intercept is 2.3151 and the slope of “drinks” is 0.5459. In other words, the linear model is \[ {\rm party\ hours} = 2.3151 + 0.5459 \times {\rm drinks}\] We can also look at the coefficients by typing
fit$coefficients
(Intercept) drinks
2.3150513 0.5458819
The variable fit$coefficients
is a numeric vector of length two, storing the values of the intercept and slope. The summary()
function provides more information about the fitted model:
summary(fit)
Call:
lm(formula = partyHr ~ drinks, data = survey)
Residuals:
Min 1Q Median 3Q Max
-22.609 -2.315 -0.774 1.685 47.685
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.31505 0.16409 14.11 <2e-16 ***
drinks 0.54588 0.01397 39.08 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.492 on 1135 degrees of freedom
Multiple R-squared: 0.5737, Adjusted R-squared: 0.5733
F-statistic: 1527 on 1 and 1135 DF, p-value: < 2.2e-16
In addition to the values of the intercept and slope, summary(fit)
also displays their estimated standard errors, the \(t\) values and the two-sided p-values \(P(>|t|)\). The two-sided p-values are calculated by assuming that the coefficients follow the \(t\) distributions with \(n-2\) degrees of freedom. Beside each p-value we see several stars. The number of stars denote the significance of the coefficient being nonzero. As the ‘Signif. codes’ underneath the coefficients show, 3 stars ‘***’ means the p-value is between 0 and 0.001; 2 stars ‘**’ means the p-value is between 0.001 and 0.01; one star ‘*’ means p-vale is between 0.01 and 0.05; ‘.’ means p-value is between 0.05 and 0.1; and space ‘ ’ means the p-value is between 0.1 and 1. The summary()
function also provides information about the residuals of the fit, \(R^2\) and residual standard error.
We can now use the abline()
function to add the regression line to the plot. Here we have the intercept a=fit$coefficients[1]
and slope b=fit$coefficients[2]
. Actually, we can simply use abline(fit)
to add the regression line:
plot(partyHr ~ drinks, data=survey, pch=19, xlab="Drinks/week", ylab="Party hours/week")
abline(fit, col="red")
Before we continue, let’s check that the lm()
function gives the results we expect. You have learned from Stat 100 and Stat 200 that for a simple linear model \(y=\beta_0+\beta_1 x\), the slope and intercept are given by \[\beta_1 = r \frac{SD_y}{SD_x} \ \ , \ \ \beta_0 = \bar{y}-\beta_1 \bar{x} ,\] where \(SD_x\) and \(SD_y\) are the standard deviations of \(x\) and \(y\), \(\bar{x}\) and \(\bar{y}\) are the means of \(x\) and \(y\), and \(r\) is the correlation coefficient between \(x\) and \(y\). Let’s confirm the results of lm()
:
r <- cor(survey$drinks,survey$partyHr)
SDy_div_SDx <- sd(survey$partyHr)/sd(survey$drinks)
slope_drinks <- r*SDy_div_SDx
intercept <- mean(survey$party) - slope_drinks*mean(survey$drinks)
c(intercept,slope_drinks)
[1] 2.3150513 0.5458819
These are exactly the values returned by the lm()
function. We know that \(R^2\) in simple regression is the same as the square of the correlation coefficient. Let’s check it:
r^2
[1] 0.5736767
It again agrees with the \(R^2\) returned by summary(fit)
. You also learned that the standard error for the slope is given by \[SE^+_{\rm slope}=\sqrt{\frac{1-r^2}{n-2}}\, \frac{SD_y}{SD_x}\] Let’s check it:
n <- nrow(survey)
(SE_slope <- sqrt((1-r^2)/(n-2))*SDy_div_SDx)
[1] 0.01396808
We again see that it agrees with the value returned by summary(fit)
. As we mention in last week’s notes, putting the assignment statement SE_slope <- sqrt((1-r^2)/(n-2))*SDy_div_SDx
inside a parenthesis ()
not only assigns the value to SE_slope
but also prints it on the screen. The \(t\) value for the slope is
(t_slope <- slope_drinks/SE_slope)
[1] 39.08067
and the two-sided p-value can be computed by the pt()
function (with df=n-2):
2*pt(-t_slope,n-2)
[1] 2.353119e-212
These are all consistent with the values returned by summary(fit)
.
Having obtained the coefficients and standard errors, we can calculate the confidence interval (CI) for these coefficients. The assumption is that these coefficients follow the \(t\) distribution with \(n-2\) degrees of freedom.
As an example, let’s calculate the 95% CI for the slope. We first need to know the t-value, \(t^*\), corresponding to the 95% CI.
As shown in the graph above1, if the middle area under the t-curve is 95%, the area of each tail is (100%-95%)/2 or 2.5%. The value of \(t^*\) can be calculated using the qt()
function:
(tstar <- qt(0.025,n-2,lower.tail=FALSE))
[1] 1.962056
This is also equal to -qt(0.025,n-2)
since the t-curve is symmetric. The CI of the slope is given by \({\rm slope} \pm t_\alpha SE^+_{\rm slope}\). We can use slope_drinks
and SE_slope
we computed above to calculate the CI:
(slope_95CI <- c(slope_drinks-tstar*SE_slope, slope_drinks+tstar*SE_slope))
[1] 0.5184757 0.5732880
In fact, we don’t need to do this calculation. R already has a built-in function confint()
that does this calculation for us. However, it is useful to go through this exercise to understand what the function does. The confint()
function takes at least three arguments: confint(object,para,level=0.95,...)
. For our purpose here, we will ignore the … arguments. The object
can be the object returned by the lm()
function, e.g. our fit
. para
specifies which parameters the CI is to be calculated, and level
is the CI level. The default is 0.95 (95%). To find the 95% CI for the slope, we can type
confint(fit,'drinks')
2.5 % 97.5 %
drinks 0.5184757 0.573288
We can also use confint(fit,2)
since the slope for ‘drinks’ is the second coefficient. We see that these numbers are the same as those we calculated above.
If the para
option is omitted in confint()
, the CI for all parameters will be computed:
confint(fit)
2.5 % 97.5 %
(Intercept) 1.9930998 2.637003
drinks 0.5184757 0.573288
We can change the confidence level by adjusting the level
parameter. For example, to compute the 80% CI, we type
confint(fit,level=0.8)
10 % 90 %
(Intercept) 2.1046405 2.5254620
drinks 0.5279706 0.5637931
Having obtained the intercept and slope for the regression, we can use them to predict the outcome. For example, we construct the following function to take an input vector drinks
and predict the party hours based on the regression coefficients beta
:
predict_partyHrfromDrinks <- function(drinks, beta) {
beta[1] + beta[2]*drinks
}
Here the input parameter beta
is assumed to be a numeric vector of length 2, with beta[1]
being the intercept and beta[2]
being the slope. As an example, we can predict the party hours for the first 10 students in the survey data by the following command:
predict_partyHrfromDrinks(survey$drinks[1:10],fit$coefficients)
[1] 5.044461 7.773870 3.406815 10.503279 3.406815 2.860933 3.952697
[8] 10.503279 2.860933 6.136224
Actually, there is an easier way to get the answer. The predicted values have already been calculated and are stored in fit$fitted.values
. To see the first 10 fitted values, we simply type
fit$fitted.values[1:10]
1 2 3 4 5 6 7
5.044461 7.773870 3.406815 10.503279 3.406815 2.860933 3.952697
8 9 10
10.503279 2.860933 6.136224
and we see that they are the same values as calculated above, with names given by rownames(survey)
, which are simply the row numbers.
We can use our function predict_partyHrfromDrinks
to predict the party hours from new data. For example,
drinks <- c(8,5.5,0.2,2)
predict_partyHrfromDrinks(drinks,fit$coefficients)
[1] 6.682106 5.317401 2.424228 3.406815
It turns out that the predict()
function together with fit
can also be used for new data, but the new data must be put into a data frame and this data frame must contain the independent variables drinks
. Then we use the option newdata=newdataframe
to get the predicted outcome. For example,
new_drinks <- data.frame(drinks = c(8,5.5,0.2,2))
rownames(new_drinks) <- c("John","Mary","Jill","Patrick")
predict(fit, newdata=new_drinks)
John Mary Jill Patrick
6.682106 5.317401 2.424228 3.406815
Note that when we add the row names to the data frame, the vector returned by predict()
also carries these names. The new data frame can contain columns other than drinks
and predict()
will ignore those other columns:
new_data <- data.frame(gender=c("male","female","female","male"),
drinks = c(8,5.5,0.2,2), partyHr = c(1,2,3,4))
rownames(new_data) <- c("John","Mary","Jill","Patrick")
predict(fit, newdata=new_data)
John Mary Jill Patrick
6.682106 5.317401 2.424228 3.406815
predict()
functionIn past semesters, students ran into problems with the predict()
function. One common problem arises from using a clumsy formula in fitting the linear model. For example, instead of using
fit <- lm(partyHr ~ drinks, data=survey)
they used
fit_bad <- lm(survey$partyHr ~ survey$drinks)
This fitting formula works as it gives the correct regression result. However, you run into problem if you try to use the predict()
function to predict new data. For example,
predict(fit_bad, newdata=new_data)
doesn’t give you the correct result. This is because predict()
needs to know which variable in new_data
is survey$partyHr
and which is survey$drinks
(the names used in fit_bad
) and can’t find any name match. There are partHr
and drinks
in new_data
but no survey$partyHr
nor survey$drinks
.
The fit
variable generated by the lm()
function also contains the residuals of the fitted data. The vector fit$residuals
stores the residuals of all the data in the data frame. To see the first 10 residuals, we can type
fit$residuals[1:10]
1 2 3 4 5 6
-4.0444605 -3.7738698 1.5931850 14.4967209 2.5931850 -2.8609331
7 8 9 10
-0.9526968 4.4967209 0.1390669 -3.1362243
Another way to get the residuals is to use the function resid()
:
res <- resid(fit)
res[1:10]
1 2 3 4 5 6
-4.0444605 -3.7738698 1.5931850 14.4967209 2.5931850 -2.8609331
7 8 9 10
-0.9526968 4.4967209 0.1390669 -3.1362243
Recall that a residual is defined as the difference between the actual value and the predicted value. We can compute our own residuals and compare with those returned by fit$residuals
:
my_residuals <- survey$partyHr - predict_partyHrfromDrinks(survey$drinks,fit$coefficients)
max( abs(my_residuals - fit$residuals) )
[1] 2.564171e-12
So we see that my_residuals
and fit$residuals
are practically the same. The small difference is likely due to numerical roundoff error.
The residuals have two important properties: they sum to zero and the correlation coefficient between them and the independent valuable \(x\) is exactly 0. These properties follow from the fact that the regression coefficients \(\beta_0\) and \(\beta_1\) are constructed to minimize the sum of the square of the residuals (SSE). This is called the least square prescription. The mathematical proof requires calculus. If you are interested and have the math background, you can read these pdf notes.
Here we skip the math and simply use R to check these assertions using our fitted model fit
sum(fit$residuals)
[1] -4.94535e-13
cor(fit$residuals,survey$drinks)
[1] 5.779052e-18
We see that the claims are indeed supported by our numerical calculations. Since there are \(n=\) 1137 observations in the data frame survey
, there are also \(n\) residuals. However, these residuals are not independent: they must sum to 0 and have zero correlation with drinks
. As a result, there are only \(n-2=\) 1135 independent residuals. If you tell me any of the \(n-2\) residuals, I can figure out the remaining two by the requirements that they must sum to 0 and they must have zero correlation with drinks
. For this reason, we say that the residuals have \(n-2\) degrees of freedom. The residual degrees of freedom of our fitted model is stored in fit$df.residual
:
fit$df.residual
[1] 1135
Note that the sum of residuals is 0 and that they have zero correlation with \(x\) are true only if the regression coefficients are given by the correct formulae \(\beta_1=r SD_y/SD_x\) and \(\beta_0=\bar{y}-\beta_1 \bar{x}\). Other values of \(\beta_0\) and \(\beta_1\) won’t have these properties. To demonstrate this, we construct a function to calculate the sum of residuals and the correlation for a general set of \(\beta_0\) and \(\beta_1\):
sumRes_cor <- function(beta,x,y) {
residuals <- y - (beta[1] + beta[2]*x)
output <- c( sum(residuals), cor(x,residuals) )
names(output) <- c("sum(residuals)","cor(x,residuals)")
output
}
In this function, y
is the dependent variable, x
is the independent variables, beta[1]
is the intercept and beta[2]
is the slope. Note that we assign names to the output vector to make the output easier to read. Let’s first check that it gives the expected results 0 and 0 when correct values of beta
are given:
sumRes_cor(fit$coefficients, survey$drinks, survey$partyHr)
sum(residuals) cor(x,residuals)
-4.090950e-12 1.709145e-14
Now let’s try a few other values of \(\beta_0\) and \(\beta_1\) to see what happens:
sumRes_cor(c(0,0),survey$drinks, survey$partyHr)
sum(residuals) cor(x,residuals)
6889.0000000 0.7574145
sumRes_cor(c(2,0.5),survey$drinks, survey$partyHr)
sum(residuals) cor(x,residuals)
716.00000000 0.09704024
sumRes_cor(c(0.5,1),survey$drinks, survey$partyHr)
sum(residuals) cor(x,residuals)
-1477.5000000 -0.6944076
As expected, they are not zero. You can try more values if you want.
Given a set of data points \((x_1,y_1), (x_2,y_2), (x_3,y_3),\cdots (x_n,y_n)\), and a linear model \[\hat{y}_i = \beta_0 + \beta_1 x_i\] we define the quantities SST, SSM and SSE as \[SST = \sum_{i=1}^n (y_i-\bar{y})^2\] \[SSM = \sum_{i=1}^n (\hat{y}_i-\bar{y})^2\] \[SSE = \sum_{i=1}^n (y_i-\hat{y}_i)^2\]
Note that SST is independent of the model parameters \(\beta_0\) and \(\beta_1\) and is given by \(SST=nSD_y^2\) from the definition of the standard deviation. The other two quantities SSM and SSE depend on the parameters. You are told that if \(\beta_0\) and \(\beta_1\) are given by the regression formulae, then SST=SSM+SSE, \(SSM=nr^2SD_y^2\) and \(SSE=n(1-r^2)SD_y^2\). The mathematical proof can be found in these notes for mathematically inclined students. Here we will confirm these claims using R.
Below is a simple function that calculates SSE, SSM and SST-SSM-SSE.
computeSS <- function(beta,x,y) {
n <- length(x)
SDy2 <- var(y)*(n-1)/n
SST <- n*SDy2
yhat <- beta[1] + beta[2]*x
SSM <- sum((yhat-mean(y))^2)
SSE <- sum((y-yhat)^2)
output <- c(SSE,SSM,SST-SSE-SSM)
names(output) <- c("SSE","SSM","SST-SSE-SSM")
output
}
Let’s try it on our survey data and the fit
coefficients and see if it works:
(fitSS <- computeSS(fit$coefficients, survey$drinks, survey$partyHr))
SSE SSM SST-SSE-SSM
2.290339e+04 3.081966e+04 9.094947e-10
We see that the equality SST=SSM+SSE holds. Let’s now confirm that the values of SSM and SSE are given by \(nr^2SD_y^2\) and \(n(1-r^2)SD_y^2\):
SDy2 <- var(survey$partyHr)*(n-1)/n
c(fitSS["SSM"] - n*r^2*SDy2, fitSS["SSE"] - n*(1-r^2)*SDy2)
SSM SSE
-8.985808e-10 -7.275958e-12
This confirms the claims.
The residual standard error (RSE) is defined as the square root of SSE divided by the degrees of freedom: \[RSE = \sqrt{SSE/(n-2)}\] Having calculated SSE, we can easily compute the RSE:
RSE <- sqrt( fitSS["SSE"]/(n-2))
RSE
SSE
4.492126
This is the same as the “residual standard error” returned by summary(fit)
(see the output above). It estimates the standard deviation of the residuals.
Note that if \(\beta_0\) and \(\beta_1\) take values other than those given by the regression formulae, the equality SST=SSM+SSE no longer holds. We can verify that by trying a few other values:
computeSS(c(1,2), survey$drinks, survey$partyHr)
SSE SSM SST-SSE-SSM
326822.0 498932.9 -772031.8
computeSS(c(-1,3), survey$drinks, survey$partyHr)
SSE SSM SST-SSE-SSM
853526 1138554 -1938357
computeSS(c(5,1), survey$drinks, survey$partyHr)
SSE SSM SST-SSE-SSM
82474.0 141667.9 -170418.9
Residual plots are useful to assess the validity of a linear model. The basic assumption of the linear model is that the data (x,y) can be approximated \(y=\beta_0 +\beta_1 x + \epsilon\) with the residual \(\epsilon\) independent of \(x\) and is random. If the residual \(\epsilon\) as a function of \(x\) shows a pattern, it suggests that the linear model is not an accurate description of the data and the relationship between \(x\) and \(y\) may be nonlinear. As an example, consider the following simple data:
set.seed(367134)
x <- seq(0,2,0.01)
y <- (x-0.5)^2 + 0.05*rnorm(length(x))
plot(x,y, pch=20)
The relationship between \(x\) and \(y\) is clearly nonlinear. If we fit a linear model and plot the residuals vs x, we see a pattern:
fit1 <- lm(y~x)
plot(x, fit1$residuals, pch=20, ylab="residuals")
abline(h=0)
This shows that the residual function \(\epsilon(x)\) is not completely random, indicating that the linear model is not an accurate description of the data.
Finally, we should caution that the estimate of standard errors, p-values, and confidence intervals are based on the assumptions that (1) the model is linear, (2) the data points are independent, (3) the the variance of the residuals is independent of the values of the independent variable \(x\), and (4) \(\epsilon\) follows a normal distribution. If these assumptions are not true, the estimated values may be used for reference but cannot be taken seriously. The last assumption can be relaxed if the sample size is sufficiently large because of the central limit theorem. The third assumption, also known as homoscedasticity, is essential. The absence of homoscedasticity is called heteroscedasticity.
We can get a sense of how good the homoscedasticity assumption holds from the residual plots. For example, in the linear model predicting party hours from number of drinks, the residual plot is
plot(survey$drinks, fit$residuals, pch=20, xlab="Drinks/week", ylab="Residuals", las=1)
abline(h=0)
This plot shows that the homoscedasticity assumption does not hold well, although the deviation is not too big. Another useful plot is residuals versus the fitted values (\(\hat{y}\)):
plot(fit$fitted.values, fit$residuals, pch=20, xlab="Predict", ylab="residual", las=1)
abline(h=0)
This plot is essentially the same as the previous plot. This is expected because \(\hat{y}=\beta_0+\beta_1 x\), so we simply change the scaling of the horizontal axis. You will see that this plot is useful when we are dealing with regression with multiple variables. Instead of making plots of the residuals versus each of the independent variable, we usually just plot the residuals versus the predicted values, which are a linear combination of the values of all independent variables.
From the plots, we conclude that the homoscedasticity assumption does not hold very well. However, the deviation from the assumption does not seem to be too large for us to completely ignore the diagnostics returned by the linear model. Therefore, for this data set the standard errors, p-values, confidence intervals can be used as reference numbers but should not be taken too seriously.
Finally, we want to remind you of the trick in R called “partial matching” mentioned in Section 10.6 of Peng’s textbook. The idea is that when there is no ambiguity, R allows you to abbreviate the name of a variable. For example, instead of typing fit$coefficients
to get the coefficients of the regression, we can simply type fit$coef
or fit$coe
:
fit$coef
(Intercept) drinks
2.3150513 0.5458819
fit$coe
(Intercept) drinks
2.3150513 0.5458819
There are no vectors named fit$coe
or fit$coef
. When you type fit$coe
, for example, R tries to match an existing object whose name begins with the names you type. In this case, R finds that fit$coe
partially matches the existing object fit$coefficients
and so it prints the content of fit$coefficients
.
As an another example, instead of typing fit$residuals
to get the residuals, we can simply type fit$res
:
fit$res[1:10]
1 2 3 4 5 6
-4.0444605 -3.7738698 1.5931850 14.4967209 2.5931850 -2.8609331
7 8 9 10
-0.9526968 4.4967209 0.1390669 -3.1362243
Partial matching can also be used in function arguments. For example, instead of typing pnorm(1.65, lower.tail=FALSE)
, we can type pnorm(1.65, low=F)
.
This partial matching trick is particularly useful in interactive work, but is discouraged to use excessively in programming because having abbreviations all over the places may make the code hard to understand.
We introduce a few commands related to the lm()
function. We do not just tell you how to use them, but also include a number of side calculations to explain where the numbers come from. Here we summarize all the commands and ideas we have covered.
The lm()
function: fitting a linear model, e.g.
fit <- lm(partyHr ~ drinks, data=survey)
lm()
(e.g. fit
in the example above) is an ‘lm’ class. When you auto-print it, it returns the calling method and the fitted coefficients.
summary(object)
function provides more information on the object of the ‘lm’ class. In addition to the calling method and coefficients, it also summarizes the residuals of the fit, standard errors, t-values and p-values of the coefficients, residual standard error (RSE), the degrees of freedom, \(R^2\) and so on.
fit
as an example of an ‘lm’ class object. A few components of fit
we mentioned here includefit$coefficients
: a named vector of coefficients
fit$residuals
: the residuals
fit$fitted.values
: the fitted values
fit$df.residual
: the residual degrees of freedom
resid()
function. For example, resid(fit)
returns the same vector as fit$residuals
.
confint(object,para,level=0.95)
: returns the confidence interval of the parameters in para
. If para
is omitted, confidence interval for all parameters in object
will be returned. The confidence level can be changed from the default value 95% by modifying the level
parameter.
When the abline()
function is applied to an ‘lm’ class object, the fitted straight line in the object will be added to an existing plot, e.g.
plot(partyHr ~ drinks, data=survey, pch=19, xlab="Drinks/week", ylab="Party hours/week")
abline(fit, col="red")
The function predict(object, newdata=dataframe)
returns the fitted values of the outcomes for the data in dataframe
, e.g.
predict(fit, newdata=data.frame(drinks=c(3,4,8,2,16)))
1. In case you want to know how the graph is generated, here is the code:
p <- -qt(0.025,n-2)
xmax <- 4
npoly <- 100
xr <- seq(p,xmax,length.out=npoly)
yr <- dt(xr,n-2)
xmid <- seq(-p,p,length.out=npoly)
ymid <- dt(xmid,n-2)
curve(dt(x,n-2), xlim=c(-xmax,xmax), xlab="",ylab="", xaxt="n", yaxt="n")
polygon(c(p,xr,xmax),c(0,yr,0),col="red")
polygon(c(-p,-xr,-xmax),c(0,yr,0),col="red")
polygon(c(-p,xmid,p),c(0,ymid,0),col="skyblue")
text(0,0.2,"95%")
text(2.7,0.035,"2.5%")
text(-2.7,0.035,"2.5%")
mtext(expression(-t^"*"), side=1, at=-p, line=1, cex=1.3)
mtext(expression(t^"*"), side=1, at=p, line=1, cex=1.3)
The polygon()
function is used to shade the areas under the curve; text()
is used to add texts in the plot; mtext()
is used to add texts in a margin of the plot; expression()
is used to write mathematical symbols.