Utility companies, which must plan the operation and expansion of electricity generation, are vitally interested in predicting customer demand over both short and long periods of time. A short-term study was conducted to investigate the effect of each month’s mean daily temperature (MDT) and of cost per kilowatt-hour, PKWH, on the mean daily consumption (in kWh), MDC, per household. The company officials expected the consumption of electricity to rise in cold weather (due to heating), fall when the weather was moderate, and rise again when the temperature rose and there was a need for air conditioning. They expected consumption to decrease as the cost per kilowatt-hour increased, reflecting greater attention to conservation. Data were collected for 2 years, a period during which the cost per kilowatt-hour PKWH increased due to the increasing costs of fuel.

The data set has only 24 observations. Instead of asking you to download it and then load it to R, it is easier to just enter the data using the following code:

column1 <- rep(c(8,10), each=12)
column2 <- c(31, 34, 39, 42, 47, 56, 62, 66, 68, 71, 75, 78, 32, 36, 39, 42, 48, 56, 62, 66, 68, 72, 75, 79)
column3 <- c(55, 50, 46, 45, 40, 43, 41, 46, 44, 51, 62, 73, 50, 44, 42, 42, 38, 40, 39, 42, 42, 44, 50, 55)
utility <- data.frame(PKWH=column1, MDT=column2, MDC=column3)
rm(list=paste0('column',1:3))

This is again a very small data set and you can view the whole data by typing name of the data frame utility. The first column, PKWH, is the cost per kilowatt-hour in cents. The second column is MDT, the mean daily temperature in °F. The third column is MDC, the mean daily consumption in kilowatt-hour. In the two-year period, PKWH took only two values: 8 cents in the first year and 10 cents in the second year.

a. (3 points)

Make a scatter plot of MDC versus MDT. Plot the data in different colors for the two different values of PKWH. Explain your color code (i.e. which color represents PKWH = 8 cents and which represents PKWH = 10 cents) in either a figure legend (preferred) or a figure caption.

Here’s the plot:

plot(MDC ~ MDT, data=utility, pch=16, col=PKWH/2-3, las=1, 
     xlab=expression(paste("mean daily temperature (",degree,"F)")), 
     ylab="mean daily consumption (kWh)")
legend(40,70,c("8 cents","10 cents"), pch=16, col=1:2)

Note that we use col=PKWH/2-3 to assign two different colors to the two PKWH values: col=8/2-3=1 (black) for PKWH=8 and col=10/2-3=2 (red) for PKWH=10.

b. (3 points)

Fit a model predicting MDC from MDT with a quadratic function MDC = β0 + β1MDT + β2MDT2.

fit1 <- lm(MDC ~ MDT + I(MDT^2), data=utility)
summary(fit1)

Call:
lm(formula = MDC ~ MDT + I(MDT^2), data = utility)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.8940 -2.1736  0.1361  1.6713 12.9347 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 131.429509  14.339808   9.165 8.71e-09 ***
MDT          -3.562033   0.558075  -6.383 2.51e-06 ***
I(MDT^2)      0.033937   0.005067   6.698 1.25e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.537 on 21 degrees of freedom
Multiple R-squared:  0.7117,    Adjusted R-squared:  0.6843 
F-statistic: 25.92 on 2 and 21 DF,  p-value: 2.128e-06

So the fitting equation is

MDC = 131.4 - 3.562 MDT + 0.03394 MDT2

c. (6 points)

The model in (b) ignores the cost per kilowatt-hour PKWH. If you look at your plot of MDC versus MDT with color-coded PKWH in (a), you will notice that the values of MDC are smaller for PKWH = 10 cents at similar MDT. This suggests that the fitting formula should include PKWH. Even though PKWH has only two values in the data set, we will treat it as a continuous variable in the regression. Fit a multiple regression model predicting MDC from MDT and PKWH with a quadratic function in MDT and with interaction terms:

MDC = β0 + β1MDT + β2MDT2 + PKWH (β3 + β4MDT + β5MDT2)

Include interaction terms with PKWH:

fit2 <- lm(MDC ~ PKWH*MDT + PKWH*I(MDT^2), data=utility)
summary(fit2)

Call:
lm(formula = MDC ~ PKWH * MDT + PKWH * I(MDT^2), data = utility)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.5193 -1.3595  0.0094  1.0129  4.9988 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)   334.387157  73.567368   4.545 0.000251 ***
PKWH          -22.517584   8.169702  -2.756 0.012998 *  
MDT           -11.760259   2.868325  -4.100 0.000672 ***
I(MDT^2)        0.117026   0.026081   4.487 0.000285 ***
PKWH:MDT        0.908281   0.317832   2.858 0.010456 *  
PKWH:I(MDT^2)  -0.009198   0.002885  -3.188 0.005091 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.575 on 18 degrees of freedom
Multiple R-squared:  0.9204,    Adjusted R-squared:  0.8983 
F-statistic: 41.62 on 5 and 18 DF,  p-value: 2.836e-09

So the fitting equation is

MDC = 334.4 - 11.76 MDT + 0.117 MDT2 + PKWH (-22.52 + 0.9083 MDT - 0.009198 MDT2)

d. (4 points)

Split the equation in (c) into two equations, one for PKWH = 8 cents and one for PKWH = 10 cents. Your equations should look like the following:

PKWH = 8 cents: MDC = (some number) + (some number) MDT + (some number) MDT2

PKWH = 10 cents: MDC = (some number) + (some number) MDT + (some number) MDT2

Note: Use the method you learned in Stat 200 to split the multiple regression equation (see, e.g., Example 1 of Ch. 17 in the Stat 200 nootbook). You don’t need to use any R command.

The 8 cents equation is obtained by setting PKWH=8 in the equation in (c). The result is

PKWH = 8 cents: MDC = 334.4 - 11.76 MDT + 0.117 MDT2 + 8×(-22.52 + 0.9083 MDT - 0.009198 MDT2),

which is simplified to

PKWH = 8 cents: MDC = 154.24 - 4.4936 MDT + 0.043416 MDT2

The 10 cents equation is obtained by setting PKWH=10 in the equation in (c). The result is

PKWH = 10 cents: MDC = 334.4 - 11.76 MDT + 0.117 MDT2 + 10×(-22.52 + 0.9083 MDT - 0.009198 MDT2),

which is simplified to

PKWH = 10 cents: MDC = 109.2 - 2.677 MDT + 0.02502 MDT2

e. (6 points)

Plot the fitted curves in (d) together with the data for the two values of PKWH. Make sure to color code the data and curves for each of the two PKWH values. Explain your colors and curves clearly, either in a figure legend (preferred) or in a figure caption.

As you’ve seen in Week 12’s notes, there are at least two methods to create the plot.

Method 1: Use predict()

The following is a plot summarizing the data and fitted curves. The range of MDT in the data is from 31 to 79. So we create a vector x of length 100 with points uniformly spaced between 31 and 79. We then use the predict() function to calculate the fitted values for each of the two PKWH values. Data points are first plotted with the plot() function. Then the fitted values are added by the lines() function.

x <- seq(31,79, length=100) # Range of MDT in the data is 31 - 79
fitted_8cents <- predict(fit2, newdata=data.frame(MDT=x, PKWH=8))
fitted_10cents <- predict(fit2, newdata=data.frame(MDT=x, PKWH=10))
plot(MDC ~ MDT, data=utility, pch=PKWH/2+12, col=PKWH/2-3, 
     las=1, ylab="mean daily consumption (kWh)", 
     xlab=expression(paste("mean daily temperature (",degree,"F)")))
lines(x, fitted_8cents, lwd=2)
lines(x, fitted_10cents, col="red", lwd=2, lty=2)
legend("top",c("data: PKWH = 8 cents","data: PKWH = 10 cents", 
               "fitted curve: PKWH = 8 cents", "fitted curve: PKWH = 10 cents"), 
       pch=c(16,17,-1,-1), lty=c(0,0,1,2), lwd=2, col=c(1,2,1,2))

Note that in data.frame(MDT=x, PKWH=8), x is a vector of length 100 and 8 is a vector of length 1, but data.frame(MDT=x, PKWH=8) still returns a data frame with 100 rows and 2 columns because 8 is recycled 99 times to match the length of x.

Method 2: Use curve()

The equations for the two curves are given in part (d) or you can simply recalculate them. Then you can use the curve() function to add the two curves to the data.

plot(MDC ~ MDT, data=utility, pch=PKWH/2+12, col=PKWH/2-3, 
     las=1, ylab="mean daily consumption (kWh)", 
     xlab=expression(paste("mean daily temperature (",degree,"F)")))
curve(fit2$coef[1]+fit2$coef[3]*x +fit2$coef[4]*x^2 + 
        8*(fit2$coef[2]+fit2$coef[5]*x+fit2$coef[6]*x^2), lwd=2, add=TRUE)
curve(fit2$coef[1]+fit2$coef[3]*x +fit2$coef[4]*x^2 + 
        10*(fit2$coef[2]+fit2$coef[5]*x+fit2$coef[6]*x^2), 
      lwd=2, lty=2, col="red", add=TRUE)
legend("top",c("data: PKWH = 8 cents","data: PKWH = 10 cents", 
               "fitted curve: PKWH = 8 cents", "fitted curve: PKWH = 10 cents"), 
       pch=c(16,17,-1,-1), lty=c(0,0,1,2), lwd=2, col=c(1,2,1,2))

f. (3 points)

Suppose the weather forecast says the mean daily temperature on a particular day is 75°F. What is the expected consumption of electricity according to the model in (c) if the cost per kilowatt-hour is 9 cents? Give your answer to the nearest integer. Don’t forget to include units.

Use the predict() function to do the calculation.

predict(fit2, newdata=data.frame(MDT=75, PKWH=9))
       1 
55.40747 

Rounded to the nearest integer, the predicted consumption of electricity is 55 kWh.