Chapter 3 Estimating SLR Parameters

In Example 2.8, we saw how to interpret the values of estimated slopes and intercepts. But how do we obtain estimates of β0, β1, and σ2? Ideally, the line should be as close as possible to the data. But for any real-world dataset, the values will not lie directly in a straight line. So how do we define “close enough”?

3.1 Ordinary Least Squares Estimation of Parameters

The standard way is to choose values for β0 and β1 to minimize the sum of squared vertical distances between the observations yi and the regression line. We will denote these chosen values as ˆβ0 and ˆβ1. Minimizing the distance between yi and the fitted line ˆβ0+ˆβ1xi is equivalent to minimizing the sum of squared residuals. Confusingly, this is often called the residual sum of squares and it is defined as4 SSRes=SSRes(ˆβ0,ˆβ1)=ni=1e2i=ni=1(yi(ˆβ0+ˆβ1xi))2 Estimating ˆβ0 and ˆβ1 by minimizing SSRes is called ordinary least squares (OLS) estimation of the SLR parameters. In the rest of this subsection we will explain how to minimize this function mathematically. But in practice this is done by software–so if you just want to know that, skip ahead to Section 3.3.

To find the minimum of the value in (3.1), we can use standard calculus steps for finding a local extrema:

  1. Find the derivatives of SSRes(ˆβ0,ˆβ1) with respect to ˆβ0 and ˆβ1.
  2. Set the derivatives equal to zero.
  3. Solve for the values of ˆβ0 and ˆβ1

This procedures yields the “critical point” of SSRes(ˆβ0,ˆβ1) (in “β-space”), which we know to be a minimum because SSRes(ˆβ0,ˆβ1) is a convex function.

Step 1: Find the two partial derivatives of SSRes(ˆβ0,ˆβ1):

SSRes(ˆβ0,ˆβ1)=ni=1(y2i2ˆβ0yi2ˆβ1yixi+ˆβ20+2ˆβ0ˆβ1xi+ˆβ21x2i)SSResˆβ0=ni=1(2yi+2ˆβ0+2ˆβ1xi)=2ni=1(yiˆβ0ˆβ1xi)SSResˆβ1=ni=1(2yixi+2ˆβ0xi+2ˆβ1x2i)=2ni=1(yiˆβ0ˆβ1xi)xi

Step 2: Set them equal to zero: 2ni=1(yiˆβ0ˆβ1xi)=02ni=1(yiˆβ0ˆβ1xi)xi=0

and simplify: nˆβ0+ˆβ1ni=1xi=ni=1yiˆβ0ni=1xi+ˆβ1ni=1x2i=ni=1xiyi

Step 3: Solve for ˆβ0 in (3.2): nˆβ0+ˆβ1ni=1xi=ni=1yiˆβ0=1nni=1yiˆβ11nni=1xi=¯y¯xˆβ1

Plug ^β0 into (3.3) and solve for ˆβ1: (1nni=1yiˆβ11nni=1xi)ni=1xi+ˆβ1ni=1x2i=ni=1xiyi(ni=1x2i1n(ni=1xi)2)ˆβ1=ni=1xiyi1nni=1yini=1xi

ˆβ1=ni=1xiyi1nni=1yini=1xini=1x2i1n(ni=1xi)2=1nni=1xiyi¯y¯x1nni=1x2i¯x2

Equations (3.4) and (3.5) together give us the formulas for computing both regression parameters estimates. ˆβ0 and ˆβ1 are the (ordinary) least squares estimators of β0 and β1

3.2 Understanding ˆβ1 with correlation

A common alternative to the form of ˆβ1 in (3.5) is to define the following sums-of-squares:

  • S2x=ni=1x2i1n(ni=1xi)2
  • Sxy=ni=1xiyi1nni=1yini=1xi
  • S2y=ni=1y2i1n(ni=1yi)2

The sums-of-squares S2x and S2y represent the total variation in the x’s and y’s, respectively. The value of Sxy is the co-variation between the x’s and the y’s and takes the same sign as their correlation. In fact, we can show that the Pearson correlation between x and y is equal to:

rxy=SxySxSy

Using these quantities, an equivalent formula for ˆβ1 is ˆβ1=SxyS2x=rxySySx In this form, we can see that the slope of the regression line depends on the correlation betwee x and y and the amount variation in each. Importantly, notice that the sign of ˆβ1 (i.e., whether it is positive or negative) is the same as the sign of the correlation rxy. Thus, if two variables are positively correlated, their SLR slope will be positive. And if they are negatively correlated, their slope will be negative. We will return to this idea in Section 4.1.

3.3 Fitting SLR models in R

The task of computing ˆβ0 and ˆβ1 is commonly called “fitting” the model. Although we could explicitly compute the values using equations (3.4) and (3.5), it is much simpler to let R do the computations for us.

3.3.1 OLS estimation in R

The lm() command, which stands for linear model, is used to fit a simple linear regression model in R. (As we will see in Section 8, the same function is used for multiple linear regression, too.) For simple linear regression, there are two key arguments for lm():

  • The first argument is an R formula of the form y ~ x, where x and y are variable names for the predictor and outcome variables, respectively.
  • The other key argument is the data= argument. It should be given a data.frame or tibble that contains the x and y variables as columns. Although not strictly required, the data argument is highly recommended. If data is not provided, the values of x and y are taken from the global environment; this often leads to unintended errors.

Example 3.1 To compute the OLS estimates of β0 and β1 for the penguin data, with flipper length as the predictor and body mass as the outcome, we can use the following code:

library(palmerpenguins)
penguin_lm <- lm(body_mass_g ~ flipper_length_mm,
   data=penguins)

Here, we have first loaded the palmerpenguins package so that the penguins data are available in the workspace. In the call to lm(), we have directly supplied the variable names (body_mass_g and flipper_length_mm) in the formula argument, which is unnamed. For the data argument, we have supplied the name of the tibble containing the data. It is good practice to save the output from lm() as an object, which facilitates retrieving additional information from the model.

To see the point estimates, we can simply call the fitted object and it will print the model that was fit and the calculated coefficients:

penguin_lm
## 
## Call:
## lm(formula = body_mass_g ~ flipper_length_mm, data = penguins)
## 
## Coefficients:
##       (Intercept)  flipper_length_mm  
##          -5780.83              49.69

To see more information about the model, we can use the summary() command:

summary(penguin_lm)
## 
## Call:
## lm(formula = body_mass_g ~ flipper_length_mm, data = penguins)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1058.80  -259.27   -26.88   247.33  1288.69 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -5780.831    305.815  -18.90   <2e-16 ***
## flipper_length_mm    49.686      1.518   32.72   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 394.3 on 340 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.759,  Adjusted R-squared:  0.7583 
## F-statistic:  1071 on 1 and 340 DF,  p-value: < 2.2e-16

Over the following chapters, we will take a closer look at what these others quantities mean and how they can be used.

Sometimes, we might need to store the model coefficients for later use or pass them as input to another function. Rather than copy them by hand, we can use the coef() command to extract them from the fitted model:

coef(penguin_lm)
##       (Intercept) flipper_length_mm 
##       -5780.83136          49.68557

Although the summary() command provides lots of detail for a linear model, sometimes it is more convenient to have the information presented as a tibble. The tidy() command, from the broom package, produces a tibble with the parameter estimates.

library(broom)
tidy(penguin_lm)
## # A tibble: 2 × 5
##   term              estimate std.error statistic   p.value
##   <chr>                <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)        -5781.     306.       -18.9 5.59e- 55
## 2 flipper_length_mm     49.7      1.52      32.7 4.37e-107

The estimates can then be extracted from the estimate column.

tidy(penguin_lm)$estimate
## [1] -5780.83136    49.68557

3.3.2 Plotting SLR Fit

To plot the SLR fit, we can add the line as a layer to a scatterplot of the data. This can be done using the geom_abline() function, which takes slope= and intercept= arguments inside aes().

Example 3.2 To plot the simple linear regression line of Example 3.1, we first create the scatterplot of values:

g_penguin <- ggplot() + theme_bw() + 
  geom_point(aes(x=flipper_length_mm,
                 y=body_mass_g),
             data=penguins) + 
    xlab("Flipper Length (mm)") +
    ylab("Body Mass (g)")
g_penguin

And then can add the regression line to it:

g_penguin_lm <- g_penguin +
    geom_abline(aes(slope=coef(penguin_lm)[2],
                  intercept=coef(penguin_lm)[1]))
g_penguin_lm

It’s important to remember to not type the numbers in directly. Not only does that lead to a loss of precision, it’s also greatly amplifies the chances of a typo. Instead, in the code above, we have used the coef() command to extract the slope and intercept estimates.

An alternative approach, which doesn’t actually require directly fitting the model, is to use the geom_smooth() command. To use this approach, provide that function

  • an x and y variable inside the aes(),
  • the dataset via the data argument
  • set method="lm" (Note the quotes), which tells geom_smooth() that you want the linear regression fit
  • optionally set se=FALSE, which will hide the shaded uncertainty. We will later see how these are calculated in Section 5.

For example:

g_penguin_lm2 <-  g_penguin +
  geom_smooth(aes(x=flipper_length_mm,
                  y=body_mass_g),
              data=penguins,
              method="lm",
              se=FALSE)
g_penguin_lm2

What happens if you forget method="lm"? In that case, the geom_smooth() function uses splines:

g_penguin +
  geom_smooth(aes(x=flipper_length_mm,
                  y=body_mass_g),
              data=penguins)

This an extremely useful method for visualizing the trend in a dataset. Just note that it is not the regression line.

Example 3.3 Consider the bike sharing data from Section 1.3.4 and Figure 1.9. After fitting an SLR model with temperature as the predictor variable and number of registered users as the outcome, we obtain the line:

ˆy=1370.9+112.4x We can interpret these coefficients as:

  • The estimated average number of registered users active in the bike sharing program on a day with high temperature 0 degrees Celsius is 1,371.
  • A difference of one degree Celsius in daily high temperature is associated with an average of 112 more registered users actively renting a bike.

3.3.3 OLS estimation in R with binary x

If the predictor variable is a binary variable such as sex, then we need to create a corresponding 0-1 variable for fitting the model (recall Section 2.3.7). Conveniently, R will automatically create a 0-1 indicator from a binary categorical variable if that variable is included as a predictor in lm().

Example 3.4 (Continuation of Example 2.7.) Compute the OLS estimates of β0 and β1 for a linear regression model with penguin body mass as the outcome and penguin sex as the predictor variable.

We first restrict the dataset to penguins with known sex:

penguins_nomissingsex <- penguins %>%
  filter(!is.na(sex))

This ensures that our predictor can only take on two values (we will look at predictors with 3+ categories in Section 11).

We then call lm() using this dataset:

penguin_lm2 <- lm(body_mass_g~sex,
                  data=penguins_nomissingsex)
coef(penguin_lm2)
## (Intercept)     sexmale 
##   3862.2727    683.4118

Note that R is providing an estimate labeled sexmale. The 0-1 indicator created by R automatically created the variable: sexmale={0 if sex is "female"1 if sex is "male". In equation (3.6), female penguins were the reference category and received a value of 0. R chose female as the reference category (instead of male) simply because of alphabetical order.

We can write the following three interpretations of ˆβ0, ˆβ0+ˆβ1, and ˆβ1, respectively:

  • The estimated average body mass for female penguins is 3,862 grams.
  • The estimated average body mass for males penguins is 4,546 grams.
  • The estimated difference in average body mass of penguins, comparing males to females, is 683 grams, with males having larger average mass.

3.4 Properties of Least Squares Estimators

3.4.1 Why OLS?

The reason OLS estimators ˆβ0 and ˆβ1 are so widely used is not only because they can be easily computed from the data, but also because they have good properties. Since ˆβ0 and ˆβ1 are random variables, they have a corresponding distribution.

We can compute the mean of ˆβ1: E[ˆβ1]=E[SxyS2x]=1S2xE[Sxy](S2x is not random)=1S2xE[ni=1(xi¯x)yi]=1S2xni=1(xi¯x)E[yi]=1S2xni=1(xi¯x)(β0+β1xi)=β0ni=1(xi¯x)S2x+β1ni=1(xi¯x)xiS2x=β0n¯xn¯xS2x+β1ni=1x2i1nni=1xinj=1xjS2x=0+β1S2xS2x=β1 This means that ˆβ1 is an unbiased estimator of β1. This is important since it means that, on average, using the OLS estimator ˆβ1 will give us the “right” answer. Similarly, we can show that ˆβ0 is unbiased: E[ˆβ0]=E[1nni=1yiˆβ11nni=1xi]=1nni=1E[yi]E[ˆβ1]1nni=1xi=1nni=1(β0+β1xi)β11nni=1xi=1nni=1β0+β1(1nni=1xi1ni=1xi)=β0

However, it’s important to note that just because E[ˆβ1]=β1 does not mean that ˆβ1=β1. For a single data set, there is no guarantee that ˆβ1 will be near β1. One way to measure how close ˆβ1 is going to be to β1 is to compute its variance. The smaller the variance, the more likely that ˆβ1 will be close to β1.

We can compute the variance of ˆβ1 explicitly: Var[ˆβ1]=Var[SxyS2x]=1(S2x)2Var[ni=1(xi¯x)yi]=1(S2x)2ni=1Var[(xi¯x)yi](Uncorrelated)=1(S2x)2ni=1(xi¯x)2Var[yi]=1(S2x)2ni=1(xi¯x)2σ2=1S2xσ2

It can also be shown that Var[ˆβ0]=σ2(1n+¯x2S2x). Both ˆβ0 and ˆβ1 are linear estimators, because they are linear combinations of the data yi: ˆβ1=SxyS2x=ni=1(xi¯x)S2xyi=ni=1ciyi

The Gauss-Markov Theorem states that the OLS estimator has the smallest variance among linear unbiased estimators. It is the BLUE – Best Linear Unbiased Estimator.

In essence, the Gauss-Markov Theorem tells us that the OLS estimators are best we can do, at least as long as we only consider unbiased estimators. (We’ll see an alternative to this in Chapter 15).

3.5 Estimating σ2

3.5.1 Not Regression: Estimating Sample Variance

Before discussing how to estimate σ2 in SLR, it’s helpful to first recall how we estimate the variance from a sample of a single random variable.

Suppose y1,y2,,yn are iid samples from a distribution with mean μ and variance σ2. The estimated mean is: ¯y=1nni=1yi
and the estimated variance is: s2=1n1ni=1(yi¯y)2

In s2, why do we divide by n1? To understand this, first consider what would happen if we divided by n instead: E[1nni=1(Yi¯Y)2]=E[1nni=1(Yiμ+μ¯Y)2]=E[1nni=1((Yiμ)2+2(Yiμ)(μ¯Y)+(μ¯Y)2)]=1nni=1E[(Yiμ)2]2nE[ni=1(Yiμ)(¯Yμ)]+1nE[(¯Yμ)2]=1nni=1σ22nE[ni=1(Yiμ)1nnj=1(Yjμ)]+1nni=1σ2n=σ22n2ni=1nj=1E[(Yiμ)(Yjμ)]+σ2/n=σ22n2ni=1E[(Yiμ)2]2n2ni=1nj=1,jiE[(Yiμ)(Yjμ)]+σ2/n=σ22nσ2+1nσ2=σ2(11n)=n1nσ2 This estimator is biased by a factor of n1n. If we instead divide by n1, then we obtain an unbiased estimator: E[1n1ni=1(Yi¯Y)2]=nn1E[1nni=1(Yi¯Y)2]=nn1n1nσ2=σ2

That explains the mathematical reason for dividing by n1. But what is the conceptual rationale? The answer is degrees of freedom. The full sample contains n values. Computing the estimator s2 requires first computing the sample mean ¯y. This “uses up” one degree of freedom, meaning that only n1 degrees of freedom are left when computing s2.

This is perhaps easier to see with an example. Suppose n=5. If ¯y=7 and y1=3,y2=1,y3=8,y4=10. What is y5? It must be that y5=10. Knowing ¯y determines ni=1yi. In this way, computing ¯y puts a constraint on what the possible values of the data can be.

3.5.2 Estimating σ2 in SLR

Now, back to SLR. We want to estimate σ2. Because Var(ϵi)=σ2 and E[ϵi]=0, it follows that E[ϵ2i]=Var(ϵi)+E[ϵi]2=σ2+0=σ2. Thus, an estimate of E[ϵ2i] is an estimate of σ2. A natural estimator of E[ϵ2i] is ˆσ2=1n2ni=1e2i Why divide by n2 here? Since ei=yi(ˆβ0+ˆβ1xi), there are 2 degrees of freedom used up in computing the residuals ei.

  • ni=1e2i=SSres is called the “sum of squared residuals” or “residual sum of squares”

  • ˆσ2=MSres is called “mean squared residuals”

3.5.3 Estimating σ2 in R

When summary() is called on an lm object, the sigma slot provides ˆσ for you:

summary(penguin_lm)$sigma
## [1] 394.2782
summary(penguin_lm)$sigma^2
## [1] 155455.3

The value of ˆσ is also printed in the summary output:

summary(penguin_lm)
## 
## Call:
## lm(formula = body_mass_g ~ flipper_length_mm, data = penguins)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1058.80  -259.27   -26.88   247.33  1288.69 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -5780.831    305.815  -18.90   <2e-16 ***
## flipper_length_mm    49.686      1.518   32.72   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 394.3 on 340 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.759,  Adjusted R-squared:  0.7583 
## F-statistic:  1071 on 1 and 340 DF,  p-value: < 2.2e-16

It’s also possible to compute this value “by hand” in R:

sig2hat <- sum(residuals(penguin_lm)^2)/(nobs(penguin_lm)-2)
sig2hat
## [1] 155455.3
sqrt(sig2hat)
## [1] 394.2782

3.6 Exercises

Exercise 3.1 Fit an SLR model to the bill size data among Gentoo penguins, using bill length as the predictor and bill depth as the outcome. Find ˆβ0 and ˆβ1 using the formulas and then compare to the value calculated by lm().

Exercise 3.2 Using the mpg data (see Section 1.3.5), use lm() to find the estimates of ˆβ0 and ˆβ1 for a SLR model with city highway miles per gallon as the outcome variable and engine displacement as the predictor variable. Provide a one-sentence interpreation of each estimate.

Exercise 3.3 Compute ˆσ2 for the model in Exercise 3.2 using the formulas. Compare to the value calculated from the lm object.

Exercise 3.4 Consider the housing price data introduced in Section 1.3.3, with housing price as the outcome and median income as the predictor variable. Find the OLS estimates for a simple linear regression analysis of this data and provide a 1-sentence interpretation for each.


  1. In most cases, we suppress the dependence on ˆβ0 and ˆβ1 and just write SSRes. But the dependence is included here to make the optimization explicit.↩︎