
If you’d like to export this presentation to a PDF, do the following
This feature has been confirmed to work in Google Chrome and Firefox.
Recall that if we’re using a sample of data to try and model the true relationship between two quantitative variables, then the intercept, \(b_0\), and slope, \(b_1\), estimates are random variables.

\(\hat{y} = 1994.39 + 46.68 x\)

\(\hat{y} = 1191.82 + 62.98 x\)
The LSRL is based on sampled data, so \[\hat{y} = b_0 + b_1x\] is the estimate for the true population regression equation \[y = \beta_0 + \beta_1 x + \varepsilon\]
\(b_0\) is the point estimate for \(\beta_0\)
\(b_1\) is the point estimate for \(\beta_1\)
\(\varepsilon\) is the error (variability around the regression line)
Our focus from here on out will be on inference about the slope parameter, \(\beta_1\).
Consider the LSRL for the relationship between the percentage a state’s population with a high school diploma and per capita income.
\(\hat{y} = 1994.39 + 46.68 x\)

The slope in the LSRL, \(46.68\), is an estimate for the true population regression line.
We might wonder, do these data provide strong evidence that the percentage of HS graduates is useful predictor of a state’s per capita income?
Frame this question of interest into a hypothesis test:
\(H_0:\) \(\beta_1 = 0\) The true linear model has slope zero.
\(H_A:\) \(\beta_1 \neq 0\) The true linear model has a slope different than zero. The explanatory variable is good predictor of the response.
R Code:
Call:
lm(formula = Income ~ HSGrad, data = state_30)
Residuals:
Min 1Q Median 3Q Max
-1113.99 -314.05 36.65 389.53 863.22
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1994.39 634.62 3.143 0.003935 **
HSGrad 46.68 12.18 3.832 0.000658 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 537.3 on 28 degrees of freedom
Multiple R-squared: 0.344, Adjusted R-squared: 0.3206
F-statistic: 14.68 on 1 and 28 DF, p-value: 0.0006581

Perform a hypothesis test on the slope parameter, \(\beta_1\), using a significance level of \(\alpha = 0.05\).
The p-value for the hypothesis test on the slope parameter is \(0.000658\).
Since \(0.000658 < \alpha\), we will reject the null hypothesis.
There is convincing evidence that the percentage of high school graduates is a useful predictor of a state’s per capita income in the 1970s.
In addition to performing a hypothesis test about the slope parameter, we can provide a measure of uncertainty about the estimate for \(\beta_1\) by constructing a confidence interval for the parameter.
\[b_1 \pm t^* \times SE_{b_1}\]
The standardized estimate for \(\beta_1\), \(\frac{b_1}{SE_{b_1}}\), follows a t-distribution with \(n-2\) degrees of freedom. We will use this distribution to determine \(t^*\).
qt(p, df)
where p corresponds to the area under the t-distribution curve to the left of the critical value and df \(n-2\).
Construct the 95% confidence interval for the slope parameter of high school graduation rate on life expectancy.
For \(b_1\) and \(SE_{b_1}\), use the model output in R:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1994.38858 634.62049 3.142648 0.0039352579
HSGrad 46.68049 12.18172 3.832013 0.0006580844
Compute the lower and upper bounds of the interval:
We are 95% confident that for each additional percentage of the population with a high school diploma, per capita income is expected to increase by $21.72 to $71.63, with a point estimate for the increase of $46.68.
# Open the tidyverse library
library(tidyverse)
# Import the dataset, first need to download the data from Canvas
state_30 <- read_csv(file.choose())
# Create a scatterplot of the Illiteracy and LifeExp variables
state_30 |>
ggplot(aes(x = `HS Grad`, y = Income)) +
geom_point(color = viridis::viridis(6)[5], size = 3) +
labs(y = "Per Capita Income",
x = "Percentage of Population with High School Diploma",
title = "High School Graduation Rates vs. Income ",
subtitle = "for 30 US States in 1970") +
theme(axis.title = element_text(size = 18)) +
theme_bw() +
stat_smooth(method = "lm",
formula = y ~ x,
geom = "smooth",
se = FALSE,
color = viridis::viridis(6)[5])
# Rename HS Grad variable
state_30 <- state_30 |>
rename("HSGrad"=`HS Grad`)
# Estimate intercept and slope for LSRL
LSRL <- lm(Income ~ HSGrad, data = state_30)
summary(LSRL)Please complete the Class 19 Activity - SLR in Canvas.