
If you’d like to export this presentation to a PDF, do the following
This feature has been confirmed to work in Google Chrome and Firefox.
Recall that if we’re using a sample of data to try and model the true relationship between two quantitative variables, then the intercept, \(b_0\), and slope, \(b_1\), estimates are random variables.

\(\hat{y} = 1994.39 + 46.68 x\)

\(\hat{y} = 1191.82 + 62.98 x\)
The LSRL is based on sampled data, so \[\hat{y} = b_0 + b_1x\] is the estimate for the true population regression equation \[y = \beta_0 + \beta_1 x + \varepsilon\]
\(b_0\) is the point estimate for \(\beta_0\)
\(b_1\) is the point estimate for \(\beta_1\)
\(\varepsilon\) is the error (variability around the regression line)
Our focus from here on out will be on inference about the slope parameter, \(\beta_1\).
Consider the LSRL for the relationship between the percentage a state’s population with a high school diploma and per capita income.
\(\hat{y} = 1994.39 + 46.68 x\)

The slope in the LSRL, \(46.68\), is an estimate for the true population regression line.
We might wonder, do these data provide strong evidence that the percentage of HS graduates is useful predictor of a state’s per capita income?
Frame this question of interest into a hypothesis test:
\(H_0:\) \(\beta_1 = 0\) The true linear model has slope zero.
\(H_A:\) \(\beta_1 \neq 0\) The true linear model has a slope different than zero. The explanatory variable is good predictor of the response.
R Code:
Call:
lm(formula = Income ~ HSGrad, data = state_30)
Residuals:
Min 1Q Median 3Q Max
-1113.99 -314.05 36.65 389.53 863.22
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1994.39 634.62 3.143 0.003935 **
HSGrad 46.68 12.18 3.832 0.000658 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 537.3 on 28 degrees of freedom
Multiple R-squared: 0.344, Adjusted R-squared: 0.3206
F-statistic: 14.68 on 1 and 28 DF, p-value: 0.0006581

Perform a hypothesis test on the slope parameter, \(\beta_1\), using a significance level of \(\alpha = 0.05\).
The p-value for the hypothesis test on the slope parameter is \(0.000658\).
Since \(0.000658 < \alpha\), we willreject the null hypothesis.
There isconvincing evidence that the percentage of high school graduates is a useful predictor of a state’s per capita income in the 1970s.
# Open the tidyverse library
library(tidyverse)
# Import the dataset, first need to download the data from Canvas
state_30 <- read_csv(file.choose())
# Create a scatterplot of the Illiteracy and LifeExp variables
state_30 |>
ggplot(aes(x = `HS Grad`, y = Income)) +
geom_point(color = viridis::viridis(6)[5], size = 3) +
labs(y = "Per Capita Income",
x = "Percentage of Population with High School Diploma",
title = "High School Graduation Rates vs. Income ",
subtitle = "for 30 US States in 1970") +
theme(axis.title = element_text(size = 18)) +
theme_bw() +
stat_smooth(method = "lm",
formula = y ~ x,
geom = "smooth",
se = FALSE,
color = viridis::viridis(6)[5])
# Rename HS Grad variable
state_30 <- state_30 |>
rename("HSGrad"=`HS Grad`)
# Estimate intercept and slope for LSRL
LSRL <- lm(Income ~ HSGrad, data = state_30)
summary(LSRL)Please complete the Class 19 Activity in Top Hat.