Topic 7: Inference for the Difference in Means

Difference of Two Means

Motivating Example: A study is done by a community group to determine if the students at two different universities (A and B) graduate having taken a different number of math courses, on average.

The data: To answer this question, the group randomly samples students from each university.

University A samples 46 graduates. Their average is four math classes with a standard deviation of 1.5 math classes.
University B samples 32 graduates. Their average is 3.5 math classes with a standard deviation of one math class.

Estimation

A Confidence Interval for the Difference in Means, \(\mu_1-\mu_2\)

Calculate the point estimate, \(\overline{x}_1 - \overline{x}_2\)
Identify the confidence level, \(CL\), and the error associated with this confidence level \(100-CL\).
Determine the critical value, \(t^*\), by finding the \(CL + \frac{100-CL}{2}\) percentile on the t distribution with \(\nu\) degrees of freedom (df).
- \(\nu\) represents the Satterthwaite degrees of freedom (see next slide)
Calculate the standard error estimate from the observed sample:

\[\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}\]

Sattherthwaite Degrees of Freedom, \(\nu\)

Link to this calculator and an R script alternative is available in the Week 7 Canvas module.

Finding t Critical Values using `qt()`

qt(p, df) calculates the value on a t distribution curve with df degrees of freedom that has an area of p to the left of it.

Example: To find the critical value needed to construct a 99% confidence interval for the difference in means when \(s_1 = 1.5\), \(n_1 = 46\), \(s_2 = 1\), \(n_2 = 32\):

qt(0.995, 75.897) = 2.642

Confidence Interval Construction for \(\mu_1-\mu_2\)

(\(\overline{x}_1 - \overline{x}_2)\) \(\pm\) \(t^*_{\nu}\) \(\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}\)

Motivating Example:

(\(4-3.5)\) \(\pm\) \(2.645\) \(\sqrt{\frac{1.5^2}{46}+\frac{1^2}{32}}\)\(= (-0.248, 1.248)\)

Practice! 🐓

Answer questions 1-6 on the Class 15 Activity - Difference in Means activity on Canvas.

Leave the activity open. We’ll come back to it.

07:00

Hypothesis Testing

Testing for a Difference in Means

We can perform a formal hypothesis test to answer questions concerning a difference in two population means.
We’ll use the same steps introduced in our Introduction to Hypothesis Testing notes, but some of the details will differ in this new scenario.

1. State the question of interest

Motivating Example:

Is there a difference in the average number of math courses taken between graduates at University A and University B?

2. Identify the parameter of interest

Motivating Example:

The difference in the average number of math courses taken, \(\mu_A - \mu_B\)

3. State the null and alternative hypotheses

Null hypothesis

\[H_0: \mu_1 = \mu_2\]

Alternative hypothesis:

This depends on the question of interest.

Lower one-sided

Question of interest: Is the mean of population 1 less than the mean of population 2?

\(H_A: \mu_1 <\mu_2\)

Upper one-sided

Question of interest: Is the mean of population 1 greater than the mean of population 2?

\(H_A: \mu_1 >\mu_2\)

Two-sided

Question of interest: Is the mean of population 1 different from the mean of population 2?

\(H_A: \mu_1 \neq\mu_2\)

3. State the null and alternative hypotheses

Motivating Example:

\(H_0:\)\(\mu_A = \mu_B\)

\(H_A:\mu_A \neq \mu_B\)

Practice! 🐓

Answer questions 7-8 on the Class 15 Activity - Difference in Means activity on Canvas.

Leave the activity open. We’ll come back to it.

02:00

4. Using the sampled data and the alternative hypothesis, determine what values would be considered “as or more extreme” than the observed sampled statistic.

Lower one-sided: \(H_A: \mu_1<\mu_2\)

Upper one-sided: \(H_A: \mu_1 >\mu _2\)

Two-sided: \(H_A: \mu_1 \neq \mu_2\)

4. Using the sampled data and the alternative hypothesis, determine what values would be considered “as or more extreme” than the observed sampled statistic.

Motivating Example:

\(\overline{x}_A - \overline{x}_B = 4 - 3.5 = 0.5\)

Any difference in sample means greater than 0.5 in magnitude would be considered as or more extreme than the observed sampled statistic.

5. Determine the Null Distribution

If the sample sizes are sufficiently large, under the null hypothesis, the distribution of the test statistic used in testing the difference between two population means is a

t distribution with \(\nu\) degrees of freedom.

\(\nu\) represents the Satterthwaite degrees of freedom.

How large do the samples need to be?

If both \(n_1\geq 30\) and \(n_2 \geq 30\), we can move forward.
If either \(n_1 < 30\) or \(n_2 < 30\), we need to look at the sampled distribution(s) of the small sample(s). If there are no clear outliers or strong skewness in the sampled data, we can move forward. If either sampled distribution suggests skewness or outliers, we should not proceed.

5. Determine the Null Distribution

Motivating Example:

Are the sample sizes sufficiently large?

Yes, both \(n_1\geq\) and \(n_2\geq30\).

So the null distribution is at distribution with 75.897 degrees of freedom.

6. Calculate the test statistic

When testing the difference in population means, \(\mu_1\) vs. \(\mu_2\), the test statistic is

\[t = \frac{\overline{x}_1-\overline{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\]

Participation Question 📊

Compute the test statistic for our motivating example problem.

Answer the question at PollEv.com/erinhowardstats

02:00

6. Calculate the test statistic

When testing the difference in population means, \(\mu_1\) vs. \(\mu_2\), the test statistic is

\[t = \frac{\overline{x}_1-\overline{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\]

Motivating Example:

\[t = \frac{4-3.5}{\sqrt{\frac{1.5^2}{46} + \frac{1^2}{32}}} = 1.766\]

7. Calculate the p-value using the test statistic and null distribution.

Lower one-sided:

\(H_A: \mu_1<\mu_2\)

Upper one-sided:

\(H_A: \mu_1>\mu_2\)

Two-sided:

\(H_A: \mu_1\neq \mu_2\)

R code:

pt(t, df)

R code:

1-pt(t, df))

R code:

2*(1-pt(abs(t), df))

where df is the Satterthwaite degrees of freedom, \(\nu\)

Participation Question 📊

Answer the question at PollEv.com/erinhowardstats

01:30

7. Calculate the p-value using the test statistic and null distribution.

Motivating Example:

2*(1-pt(abs(1.766), 75.897)) = 0.0814

8. Make a conclusion

Write a 4-part conclusion. The conclusion should be written in the context of the problem and contain the following components:

A statement for the strength of evidence in favor the alternative hypothesis.
Whether to reject or fail to reject the null hypothesis.
The point estimate for the parameter of interest.
A \((1-\alpha)100\%\) confidence interval estimate for the parameter of interest.

8. Make a conclusion

A statement in terms of the alternative hypothesis

Using terms like “reject” and “fail to reject the null” may be confusing to novice readers.
We’ll provide a more complete conclusion by providing a statement of evidence in terms of the alternative hypothesis that reflects the question of interest.

8. Make a conclusion

Motivating Example: Write a 4-part conclusion with a \(\alpha=0.01\) significance level.

1. There is slightly suggestive evidence that the graduates at University A and University B take a different number of math courses, on average.

2. At the \(\alpha=0.01\) significance level, we fail to reject the null hypothesis.

3. and 4. We are 99% confident that the students at University A take 0.248 fewer to 1.248 more math courses than students at University B on average, with an estimated difference in the average number of math courses of 0.5.

Topic 7: Inference for the Difference in Means

Difference of Two Means

Estimation

A Confidence Interval for the Difference in Means, \(\mu_1-\mu_2\)

Sattherthwaite Degrees of Freedom, \(\nu\)

Finding t Critical Values using qt()

Confidence Interval Construction for \(\mu_1-\mu_2\)

Practice! 🐓

Hypothesis Testing

Testing for a Difference in Means

1. State the question of interest

2. Identify the parameter of interest

3. State the null and alternative hypotheses

3. State the null and alternative hypotheses

Practice! 🐓

5. Determine the Null Distribution

5. Determine the Null Distribution

6. Calculate the test statistic

Participation Question 📊

6. Calculate the test statistic

7. Calculate the p-value using the test statistic and null distribution.

Participation Question 📊

7. Calculate the p-value using the test statistic and null distribution.

8. Make a conclusion

8. Make a conclusion

A statement in terms of the alternative hypothesis

8. Make a conclusion

Finding t Critical Values using `qt()`