Topic 8: Analysis of Variance (ANOVA)

Download or print notes to PDF

If you’d like to export this presentation to a PDF, do the following

Toggle into Print View using the E key.
Open the in-browser print dialog (CTRL/CMD+P)
Change the Destination to Save as PDF.
Change the Layout to Landscape.
Change the Margins to None.
Enable the Background graphics option.
Click Save.

This feature has been confirmed to work in Google Chrome and Firefox.

Motivating Exercises

In each of the side-by-side boxplots below, you’ll see data sampled from three different populations. The red dot in each plot corresponds to the sample mean.

Example 1

🎩 In Top Hat, comment on whether you think the means of the populations from which the above three samples came are the same, similar, or significantly different from one another.

Example 2

🎩 Again, in Top Hat, comment on whether you think the means of the populations from which the above three samples came are the same, similar, or significantly different from one another.

Analysis of Variance (ANOVA)

An ANOVA is a holistic procedure used to test whether there is evidence that at least one pair of populations have different means when comparing more than two populations.
Why is it called an Analysis of Variance if the goal is to compare means?
- In an ANOVA, we’ll compare the variability across groups to the variability within groups.

Notation

\(k=\) number of groups (i.e. number of populations of interest)
\(n=\) overall sample size (i.e. size of all samples combined)
\(\overline{x}=\) overall sample mean (i.e. mean of all observations ignoring groups)
\(n_i=\) sample size of the \(i\)th group
\(\mu_i=\) population mean of the \(i\)th group
\(\overline{x}_i=\) sample mean of the \(i\)th group
\(s_i=\) sample standard deviation of the \(i\)th group

ANOVA Procedure Steps

1. Question of interest

Are there differences in the means of the multiple populations?

2. Parameters of interest

\(\mu_1, \mu_2, ..., \mu_k\), the population means of the \(k\) groups

ANOVA Procedure Steps

3. Null and Alternative Hypotheses

\(H_0: \mu_1 = \mu_2 = ... = \mu_k\)

\(H_A:\) At least one mean is different

ANOVA Procedure Steps

4. The \(F\) test statistic

In order to answer our question of interest, we must compare the variability between groups to the variability within groups.

Mean Square Between Groups (MSG)

Measures the average variability between groups

\[MSG = \frac{1}{k-1}\sum \limits_{i=1}^k n_i (\overline{x}_i - \overline{x})^2\]

Mean Square Error (MSE)

Measures the average variability within groups

\[MSE = \frac{1}{n-k}\sum \limits_{i=1}^k (n_i-1)s_i^2 \]

The test statistic is the ratio of the average between group variability to the average within group variability \[F = \frac{MSG}{MSE}\]

ANOVA Procedure Steps

5. Determine the Null Distribution

There are two conditions that need to be met in order to assume the upcoming null distribution:

The samples sizes must be sufficiently large.

If \(n_i\geq 30\), we can move forward.
If any of the sample sizes are less than 30, we need to look at the sampled distribution(s) of the small sample(s). If there are no clear outliers or strong skewness in the sampled data, we can move forward.

We need to be able to assume constant variance across the groups.

To assess this condition, we should use the sampled distributions and make a judgement as to whether the standard deviations are similar between samples.

If the above conditions are met, under the null hypothesis, the test statistic, \(F = \frac{MSG}{MSE}\), follows an F distribution with \(k-1\) and \(n-k\) degrees of freedom.

ANOVA Procedure Steps

5. Determine the Null Distribution

If the above conditions are met, under the null hypothesis, the test statistic, \(F = \frac{MSG}{MSE}\), follows an F distribution with \(k-1\) and \(n-k\) degrees of freedom.

The \(F\) distribution is right skewed whose support is \((0, \infty)\)
Its shape is defined by two values:
- The numerator degrees of freedom, \(k-1\)
- The denonminator degrees of freedom, \(n-k\)
Denoted: \(F_{k-1, n-k}\)

ANOVA Procedure Steps

6. Using the sampled data and the alternative hypothesis, determine what values would be considered “as or more extreme” than the observed sampled statistic.

Any ratio of \(MSG\) to \(MSE\) greater than the calculated \(F\) test statistic would be considered “as or more extreme” than our observed data.

ANOVA Procedure Steps

7. Calculate the p-value

Recall that the p-value represents the probability of observing data as or more extreme than our current dataset according to the alternative hypothesis, if the null hypothesis were true.

In an ANVOA F test, the p-value is always the area under the null distribution curve to the right of the F test statistic.

R code: 1-pf(F, k-1, n-k)

ANOVA Procedure Steps

8. Make a conclusion

Write a 2-part conclusion (we’ll exclude a single point estimate and confidence interval). The conclusion should be written in the context of the problem and contain the following components:

A statement for the strength of evidence in favor the alternative hypothesis.
Whether to reject or fail to reject the null hypothesis.

Practice!

Complete the Class 17 Practice - ANOVA assignment in Top Hat.