If you’d like to export this presentation to a PDF, do the following
This feature has been confirmed to work in Google Chrome and Firefox.
In each of the side-by-side boxplots below, you’ll see data sampled from three different populations. The red dot in each plot corresponds to the sample mean.
Example 1
🎩 In Top Hat, comment on whether you think the means of the populations from which the above three samples came are the same, similar, or significantly different from one another.
Example 2
🎩 Again, in Top Hat, comment on whether you think the means of the populations from which the above three samples came are the same, similar, or significantly different from one another.
An ANOVA is a holistic procedure used to test whether there is evidence that at least one pair of populations have different means when comparing more than two populations.
Why is it called an Analysis of Variance if the goal is to compare means?
\(k=\) number of groups (i.e. number of populations of interest)
\(n=\) overall sample size (i.e. size of all samples combined)
\(\overline{x}=\) overall sample mean (i.e. mean of all observations ignoring groups)
\(n_i=\) sample size of the \(i\)th group
\(\mu_i=\) population mean of the \(i\)th group
\(\overline{x}_i=\) sample mean of the \(i\)th group
\(s_i=\) sample standard deviation of the \(i\)th group
Are there differences in the means of the multiple populations?
\(\mu_1, \mu_2, ..., \mu_k\), the population means of the \(k\) groups
\(H_0: \mu_1 = \mu_2 = ... = \mu_k\)
\(H_A:\) At least one mean is different
In order to answer our question of interest, we must compare the variability between groups to the variability within groups.
Mean Square Between Groups (MSG)
\[MSG = \frac{1}{k-1}\sum \limits_{i=1}^k n_i (\overline{x}_i - \overline{x})^2\]
Mean Square Error (MSE)
\[MSE = \frac{1}{n-k}\sum \limits_{i=1}^k (n_i-1)s_i^2 \]
The test statistic is the ratio of the average between group variability to the average within group variability \[F = \frac{MSG}{MSE}\]
There are two conditions that need to be met in order to assume the upcoming null distribution:
If \(n_i\geq 30\), we can move forward.
If any of the sample sizes are less than 30, we need to look at the sampled distribution(s) of the small sample(s). If there are no clear outliers or strong skewness in the sampled data, we can move forward.
If the above conditions are met, under the null hypothesis, the test statistic, \(F = \frac{MSG}{MSE}\), follows an F distribution with \(k-1\) and \(n-k\) degrees of freedom.
If the above conditions are met, under the null hypothesis, the test statistic, \(F = \frac{MSG}{MSE}\), follows an F distribution with \(k-1\) and \(n-k\) degrees of freedom.
The \(F\) distribution is right skewed whose support is \((0, \infty)\)
Its shape is defined by two values:
The numerator degrees of freedom, \(k-1\)
The denonminator degrees of freedom, \(n-k\)
Denoted: \(F_{k-1, n-k}\)
Any ratio of \(MSG\) to \(MSE\) greater than the calculated \(F\) test statistic would be considered “as or more extreme” than our observed data.
Recall that the p-value represents the probability of observing data as or more extreme than our current dataset according to the alternative hypothesis, if the null hypothesis were true.
In an ANVOA F test, the p-value is always the area under the null distribution curve to the right of the F test statistic.
R code: 1-pf(F, k-1, n-k)
Write a 2-part conclusion (we’ll exclude a single point estimate and confidence interval). The conclusion should be written in the context of the problem and contain the following components:
A statement for the strength of evidence in favor the alternative hypothesis.
Whether to reject or fail to reject the null hypothesis.
Complete the Class 17 Practice - ANOVA assignment in Top Hat.