Topic 4 - The Normal Distribution and Sampling Variability

Download or print notes to PDF

If you’d like to export this presentation to a PDF, do the following

  1. Toggle into Print View using the E key.
  2. Open the in-browser print dialog (CTRL/CMD+P)
  3. Change the Destination to Save as PDF.
  4. Change the Layout to Landscape.
  5. Change the Margins to None.
  6. Enable the Background graphics option.
  7. Click Save.

This feature has been confirmed to work in Google Chrome and Firefox.

The Normal Distribution

Normal Distribution

  • When to use:

Useful when modeling a continuous random variable that has a bell-shaped distribution.

  • Parameters of the distribution:

\(\mu\): mean - determines the center of the distribution

\(\sigma\): standard deviation - determines the spread of the distribution

  • Probability Density Function:

\[f(x) = \frac{1}{\sqrt{2\pi \sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\] for \(x\) in \((-\infty, \infty)\)

 

  • Expectation: \(E(X) = \mu\)

  • Variance:\(Var(X) = \sigma^2\)

Standard Normal Distribution

The Standard Normal Distribution is a Normal distribution with mean \(\mu=0\) and standard deviation \(\sigma=1\).

\[Z \sim N(0, 1)\]

Standardizing Any Normal Random Variable

For a Normal random variable, \(X\), a z-score represents the number of standard deviations any observation \(x\) is from the mean.

\[z = \frac{x-\mu}{\sigma}\]

Example

For a particular bridge, recorded vehicle speeds are normally distributed with a mean of 58 mph and a standard deviation of 10 mph. Suppose a randomly chosen vehicle is going 40 miles per hour.

How many standard deviations away from the mean is 40 mph?

Calculate the z-score!

\(z=\)\(\frac{x-\mu}{\sigma}\)\(=\frac{40-58}{10}\)\(=-1.8\)

 

The randomly chosen vehicle is traveling 1.8 standard deviations slower than the average vehicle on the bridge.

Example

For a particular bridge, recorded vehicle speeds are normally distributed with a mean of 58 mph and a standard deviation of 10 mph.

What is the probability of a randomly selecting a vehicle going less than 40 mph?

\[P(X < 40) = \int \limits_{-\infty}^{40} \frac{1}{\sqrt{2\pi 10^2}}e^{-\frac{(x-58)^2}{2(10^2)}} dx\]

 

We cannot solve this analytically - we must use R!

R Demonstration

 

Normal Distribution

 

\(F(x) = P(X \leq x)\):

pnorm(q, mean, sd, lower.tail = TRUE)

 

\(p^{th}\) percentile:

qnorm(p, mean, sd, lower.tail = TRUE)

Class 7 Activity

Please complete the short Class 7 Activity in Top Hat.

05:00

Sampling Distributions

Sampling Distributions Simulation

Please do the following:

Answer the one question in the google form that can be accessed in any of the following ways:

  • Typing the following URL into your browser. The URL is case sensitive. https://beav.es/GaJ

  • Find the Week 4 Survey link on Canvas under the Week 4 module

  • Scan the QR code

Key Concepts

Inferential Statistics

  • Recall that inferential statistics use information from a sample to estimate or test characteristics from a population of interest.

  • Typically, we calculate a point estimate from the sample as our best guess of the parameter of interest.

    • Naturally, our best guess for the population mean, \(\mu\), from a sample is the sample mean, \(\overline{x}\).

    • Our best guess for the population proportion, \(p\), is the sample proportion, \(\hat{p}\).

Sampling Variability

Even when robust sampling schemes are used, different samples will yield different point estimates.

Population

Sample 1

Sample 2

Sample 3

\(\hat{\theta}_1\) \(\hat{\theta}_2\) \(\hat{\theta}_3\) \(\hat{\theta}\) represents a generic point estimate.

Distributions of Inference

Population Distribution

Distribution of the entire collection of interest.

SamplED Distribution

Distribution of \(n\) observations obtained from a single sample.

SamplING Distribution

Distribution of a sample statistic, such as \(\overline{x}\) or \(\hat{p}\), from repeated samples of size \(n\) from the population.

Sampling Distributions

 

If we can’t observe the sampling distribution in real-world applications, why do we care about it?

 

Understanding the sampling distribution of commonly used statistics, such as \(\overline{x}\) and \(\hat{p}\), allows us to quantify the uncertainty in our point estimates.

Unbiased Estimators

Recall that because of sampling variability, a statistic from a sample is a random variable.

A statistic is called unbiased if its expectation is equal to the corresponding population parameter.

 

\(\overline{x}\), \(\hat{p}\), and \(s^2\) are unbiased.

 

\(E(\overline{x}) =\) \(\mu\)

\(E(\hat{p}) =\) \(p\)

\(E(s^2) =\) \(\sigma^2\)

Consistent Estimators & The Law of Large Numbers

A point estimate is called consistent if it converges in probability to its corresponding population parameter.

 

Under the Law of Large Numbers, we have that as sample size, \(n\), increases the point estimate will approach the population parameter.

 

\(\overline{x}\), \(\hat{p}\), and \(s^2\) are consistent.

Therefore, as \(n\) increases towards the size of the population

\(\overline{x} \rightarrow\) \(\mu\)

\(\hat{p} \rightarrow\) \(p\)

\(s^2 \rightarrow\) \(\sigma^2\)

Sample Size & Sampling Variability

  • The variability of the point estimate is called the standard error.

  • The standard error is the standard deviation of the sampling distribution.

  • As \(n\) increases, the standard error of the point estimate decreases.

Central Limit Theorem

When observations are independent and the sample size, \(n\), is sufficiently large, the central limit theorem states that the distributions of \(\hat{p}\) and \(\overline{x}\) are approximately Normal.

The sample size conditions (“sufficiently large”) and the details of these normal distributions differ for \(\hat{p}\) and \(\overline{x}\).

Sample Proportion, \(\hat{p}\)

\[\hat{p}\sim N\bigg(p, \sqrt{\frac{p(1-p)}{n}}\bigg)\] where \(p\) represents the population proportion

Sample Mean, \(\overline{x}\)

\[\overline{x}\sim N\bigg(\mu, \frac{\sigma}{\sqrt{n}}\bigg)\] where \(\mu\) and \(\sigma\) represent the population mean and standard deviation, respectively.

Central Limit Theorem

Sampling Distribution of Sample Proportion, \(\hat{p}\)

Central Limit Theorem

Sampling Distribution of Sample Mean, \(\overline{x}\)

CLT - Sample Size Conditions

The sample size conditions needed to apply the Central Limit Theorem differ depending on the statistic.

Sample proportion, \(\hat{p}\)

For the CLT to apply to the distribution of the sample proportion, we need the following sample size conditions to be met:

  • \(np \geq 10\)

  • \(n(1-p) \geq 10\)

Sample mean, \(\overline{x}\)

Use the sample size and observe the shape of the sampled distribution to determine if the sample size is sufficiently large:

  • If \(n\geq 30\), we can typically assume the sampling distribution of \(\overline{x}\) is approximately Normal and the CLT applies.

  • If \(n < 30\), we need to look at the sampled distribution. If there are no clear outliers or strong skewness in the sampled data, we can assume the sampling distribution of \(\overline{x}\) is approximately Normal and the CLT applies.

If the sample size conditions aren’t met, we cannot apply the results of the CLT.

Example from last class

Question of interest: What proportion of students in ST 314 have attended a career fair this year?

Parameter of interest:

Practice!

Please complete the Class 8 Activity in Top Hat (can be found under the Assigned tab). Collaboration is encouraged! When you are finished with the activity, you are free to go.