02:30
If you’d like to export this presentation to a PDF, do the following
This feature has been confirmed to work in Google Chrome and Firefox.
The complete collection of subjects we are interested in learning or making inference about.
Example:
A characteristic about the population, typically unknown or unobservable.
Example:
An observed subset of the population
Example:
A characteristic about the observed sample
Example:
The process of using known sampled information to form a conclusion about unknown population characteristics.
Primarily concerned with understanding and quantifying the uncertainty of parameter estimates (Weeks 4-10).
A study that observes and collects information on units but does not attempt to change or influence the units.
OpenIntro: Guided Practice 1.12 (pg. 25)
The tendency to systematically favor certain parts of a population over others.
How can we reduce biases when designing an observational study?
Use a random mechanism when sampling from the population.
A study in which the observed units are randomly assigned to treatments.
Example:
Four principles of a well-designed randomized experient:
A very large college class has 600 students. The students are divided into 25 groups, each of 24 students, for lab sections administered by different teaching assistants. The instructor wants to conduct a survey about how satisfied the students are with the course, and she believes that the lab section a student is in might affect the student’s overall satisfaction with the course.
Using one of the sampling schemes discussed in this week’s assigned reading, in a few sentences, propose a strategy to sample 100 students from the class so that you have a representative sample of the entire population of interest.
Sampling Schemes:
02:30
Please answer the question currently open at
Find one or two other students nearby to do this part with.
Introduce yourself if you have not already done so.
One group member should start by reading their study design aloud to the group.
The other group members’ task is to determine what sampling scheme was used.
Make sure each group member has a chance to share their sampling designs.
03:00
A pharmaceutical company is interested in assessing whether taking daily aspirin reduces the risk of heart attack. 1,500 individuals over the age of 55 have agreed to participate in the company’s study. Of the 1,500 participants, 550 report being at-risk for heart disease based on family medical history. The remaining 950 participants report no predisposition to heart disease.
In a few sentences, briefly outline an experimental design that may allow the researchers to answer the question of interest: “Does taking daily aspirin reduce the risk of heart attack?”
Your design should include the four elements of experimental design: controlling, randomization, replication, and blocking.
02:30
Return to your small group
In your group, take turns sharing the experimental designs.
After each member shares, identify how controlling, randomization, and replication were implemented in the study.
03:00
\[ \text{Sample mean: } \overline{x} = \frac{\sum \limits_{i = 1}^n x_i}{n}\]
\(n\): number of observations
\(x_i\): \(i\)th observation in the dataset
Sample Median, \(M\): middle value of the ordered data
If \(n\) is odd, \(M\), is the middle value in the ordered set of values.
If \(n\) is even, \(M\) is the midpoint (average) of the \(\frac{n}{2}\)th and \(\frac{n}{2}+1\)th observation.
Please answer the question currently open at
Variance
Standard deviation: the typical deviation of observations from the mean
\[s = \sqrt{s^2} = \sqrt{\frac{\sum \limits_{i=1}^n(x_i-\overline{x})^2}{n-1}}\]
Standard deviation is often used instead of variance to describe the spread of a distribution since it is expressed in the same units as the variable of interest.
Interquartile Range (IQR): describes the range of the middle 50% of the data
\[IQR = Q_3 - Q_1\]
\(Q_1\) is the first quartile: 25th percentile, the value such that 25% of data fall below this value
\(Q_3\) is the third quartile: 75th percentile, the value such that 75% of data fall below this value
An observation are considered outliers if it is
less than \(Q_1 - 1.5\times IQR\)
greater than \(Q_3 + 1.5 \times IQR\)
Is the distribution symmetric, left-skewed, or right-skewed?
How many peaks does the distribution have? Unimodal, bimodal, or multimodal?
The table displays the number of Nobel laureates in physics, chemistry, medicine, and economy per country from 1969-2020.
Country | Count | Proportion | Percentage |
---|---|---|---|
France | 15 | 15/442 = 0.034 | 3.4% |
Germany | 20 | 20/442 = 0.045 | 4.5% |
Japan | 15 | 15/442 = 0.034 | 3.4% |
Sweden | 8 | 8/442 = 0.018 | 1.8% |
Switzerland | 15 | 15/442 = 0.034 | 3.4% |
United Kingdom | 45 | 45/442 = 0.102 | 10.2% |
United States | 281 | 281/442 = 0.636 | 63.6% |
Other | 43 | 43/442 = 0.097 | 9.7% |
Total | 442 | 1.000 | 100.0% |
Source: https://www.statista.com/chart/19646/science-nobel-prizes-by-country-and-immigrant-share/