
2025-10-06
While you’re coming into the room, please take 1 card. Then:
On the unlined side, write down “berry” if you pronounce Middlebury as Middle-“berry”, and “bury” if you pronounce it as Middle-“burry”.
On the lined side, write down the average number of hours of sleep you get per night
Then bring these to Prof. Tang
Problem Set 4 due tonight!
Midterm 1 updates
Today’s content is NOT on midterm
We are shifting focus from EDA and beginning to enter the world of statistical inference and modeling!
Want to answer questions about a population, but must rely on a sample
Collect data from sample –> calculate statistics
What can we say about the statistics?
Data are random! So how sure are we about our conclusions?
Statistics starts here!
Statistical inference is the process of using sample data to make conclusions about the underlying population the sample came from
Estimation: using the sample to estimate a plausible values for the unknown parameter
Testing: evaluating whether our observed sample provides evidence for or against some claim about the population
Examples:
What proportion of Middlebury students pronounce the college’s name as Middle-“berry”?
What is the average number of hours of sleep Middlebury students get a night?
Questions here are about a population parameter
What proportion of Middlebury STAT 201A students pronounce Middlebury as Middle-“berry”?
Target population:
Sampling method:
Population parameter:
Are we able to compute the value of the parameter, or do we need to calculate a statistic?
What proportion of Middlebury students pronounce Middlebury as Middle-“berry”?
Target population:
Sampling method:
Population parameter:
Are we able to compute the value of the parameter, or do we need to calculate a statistic?
We are often interested in estimating a population mean or proportion. Let’s make sure we feel comfortable telling the difference.
For each of the following situations, state whether the parameter of interest is a mean or a proportion.
Sample proportion \(\hat{p}\) is a very sensible estimate for true proportion \(p\)
\(\hat{p}\) is an example of a point estimate: a single number used to estimate a true but unknown population parameter
i.e. a point estimate is a statistic with a specific purpose
Other examples include sample mean \(\bar{x}\) for true mean \(\mu\), and \(s\) for \(\sigma\)
What might be a desirable characteristic of a “good” point estimate?
Two datasets collected under identical sampling procedures will almost always differ due to variability in the sample.
As a result, values of the point estimate/sample statistic that we calculate from the different samples will also exhibit variability
Sampling distribution of the statistic: how the statistic behaves under repeated random samples obtained via the same sampling procedure
The variability associated with the sampling distribution of the statistic is called the standard error
This is in contrast to the standard deviation, which describes variability in the individual data points and not the statistic
Population distribution: distribution of the variable of interest for everyone in the population
Sample distribution: distribution of the data from a single sample
Sampling distribution: distribution of sample statistics calculated from the data obtained from multiple samples
At the beginning of the semester, I passed around a bag of candy and everyone took out 5 pieces at random, and measured the average weight.
What was the parameter of interest? What sample statistic did you calculate?

The histogram visualizes your sample mean weights.
Does this histogram visualize the population distribution, the sample distribution, or the sampling distribution of a statistic?
Each one of the values in the histogram is a sample mean \(\bar{x}\) (i.e. a sample statistic)
Thus, the histogram visualizes the sampling distribution of the sample mean
Do we typically get to observe the sampling distribution?
What are the differences between a population distribution, a sample distribution, and a sampling distribution? What are their associated variability?