Welcome!

2025-09-08

Welcome to the course!

Important course information

  • Professor Becky Tang

    • Office: Warner 214
    • Email: btang@middlebury.edu
    • Office hours: Mondays 2-3pm, Fridays 11-12pm
  • Course website: https://midd-stat201a-fall25.github.io/

    • Please bookmark this page and visit frequently!

Who is Professor Tang?

  • Undergraduate at Swarthmore College, majors in Mathematics and Computer Science

  • PhD in Statistical Science from Duke University

  • Joined Middlebury in Fall 2022

  • Hobbies: mushroom foraging, cooking for friends, running

  • Disclaimer: I talk EXTREMELY fast so please don’t hesitate to tell me to slow down/repeat myself

What is this course about?

  • What is statistics? What is data science?

  • By the end of this course, you will:

    • Produce and interpret graphical displays and numerical summaries of data
    • Have developed confidence and some proficiency in coding in R (and in particular, the tidyverse syntax)
    • Better understand the central role of randomness in designing studies and making conclusions
    • Hopefully want to pursue another Statistics or Mathematics course!
    • And much more…

Necessary background

  • We assume zero background in statistics and data science

  • There is a large computing component, though not as much as in STAT 118

    • Comfort in typing on keyboard, navigating your file system
  • MATH 121 (Calculus 1) pre-req

Recommendations

  • Come to class as awake as possible!

  • Takes notes! Each day’s slides will be made available on the course website by 10pm the night before. I recommend either:

    • Printing out the PDF version of slides to write notes on during class

      • I recommend 4 or 6 slides per page (demo)
    • Downloading PDF of slides to iPad/tablet/laptop and write notes on then using device

    • Taking supplemental notes on paper/device

  • Re–visit notes within 24 hours of class

Recommendations (cont.)

  • We will frequently make use of our laptops. Please bring one with enough charge to last the entire class each day we meet!

    • Please let me know as soon as possible if you do not have access to a laptop
  • Try to resist the temptation to do other tasks (e.g. check email, online shop, watch shows) when your laptop is open

    • This can be distracting to those around you
  • Keep an open mind and don’t be afraid to ask for assistance or tell me to slow down!

  • Resist the temptation of using ChatGPT or other generative AI tools

First assignment + submission

  • Problem Set 0: answers to a brief questionnaire are due before next class. See the course Schedule page for details!

  • All coding practice and homework assignments will be submitted to Gradescope via Canvas. You should all be able to access your Gradescope account using your Middlebury College credentials.

    • On Canvas, assignments can be found in both the Assignment tab and the Gradescope tab

    • Submissions can be resubmitted as many times as you like up until the due date

    • We will walk through how to submit an assignment together, but instructions + video can also be found here

Example

Counties of the U.S. within the bottom 10% of death rates for kidney cancer for white males, 1980-1989.

What do you notice? What might be the explanation?

Example (cont.)

Counties of the U.S. within the top 10% of death rates for kidney cancer for white males, 1980-1989.

Example (cont.)

What’s going on? Let’s do some digging…

Key terminology

Population and samples

  • Data do not come from thin air! Data have to be collected in some way.

  • This usually takes the form of sampling a subset of individuals from a target group of interest

    • The target group of interest is called the population
    • The subset of individuals from whom we actually collect data is the sample
  • A case is a fancy term for saying one observational unit

    • In the kidney cancer example, the cases are counties
  • What are the target populations and cases in the following?

    1. What is the average height of trees on Middlebury College campus?

    2. What proportion of current Middlebury professors attended a liberal arts college?

    3. Over the last five years, what is the average time to complete a degree for Middlebury students?

Population and samples (cont.)

  • Typically, the size of the sample is way smaller than the population. Why?
    • In the lucky event that we are able to collect data for every individual in the population, the sample is referred to as a census
  • Example: the U.S. Census Bureau is responsible for producing data about the American people and economy. They collect data with different schemes and frequency:

Parameters and statistics

  • Often times, answering a research question simplifies to understanding a numerical summary.

    • (Population) parameter: a single numerical summary calculated from (or considered for calculation from) the entire population

    • (Sample) statistic: a single numerical summary calculated from a sample

  • Why do we differentiate? It’s always good to remember that we are trying to answer questions about the population!

  • For every population parameter, there is an associated sample statistic

  • In kidney cancer data, we might want to know a particular county’s true kidney cancer rate (for white males, 1980-1989)

    • Is the quantity of interest a population parameter or a sample statistic?

    • Suppose the reported rate is 10%. Is 10% the value of a population parameter or a sample statistic?