Visualizations and contingency tables
2025-09-22
Recall that a variable is either numerical or categorical
Categorical variables are variables that can take one of a limited (usually fixed) number of possible values, known as levels
Two types:
Ordinal: the levels have a special ordering
Nominal: the levels don’t have an ordering
Example:
Blood type (A, B, AB, O)
Education level (high school, college, graduate degree, other)
If we are interested in understanding the distribution of a single categorical variable, it is common to:
Display a frequency table, which is a table of counts of each level
# A tibble: 2 × 2
smoker n
<chr> <int>
1 no 155
2 yes 45
Create a bar plot, where different levels are displayed on one axis and the counts are portrayed on the other

Perhaps we are interested in examining the distribution of two categorical variables at the same time
Summarize the distribution using a two-way table known as a contingency table:
Each value in the table counts the number of times a particular combination of variable 1 and variable 2 levels occurred in data
| smoker | female | male |
|---|---|---|
| no | 87 | 68 |
| yes | 17 | 28 |
How can we use contingency table to obtain the distribution of just one of the variables?
The dodged bar plot directly converts the contingency table to a visualization.
| smoker | female | male |
|---|---|---|
| no | 87 | 68 |
| yes | 17 | 28 |

The stacked bar plot looks at the counts either row-wise or column-wise.
| smoker | female | male |
|---|---|---|
| no | 87 | 68 |
| yes | 17 | 28 |


Can convert the contingency table to proportions row-wise or column-wise to obtain the fractional breakdown of one variable in another.
| smoker | female | male |
|---|---|---|
| no | 87 | 68 |
| yes | 17 | 28 |
| smoker | female | male |
|---|---|---|
| no | 0.561 | 0.439 |
| yes | 0.378 | 0.622 |
Set up how to find the column-wise proportions using our contingency table
| smoker | female | male |
|---|---|---|
| no | 87 | 68 |
| yes | 17 | 28 |
The standardized bar plot visualizes these row-wise or column-wise proportions.
Note: if your data are already in the form of frequency table, we should use geom_col() instead!
How might we make the bars horizontal instead of vertical?
What do you notice about the legend for color compared to the legend for color from last week?
Faceting is used when we want to split a particular visualization by the values of another (categorical) variable
Like faceting, but only for box plots. Really good for comparing a numerical variable across across a categorical!
Change the background of plots by adding on any one of the following:
theme_bw(), theme_minimal(), theme_gray(), theme_void() and a few more (see all options by checking the help file for any one of these)