2025-09-24
Recall: data frames are objects in R that store tabular data in tidy form
The dplyr package (included in tidyverse package) uses the concept of functions as verbs that manipulate data frames.
filter(): pick rows matching criteriamutate(): add new variables as columnssummarise(): reduce variables to quantitative valuesgroup_by(): for grouped operations based on a variabledistinct(): filter for unique rowsselect(): pick columns by nameslice(): pick rows using indicesIn programming, a pipe is a technique for passing information from one process to another
In dplyr, the pipes are coded as |> (i.e. vertical bar and greater than sign)
+ used to add layers in ggplotWe can think about pipes as following a sequence of actions which provide a more natural and easier to read structure
For example: suppose that in order to get to work, I need to find my car keys, start my car, drive to work, and then park my car
It is common to compare two quantities using logical operators. All of these operators will return a logical TRUE or FALSE. List of some common operators:
<: less than
<=: less than or equal to
>: greater than
>=: greater than or equal to
==: (exactly) equal to
!=: not equal to
We might also want to know if a certain quantity “behaves” a certain way. The following also return logical outputs:
is.na(x): test if x is NA
x %in% y: test if x is in y
!x: not x
If executed code output in Source

If executed code output in Console

Tibble (i.e. data frame) with 12 observations and 13 variables
For variables shown, their names and types
Variables not displayed. In Source, you can click to see other variables.
Source will display at most 10 observations, but you can click to see more.
Data from Amazon: we have data about several books available for purchase from Amazon. I took a random sample from the original sample of 325 cases from the original dataset.
Copy and paste the following line of code into a new code chunk in your live code! We will load in the data together and take a quick look at it before diving into data wrangling