dplyr 1: Verbs

Author
Affiliation

Prof Amanda Luby

Carleton College
Stat 220 - Spring 2025

Click the “code” button above to copy and paste the source code into your RStudio editor.

Warm up

Identify the verb (function) that does the following:

  • Picks rows by their values
  • Reorders the rows
  • Picks variables by their names
  • Creates new variables with functions of existing variables

Wrangling the nycflights23 data

Part 1

  • Find all flights that had an arrival delay of two or more hours.
  • Find all flights to MSP
  • Find all flights that arrived more than two hours late, but left less than one hour late
  • (if time) Find all flights that were delayed by at least an hour, but made up over 30 minutes in flight

Part 2

Use arrange to answer the following questions:

  • Which flights traveled the farthest?
  • Which traveled the shortest?
  • Which flights lasted the longest?
  • Which lasted the shortest?

Part 3

Create a new column in flights giving the average speed of the flight while it was in the air. What are the units of this variable? Make the variable in terms of miles per hour.

Part 5

Suppose that you don’t think the FAA gives enough information in their definition of a delayed flight, so you come up with the following delay categories:

  • dep_delay <= 0 -> none
  • dep_delay between 1 and 15 minutes -> minimal
  • dep_delay between 16 and 30 minutes -> delayed
  • dep_delay between 31 and 60 minutes -> major
  • dep_delay over 60 minutes -> extreme

Use mutate() and case_when() to create a delay_category variable in the flights data frame.

and we can check with select():

Part 6

Chain the last two parts together, so that the resulting dataset contains both avg_speed and delay_category. Pipe this new dataset into ggplot() to answer the question “is there a relationship between average speed and how late a flight is delayed?” You should use the delay_category variable you created to answer this question.

If you have time, create a new graph which only contains flights to MSP. Does your conclusion change?

Adapted from Adam Loy’s materials