26-sql

Author
Affiliation

Prof Amanda Luby

Carleton College
Stat 220 - Spring 2025

library(tidyverse)
library(mdsr) # for connecting to database
library(DBI) # database tools

Warm Up

Using nycflights23::flights, answer the following using {dplyr} code:

  • How many flights are in this dataset?
  • How many flights flew to MSP from JFK in 2023?
  • What was the average flight delay for MSP flights in Sept 2023?
nycflights23::flights
# A tibble: 435,352 × 19
    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
   <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
 1  2023     1     1        1           2038       203      328              3
 2  2023     1     1       18           2300        78      228            135
 3  2023     1     1       31           2344        47      500            426
 4  2023     1     1       33           2140       173      238           2352
 5  2023     1     1       36           2048       228      223           2252
 6  2023     1     1      503            500         3      808            815
 7  2023     1     1      520            510        10      948            949
 8  2023     1     1      524            530        -6      645            710
 9  2023     1     1      537            520        17      926            818
10  2023     1     1      547            545         2      845            852
# ℹ 435,342 more rows
# ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#   hour <dbl>, minute <dbl>, time_hour <dttm>

Intro to SQL

Your turn: Set up database connection

db <- dbConnect_scidb(dbname = "imdb")
db %>%
  dbGetQuery("SELECT * FROM kind_type
             LIMIT 0,10;")

What kind_type corresponds to movies?

Your turn: simple queries

  1. Find your favorite movie in the title table
  2. Find Viola Davis in the name table
  3. Make note of Viola Davis’s person_id

Your turn: finding data info

List the fields in the name, cast_info, and char_name tables. Do they have fields in common? How do you think these tables fit together?

dbListTables(db)
dbListFields(db, "title")

Your turn: JOINing Viola Davis

  1. Find all the rows in cast_info that correspond to Viola Davis as an actress

  2. Add the names of the characters she played by JOINing the char_name table

Your turn: full filmography

Find Viola Davis’s full filmography, in chronological order. Include each movie’s title, production_year, and the name of the character that she played.