date | last_name | first_name | address | age | cause_of_death |
---|---|---|---|---|---|
Aug 31, 1854 | Jones | Thomas | 26 Broad St. | 37 | cholera |
Aug 31, 1854 | Jones | Mary | 26 Broad St. | 11 | cholera |
Sept 1, 1854 | Warwick | Martin | 14 Broad St. | 23 | cholera |
Day 05
Carleton College
Stat 220 - Spring 2025
ggplot2
hw.Rmd
) to a safe place.
In 1854, a Cholera outbreak killed 127 people in 3 days in a London neighborhood, resulting in a mass exodus of local residents. At the time, people thought that Cholera w as an airborne disease. John Snow was a physician who was critical of the airborne theory, and set out to investigate.
What might this data look like?
date | last_name | first_name | address | age | cause_of_death |
---|---|---|---|---|---|
Aug 31, 1854 | Jones | Thomas | 26 Broad St. | 37 | cholera |
Aug 31, 1854 | Jones | Mary | 26 Broad St. | 11 | cholera |
Sept 1, 1854 | Warwick | Martin | 14 Broad St. | 23 | cholera |
What makes “address” a useful variable is that it is linked to a specific location in the physical world. If we plot these addresses, we get something like the following:
While we can see patterns in the last plot, the underlying map of the London streets provides helpful context that makes it more intelligble:
Snow’s insight was driven by another set of data—the locations of the street-side water pumps (it’s kind of hard to see, but they are labelled on the map). Nearly all of the cases were clustered around a single pump on the center of Broad Street.
John Snow’s map (and water pump) are now “famous” among epidemiologists and statisticians.
Fill in regions with variable values
Need two data sources:
you don’t need to join them!
Overlay symbols on an existing map, where the size of the shape is proportional to the variable
Use approximate geographical position to encode information, but not lat/long directly
A bunch of latitude longitude points…
… that are connected with lines in a very specific order.
latitude/longitude points for all map boundaries
which boundary group all lat/long points belong
the order to connect points within each group
ggplot2::map_data()
provides the necessary information
Rows: 15,537
Columns: 6
$ long <dbl> -87.46201, -87.48493, -87.52503, -87.53076, -87.57087, -87.5…
$ lat <dbl> 30.38968, 30.37249, 30.37249, 30.33239, 30.32665, 30.32665, …
$ group <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ order <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
$ region <chr> "alabama", "alabama", "alabama", "alabama", "alabama", "alab…
$ subregion <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
geom_polygon()
Using geom_polygon()
will treat states as solid shapes, making it easier to add color
coord_fixed()
Using coord_fixed()
forces x and y units to be equal
long
, lat
, and group
05-maps.Rmd
file available in the activities
repo03:00
Geospatial data exists on the globe and is generally described with a latitude and longitude. Any projection from the globe to euclidean space (X-Y plane) is going to cause some distortion.
coord_map
function provides a Mercator projection (mapproj
package has more options)
coord_map
function provides a Mercator projection (mapproj
package has more options)
Your task is to use the American Community Survey data to make a chloropleth map of the US
You should:
05-maps.rmd