babynames %>%
___(name == ___ ) %>%
ggplot(aes(x = ___, y = ___, col = ___)) +
geom____() 13: Intro to Strings/Regex
babynames intro
Replicate the “Amanda” plot with your own name using the starter code below. If your name doesn’t have enough data, try a friend or professor’s name
Practice with stringr
Your turn
No code needed, just think about what it returns
Your turn 2
Fill in the blanks of the .Rmd file to…
Isolate the last letter of every name
and create a logical variable that displays whether the last letter is one of “a”, “e”, “i”, “o”, “u”, or “y”.
Use a weighted mean to calculate the proportion of children whose name ends in a vowel, by year (see
?weighted.mean).and then display the results as a line plot.
babynames %>%
mutate(last = ___,
vowel = ___) %>%
group_by(___) %>%
___(p_vowel = weighted.mean(vowel, n)) %>%
___ +
___Example: Carleton courses
The below code chunk imports a data set of the 6-credit courses offered at Carleton in Winter 2023. All three columns are character vectors.
courses <- read_csv("https://stat220-s25.github.io/data/winter2023_course_tbl.csv", col_types = list(course = col_character()))(a)
How many of the course numbers end in .00? Use str_detect() or str_count() to help you answer this question.
Note that . is a special character in strings, so use \\. to get the literal period.
(b)
The section number appears after the decimal point. Use mutate() and str_sub() to create a section column containing this number.
(c)
How many courses contain the word Introduction? Does case matter here?
(d)
What is the longest course name (in terms of characters)? What is the shortest course name? Use str_length() to help you answer this question.
(e)
Use str_subset() to return the course names that contain exclamation points (!).
Practice with Regular Expressions
Your turn 1
Detect either “.” or “-” in the info vector.
a1 <- "Home: 507-645-5489"
a2 <- "Cell: 219.917.9871"
a3 <- "My work phone is 507-202-2332"
a4 <- "I don't have a phone"
info <- c(a1, a2, a3, a4)Your turn: ends with a vowel
Fill in the code to determine how many baby names in 2015 ended with a vowel.
babynames %>%
___(___ == ___) %>% # extract year 2015
___(ends_with_vowel = ___(___, ___)) %>% # create logical column
count(ends_with_vowel) # create a frequency tableAdditional practice (if time)
A vector called words is loaded with stringr and contains a corpus of 980 words used in text analysis. Use stringr functions and regular expressions to find the words that satisfy the following descriptions.
1. Find all words that start with y.
pattern <- "type your pattern here"
str_subset(words, pattern)character(0)
