library(tidyverse)
library(palmerpenguins) # load penguins data
16-functions
Load the Data
unscaled_cancer <- read_csv("https://raw.githubusercontent.com/UBC-DSCI/introduction-to-datascience/refs/heads/main/data/wdbc_unscaled.csv")
unscaled_cancer
# A tibble: 569 × 12
ID Class Radius Texture Perimeter Area Smoothness Compactness Concavity
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 8.42e5 M 18.0 10.4 123. 1001 0.118 0.278 0.300
2 8.43e5 M 20.6 17.8 133. 1326 0.0847 0.0786 0.0869
3 8.43e7 M 19.7 21.2 130 1203 0.110 0.160 0.197
4 8.43e7 M 11.4 20.4 77.6 386. 0.142 0.284 0.241
5 8.44e7 M 20.3 14.3 135. 1297 0.100 0.133 0.198
6 8.44e5 M 12.4 15.7 82.6 477. 0.128 0.17 0.158
7 8.44e5 M 18.2 20.0 120. 1040 0.0946 0.109 0.113
8 8.45e7 M 13.7 20.8 90.2 578. 0.119 0.164 0.0937
9 8.45e5 M 13 21.8 87.5 520. 0.127 0.193 0.186
10 8.45e7 M 12.5 24.0 84.0 476. 0.119 0.240 0.227
# ℹ 559 more rows
# ℹ 3 more variables: Concave_Points <dbl>, Symmetry <dbl>,
# Fractal_Dimension <dbl>
Your turn: 3 functions
Turn the following code snippets into functions. Think about what each function does before you begin, and be sure to give each function an informative name.
mean(is.na(x))
x / sum(x, na.rm = TRUE)
sd(x, na.rm = TRUE) / mean(x, na.rm = TRUE)
Your turn: column_mean
Write a function called
column_mean
that takes a data set and column name (as a string) as inputs and returns the column mean as output. (Hint: access the column using[[
)You should also include a
na.rm
argument and set the default toTRUE
so thatNA
s are removed from the calculation by default.Test your function on the
mtcars
data set.
> column_mean(mtcars, "cyl")
[1] 6.1875
Your turn: scatterplot
Write a plotting function that makes a scatterplot of any two quantitative variables, coloring the points by a 3rd categorical variable.
Test your function with the following examples:
scatterplot(unscaled_cancer, Radius, Texture, Class)
scatterplot(penguins, bill_length_mm, bill_depth_mm, species)
Your turn: scatterplot
2
Edit your scatterplot
function to include an argument called draw_line
. If draw_line
is TRUE
, your function should add a line of best fit to your scatterplot. Test your function with the following examples
scatterplot(unscaled_cancer, Radius, Texture, Class, draw_line = FALSE)
scatterplot(penguins, bill_length_mm, bill_depth_mm, species, draw_line = TRUE)
More practice (if time)
1
Consider the following function and subsequent call of that function. What causes this error? How can you fix it?
2
You can tell R to print messages using print("your message here")
. Edit the function above to first check if pattern
is one of the species
in the dataset. If it’s not, your function should print an informative message and then return()
.
3
Given a vector of birthdates, write a function to compute the age in years. See if your function works on the easy
vector first, then try the hard
vector.