library(tidyverse)
library(palmerpenguins) # load penguins data16-functions
Load the Data
unscaled_cancer <- read_csv("https://raw.githubusercontent.com/UBC-DSCI/introduction-to-datascience/refs/heads/main/data/wdbc_unscaled.csv")
unscaled_cancer# A tibble: 569 × 12
        ID Class Radius Texture Perimeter  Area Smoothness Compactness Concavity
     <dbl> <chr>  <dbl>   <dbl>     <dbl> <dbl>      <dbl>       <dbl>     <dbl>
 1  8.42e5 M       18.0    10.4     123.  1001      0.118       0.278     0.300 
 2  8.43e5 M       20.6    17.8     133.  1326      0.0847      0.0786    0.0869
 3  8.43e7 M       19.7    21.2     130   1203      0.110       0.160     0.197 
 4  8.43e7 M       11.4    20.4      77.6  386.     0.142       0.284     0.241 
 5  8.44e7 M       20.3    14.3     135.  1297      0.100       0.133     0.198 
 6  8.44e5 M       12.4    15.7      82.6  477.     0.128       0.17      0.158 
 7  8.44e5 M       18.2    20.0     120.  1040      0.0946      0.109     0.113 
 8  8.45e7 M       13.7    20.8      90.2  578.     0.119       0.164     0.0937
 9  8.45e5 M       13      21.8      87.5  520.     0.127       0.193     0.186 
10  8.45e7 M       12.5    24.0      84.0  476.     0.119       0.240     0.227 
# ℹ 559 more rows
# ℹ 3 more variables: Concave_Points <dbl>, Symmetry <dbl>,
#   Fractal_Dimension <dbl>
Your turn: 3 functions
Turn the following code snippets into functions. Think about what each function does before you begin, and be sure to give each function an informative name.
mean(is.na(x))x / sum(x, na.rm = TRUE)sd(x, na.rm = TRUE) / mean(x, na.rm = TRUE)
Your turn: column_mean
Write a function called
column_meanthat takes a data set and column name (as a string) as inputs and returns the column mean as output. (Hint: access the column using[[)You should also include a
na.rmargument and set the default toTRUEso thatNAs are removed from the calculation by default.Test your function on the
mtcarsdata set.
> column_mean(mtcars, "cyl")
[1] 6.1875
Your turn: scatterplot
Write a plotting function that makes a scatterplot of any two quantitative variables, coloring the points by a 3rd categorical variable.
Test your function with the following examples:
scatterplot(unscaled_cancer, Radius, Texture, Class)
scatterplot(penguins, bill_length_mm, bill_depth_mm, species)
Your turn: scatterplot 2
Edit your scatterplot function to include an argument called draw_line. If draw_line is TRUE, your function should add a line of best fit to your scatterplot. Test your function with the following examples
scatterplot(unscaled_cancer, Radius, Texture, Class, draw_line = FALSE)
scatterplot(penguins, bill_length_mm, bill_depth_mm, species, draw_line = TRUE)
More practice (if time)
1
Consider the following function and subsequent call of that function. What causes this error? How can you fix it?
2
You can tell R to print messages using print("your message here"). Edit the function above to first check if pattern is one of the species in the dataset. If it’s not, your function should print an informative message and then return().
3
Given a vector of birthdates, write a function to compute the age in years. See if your function works on the easy vector first, then try the hard vector.
