CSSS508, Week 7

class: center, top, title-slide

# CSSS508, Week 7
## Vectorization and Functions
### Chuck Lanfear
### May 12, 2021 Updated: May 18, 2021

---

class: inverse
# A Quick Aside

---
# Visualize the Goal First

Before you can write effective code, you need to know *exactly* what you want that code to produce.

* Do I want a single value? A vector? List?
* Do I want one observation per person? Person-year? Year?

Most programming problems can be reduced to having an unclear idea of your end **goal** (or your beginning state).

If you know what you *have* (the data structure) and what you *want*, the intermediate steps are usually obvious.

When in doubt, *sketch* the beginning state and the intended end state. Then consider what translates the former into the latter in the least complicated way.

If that seems complex, break it into more steps.

---
class: inverse
# Vectorization

---
# Example from Last Week

Remember when we tried find the mean for each variable in the `swiss` data?

The best solution is to just use `colMeans()` without even thinking about pre-allocation or `for()` loops:

```r
colMeans(swiss)
```

```
##        Fertility      Agriculture      Examination        Education 
##             70.1             50.7             16.5             11.0 
##         Catholic Infant.Mortality 
##             41.1             19.9
```

---
# Vectorization Avoids Loops

Loops are very powerful and applicable in almost any situation.

They are also often slower and require writing more code than vectorized commands.

Whenever possible, use existing vectorized commands like `colMeans()` or `dplyr` functions.

Sometimes no functions exist to do what you need, so you'll be tempted to write a loop.

This makes sense on a *fast, one-time operation, on small data*.

If your data are large or you're going to do it repeatedly, however, consider *writing your own functions*!

---
class: inverse
# Writing Functions

---
# Examples of Existing Functions

* `mean()`:
    + Input: a vector
    + Output: a single number

* `dplyr::filter()`:
    + Input: a data frame, logical conditions
    + Output: a data frame with rows removed using those conditions

* `readr::read_csv()`:
    + Input: a file path, optionally variable names or types
    + Output: a data frame containing info read in from file

---
# Why Write Your Own Functions?

Functions can encapsulate actions you might perform often, such as:

* Given a vector, compute some special summary stats
* Given a vector and definition of "invalid" values, replace with `NA`
* Templates for favorite `ggplot`s used in reports
* Defining a new logical operator

Advanced function applications (not covered in this class):

* Parallel processing
* Generating *other* functions
* Making custom packages containing your functions

---
# Simple Function

Let's look at a function that takes a vector as input and outputs a named vector of the first and last elements:

```r
first_and_last <- function(x) {
 first <- x[1]
 last <- x[length(x)]
 return(c("first" = first, "last" = last))
}
```

Test it out:

```r
first_and_last(c(4, 3, 1, 8))
```

```
## first  last 
##     4     8
```

---
# Testing `first_and_last`

What if I give `first_and_last()` a vector of length 1?

```r
first_and_last(7)
```

```
## first  last 
##     7     7
```

Of length 0?

```r
first_and_last(numeric(0))
```

```
## first 
##    NA
```

Maybe we want it to be a little smarter.

---
# Checking Inputs

Let's make sure we get an error message when the vector is too small:

```r
smarter_first_and_last <- function(x) {
 if(length(x) == 0L) { # specify integers with L
* stop("The input has no length!")
 } else {
 first <- x[1]
 last <- x[length(x)]
 return(c("first" = first, "last" = last)) 
 }
}
```

.footnote[`stop()` ceases running the function and prints the text inside as an error message.]
---
# Testing Smarter Function

```r
smarter_first_and_last(numeric(0))
```

```
## Error in smarter_first_and_last(numeric(0)): The input has no length!
```

```r
smarter_first_and_last(c(4, 3, 1, 8))
```

```
## first  last 
##     4     8
```

---
# Cracking Open Functions

If you type a function name without any parentheses or arguments, you can see its contents:

```r
smarter_first_and_last
```

```
## function(x) {
## if(length(x) == 0L) { # specify integers with L
## stop("The input has no length!") #<<
## } else {
## first <- x[1]
## last <- x[length(x)]
## return(c("first" = first, "last" = last)) 
## }
## }
## <environment: 0x00000148e60609d0>
```

You can also put your cursor over a function in your syntax and hit `F2`.

---
# Anatomy of a Function

.small[

```r
NAME <- function(ARGUMENT1, ARGUMENT2=DEFAULT){
 BODY
 return(OUTPUT)
}
```
]

* **Name**: What you assign the function to so you can use it later
 + You can have "anonymous" (no-name) functions
--
* **Arguments** (aka inputs, parameters): things the user passes to the function that affect how it works
 + e.g. `x` or `na.rm` in `my_new_func <- function(x, na.rm = FALSE) {...}`
 + `na.rm = FALSE` is example of setting a default value: if user doesn't say what `na.rm` is, it'll be `FALSE`
 + `x`, `na.rm` values won't exist in R outside of the function
--
* **Body**: The actual operations inside the function.

--
* **Return Value**: The output inside `return()`. Could be a vector, list, data frame, another function, or even nothing
    + If unspecified, will be the last thing calculated (maybe not what you want?)
    
---
# Example: Reporting Quantiles

Maybe you want to know more detailed quantile information than `summary()` gives you and with interpretable names.

Here's a starting point:

.smallish[

```r
quantile_report <- function(x, na.rm = FALSE) {
 quants <- quantile(x, na.rm = na.rm, 
 probs = c(0.01, 0.05, 0.10, 0.25, 0.5, 0.75, 0.90, 0.95, 0.99))
 names(quants) <- c("Bottom 1%", "Bottom 5%", "Bottom 10%", "Bottom 25%",
 "Median", "Top 25%", "Top 10%", "Top 5%", "Top 1%")
 return(quants)
}
quantile_report(rnorm(10000))
```

```
##  Bottom 1%  Bottom 5% Bottom 10% Bottom 25%     Median    Top 25% 
##   -2.34863   -1.64252   -1.28831   -0.65889   -0.00165    0.67332 
##    Top 10%     Top 5%     Top 1% 
##    1.28503    1.63437    2.27336
```
]

---
class: inverse
# An Aside on Apply functions

---
### Don't Loop, `apply()` Yourself Instead

Writing loops is challenging, particularly for new coders.

Loops also require writing a lot of code and are hard to troubleshoot.

But loops aren't the only way to iterate in R.

Like a loop, `apply` functions iterate over elements of objects, except:

* They don't need preallocation--you can directly assign the output.
* They *must use a function*

*Nearly anything you can do with an explicit loop can be done more easily with the `apply` family of functions*

---
# `lapply()`: List + Functions

`lapply()` is used to **apply** a function over a **l**ist of any kind (e.g. a data frame) and return a list. This is a lot easier than preparing a `for()` loop!

```r
lapply(swiss, FUN = quantile_report)
```

.small[

```
## $Fertility
##  Bottom 1%  Bottom 5% Bottom 10% Bottom 25%     Median    Top 25% 
##       38.6       47.6       56.2       64.7       70.4       78.4 
##    Top 10%     Top 5%     Top 1% 
##       84.6       90.7       92.5 
## 
## $Agriculture
##  Bottom 1%  Bottom 5% Bottom 10% Bottom 25%     Median    Top 25% 
##       4.19      15.65      17.36      35.90      54.10      67.65 
##    Top 10%     Top 5%     Top 1% 
##      76.82      84.81      87.95 
## 
## $Examination
##  Bottom 1%  Bottom 5% Bottom 10% Bottom 25%     Median    Top 25% 
##        3.0        5.0        6.0       12.0       16.0       22.0 
##    Top 10%     Top 5%     Top 1% 
##       26.0       30.4       36.1
```
]

---
## `sapply()`: Simple `lapply()`

A downside to `lapply()` is that lists are hard to work with. `sapply()` **s**implifies the output by making each element a column in a matrix... usually:

.small[

```r
sapply(swiss, FUN = quantile_report)
```

```
##            Fertility Agriculture Examination Education Catholic
## Bottom 1%       38.6        4.19         3.0      1.46     2.21
## Bottom 5%       47.6       15.65         5.0      2.00     2.45
## Bottom 10%      56.2       17.36         6.0      3.00     2.83
## Bottom 25%      64.7       35.90        12.0      6.00     5.20
## Median          70.4       54.10        16.0      8.00    15.14
## Top 25%         78.4       67.65        22.0     12.00    93.12
## Top 10%         84.6       76.82        26.0     23.20    99.00
## Top 5%          90.7       84.81        30.4     29.00    99.61
## Top 1%          92.5       87.95        36.1     43.34    99.87
##            Infant.Mortality
## Bottom 1%              12.8
## Bottom 5%              15.6
## Bottom 10%             16.4
## Bottom 25%             18.1
## Median                 20.0
## Top 25%                21.7
## Top 10%                23.7
## Top 5%                 24.5
## Top 1%                 25.8
```
]

---
# `apply()`

There is also `apply()` which works over matrices or data frames. You can apply the function to each row (`MARGIN = 1`) or column (`MARGIN = 2`).

.small[

```r
apply(swiss, MARGIN = 2, FUN = quantile_report)
```

Remember the loop for loading data files from last week?

```r
library(dplyr); library(readr)
file_list <- list.files("./example_data/")
file_paths <- paste0("./example_data/", file_list)
data_names <- stringr::str_remove(file_list, ".csv")
data_list <- vector("list", length(file_list))
names(data_list) <- data_names
for (i in seq_along(file_list)){
 data_list[[ data_names[i] ]] <- read_csv(file_paths[i])
} 
complete_data <- bind_rows(data_list)
head(complete_data, 3)
```

```
## # A tibble: 3 x 3
## id x z
## <dbl> <dbl> <dbl>
## 1 44 0.516 0.381
## 2 49 2.17 0.346
## 3 50 -0.122 0.711
```

---
# Data Loading with `lapply()`

Another way to load these files would be to... `lapply()` over the file names then bind the rows together. Faster and easier!

```r
complete_data <- lapply(file_paths, read_csv) %>%
 bind_rows()
head(complete_data, 3)
```

```
## # A tibble: 3 x 3
## id x z
## <dbl> <dbl> <dbl>
## 1 44 0.516 0.381
## 2 49 2.17 0.346
## 3 50 -0.122 0.711
```

---
# Data Loading with `vroom`

The fastest and easiest way is to use a fully vectorized data loading function, like `vroom::vroom()`!

```r
library(vroom)
complete_data <- vroom(file_paths)
head(complete_data, 3)
```

```
## # A tibble: 3 x 3
## id x z
## <dbl> <dbl> <dbl>
## 1 44 0.516 0.381
## 2 49 2.17 0.346
## 3 50 -0.122 0.711
```

Just give `vroom()` a vector of file locations and it determines their delimiter, loads them all (crazy fast), and binds them into one dataframe.

---
## From Loop to `apply()`

Converting code in a loop to an `apply` function is straightforward:

1. What you iterate over in the loop (e.g. `seq_along(x)`) becomes the first input.

2. The body of the loop becomes a function.

* This function should take only the iterator index (e.g. `i`) as an input.
   
3. Assign the output to what your loop stored values in.

---
# Loop vs. Apply

```r
loop_vec <- numeric(5) # Preallocation!
for(x in seq_along(loop_vec)){ # Change x to 1,2,3,4,5
 loop_vec[x] <- x^2 # Write x squared to loop_vec
}
loop_vec
```

```
## [1]  1  4  9 16 25
```

`seq_along(loop_vec)` is just `1:5`, but we need the empty `loop_vec` to store results.

```r
# No preallocation, just iterate over 1:5 and assign output!
apply_vec <- sapply(1:5, function(x){x^2})
apply_vec
```

```
## [1]  1  4  9 16 25
```

For apply functions, we don't need to prellocate, so we just `sapply()` over `1:5` directly.

---
class: inverse
## Back to Making and Using Functions!

---
# Example: Discretizing Continuous Data

Maybe you often want to bucket variables in your data into groups based on quantiles:

| Person | Income | Income Bucket |
|:------:|-------:|--------------:|
|    1   |   8000 |             1 |
|    2   | 103000 |             3 |
|    3   |  12000 |             1 |
|    4   |  52000 |             2 |
|    5   | 150000 |             3 |
|    6   |  45000 |             2 |

---
# Bucketing Function

There's already a function in R called `cut()` that does this, but you need to tell it breaks or the number of buckets.

Let's make a function that calls `cut()` using quantiles (`quants`) for splitting and returns integers:

```r
*bucket <- function(x, quants = c(0.333, 0.667)) {
 # set low extreme, quantile points, high extreme
* new_breaks <- c(min(x)-1, quantile(x, probs = quants), max(x)+1)
 # labels = FALSE will return integer codes instead of ranges
 return(cut(x, breaks = new_breaks, labels = FALSE))
}
```

By default this will produce *three buckets*:
1. Anything below 33.3rd percentile
2. Anthing from 33.3rd to 66.7th
3. Anything above 66.7th

.pull-right[
.footnote[
To capture all high/low values, we start with `min(x)-1` and end with `max(x)+1`.
]
]

---
# Trying Out `bucket()`

.smallish[

```r
dat <- rnorm(100)
dat_quants <- c(0.05, 0.25, 0.5, 0.75, 0.95)
bucketed_dat <- bucket(dat, quants = dat_quants)
plot(x = bucketed_dat, y = dat, main = "Buckets and values", pch = 16)
abline(h = quantile(dat, dat_quants), lty = "dashed", col = "red")
```

![](CSSS508_Week7_vectorization_files/figure-html/unnamed-chunk-21-1.svg)
]

---
# Example: Removing Bad Data

Let's say we have data where impossible values occur:1

.smallish[

```r
(school_data <- 
 data.frame(school = letters[1:10],
 pr_passing_exam=c(0.78, 0.55, 0.91, -1, 0.88, 0.81, 0.90, 0.76, 99, 99),
 pr_free_lunch = c(0.33, 99, 0.25, 0.05, 0.12, 0.09, 0.22, -13, 0.21, 99)))
```

```
##    school pr_passing_exam pr_free_lunch
## 1       a            0.78          0.33
## 2       b            0.55         99.00
## 3       c            0.91          0.25
## 4       d           -1.00          0.05
## 5       e            0.88          0.12
## 6       f            0.81          0.09
## 7       g            0.90          0.22
## 8       h            0.76        -13.00
## 9       i           99.00          0.21
## 10      j           99.00         99.00
```
]

.footnote[[1] Different types of missing data are often coded this way in survey and administrative data sets.]

---
# Function to Remove Extreme Values

Goal:

* Input: a vector `x`, cutoff for `low`, cutoff for `high`
* Output: a vector with `NA` in the extreme places

```r
remove_extremes <- function(x, low, high) {
 x_no_low <- ifelse(x < low, NA, x)
 x_no_low_no_high <- ifelse(x_no_low > high, NA, x)
 return(x_no_low_no_high)
}
remove_extremes(school_data$pr_passing_exam, low = 0, high = 1)
```

```
##  [1] 0.78 0.55 0.91   NA 0.88 0.81 0.90 0.76   NA   NA
```

---
# `dplyr::across()`

The `dplyr` function `across()` allows us to a function to every variable (besides `school`) to update the columns in `school_data`:

.smallish[

```r
library(dplyr)
school_data %>%
   mutate(across(-school, ~ remove_extremes(x = ., low = 0, high = 1)))
```

```
##    school pr_passing_exam pr_free_lunch
## 1       a            0.78          0.33
## 2       b            0.55            NA
## 3       c            0.91          0.25
## 4       d              NA          0.05
## 5       e            0.88          0.12
## 6       f            0.81          0.09
## 7       g            0.90          0.22
## 8       h            0.76            NA
## 9       i              NA          0.21
## 10      j              NA            NA
```
]

---
# (Non-)Standard  Evaluation

`dplyr` uses what is called **non-standard evaluation** that lets you refer to "naked" variables (no quotes around them) like `school`.

`dplyr` verbs (like `mutate()`) recently started supporting **standard evaluation** allowing you to use quoted object names as well. This makes writing functions and loops with `dplyr` easier.

```r
swiss %>%
  select("Fertility", "Catholic") %>%
  head(2)
```

```
##            Fertility Catholic
## Courtelary      80.2     9.96
## Delemont        83.1    84.84
```

---
# Anonymous Functions in `dplyr`

You can skip naming your function in `dplyr` if you won't use it again. Code below will return the mean divided by the standard deviation for each variable in `swiss`:

.smallish[

```r
swiss %>%
    summarize(across(everything(), ~ mean(., na.rm=TRUE) / sd(., na.rm=TRUE)))
```

```
##   Fertility Agriculture Examination Education Catholic
## 1      5.62        2.23        2.07      1.14    0.987
##   Infant.Mortality
## 1             6.85
```
]

---
# Anonymous `lapply()`

Like with `dplyr`, you can use anonymous functions in `lapply()`1, but a difference is you'll need to have the `function()` part at the beginning:

.smallish[

```r
lapply(swiss, function(x) mean(x, na.rm = TRUE) / sd(x, na.rm = TRUE))
```
]

```
## $Fertility
## [1] 5.62
## 
## $Agriculture
## [1] 2.23
## 
## $Examination
## [1] 2.07
## 
## $Education
## [1] 1.14
## 
## $Catholic
## [1] 0.987
```

.pull-right[
.footnote[
[1] Note that `lapply()` produces a list as output. You could instead use `sapply()` to get a vector.
]
]

---
class: inverse
# Extended Example:

## `ggplot2` Templates

---
# Flexible `ggplot2`

Let's say you have a particular way you like your charts:

```r
library(gapminder); library(ggplot2)
ggplot(gapminder %>% filter(country == "Afghanistan"),
       aes(x = year, y = pop / 1000000)) +
       geom_line(color = "firebrick") +
       xlab(NULL) + ylab("Population (millions)") +
       ggtitle("Population of Afghanistan since 1952") +
       theme_minimal() + 
       theme(plot.title = element_text(hjust = 0, size = 20))
```

* How could we make this flexible for any country?
--

* How could we make this flexible for any `gapminder` variable?

---
# Example of Desired Chart

![](CSSS508_Week7_vectorization_files/figure-html/unnamed-chunk-30-1.svg)

---
# Another Example

![](CSSS508_Week7_vectorization_files/figure-html/unnamed-chunk-31-1.svg)

---
# Making Country Flexible

We can have the user input a character string for `cntry` as an argument to the function to get subsetting and the title right:

```r
gapminder_lifeplot <- function(cntry) {
* ggplot(gapminder %>% filter(country == cntry),
 aes(x = year, y = lifeExp)) +
 geom_line(color = "firebrick") +
 xlab(NULL) + ylab("Life expectancy") + theme_minimal() + 
* ggtitle(paste0("Life expectancy in ", cntry, " since 1952")) +
 theme(plot.title = element_text(hjust = 0, size = 20))
}
```

What `cntry` does:

* `filter()` to the specific value of `cntry`

* Add text value of `cntry` in `ggtitle()`

---
# Testing Plot Function

```r
gapminder_lifeplot(cntry = "Turkey")
```

![](CSSS508_Week7_vectorization_files/figure-html/unnamed-chunk-33-1.svg)

---
# Testing Plot Function

```r
gapminder_lifeplot(cntry = "Rwanda")
```

![](CSSS508_Week7_vectorization_files/figure-html/unnamed-chunk-34-1.svg)

---
# Making `y` Value Flexible

Now let's allow the user to say which variable they want on the y-axis. How we can get the right labels for the axis and title? We can use a named character vector to serve as a "lookup table" inside the function:

.smallish[

```r
y_axis_label <- c("lifeExp" = "Life expectancy",
 "pop" = "Population (millions)",
 "gdpPercap" = "GDP per capita, USD")
title_text <- c("lifeExp" = "Life expectancy in ",
 "pop" = "Population of ",
 "gdpPercap" = "GDP per capita in ")
# example use:
y_axis_label["pop"]
```

```
##                     pop 
## "Population (millions)"
```

```r
title_text["pop"]
```

```
##              pop 
## "Population of "
```
]

---
# `aes_string()`

`ggplot()` is usually looking for "naked" variables, but we can tell it to take them as quoted strings (standard evaluation) using `aes_string()` instead of `aes()`, which is handy when making functions:

```r
gapminder_plot <- function(cntry, yvar) {
 y_axis_label <- c("lifeExp" = "Life expectancy",
 "pop" = "Population (millions)",
* "gdpPercap" = "GDP per capita, USD")[yvar]
 title_text <- c("lifeExp" = "Life expectancy in ",
 "pop" = "Population of ",
* "gdpPercap" = "GDP per capita in ")[yvar]
* ggplot(gapminder %>% filter(country == cntry) %>%
 mutate(pop = pop / 1000000),
* aes_string(x = "year", y = yvar)) +
 geom_line(color = "firebrick") + 
* ggtitle(paste0(title_text, cntry, " since 1952")) +
 xlab(NULL) + ylab(y_axis_label) + theme_minimal() +
 theme(plot.title = element_text(hjust = 0, size = 20))
}
```

---
# Testing `gapminder_plot()`

```r
gapminder_plot(cntry = "Turkey", yvar = "pop")
```

![](CSSS508_Week7_vectorization_files/figure-html/unnamed-chunk-37-1.svg)

---
# Testing `gapminder_plot()`

```r
gapminder_plot(cntry = "Rwanda", yvar = "gdpPercap")
```

![](CSSS508_Week7_vectorization_files/figure-html/unnamed-chunk-38-1.svg)

---
class: inverse
# Making an Operator

---
# Opposite of `%in%`

`%in%` returns `TRUE` where elements on its left equal any element on the right.

.smallish[

```r
us_ca <- c("Canada", "United States")
gapminder %>% filter(country %in% us_ca) %>% distinct(country) %>% head(2)
```

```
## # A tibble: 2 x 1
## country 
## <fct> 
## 1 Canada 
## 2 United States
```
]

We can invert this to get the opposite, but it looks a bit awkward:

.smallish[

```r
gapminder %>% filter(!country %in% us_ca) %>% distinct(country) %>% head(2)
```

```
## # A tibble: 2 x 1
## country 
## <fct> 
## 1 Afghanistan
## 2 Albania
```
]

---
# `%!in%`

We can *invert* or **negate**1 `%in%` to get a "not in" operator:

```r
`%!in%` <- Negate(`%in%`)
```

To make a new operator, you need to put it in backticks.

```r
gapminder %>% 
* filter(country %!in% us_ca) %>% # Our new operator!
  distinct(country) %>% 
  head(2)
```

```
## # A tibble: 2 x 1
## country 
## <fct> 
## 1 Afghanistan
## 2 Albania
```

.footnote[[1] `Negate()` produces logical negations of *functions*, inverting their output. e.g.: `isnt.numeric <- Negate(is.numeric)` ]

---
class: inverse
# Wrapping Up

---
# Debugging

Something not working as hoped? Try using `debug()` on a function, which will show you the world as perceived from inside the function:

```r
debug(gapminder_plot)
```

Then when you've fixed your problem, use `undebug()` so that you won't go into debug mode every time you run it:

```r
undebug(gapminder_plot)
```

---
# Overview: The Process

Data processing can be very complicated, with many valid ways of accomplishing it.

I believe the best general approach is the following:
--

1. Look carefully at the **starting data** to figure out what you can get from them.
--

2. Determine *precisely* what you want the **end product** to look like.
--

3. Identify individual steps needed to go from Step 1 to Step 2.
--

4. Make each discrete step its own set of functions or function calls.
   + If any step is confusing or complicated, **break it into more steps**.
--

5. Complete each step *separately and in order*.
   + Do not continue until a step is producing what you need for the next step.
   + **Do not worry about combining steps for efficiency until everything works**.

Once finished, if you need to do this again, *convert the prior steps into functions*!

---
# Bonus Function

My lectures are rendered with a function!

.smallish[

```r
render_and_print_slides <- function(week){
 week_dir <- paste0(getwd(), "/Lectures/", "Week", week, "/")
 current_rmd <- paste0(week_dir, stringr::str_subset(list.files(week_dir),
 "^CSSS508_Week.*\\.Rmd$"))
 rmarkdown::render(current_rmd, encoding = "UTF-8")
 current_html <- stringr::str_replace(current_rmd, "\\.Rmd", "\\.html")
 new_pdf_file <- stringr::str_replace(current_html, "\\.html", "\\.pdf")
 new_r_script <- stringr::str_replace(current_html, "\\.html", "\\.R")
 message("Slides rendered, waiting 5 seconds.")
 Sys.sleep(5)
 message("Purling slides.")
 knitr::purl(input = current_rmd, output = new_r_script, documentation = 0)
 message("Printing from Chrome.")
 pagedown::chrome_print(current_html, format="pdf")
 message(paste0("Printing complete at ", week_dir))
}
```
]

I give it a numeric week and it (1) finds the lecture `.Rmd`, (2) knits the slides, (3) creates a `.R` file, (4) then opens the slides in Chrome and prints a PDF.

---
class: inverse
# Homework

[Download](https://s3.amazonaws.com/pronto-data/open_data_year_one.zip) and analyze data from the first year of Seattle's Pronto! bike sharing program.

Using the provided template, you will write:

1. A loop (or `lapply()`) to read in the data from multiple files.
   * Don't just use `vroom()`!
2. Functions to clean up the data 
3. A function to visualize ridership over the first year.

There is some string processing needed—much of which you have already seen or can probably Google—but *some will come in the next lecture*. I give suggestions in the template, but I can cover string processing in detail in lab if needed before the homework is due.

### PART 1 DUE: Next week
### PART 2 DUE: In two weeks