Instructions

For this assignment you will practice creating, examining, and combining different data structures in R. This assignment is different from others in that it takes a worksheet format with built-in error checking. Each time you complete an answer, if you knit the document it should check your answer (but don’t remove “NULL” on an answer before you want it checked or you’ll get an error!) For this reason, you should re-knit every time you answer a question, so that if something goes wrong, you know the last edit caused the issue!

You will need to do the following:

  1. Fill in code as needed in the designated locations in code chunks.
    • You will not need to create new code chunks or edit chunk options.
    • Instructions in chunks will be shown as comments (after #); remove and replace the comments with your code:
    • Where you are assigning things to objects, replace the “NULL” values with your code. See code example 1 for an example!
    • You may need to add elements, such as for subsetting, to objects on the left side of operators. See code example 2 for an example!
    • Some answers may require multiple lines of code, so add as necessary.
    • Do not modify chunks labeled with “# DO NOT EDIT”; these are used to test your code.
  2. Fill in text as block quotes (use >) in designated locations to answer questions.
    • Be brief and specific. If you reference a single value or short (<10 element) vector, use in-line R code. If the answer can be clearly and accurately answered without referencing any values, you can do so. When I bold a word in a question, take it to mean I want you to reference a value or values.
    • Do not directly reference matrices, lists, or data frames in text (don’t display them). They can be big!
    • You should not need to include any code chunks in this section, but you can if you need.

Example for Code Questions:

# Ex 1) Create a vector of the numbers 1, 5, 3, 2, 4.
example <- NULL
# You would write this:
example <- c(1,5,3,2,4)

# Ex 2) Add the number 10 to the sixth position of example2
example2 <- example
example2 <- NULL

# You would write:
example2[6] <- 10

Example for Text Questions: 1) “How do atomic vectors differ from lists?”

An atomic vector is a one dimensional object where every element is the same type of data. Lists are one dimensional but elements can be different data types.

Problems

1: Coding with (Atomic) Vectors

# 1) Use "seq()" to create a vector of 10 numbers evenly spaced from 0 to 15.
vec_num <-  seq(0, 15, length.out=10)

# 2) Use ":" to create an integer vector of the numbers 11 through 20.
vec_int <- 11:20

# 3) Use "LETTERS" and "[ ]" to create a vector of the last 10 capital letters.
vec_cha <- LETTERS[17:26]

# 4) Use letters and "[ ]" to create a factor variable using the first ten lower case letters.
vec_fac <- factor(letters[1:10])

# 5) Use "c()" to combine "vec_cha" and "vec_fac" into "vec_let". Do not convert it to a factor!
vec_let <- c(vec_cha, vec_fac)

# 6) Use "c()" and "[ ]" to combine the first 4 elements of "vec_num" with the last
# 4 elements of "vec_int" to create "vec_ni".
vec_ni <- c(vec_num[1:4], vec_int[7:10])

# 7) Use "rev()" to reverse the order of "vec_fac".
fac_vec <- rev(vec_fac)

How’d you do?

Good job on vec_num!
Good job on vec_int!
Good job on vec_cha!
Good job on vec_fac!
Good job on vec_let!
Good job on vec_ni!
Good job on fac_vec!

Questions about Vectors:

  1. If you used c() to combine vec_int with vec_fac, what class of vector would you get? Why?

You would get numeric (integer) data. This is because factors are considered numeric when combined with other numeric vectors and R coerces to this data type.

  1. Consider this code:
new_vec <- c(TRUE, FALSE, TRUE, TRUE) 
  • What class of vector is this?
  • In words, what happens to its values when you try to convert it to numeric? To character? To numeric and then character?
  • What about to character and then numeric?

This is a logical vector. If you convert it to numeric, it will become 1’s and 0’s. If converted to character, it becomes true and false in quotes, that is character values. If you convert to numeric then character, you will get 1’s and 0’s in quotes (character data). If you convert to character then numeric, it will fail because R cannot convert character data directly into numeric data.

2: Coding with Matrices

# 1) Use matrix() to create a matrix with 10 rows and four columns filled with NA
mat_empty <- matrix(NA, nrow=10, ncol=4)

# 2) Assign "vec_num" to the first column of "mat_1" below.
mat_1 <- mat_empty # DO NOT EDIT THIS LINE; add code below it.
mat_1[,1] <- vec_num 

# 3) Assign "vec_int" to the second column of "mat_2" below
mat_2 <- mat_1 # DO NOT EDIT THIS LINE; add code below it.
mat_2[,2] <- vec_int

# 4) Assign "vec_cha" and "vec_fac" to the third and fourth columns of "mat_3" using one assignment operator.
mat_3 <- mat_2 # DO NOT EDIT THIS LINE; add code below it.
mat_3[,c(3,4)] <- cbind(vec_cha, vec_fac)

# 5) Select the fourth row from "mat_3" and assign it to the object "row_4" as a vector.
row_4 <- mat_3[4,]

# 6) Assign the element in the 6th row and 2nd column of "mat_3" to "val_6_2" as a numeric value.
val_6_2 <- as.numeric(mat_3[6,2])

# 7) Use "cbind()" to combine "vec_num", "vec_int", "vec_cha", and "vec_fac" into "mat_4".
mat_4 <- cbind(vec_num, vec_int, vec_cha, vec_fac)

# 8) Next, first transpose mat_4, then select only the first four columns and assign to mat_t
mat_t <- t(mat_4)[,1:4]

# 9)  Then use rbind() to add the rows from mat_3 to this (mat_t first, mat_3 second) and assign this combination to mat_big.
mat_big <- rbind(mat_t, mat_3)

How’d you do?

Good job on mat_empty!
Good job on mat_1!
Good job on mat_2!
Good job on mat_3!
Good job on row_4!
Good job on val_6_2!
Good job on mat_4!
Good job on mat_t!
Good job on mat_big!

Questions about Matrices:

  1. Note the column names on mat_4. What do you get when you try to get names() from mat_4? What about colnames()? What about rownames()? Can you guess why you get all these results?

Names extracts the names of elements from a vector or list (a data frame being a type of list). It does not work on matrices because it assumes a single dimension. Matrices have two dimensions, each of which can have names.

  1. Consider the code below:
mat_letters <- matrix(letters, ncol=2) 
  • This matrix goes from "a" to "m" in the first column and "n" to "z" in the second. What would be an easy way to make the matrix go in alphabetical order left to right, top to bottom?
mat_letters <- matrix(letters, ncol=2, byrow=TRUE) 
  1. Consider the code below:
math_mat <- matrix(1:5, nrow=5, ncol=5)
math_vec <- 1:5
  • Without showing your code (use the console), look at math_mat and math_vec. When you add math_mat + math_vec, what happens?
  • Without showing your code (use the console), what is the difference between the results from math_mat %*% math_vec and from math_mat * math_vec. Can you tell what is happening?

In the first case, you get element-by-element addition where math_vec is recycled to the dimensions of math_mat. In the second case, you are performing matrix multiplication compared to element-by-element multiplication.

3: Lists

# 1) Use "list()" to create a list that contains "vec_num" and "row_4", and assign the names
#   "vec_num" and "row_4" to these two elements of "list_1".
list_1 <- list(vec_num=vec_num, row_4=row_4)

# 2) Using "$", extract "row_4" from "list_1" and assign it to the object "row_4_2".
row_4_2 <- list_1$row_4

# 3) Create another list that contains "val_6_2" and "mat_big".
list_2 <- list(val_6_2, mat_big)

# 4) Combine list_1 and list_2 together using "c()" and assign them to "list_3"
list_3 <- c(list_1, list_2)

# 5) Use "unlist()" to turn "list_3" into a vector and assign it to "vector_3"
vector_3 <- unlist(list_3)

# 6) Use "as.list()" to convert "vector_3" into a list and assign it to "list_big"
list_big <- as.list(vector_3)

# 7) Now copy "list_3" as "list_4" and use "[[ ]]" to assign "list_3" as the last (fifth) element of "list_4";
# that is, you should have a list object with five elements named "list_4" that contains the same four
# elements as "list_3" plus a fifth element that is -all- four elements of "list_3" as one object.
list_4 <- list_3
list_4[[5]] <- list_3

# 8) Select the third element (that is, the sub-element) of the the fifth element of "list_4" and assign it 
# to element_5_3 using "[[ ]]".
element_5_3 <- list_4[[5]][[3]]

# 9) Lastly, repeat the previous assignment of the third element of the fifth element, but
# extract the element as a list rather than scalar using "[ ]" and assign it to "list_5_3".
list_5_3 <- list_4[[5]][3]

How’d you do?

Good job on list_1!
Good job on row_4_2!
Good job on list_2!
Good job on list_3!
Good job on vector_3!
Good job on list_big!
Good job on list_4!
Good job on element_5_3!
Good job on list_5_3!

Questions About Lists:

Many functions in R produce lists as output because they produce objects with different types of data and of different lengths. For instance, consider the linear regression saved to lm.output below. Don’t worry if you are not familiar with regression, we’re just concerned with what the function produces!

lm.output <- lm(mpg ~ wt, data=mtcars)
lm.output
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Coefficients:
## (Intercept)           wt  
##      37.285       -5.344
  1. How many elements does this call to lm() produce?

This call produces 12 elements.

  1. What are the dimensions of the model element?

lm.output$model has 32 rows and 2 columns.

  1. What are the values of the coefficients for the intercept and wt? Remember to call on them from lm.output object for your answer!

The intercept is 37.29 and the coefficient on wt is -5.34.

4: Coding with Data Frames

# 1) Use "data.frame()" to combine "vec_num" (first column) and "vec_int" (second column) into "df_1".
df_1 <- data.frame(vec_num, vec_int)

# 2) Use "$" to extract "vec_num" from "df_1", reverse it with "rev()", and assign it as the vector "vec_num_2".
vec_num_2 <- rev(df_1$vec_num)

# 3) Use "$" to add "vec_num_2" to "df_2" as a new column with the name "number_vector".
df_2 <- df_1 # DO NOT EDIT THIS LINE; add code below it.
df_2$number_vector <- vec_num_2

# 4) Combine "df_2" with itself using "rbind()" to create "df_3"
df_3 <- rbind(df_2, df_2)

# 5) Create a new data frame "df_4" using "data.frame()" that contains the following named columns (in order):
    # "y" that contains 20 numbers evenly spaced from 31 to 125
    # "x" that has 20 numbers between 0 and 10 generated using "runif()" (get help with ?runif)
    # "color" that consists of 20 values sampled from "col_vec "below using "sample()".
col_vec <- colors() # DO NOT EDIT THIS LINE; add code below it.
df_4 <- data.frame(y=seq(31,125,length.out=20),
                   x=runif(20,0,10),
                   color=sample(col_vec, 20))

    # This code here should produce a plot of those values with those colors!
if(is.null(df_4)==FALSE) df_4 %>% ggplot(aes(x, y, color=color)) + geom_point() + theme(legend.position="none")

# 5) Use "cbind()" to combine "df_4" and "df_2" into "df_5".
df_5 <- cbind(df_4, df_2)

# 6) Now use "data.frame()" to combine "df_4" with "df_2"
df_6 <- data.frame(df_4, df_2)

How’d you do?

Good job on df_1!
Good job on vec_num_2!
Good job on df_2!
Good job on df_3!
Oh no, check your code for "df_4"
- You seem to have an incorrect or misplaced value somewhere!
Good job on df_5!
Good job on df_6!

Questions About Data Frames:

  1. First, let’s add vec_cha to df_1 to make a new data frame and then use lapply(df_cha, class) to get a list that indicates the class of each column in df_1 (you’ll need to run this yourself in the console or turn eval=TRUE):
df_cha <- data.frame(df_1, vec_cha)
lapply(df_cha, class)

Do you see anything unusual about df_cha$vec_cha, say if we get class(vec_cha)? Get help on data frames with ?data.frame and explain in a sentence what you could do to modify this behavior.

When creating a data frame, by default R coerces character data into factors (because it assumes you want to use these data for a model). You can prevent this from occurring by using the data.frame() option stringsAsFactors=FALSE).

  1. Use names(), colnames(), and rownames() on df_1. How does this compare to the behavior of these functions on lists and matrices?

Because a data frame is a list but is treated as having two dimensions like a matrix, all functions to extract names work.

  1. Similarly, how do the results of length() and dim() differ between data frames, lists, matrices, and vectors?

length() gives you the number of columns in a data frame, the number of elements in a list or vector, and nothing for matrices. dim() gives you the number of rows and columns for both data frames and matrices but does not work for vectors or lists which are one-dimensional.

  1. Using the console, use == to test to see if cbind() and data.frame() in questions 5 and 6 produce the same result. Optional: If you want to make this cleaner, integrate either all() or any() functions, which could allow you to show this result in-line in your answer.

The results are identical, that is, a test of simultaneous equality of elements produces the value TRUE.

  1. As a final question, which won’t be graded strictly, use between three sentences and a paragraph to describe the differences and similarities between (this is mainly to help you think them through!):
    • Vectors and lists.
    • Data frames and matrices
    • Lists and data frames.

Vectors and lists are one-dimensional data types. They can be combined using c(), names can be obtained with names(), and length() can be used to get their size. Lists, however, can have different data types for each element while vectors have the same data type in every element.

Data frames and matrices both have rows and columns, allowing them to be indexed with [rows, columns], and names can be obtained with rownames() and colnames(). Matrices, however, can contain only one data type, and provide no output for length() and names(), because they are exclusively two dimensional while data frames are actually lists.

Data frames are a type of list where all elements of the data frame must have the same length; this allows them to be treated as columns in a two dimensional structure. You can access elements of a data frame and list with [[ ]], $, or by names. Unlike lists, data frames permit you to index them with [row, column] as well. Most commands that work on lists work on data frames, but many data frame functions do not work on lists. Unlike lists, data frames give a result for ncol, nrow, rownames, colnames, and dim. You can also do matrix operations on a data frame like t() and %*% provided the data are all numeric or can be coerced to numeric in the latter case.