```{r setup, include=FALSE, purl=FALSE}
options(htmltools.dir.version = FALSE)
knitr::opts_chunk$set(comment = "##")
```
```{r xaringan-themer, include = FALSE, purl=FALSE}
library(xaringanthemer)
source("../csss508css.R")
```
```{r, include=FALSE}
library(tidyverse)
library(pander)
library(knitr)
`%!in%` <- Negate(`%in%`)
hook_output = knit_hooks$get('output')
knit_hooks$set(output = function(x, options) {
if (!is.null(n <- options$out.lines)) {
x = unlist(stringr::str_split(x, '\n'))
if (length(x) > n) {
# truncate the output
x = c(head(x, n), '....\n')
}
x = paste(x, collapse = '\n') # paste first n lines together
}
hook_output(x, options)
})
opts_chunk$set(out.lines = 20)
ex_dat <- data.frame(num1 = rnorm(200, 1, 2),
fac1 = sample(c(1, 2, 3), 200, TRUE),
num2 = rnorm(200, 0, 3),
fac2 = sample(c(1, 2))) %>%
mutate(yn = num1*0.5 + fac1*1.1 + num2*0.7 + fac2-1.5 + rnorm(200, 0, 2)) %>%
mutate(yb = as.numeric(yn > mean(yn))) %>%
mutate(fac1 = factor(fac1, labels=c("A", "B", "C")),
fac2 = factor(fac2, labels=c("Yes", "No")))
```
# Topics for Today
Displaying Model Results
* `broom`
+ Turning model output lists into dataframes
+ Summarizing models
* `ggeffects`
+ Creating counterfactual estimates
+ Plotting marginal effects
* Manual counterfactual plots
* Making regression tables
+ Using `pander` for models
+ Using `sjTable()` in `sjPlot`
* Wrapping up the course
---
class: inverse
# `broom`
---
# `broom`
`broom` is a package that "tidies up" the output from models such a `lm()` and `glm()`.
It has a small number of key functions:
* `tidy()` - Creates a dataframe summary of a model.
* `augment()` - Adds columns—such as fitted values—to the data used in the model.
* `glance()` - Provides one row of fit statistics for models.
```{r}
library(broom)
```
---
# Model Output is a List
`lm()` and `summary()` produce lists as output, which cannot go directly into
tidyverse functions, particularly those in `ggplot2`.
.smaller[
```{r}
lm_1 <- lm(yn ~ num1 + fac1, data = ex_dat)
summary(lm_1)
```
]
---
# Model Output Varies!
.smallish[
Each type of model also produces somewhat different output, so you can't just reuse
the same code to handle output from every model.
]
.smaller[
```{r, op}
glm_1 <- glm(yb ~ num1 + fac1, data = ex_dat, family=binomial(link="logit"))
summary(glm_1)
```
]
---
# `broom::tidy()`
`tidy()` produces the similar output, but as a dataframe.
```{r}
lm_1 %>% tidy()
```
Each type of model (e.g. `glm`, `lmer`) has a different *method* with its own additional arguments. See `?tidy.lm` for an example.
---
# `broom::tidy()`
This output is also completely identical between different models.
This can be very
useful and important if running models with different test statistics... or just running
a lot of models!
```{r}
glm_1 %>% tidy()
```
---
# `broom::glance()`
`glance()` produces dataframes of fit statistics for models.
If you run many models,
you can compare each model row-by-row in each column... or even plot their different
fit statistics to allow holistic comparison.
.small[
```{r}
glance(lm_1)
```
]
---
# `broom augment()`
`augment()` takes values generated by a model and adds them back to the original data.
This includes fitted values, residuals, and leverage statistics.
.small[
```{r}
augment(lm_1) %>% head()
```
]
---
# The Power of `broom`
The real advantage of `broom` becomes apparent when running many models at once. Here we run separate models for each level of `fac1`:
.small[
```{r}
ex_dat %>% group_by(fac1) %>% do(tidy(lm(yn ~ num1 + fac2 + num2, data = .))) #<<
```
]
.footnote[`do()` repeats whatever is inside it once for each level of the variable(s) in `group_by()` then puts them together as a data frame.]
---
class: inverse
# Plotting Model Results
---
# `geom_smooth()`
I have used `geom_smooth()` in many past examples.
`geom_smooth()` generates "smoothed conditional means" including loess curves and generalized additive models (GAMs).
--
Note, however, that most regression models are conditional mean models, such as ordinary least squares, generalized linear models.
--
We can use `geom_smooth()` to add a layer depicting common bivariate models.
We'll look at this with the `gapminder` data from Week 2.
```{r}
library(gapminder)
```
---
# Default `geom_smooth()`
```{r, fig.height = 2.75, fig.width = 7, dev="svg", warning=FALSE, message=FALSE}
ggplot(data = gapminder,
aes(x = year, y = lifeExp, color = continent)) +
geom_point(position = position_jitter(1,0), size = 0.5) +
geom_smooth()
```
By default, `geom_smooth()` chooses either a loess smoother (N < 1000) or a GAM depending on the number of observations.
---
# Linear `glm`
```{r, fig.height = 2.75, fig.width = 7, dev="svg", warning=FALSE, message=FALSE}
ggplot(data = gapminder,
aes(x = year, y = lifeExp, color = continent)) +
geom_point(position = position_jitter(1,0), size = 0.5) +
geom_smooth(method = "glm", formula = y ~ x) #<<
```
We could also fit a standard linear model using either `method = "glm"` or `method = "lm"` and a formula like `y ~ x`.
---
# Polynomial `glm`
```{r, fig.height = 2.75, fig.width = 7, dev="svg", warning=FALSE, message=FALSE}
ggplot(data = gapminder,
aes(x = year, y = lifeExp, color = continent)) +
geom_point(position = position_jitter(1,0), size = 0.5) +
geom_smooth(method = "glm", formula = y ~ poly(x, 2)) #<<
```
`poly(x, 2)` produces a quadratic model which contains a linear term (`x`) and a quadratic term (`x^2`).
---
# More Complex Models
What if we want something more complex than a bivariate model?
What if we have a statistically complex model, like nonlinear probability model or multilevel model?
We need to go beyond `geom_smooth()`!
---
# But first, vocab!
We are often interested in what might happen if some variables take particular values, often ones not seen in the actual data.
--
When we set variables to certain values, we refer to them as **counterfactual values** or just **counterfactuals**.
--
For example, if we know nothing about a new observation, our prediction for that estimate is often based on assuming every variable is at its mean.
--
Sometimes, however, we might have very specific questions which require setting (possibly many) combinations of variables to particular values and making an estimate or prediction.
--
Providing specific estimates, conditional on values of covariates, is a nice way to summarize results, particularly for models with unintuitive parameters (e.g. logit models).
---
class: inverse
# `ggeffects`
---
# `ggeffects`
If we want to look at more complex models, we can use `ggeffects` to create and plot tidy
*marginal effects*.
That is, tidy dataframes of *ranges* of predicted values that can be
fed straight into `ggplot2` for plotting model results.
We will focus on two `ggeffects` functions:
* `ggpredict()` - Computes predicted values for the outcome variable at margins of specific variables.
* `plot.ggeffects()` - A plot method for `ggeffects` objects (like `ggredict()` output)
```{r}
library(ggeffects)
```
---
# Quick Simulated Data
To best show off `ggeffects`, I need a data frame with numeric and categorical variables with strong relationships. It is easiest to just simulate it:
```{r}
ex_dat <- data.frame(num1 = rnorm(200, 1, 2),
fac1 = sample(c(1, 2, 3), 200, TRUE),
num2 = rnorm(200, 0, 3),
fac2 = sample(c(1, 2))) %>%
mutate(yn = num1 * 0.5 + fac1 * 1.1 + num2 * 0.7 +
fac2 - 1.5 + rnorm(200, 0, 2)) %>%
mutate(yb = as.numeric(yn > mean(yn))) %>%
mutate(fac1 = factor(fac1, labels = c("A", "B", "C")),
fac2 = factor(fac2, labels = c("Yes", "No")))
```
Now we can get `ggpredict`ing!
---
# `ggpredict()`
When you run `ggpredict()`, it produces a dataframe with a row for every unique
value of a supplied predictor ("independent") variable (`term`).
Each row contains an expected (estimated) value for the outcome ("dependent") variable, plus confidence intervals.
```{r}
lm_1 <- lm(yn ~ num1 + fac1, data = ex_dat)
lm_1_est <- ggpredict(lm_1, terms = "num1")
```
If desired, the argument `interval="prediction"` will give predicted intervals instead.
---
#`ggpredict()` output
.smallish[
```{r}
lm_1_est
```
]
---
# `plot()` for `ggpredict()`
`ggeffects` features a `plot()` *method*, `plot.ggeffects()`, which produces
a ggplot when you give `plot()` output from `ggpredict()`.
.small[
```{r, warning=FALSE, message=FALSE, dev="svg", fig.height=4}
plot(lm_1_est)
```
]
---
# Grouping with `ggpredict()`
When using a vector of `terms`, `ggeffects` will plot the first along the x-axis and use
others for *grouping*. Note we can pipe a model into `ggpredict()`!
.small[
```{r, warning=FALSE, message=FALSE, dev="svg", fig.height=3.5}
glm(yb ~ num1 + fac1 + num2 + fac2, data = ex_dat, family=binomial(link = "logit")) %>%
ggpredict(terms = c("num1", "fac1")) %>% plot()
```
]
---
# Faceting with `ggpredict()`
You can add `facet=TRUE` to the `plot()` call to facet over *grouping terms*.
.small[
```{r, warning=FALSE, message=FALSE, dev="svg", fig.height=3.5}
glm(yb ~ num1 + fac1 + num2 + fac2, data = ex_dat, family = binomial(link = "logit")) %>%
ggpredict(terms = c("num1", "fac1")) %>% plot(facet=TRUE)
```
]
---
# Counterfactual Values
You can add values in square brackets in the `terms=` argument to specify counterfactual values.
.small[
```{r, warning=FALSE, message=FALSE, dev="svg", fig.height=3.5}
glm(yb ~ num1 + fac1 + num2 + fac2, data=ex_dat, family=binomial(link="logit")) %>%
ggpredict(terms = c("num1 [-1,0,1]", "fac1 [A,B]")) %>% plot(facet=TRUE)
```
]
---
# Representative Values
You can also use `[meansd]` or `[minmax]` to set representative values.
.small[
```{r, warning=FALSE, message=FALSE, dev="svg", fig.height=3.5}
glm(yb ~ num1 + fac1 + num2 + fac2, data = ex_dat, family = binomial(link = "logit")) %>%
ggpredict(terms = c("num1 [meansd]", "num2 [minmax]")) %>% plot(facet=TRUE)
```
]
---
# Dot plots with `ggpredict()`
`ggpredict` will produce dot plots with error bars for categorical predictors.
.small[
```{r, warning=FALSE, message=FALSE, dev="svg", fig.height=3.5}
lm(yn ~ fac1 + fac2, data = ex_dat) %>%
ggpredict(terms=c("fac1", "fac2")) %>% plot()
```
]
---
# Notes on `ggeffects`
There is a lot more to the `ggeffects` package that you can see in [the package vignette](https://cran.r-project.org/web/packages/ggeffects/vignettes/marginaleffects.html)
and the [github repository](https://github.com/strengejacke/ggeffects). This includes,
but is not limited to:
* Predicted values for polynomial and interaction terms
* Getting predictions from models from dozens of other packages
* Sending `ggeffects` objects to `ggplot2` to freely modify plots
---
# An Advanced Example
Here is an example using a model from a [recent article I worked on](https://onlinelibrary.wiley.com/doi/abs/10.1111/cico.12346?af=R).
This models the likelihood of arrest of a target in a police contact conditional on neighborhood, race of target, and race of who called the police.
.smallish[
```{r}
load("data/any_arrest_data.RData")
mod_arrest <- glm(arrest ~ white_comp_wit_vict*black_arr_susp +
crime_type*white_comp_wit_vict + caller_type +
arr_susp_subj_count + comp_wit_vict_count +
black_arr_susp*neighb_type + crime_type*neighb_type +
serious_rate + pbl + pot + dis + year,
family = binomial(link = "logit"),
data = any_arrest_data)
```
]
There are a lot of interactions here:
* Target Race x Caller Race
* Crime Type x Caller Race
* Target Race x Neigbhorhood Type
* Crime Type x Neighborhood Type
---
# `ggeffects` Output
.smallish[
```{r, warning=FALSE, message=FALSE, dev="svg", fig.height=4}
mod_arrest %>% ggpredict(terms = c("black_arr_susp",
"white_comp_wit_vict", "neighb_type")) %>% plot()
```
]
---
# A Complex Example
`ggpredict()` can only handle three variables in its `terms=` argument.
--
For my article, I wanted to plot estimates across counterfactual values of all four variables in my interaction terms:
* Caller Race
* Target Race
* Crime Type
* Neighborhood Type
How could I do this?
--
Stats + Math + Code = $\heartsuit$
---
# Some Background
Given we've estimate a model, consider the following:
1. $\hat{Y} = X\hat{\beta}$, where $X$ is the model matrix and $\hat{\beta}$ is the coefficients.
2. $\hat{\beta}$ is a vector of *random variables* whose estimated distribution is described by parameter variance-covariance matrix $\Sigma$.
--
Using this, we can do the following:
1. Extract the model matrix $X$, estimated coefficients ( $\hat{\beta}$ ), and $\Sigma$ from our fitted model.
2. Make lots of random parameter draws centered on $\hat{\beta}$ and distributed according to $\Sigma$.
3. Multiply *each* of these draws by *counterfactual* $X$ *values* to get $\hat{Y}$ values.
4. Take the 2.5% and 97.% quantiles of these $\hat{Y}$ values.
--
This produces a *simulated* mean and confidence interval. This is called the **percentile method**, a type of *bootstrapping*.
---
# Simulating Coefficients
We can make random draws from our estimated distribution of parameters using `MASS::mvrnorm()` which takes three main arguments:
1. `n`: The number of draws
2. `mu`: mean—our coefficient estimates—obtained via `coef()`.
3. `Sigma`: a covariance matrix, obtained via `vcov()`.
.smallish[
```{r}
sim_params <- MASS::mvrnorm(n = 10000,
mu = coef(mod_arrest),
Sigma = vcov(mod_arrest))
sim_params[1:6, 1:4]
```
]
---
# Counterfactual Values
Next we need a data frame with our counterfactual values.
We want one row (or *scenario*) per estimate to plot, and all variables at their means *except* the ones we are varying. We also don't want impossible values; `neighb_type` values are mutually exclusive.
```{r, echo=FALSE}
opts_chunk$set(out.lines = 30)
```
.smallish[
```{r}
x_values <- colMeans(model.matrix(mod_arrest)) # vars at mean
n_scen <- (2*2*2*3) # Number of scenarios
x_frame <- setNames(data.frame(matrix(x_values, nrow=n_scen,
ncol=length(x_values),
byrow=T)), names(x_values))
cf_vals <- arrangements::permutations(c(0,1), k=5, replace=T) #<<
cf_vals <- cf_vals[cf_vals[,4]+cf_vals[,5]!=2 ,] # Remove impossible vals
colnames(cf_vals) <- c("white_comp_wit_vict1", "black_arr_susp1",
"crime_typeNuisance", "neighb_typeBlackDisadv",
"neighb_typeChanging")
x_frame[colnames(cf_vals)] <- cf_vals # assign to countefactual df
```
]
.footnote[`permutations()` is a quick way to get all combinations of some values.]
---
# What Do We Have?
.smaller[
```{r}
glimpse(x_frame)
```
]
---
# Fixing Interactions
Our main variables are correct... but we need to make our interaction terms.
The interaction terms in the model matrix have specific form `var1:var2`.
Their counterfactual values are just equal to the products of their components.
.small[
```{r}
x_frame <- x_frame %>%
mutate(
`white_comp_wit_vict1:black_arr_susp1` = white_comp_wit_vict1*black_arr_susp1,
`white_comp_wit_vict1:crime_typeNuisance` = white_comp_wit_vict1*crime_typeNuisance,
`black_arr_susp1:neighb_typeBlackDisadv` = black_arr_susp1*neighb_typeBlackDisadv,
`black_arr_susp1:neighb_typeChanging` = black_arr_susp1*neighb_typeChanging,
`crime_typeNuisance:neighb_typeBlackDisadv` = crime_typeNuisance*neighb_typeBlackDisadv,
`crime_typeNuisance:neighb_typeChanging` = crime_typeNuisance*neighb_typeChanging,
`black_arr_susp1:neighb_typeBlackDisadv` = black_arr_susp1*neighb_typeBlackDisadv,
`black_arr_susp1:neighb_typeChanging` = black_arr_susp1*neighb_typeChanging)
```
]
---
# Fixed
.smaller[
```{r}
glimpse(x_frame)
```
]
---
# Estimates!
Then we just multiply our parameters by our counterfactual data:
```{r}
sims_logodds <- sim_params %*% t(as.matrix(x_frame))
sims_logodds[1:6, 1:6]
dim(sims_logodds)
```
Now we log-odds 10,000 estimates each (rows) of 24 counterfactual scenarios (columns).
---
# Getting Probabilities
The model for this example is a *logistic regression*, which produces estimates in *log-odds* ( $ln(Odds(x))$ ).
We can convert these to probabilities based on two identities:
1. $Odds(x) = e^{ln(Odds(x))}$
2. $Pr(x) = \frac{Odds(x)}{(1 + Odds(x))}$
```{r}
sims_prob <- exp(sims_logodds) / (1 + exp(sims_logodds))
sims_prob[1:6, 1:6]
```
---
# A Quick Function
We are going to want to grab the mean and 95% confidence interval from our simulation estimates.
Here's a quick function to do it and make it pretty.
```{r}
extract_pe_ci <- function(x){
vals <- c(mean(x), quantile(x, probs=c(.025, .975)))
names(vals) <- c("PE", "LB", "UB")
return(vals)
}
```
This returns a length 3 vector with the following names:
* **PE** for *point estimate*
* **LB** for *lower bound* of the confidence interval
* **UB** for *upper bound*
---
# Prep for Plotting
First we extract our point estimates and confidence intervals by *applying* `extract_pe_ci()` to each column of estimated probabilities.
.small[
```{r}
estimated_pes <- as.data.frame( t(apply(sims_prob, 2, extract_pe_ci)))
```
]
Then I add columns describing the scenarios to color, group, and facet over based on the counterfactual values.
.small[
```{r}
estimated_pes$`Reporter` <- ifelse(cf_vals[,1]==1, "Any White", "All Black")
estimated_pes$`Target` <- ifelse(cf_vals[,2]==1, "Any Black", "All White")
estimated_pes$`Crime Type` <- ifelse(cf_vals[,3]==1, "Nuisance Crime", "Serious Crime")
estimated_pes$`Neighborhood` <- case_when(
cf_vals[,4]==1 ~ "Disadvantaged",
cf_vals[,5]==1 ~ "Changing",
TRUE ~ "Stable White")
```
]
---
# Final Tidy Data
.small[
```{r}
estimated_pes %>% mutate_if(is.numeric, round, digits=3) # round for display
```
]
---
# Plot Code
Finally we plot estimates (`PE`) as points with error bars (`UB`, `LB`) stratified on `Target` and `Reporter` and faceted by `Crime Type` and `Neighborhood`.
.smallish[
```{r, eval=FALSE}
ggplot(estimated_pes, aes(x = Target, y = PE, group = Reporter)) +
facet_grid(`Crime Type` ~ Neighborhood) +
geom_errorbar(aes(ymin = LB, ymax = UB),
position = position_dodge(width = .4),
size = 0.75, width = 0.15) +
geom_point(shape = 21, aes(fill = Reporter),
position = position_dodge(width = .4),
size = 2) +
scale_fill_manual("Reporter", values=c("Any White" = "white",
"All Black" = "black")) +
ggtitle("Figure 3. Probability of Arrest",
subtitle = "by Reporter and Target Race, Neighborhood and Crime Type") +
xlab("Race of Target") + ylab("Estimated Probability") +
theme_bw() + theme(legend.position = c(0.86,0.15),
legend.background = element_rect(color = 1))
```
]
---
# Plot
```{r, eval=TRUE, echo=FALSE, dev="svg", fig.height=5}
ggplot(estimated_pes, aes(x = Target, y = PE, group = Reporter)) +
facet_grid(`Crime Type` ~ Neighborhood) +
geom_errorbar(aes(ymin = LB, ymax = UB),
position = position_dodge(width = .4),
size = 0.75, width = 0.15) +
geom_point(shape = 21, position = position_dodge(width = .4),
size = 2, aes(fill = Reporter)) +
scale_fill_manual("Reporter", values=c("Any White"="white",
"All Black"="black")) +
ggtitle("Figure 3. Probability of Arrest",
subtitle = "by Reporter and Target Race, Neighborhood and Crime Type") +
xlab("Race of Target") + ylab("Estimated Probability") +
theme_bw() + theme(legend.position = c(0.86,0.15),
legend.background = element_rect(color = 1))
```
---
class: inverse
# Making Tables
---
# `pander` Regression Tables
We've used `pander` to create nice tables for dataframes. But `pander` has *methods*
to handle all sort of objects that you might want displayed nicely.
This includes
model output, such as from `lm()`, `glm()`, and `summary()`.
```{r pander}
library(pander)
```
```{r, include=FALSE}
panderOptions("table.style", "rmarkdown")
```
---
# `pander()` and `lm()`
You can send an `lm()` object straight to `pander`:
```{r, echo=TRUE, eval=FALSE}
pander(lm_1)
```
| | Estimate | Std. Error | t value | Pr(>t) |
|:----------------|:--------:|:----------:|:-------:|:---------:|
| **(Intercept)** | 37.23 | 1.599 | 23.28 | 2.565e-20 |
| **wt** | -3.878 | 0.6327 | -6.129 | 1.12e-06 |
| **hp** | -0.03177 | 0.00903 | -3.519 | 0.001451 |
Table: Fitting linear model: mpg ~ wt + hp
---
# `pander()` and `summary()`
You can do this with `summary()` as well, for added information:
```{r, eval=FALSE, echo=TRUE}
pander(summary(lm_1))
```
| | Estimate | Std. Error | t value | Pr(>t) |
|:----------------|:--------:|:----------:|:-------:|:---------:|
| **(Intercept)** | 37.23 | 1.599 | 23.28 | 2.565e-20 |
| **wt** | -3.878 | 0.6327 | -6.129 | 1.12e-06 |
| **hp** | -0.03177 | 0.00903 | -3.519 | 0.001451 |
| Observations | Residual Std. Error | $R^2$ | Adjusted $R^2$ |
|:------------:|:-------------------:|:------:|:--------------:|
| 32 | 2.593 | 0.8268 | 0.8148 |
Table: Fitting linear model: mpg ~ wt + hp
---
# `sjPlot`
`pander` tables are great for basic `rmarkdown` documents, but they're not
generally publication ready.
The `sjPlot` package produces `html` tables that look more like
those you may find in journal articles.
```{r table_packages}
library(sjPlot)
```
---
# `sjPlot` Tables
`tab_model()` will produce tables for most models.
```{r, eval=FALSE}
model_1 <- lm(mpg ~ wt, data = mtcars)
tab_model(model_1)
```
```{r, echo=FALSE, out.width="400px"}
# If you're seeing this, you are looking in my presentation files.
# I actually have to call on a saved image here because sjPlot
# doesn't display properly in .Rpres slides for some reason.
knitr::include_graphics("img/sjPlot_table.PNG")
```
---
# Multi-Model Tables with `sjTable`
Often in journal articles you will see a single table that compares multiple models.
Typically, authors will start with a simple model on the left, then add variables,
until they have their most complex model on the right.
The `sjPlot` package makes this easy to do: just give `tab_model()` more models!
---
# Multiple `tab_model()`
```{r, eval=FALSE}
model_2 <- lm(mpg ~ hp + wt, data = mtcars)
model_3 <- lm(mpg ~ hp + wt + factor(am), data = mtcars)
tab_model(model_1, model_2, model_3)
```
```{r, echo=FALSE, out.width = "1280px"}
# If you're seeing this, you are looking in my presentation files.
# I actually have to call on a saved image here because sjPlot
# doesn't display properly in .Rpres slides for some reason.
knitr::include_graphics("img/sjPlot_mtable.PNG")
```
---
# `sjPlot` does a lot more
The `sjPlot` package does *a lot* more than just make pretty tables. It is a rabbit hole
of *incredibly* powerful and useful functions for displaying descriptive and inferential results.
View the [package website](http://www.strengejacke.de/sjPlot/) for extensive documentation.
`sjPlot` is a bit more complicated than `ggeffects` but can do just about everything
it can do as well; they were written by the same author!
`sjPlot` is fairly new but offers a fairly comprehensive solution for `ggplot`
based publication-ready social science data visualization. All graphical functions in
`sjPlot` are based on `ggplot2`, so it should not take terribly long to figure out.
---
# `sjPlot` Example: Likert plots
```{r, echo=FALSE, out.width = "600px"}
knitr::include_graphics("img/sjPlot_likert.PNG")
```
---
# `sjPlot` Example: Crosstabs
```{r, echo=FALSE, out.width = "500px"}
knitr::include_graphics("img/sjPlot_crosstab.PNG")
```
---
# LaTeX Tables
For tables in $\LaTeX$—as is needed for `.pdf` files—I recommend looking into the `gt`, `stargazer`, or `kableExtra` packages.
--
`gt` and `kableExtra` allow the construction of complex tables in either HTML or $\LaTeX$ using
additive syntax similar to `ggplot2` and `dplyr`.
`stargazer` produces nicely formatted $\LaTeX$ tables but is idiosyncratic.
--
If you want to edit $\LaTeX$ documents, you can do it in R using Sweave documents (.Rnw).
Alternatively, you may want to work in a dedicated $\LaTeX$ editor. I recommend [Overleaf](http://www.overleaf.com)
for this purpose.
--
RMarkdown has support for a fair amount of basic $\LaTeX$ syntax if you aren't trying to
get too fancy!
--
Another approach I have used is to manually format $\LaTeX$ tables but use in-line R calls to
fill in the values dynamically. This gets you the *exact* format you want but without
forcing you to update values any time something changes.
---
# Bonus: `corrplot`
The `corrplot` package has functions for displaying correlograms.
These make visualizing the correlations between variables in a data set easier.
The first argument is a call to `cor()`, the base R function for generating a correlation matrix.
[See the vignette for customization options.](https://cran.r-project.org/web/packages/corrplot/vignettes/corrplot-intro.html)
```{r, message=FALSE, warning=FALSE, eval=FALSE}
library(corrplot)
corrplot(
cor(mtcars),
addCoef.col = "white",
addCoefasPercent=T,
type="upper",
order="AOE")
```
---
## Correlogram
```{r, message=FALSE, warning=FALSE, eval=TRUE, echo=FALSE, dev="svg", fig.height=5}
library(corrplot)
corrplot(cor(mtcars), addCoef.col = "white", addCoefasPercent=T, type="upper", order="FPC")
```
---
class: inverse
# Wrapping up the Course
---
# What You've Learned
A lot!
* How to get data into R from a variety of formats
* How to do "data custodian" work to manipulate and clean data
* How to make pretty visualizations
* How to automate with loops and functions
* How to combine text, calculations, plots, and tables into dynamic R Markdown reports
* How to acquire and work with spatial data
---
# What Comes Next?
* Statistical inference (e.g. more CSSS courses)
+ Functions for hypothesis testing, hierarchical/mixed effect models, machine learning, survey design, etc. are straightforward to use... once data are clean
+ Access output by working with list structures (like from regression models) or using `broom` and `ggeffects`
* Practice, practice, practice!
+ Replicate analyses you've done in Excel, SPSS, or Stata
+ Think about data using `dplyr` verbs, tidy data principles
+ R Markdown for reproducibility
* More advanced projects
+ Using version control (git) in RStudio
+ Interactive Shiny web apps
+ Write your own functions and put them in a package
---
# Course Plugs
If you...
* have no stats background yet - **SOC504: Applied Social Statistics**
* want to learn more social science computing - **SOC590: Big Data and Population Processes** ^{1}
* have (only) finished SOC506 - **CSSS510: Maximum Likelihood**
* want to master visualization - **CSSS569: Visualizing Data**
* study events or durations - **CSSS544: Event History Analysis** ^{2}
* want to use network data - **CSSS567: Social Network Analysis**
* want to work with spatial data - **CSSS554: Spatial Statistics**
* want to work with time series - **CSSS512: Time Series and Panel Data**
.footnote[
[1] We're hoping to offer that again soon!

[2] Also a great maximum likelihood introduction.
]
---
class: inverse
# Thank you!