Plotting

Scatterplots

Stata

scatter y x

R

Base R scatterplots use plot() and can take either a formula or x and y arguments:

plot(example_data$y ~ example_data$x)
plot(example_data$x, example_data$y)

gplot2 uses geom_point():

ggplot(example_data, aes(x, y)) + 
  geom_point()

Fit Lines on Scatterplots

Often we want to plot regression lines on our scatterplots.

Stata

We can just draw both the scatter and a linear fit (lfit) at the same time:

graph twoway ((scatter y x) || (lfit y x))

R

In base R, we draw the scatterplot and add a linear line (abline()):

plot(example_data$y ~ example_data$x)
abline(reg=glm(y ~ x, data = example_data))

gplot2 adds a line with geom_smooth():

ggplot(example_data, aes(x, y)) + 
  geom_point() + 
  geom_smooth(method="glm", formula= y ~ x)

By default geom_smooth() includes a 95% confidence interval around the line.

Quadratic Lines on Scatterplots

We may want to put a nonlinear fit line on our plot. This is easy in both Stata and R.

Stata

qfit generates a quadratic fit line.

graph twoway ((scatter y x) || (qfit y x))

R

Quadratic fits are slightly awkward in base R:

plot(example_data$y ~ example_data$x)
curve(predict(glm(y~x, data=example_data), 
  newdata=data.frame(wt=x)), add=T)

They use the same syntax as linear fits in gplot2:

ggplot(example_data, aes(x, y)) + 
  geom_point() + 
  geom_smooth(method="glm", formula= y ~ poly(x,2))

Plotting Predictions

Stata

Plotting predictions from a model at varying levels of two variables requires multiple prgen (from spost9_ado) calls:

logit x_d1 y z
prgen y,from(0) to(8) generate(predval_a) n(30) x(z=-1)
prgen y,from(0) to(8) generate(predval_b) n(30) x(z=0)
prgen y,from(0) to(8) generate(predval_c) n(30) x(z=1)

graph twoway (line predval_ap1 predval_ax  || line predval_bp1 predval_bx || line predval_cp1 predval_cx )

Note prgen works only with logit and not glm

Or using margins and marginsplot

glm x_d1 y z, family(binomial) link(logit)
margins, at(y=(0(1)8) z=-1) ///
         at(y=(0(1)8) z=0) ///
         at(y=(0(1)8) z=1)
marginsplot

margins and marginsplot work with glm or logit

R

ggplot2 is well suited to making complex predicted value plots. Here we generate a range of y values from 0 to 8 in increments of 0.25. We do this for each value of z: -1, 0, and 1.

pr_data <- data.frame(y = rep(seq(0,8,0.25), each=3),
                      z = c(-1,0,1))
pr_data$predicted <- 
   predict(example_model2, newdata = pr_data,
   type = "response")
ggplot(pr_data, aes(y=predicted, x=y, group=z, col=z)) + 
   geom_line()

ggeffects streamlines this process by automatically generating the data and predictions:

library(ggeffects)
example_model2 %>% 
   ggpredict(terms=c("y [n=30]","z")) %>% 
   plot()

The n=30 in terms forces ggpredict() to generate 30 values of y: this makes the lines smoother curves.

Heteroskedasticity

Residual Plots

Stata

glm y x z
predict yres2, residual
predict yhat, xb
scatter yres2 yhat

R

Simple base R method:

plot(residuals(example_model)~fitted(example_model))
plot(residuals(example_model)^2~fitted(example_model))

broom, dplyr, and ggplot2 method:

example_model %>% 
  broom::augment() %>% 
  ggplot(aes(y=.resid, x=.fitted)) + 
  geom_point()