scatter y x
Base R scatterplots use plot() and can take either a formula or x and y arguments:
plot(example_data$y ~ example_data$x)
plot(example_data$x, example_data$y)
gplot2 uses geom_point():
ggplot(example_data, aes(x, y)) +
geom_point()
Often we want to plot regression lines on our scatterplots.
We can just draw both the scatter and a linear fit (lfit) at the same time:
graph twoway ((scatter y x) || (lfit y x))
In base R, we draw the scatterplot and add a linear line (abline()):
plot(example_data$y ~ example_data$x)
abline(reg=glm(y ~ x, data = example_data))
gplot2 adds a line with geom_smooth():
ggplot(example_data, aes(x, y)) +
geom_point() +
geom_smooth(method="glm", formula= y ~ x)
By default geom_smooth() includes a 95% confidence interval around the line.
We may want to put a nonlinear fit line on our plot. This is easy in both Stata and R.
qfit generates a quadratic fit line.
graph twoway ((scatter y x) || (qfit y x))
Quadratic fits are slightly awkward in base R:
plot(example_data$y ~ example_data$x)
curve(predict(glm(y~x, data=example_data),
newdata=data.frame(wt=x)), add=T)
They use the same syntax as linear fits in gplot2:
ggplot(example_data, aes(x, y)) +
geom_point() +
geom_smooth(method="glm", formula= y ~ poly(x,2))
Plotting predictions from a model at varying levels of two variables requires multiple prgen (from spost9_ado) calls:
logit x_d1 y z
prgen y,from(0) to(8) generate(predval_a) n(30) x(z=-1)
prgen y,from(0) to(8) generate(predval_b) n(30) x(z=0)
prgen y,from(0) to(8) generate(predval_c) n(30) x(z=1)
graph twoway (line predval_ap1 predval_ax || line predval_bp1 predval_bx || line predval_cp1 predval_cx )
Note prgen works only with logit and not glm
Or using margins and marginsplot
glm x_d1 y z, family(binomial) link(logit)
margins, at(y=(0(1)8) z=-1) ///
at(y=(0(1)8) z=0) ///
at(y=(0(1)8) z=1)
marginsplot
margins and marginsplot work with glm or logit
ggplot2 is well suited to making complex predicted value plots. Here we generate a range of y values from 0 to 8 in increments of 0.25. We do this for each value of z: -1, 0, and 1.
pr_data <- data.frame(y = rep(seq(0,8,0.25), each=3),
z = c(-1,0,1))
pr_data$predicted <-
predict(example_model2, newdata = pr_data,
type = "response")
ggplot(pr_data, aes(y=predicted, x=y, group=z, col=z)) +
geom_line()
ggeffects streamlines this process by automatically generating the data and predictions:
library(ggeffects)
example_model2 %>%
ggpredict(terms=c("y [n=30]","z")) %>%
plot()
The n=30 in terms forces ggpredict() to generate 30 values of y: this makes the lines smoother curves.
glm y x z
predict yres2, residual
predict yhat, xb
scatter yres2 yhat
Simple base R method:
plot(residuals(example_model)~fitted(example_model))
plot(residuals(example_model)^2~fitted(example_model))
broom, dplyr, and ggplot2 method:
example_model %>%
broom::augment() %>%
ggplot(aes(y=.resid, x=.fitted)) +
geom_point()