scatter y x
Base R scatterplots use plot()
and can take either a formula or x
and y
arguments:
plot(example_data$y ~ example_data$x)
plot(example_data$x, example_data$y)
gplot2
uses geom_point()
:
ggplot(example_data, aes(x, y)) +
geom_point()
Often we want to plot regression lines on our scatterplots.
We can just draw both the scatter and a linear fit (lfit
) at the same time:
graph twoway ((scatter y x) || (lfit y x))
In base R, we draw the scatterplot and add a linear line (abline()
):
plot(example_data$y ~ example_data$x)
abline(reg=glm(y ~ x, data = example_data))
gplot2
adds a line with geom_smooth()
:
ggplot(example_data, aes(x, y)) +
geom_point() +
geom_smooth(method="glm", formula= y ~ x)
By default geom_smooth()
includes a 95% confidence interval around the line.
We may want to put a nonlinear fit line on our plot. This is easy in both Stata and R.
qfit
generates a quadratic fit line.
graph twoway ((scatter y x) || (qfit y x))
Quadratic fits are slightly awkward in base R:
plot(example_data$y ~ example_data$x)
curve(predict(glm(y~x, data=example_data),
newdata=data.frame(wt=x)), add=T)
They use the same syntax as linear fits in gplot2
:
ggplot(example_data, aes(x, y)) +
geom_point() +
geom_smooth(method="glm", formula= y ~ poly(x,2))
Plotting predictions from a model at varying levels of two variables requires multiple prgen
(from spost9_ado
) calls:
logit x_d1 y z
prgen y,from(0) to(8) generate(predval_a) n(30) x(z=-1)
prgen y,from(0) to(8) generate(predval_b) n(30) x(z=0)
prgen y,from(0) to(8) generate(predval_c) n(30) x(z=1)
graph twoway (line predval_ap1 predval_ax || line predval_bp1 predval_bx || line predval_cp1 predval_cx )
Note prgen
works only with logit
and not glm
Or using margins
and marginsplot
glm x_d1 y z, family(binomial) link(logit)
margins, at(y=(0(1)8) z=-1) ///
at(y=(0(1)8) z=0) ///
at(y=(0(1)8) z=1)
marginsplot
margins
and marginsplot
work with glm
or logit
ggplot2
is well suited to making complex predicted value plots. Here we generate a range of y
values from 0 to 8 in increments of 0.25. We do this for each value of z: -1, 0, and 1.
pr_data <- data.frame(y = rep(seq(0,8,0.25), each=3),
z = c(-1,0,1))
pr_data$predicted <-
predict(example_model2, newdata = pr_data,
type = "response")
ggplot(pr_data, aes(y=predicted, x=y, group=z, col=z)) +
geom_line()
ggeffects
streamlines this process by automatically generating the data and predictions:
library(ggeffects)
example_model2 %>%
ggpredict(terms=c("y [n=30]","z")) %>%
plot()
The n=30
in terms forces ggpredict()
to generate 30 values of y
: this makes the lines smoother curves.
glm y x z
predict yres2, residual
predict yhat, xb
scatter yres2 yhat
Simple base R method:
plot(residuals(example_model)~fitted(example_model))
plot(residuals(example_model)^2~fitted(example_model))
broom
, dplyr
, and ggplot2
method:
example_model %>%
broom::augment() %>%
ggplot(aes(y=.resid, x=.fitted)) +
geom_point()