Linear hypotheses are used to test restrictions on models (e.g. that a coefficient is zero or equal to another).
test x=0
test x=z
test x=z=0
Like most postestimation commands in Stata, linear hypotheses (with test
) must be conducted immediately after running the model of interest.
car::linearHypothesis(example_model, "x=0")
car::linearHypothesis(example_model, "x=z")
car::linearHypothesis(example_model, c("x=0", "z=0"))
In R, they must be run using a saved model object (like example_model
created above). In R, the linearHypothesis()
function is found in the car
package. You can obtain this package using install.packages("car")
. You can then either use library(car)
to load the package or run linearHypothesis()
from car directly–without loading the package–using car::linearHypothesis()
as below.
We often want to obtain residuals to check model fit or produce plots where we have controlled for covariates.
predict residual_y_xz, deviance // GLM uses deviance
predict residual_y_xz, residual // OLS uses residual
Obtaining residuals is a postestimation command in Stata, which must be run immediately after a model.
residual_y_xz <- residuals(example_model)
example_data$residual_y_xz <- residuals(example_model)
In R, we run the command on a saved model object. If you want to save the residuals to the original data, you can just assign them as a column.
This section is sparse but low priority—let me know if additions are needed.
Stata uses lrtest
for likelihood ratio tests.
lrtest model1 model2
This will conduct a likelihood ratio test comparing model1
to model2
.
The simplest likelihood ratio test in R is found in the lmtest
package.
lmtest::lrtest(model1, model2)
This will conduct a likelihood ratio test comparing model1
to model2
.
Stata uses the postestimation command estat ic
to obtain BIC values.
estat ic
BIC(example_model)
Note you can give BIC()
multiple models and it will create a table of BIC values.
BIC(model1, model2)
We can run tests manually.
glm y x z
predict yres2, residual
glm yres2 x z family(gauss) link(identity)
test x=z=0
glm yres2 yhat yhat2
Or use hettest
after an OLS run using regress
.
reg y x z
hettest
We can do this manually:
example_data$yres2 <- residuals(example_model)^2
glm(yres2 ~ x + z, data=example_data)
example_data$yhat <- fitted(example_model)
example_data$yhat2 <- fitted(example_model)^2
glm(yres2 ~ yhat + yhat2, data=example_data)
Or lmtest
contains a Breusch-Pagan test:
lmtest::bptest(example_model)
glm y x z, family(gauss) link(identity) vce(robust)
sjstats
has a simple wrapper:
sjstats::robust(example_model)
Or we can do it with lmtest
:
lmtest::coeftest(example_model, vcov = vcovHC)
glm y x z, family(gauss) link(identity) vce(bootstrap, reps(1000))
The car
package provides a shortcut to boot
for easy bootstrapped SEs.
car::Boot(glm(y ~ x + z, data=example_data))
We can also use boot
directly:
library(boot)
boot_glm <- function(d,indices) {
d <- d[indices,]
fit <- glm(y ~ x + z, data = d)
return(coef(fit))
}
boot(data = example_data,
statistic = boot_glm,
R = 1000)
Or we do it manually using modern tidyverse commands:
library(dplyr)
library(broom)
library(rsample)
library(tidyr)
library(purrr)
example_data %>%
bootstraps(times=1000) %>%
mutate(model = map(splits,
function(x) glm(y ~ x + z, data=x)),
coef_info = map(model, tidy)) %>%
unnest(coef_info) %>%
group_by(term) %>%
summarize(pe = mean(estimate),
se = sd(estimate),
low = quantile(estimate, .025),
high = quantile(estimate, .975))
We’ll create a variable to indicate the case number, then run avplot
.
generate case=_n
avplots, mlabel(case)
car
has an avPlots()
function:
car::avPlots(example_model)
Or you can do it manually:
example_data$case <- 1:nrow(example_data)
res_x_z <- residuals(glm(x~z, data=example_data))
res_y_z <- residuals(glm(y~z, data=example_data))
plot(res_y_z~res_x_z)
text(res_x_z, res_y_z, labels=example_data$case)
abline(lm(res_y_z~res_x_z), col="blue")
predict leverage, hat
hatvalues()
produces… hat values.
hatvalues(example_model)
augment()
adds values to the data used to fit the model: .hat
is the diagonal of the hat matrix.
broom::augment(example_model)
predict studentres, rstudent
rstudent(example_model)
predict dfits, dfits
dffits()
gets just the DFFITS values.
dffits(example_model)
influence.measures()
gets DFFITS and other influence measures.
influence.measures(example_model)
predict cooksd, cooksd
For just Cook’s D values:
cookds.distance(example_model)
Or augment()
can return the data with Cook’s D added for each row.
broom::augment(example_model)
dfbeta
For DFBetas alone:
dfbetas(example_model)
Or DFBetas with other statistics:
influence.measures(example_model)
To produce variance inflation factors, just run vif
after the model.
vif
car
has a function for VIFs:
car::vif(example_model)
Stata has individual functions for these but no comparable function for multiple diagnostic plots. You’d normally want to run them individually anyway.
Just running plot()
on an R model will produce a series of diagnostic plots. You must hit ENTER
in the console to browse through them.
plot(example_model)
This produces the following plots (in order):