Modeling commands are very similar between R and Stata. The primary difference is that R argument go inside paranthesis rather than just after the name of the command as in Stata. Stata options come after a comma and are separate from formulae, but all R arguments–including formulae–are treated the same and separated by commas. Default R output is simpler than Stata: use summary()
if you want Stata-like output. In Stata, you save model output using estimates store
while in R you just assign the model object to a name. Note we specified which data we are using in R: R can have many data sets loaded at the same time!
glm y x z, ///
family(gaussian) link(identity)
estimates store example_model
Note in the above examples, a single Stata command can be spread over multiple lines using ///
.
example_model <-
glm(y ~ x + z,
family = gaussian(link = "identity"),
data = example_data)
summary(example_model)
In R, all commands can span multiple lines so long as each line (other than the last) ends in an operator (e.g. ,
as above, but also +
like in ggplot2
calls).
If we assume x
is a treatment and time
is a dummy indicating before and after, the linear model diff-in-diff estimators look as follows. These would be identical using GLM instead.
gen x_time = x*time
regress y x time x_time
example_did <- lm(y ~ x + time + x*time,
data = example_data)
summary(example_did)
In Stata, you can set your fixed effects IDs first using xtset
then when you run a regression on the data, use the fe
option.
xtset id
xtreg y x z, fe
You can run fixed effects using dummies for each unit. If id
is a factor, it will create K-1 dummies in your regression, leading to a fixed effects model.
lm(y ~ x + z + id, data = your_data)
You can also run fixed effects using the lfe
package. Here id
comes after |
. Everything after the first |
is treated as a fixed effect indicator. Additional bars can be added to specify intstrumental variables and clustered standard errors.
library(lfe)
felm(y ~ x + z | id, data = example_data)
If you want to fit a fixed effects panel model in the econometric style, as described in class, you might use the plm
package. You’ll want to use this if you want to run a Hausman test between fixed and random effects. Note you’ll need to specify indices: A variable that uniquely identifies the groups (here, "id"
) and that identifies the time or unique observations within groups (here, "time"
).
library(plm)
ex_plm_fe <- plm(y ~ z, index = c("id","time"),
model = "within",
data = example_data)
Note model = "twoway"
will give fixed effects on both indices. model = "random"
will give you random effects. Also, note that if you want to test constraints on a plm()
model–such as comparing nested models–plm()
uses Generalized Least Squares, not maximum likelihood, to fit models so you cannot use a likelihood ratio test. Use lmtest::waldtest()
instead.
In Stata, random effects models use the same base syntax as fixed effects, except with an re
option.
xtset id
xtreg y x z, re
In R, random effects models are usually conducted using the lme4
package. The formula has two parts: the main formula (y ~ x + z
) and the random effects ((1|id)
). Note it will report normal parameters as “fixed effects” because coefficients which are the same for all units are called fixed effects outside of econometrics. This can be confusing! They’re just normal coefficients.
library(lme4)
lmer(y ~ x + z + (1|id), data = example_data)
The number 1
in front of id
here represents a constant. If you take a more advanced course on hierarchical models, youll learn about having random slopes which you can fit by replacing that 1
with variables.
You can also fit random effects models in the econometric style using plm
. This is the easiest approach if you want to run a Hausman test–outside of econometrics, Hausman tests are a bit uncommon these days.
library(plm)
ex_plm_re <- plm(y ~ z, index = c("id","time"),
model = "random",
effect = "individual",
data = example_data)
Note you should specifcy effect = "individual"
in the random effects model so that random effects are applied to the id
variable but no random or fixed effects are applied to time
.
Hausman tests allow you to compare the econometric fixed effects model (i.e. dummies for each unit) to the random effects model (i.e. random intercepts for each unit) or a basic model (no dummies or random intercepts).
To do the Hausman test in Stata, you need to store your fied effects model and your random effects model, then compare them with hausman
.
xtreg y z, fe
estimates store fixed_model
xtreg y z, re
estimates store random_model
hausman fixed_model random_model
The easiest Hausman test to use in R is the one built into plm
. Like for Stata, you’ll fit both models, then give them to phtest()
.
library(plm)
ex_plm_fe <- plm(y ~ z, index = c("id","time"),
model = "within",
data = example_data)
ex_plm_re <- plm(y ~ z, index = c("id","time"),
model = "random",
effect = "individual",
data = example_data)
phtest(ex_plm_fe, ex_plm_re)