Poisson regression is a generalized linear model for count data with an equal mean and variance. Coefficients are reported in log-counts. These examples assume `y_count`

is a count outcome.

There are two options for fitting a Poisson model in Stata: `glm`

and `poisson`

.

```
glm y_count x z, family(poisson) link(log)
poisson y_count x z
```

Results are equivalent but `poisson`

offers more postestimation commands.

`glm()`

with `family = poisson(link = "log")`

fits a Poisson regression.

```
example_pois <- glm(y_count ~ x + z,
family = poisson(link = "log"),
data = example_data)
summary(example_pois)
```

`summary()`

produces summary output with standard errors.

Negative binomial regression is a generalized linear model for count data with a variance which is a multiplicative factor of the mean (it is overdispersed). Coefficients are reported in log-counts. These examples assume `y_count`

is a count outcome.

We fit a negative binomial regression in Stata using `nbreg`

.

`nbreg y_count x z`

The most commonly used negative binomial function in R is in the `MASS`

package. I recommend you *do not load the MASS library*, but rather just use `MASS::glm.nb`

to call it right out of `MASS`

. This is because `MASS`

has a function called `select()`

in it which will mask over the `dplyr`

one.

```
example_nb <- MASS::glm.nb(y_count ~ x + z,
data = example_data)
summary(example_nb)
```

`summary()`

produces summary output with standard errors.

Zero-inflated Poisson regression is a generalized linear model for count data with an equal mean and variance but a greater number of zeroes than normal. These zeroes may arise from a different process than the counts: some variables may predict absence of counts while others predict levels if a count is possible. Coefficients are reported in log-counts. These examples assume `y_count`

is a count outcome.

`zip`

is used to fit a Zero-Inflated Poisson in Stata. The variables in `inflate()`

are used to predict zero counts.

`zip y_count x z, inflate(z)`

The most commonly used Zero-Inflated Poisson model for R is `zeroinfl`

found in the `pscl`

package. Variables after the bar `|`

are used to predict zero counts. Note this command only works if there are some zero counts.

```
example_zip <-
pscl::zeroinfl(y_count ~ x + z | z,
data = example_data)
summary(example_zip)
```

`summary()`

produces summary output with standard errors.

Zero-inflated Negative Binomial regression is a generalized linear model for overdispersed count data with a greater number of zeroes than normal. These zeroes may arise from a different process than the counts: some variables may predict absence of counts while others predict levels if a count is possible. Coefficients are reported in log-counts. These examples assume `y_count`

is a count outcome.

`zinb`

is used to fit a Zero-Inflated Negative Binomial in Stata. The variables in `inflate()`

are used to predict zero counts.

`zip y_count x z, inflate(z)`

You can fit Zero-Inflated Negative Binomial models in R using the same `zeroinfl`

function for ZI Poisson models found in the `pscl`

package. Just add an argument `dist = "negbin"`

. Variables after the bar `|`

are used to predict zero counts. Note this command only works if there are some zero counts.

```
example_zinb <-
pscl::zeroinfl(y_count ~ x + z | z,
data = example_data,
dist = "negbin")
summary(example_zinb)
```

`summary()`

produces summary output with standard errors.

We might want the estimated probability of observing a given count for particular values of covariates. These example use the poisson output but should work equivalently for the negative binomial (at least in R).

`prvalue`

is a postestimation command, so it must be run immediately after a model. It requires using `poisson`

instead of `glm`

. `prvalue`

will return the predicted rate (mean) count as well as probability of observing particular counts.

```
poisson y_count x z
prvalue, x(x=2 z=0) maxcnt(8)
```

`x()`

is used to set values of covariates and `maxcnt()`

sets the maximum count for which a probability is returned.

We could also plot the observed distribution of counts versus the predicted probabilities.

```
poisson y_count x z
prcounts counts, max(8) plot
graph twoway (connected countobeq countval) || (connected countpreq countval)
```

This produces a dashed predicted distribution and a solid true distribution from the data.

In R, we can do this with `predict()`

and the Poisson probability density function `dpois()`

.

```
pred_pois <- predict(example_pois, type="response",
newdata=data.frame(x=2, z=2))
count_range <- 0:10
prob_pois <- dpois(count_range, pred_pois)
barplot(prob_pois, names = count_range)
```

If we want to compare the predicted distribution to the actual distribution, we could use histograms.

```
library(ggplot2)
ggplot() +
geom_histogram(
aes(x=example_data$y_count, stat(density)),
bins=8, alpha=0.5, fill="blue") +
geom_histogram(
aes(x=fitted(example_pois), stat(density)),
bins=8, alpha=0.5, fill="red")
```

Or we could use densities, though note that densities around 0 will be lower than the true probability.

```
plot(density(fitted(example_pois)),
lty="dashed", col="red")
lines(density(example_data$y_count), col = "blue" )
```

We might be interested in plotting predicted values across ranges of covariates.

We can use `prgen`

to get predicted values given some fixed covariates and multiple values of another covariate.

```
poisson y_count x z
prgen z, x(x=0) from(-2) to (4) generate(zc) n(20)
graph twoway (connected zcp0 zcx) || (connected zcp1 zcx) || (connected zcp2 zcx)
```

This produces predicted probabilities of observing a given count (`zcp0`

is count=0, `zcp2`

is count=2) across values of `z`

.

Mimicking Stata’s `prgen`

-style plot in Base R is a bit bulky. See in-line comments. First, I’ve generated example data here to help:

```
mean_vec <- c("x" = 1.0, "y" = 2.0, "z" = 3.0)
cov_mat <- rbind(c(1.0, .75, 1.0),
c(.75, 1.5, 0.0),
c(1.0, 0.0, 2.0))
example_data <- data.frame(
MASS::mvrnorm(300,
mu = mean_vec,
Sigma = cov_mat,
empirical = TRUE))
example_data$y_count <-
with(example_data, rpois(300, 10+x+z))
```

Setting the values we want to work with and predicting counts:

```
n_vals <- 20 # number of x-values to plot over
pr_counts <- c(9, 12, 15, 18) # counts to get probs for
# z_range sets the range of a covariate to plot over
# In seq() the first value is a minimum, second is max
z_range <- round(seq(0, 6, length.out=n_vals),2)
example_pois <- glm(y_count ~ x + z,
family = poisson(link = "log"),
data = example_data)
# generate predicted means
lambdas <-
predict(example_pois, type="response",
newdata = data.frame(x = 0, z = z_range))
```

Then plotting…

First, the `ggplot2`

and `dplyr`

way to do this:

```
library(dplyr)
library(ggplot2)
data.frame(Count = rep(pr_counts, each=n_vals),
Z = rep(z_range, length(pr_counts)),
lambda = rep(lambdas, length(pr_counts))) %>%
mutate(Probability = dpois(Count, lambda),
Count = factor(Count)) %>%
ggplot(aes(x = Z, y = Probability, color = Count)) +
geom_line()
```

Then the hideous base R way to do this:

```
# generate a matrix of predicted probabilities
# rownames are value of z, colnames # of counts
pr_values <-
matrix(dpois(rep(pr_counts, each=n_vals),
rep(lambdas, length(pr_counts))),
ncol=length(pr_counts), byrow=FALSE,
dimnames = list(z_range, pr_counts))
# plot the matrix
# plotted number is the count
# y-axis is probability of that count
# x-axis is value of z
matplot(z_range, pr_values,
pch = pr_counts)
legend(x = min(z_range), y = max(pr_values),
legend = pr_counts, pch = pr_counts)
```

Another potentially useful plot is just the predicted counts across levels of covariates:

```
library(ggeffects)
plot(ggpredict(example_pois, terms="x"))
```