This section includes technical information on the models currently implemented in Zelig (5.0-1). This includes a reference with a list of supported models as well as individual model vignettes with detailed information on the model, quantities of interest and syntax.

The following models are currently supported in Zelig 5.0-1:

*Exponential Regression*:`zexp$new()`*Gamma Regression*:`zgamma()`*Logistic Regression*:`zlogit$new()`*Log Normal Regression*:`zlognorm$new()`*Least Squares Regression*:`zls$new()`*Negative Binomial Regression*:`zbinom$new()`*Normal Regression*:`znormal$new()`*Poisson Regression*:`zpoisson$new()`*Probit Regression*:`zprobit$new()`*Rare Events Logistic Regression*:`zrelogit$new()`*Tobit Regression*:`ztobit$new()`*Bayesian Factor Analysis*:`zfactorbayes$new()`*Bayesian Multinomial Logistic Regression*:`zmlogitbayes$new()`

Exponential Regression for Duration Dependent Variables

Use the exponential duration regression model if you have a dependent variable representing a duration (time until an event). The model assumes a constant hazard rate for all events. The dependent variable may be censored (for observations have not yet been completed when data were collected).

With reference classes:

```
z5 <- zexp$new()
z5$zelig(Surv(Y, C) ~ X, data = mydata)
z5$setx()
z5$sim()
```

With the Zelig 4 compatibility wrappers:

```
z.out <- zelig(Surv(Y, C) ~ X, model = "exp", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)
```

Exponential models require that the dependent variable be in the form
Surv(Y, C), where Y and C are vectors of length . For each
observation in 1, …, , the value is the
duration (lifetime, for example), and the associated is a
binary variable such that if the duration is not
censored (*e.g.*, the subject dies during the study) or
if the duration is censored (*e.g.*, the subject is still alive at the
end of the study and is know to live at least as long as ).
If is omitted, all Y are assumed to be completed; that is,
time defaults to 1 for all observations.

In addition to the standard inputs, zelig() takes the following additional options for exponential regression:

- robust: defaults to FALSE. If TRUE, zelig() computes robust standard errors based on sandwich estimators (see and ) and the options selected in cluster.
- cluster: if robust = TRUE, you may select a variable to define groups of correlated observations. Let x3 be a variable that consists of either discrete numeric values, character strings, or factors that define strata. Then

```
z.out <- zelig(y ~ x1 + x2, robust = TRUE, cluster = "x3",
model = "exp", data = mydata)
```

means that the observations can be correlated within the strata defined by the variable x3, and that robust standard errors should be calculated according to those clusters. If robust = TRUE but cluster is not specified, zelig() assumes that each observation falls into its own cluster.

Attach the sample data:

```
data(coalition)
```

Estimate the model:

```
z.out <- zelig(Surv(duration, ciep12) ~ fract + numst2, model = "exp", data = coalition)
```

```
## How to cite this model in Zelig:
## Olivia Lau, Kosuke Imai, Gary King. 2011.
## exp: Exponential Regression for Duration Dependent Variables
## in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
## http://zeligproject.org/
```

View the regression output:

```
summary(z.out)
```

```
## Model:
## $by
## [1] 1
##
## Call:
## survival::survreg(formula = Surv(duration, ciep12) ~ fract +
## numst2, data = ., dist = "exponential", model = FALSE)
##
## Coefficients:
## (Intercept) fract numst2
## 5.535873 -0.003909 0.461179
##
## Scale fixed at 1
##
## Loglik(model)= -1077 Loglik(intercept only)= -1101
## Chisq= 46.66 on 2 degrees of freedom, p= 7.4e-11
## n= 314
## Next step: Use 'setx' method
```

Set the baseline values (with the ruling coalition in the minority) and the alternative values (with the ruling coalition in the majority) for X:

```
x.low <- setx(z.out, numst2 = 0)
```

```
## Error: error in evaluating the argument 'x' in selecting a method for function 'terms': Error in Ops.Surv(y, z$residuals) : Invalid operation on a survival time
## Calls: lm -> lm.fit -> Ops.Surv
```

```
x.high <- setx(z.out, numst2 = 1)
```

```
## Error: error in evaluating the argument 'x' in selecting a method for function 'terms': Error in Ops.Surv(y, z$residuals) : Invalid operation on a survival time
## Calls: lm -> lm.fit -> Ops.Surv
```

Simulate expected values and first differences:

```
s.out <- sim(z.out, x = x.low, x1 = x.high)
```

```
## Error: object 'x.low' not found
```

Summarize quantities of interest and produce some plots:

```
summary(s.out)
```

```
## Error: error in evaluating the argument 'object' in selecting a method for function 'summary': Error: object 's.out' not found
```

```
plot(s.out)
```

```
## Error: error in evaluating the argument 'x' in selecting a method for function 'plot': Error: object 's.out' not found
```

Let be the survival time for observation . This variable might be censored for some observations at a fixed time such that the fully observed dependent variable, , is defined as

y_c \\ \end{array} \right."/>

The

*stochastic component*is described by the distribution of the partially observed variable . We assume follows the exponential distribution whose density function is given byfor and 0"/>. The mean of this distribution is .

In addition, survival models like the exponential have three additional properties. The hazard function measures the probability of not surviving past time given survival up to . In general, the hazard function is equal to where the survival function represents the fraction still surviving at time . The cumulative hazard function describes the probability of dying before time . In general, . In the case of the exponential model,

For the exponential model, the hazard function is constant over time. The Weibull model and lognormal models allow the hazard function to vary as a function of elapsed time (see and respectively).

The

*systematic component*is modeled aswhere is the vector of explanatory variables, and is the vector of coefficients.

The expected values (qi$ev) for the exponential model are simulations of the expected duration given and draws of from its posterior,

The predicted values (qi$pr) are draws from the exponential distribution with rate equal to the expected value.

The first difference (or difference in expected values, qi$ev.diff), is

where and are different vectors of values for the explanatory variables.

In conditional prediction models, the average expected treatment effect (att.ev) for the treatment group is

where is a binary explanatory variable defining the treatment () and control () groups. When is censored rather than observed, we replace it with a simulation from the model given available knowledge of the censoring process. Variation in the simulations is due to two factors: uncertainty in the imputation process for censored and uncertainty in simulating , the counterfactual expected value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

In conditional prediction models, the average predicted treatment effect (att.pr) for the treatment group is

where is a binary explanatory variable defining the treatment () and control () groups. When is censored rather than observed, we replace it with a simulation from the model given available knowledge of the censoring process. Variation in the simulations is due to two factors: uncertainty in the imputation process for censored and uncertainty in simulating , the counterfactual predicted value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

The output of each Zelig command contains useful information which you
may view. For example, if you run
`z.out <- zelig(Surv(Y, C) ~ X, model = exp, data)`, then you may
examine the available information in `z.out` by using
`names(z.out)`, see the coefficients by using z.out$coefficients, and
a default summary of information through `summary(z.out)`.

The exponential function is part of the survival library by Terry
Therneau, ported to R by Thomas Lumley. Advanced users may wish to refer
to `help(survfit)` in the survival library.

Gamma Regression for Continuous, Positive Dependent Variables

Use the gamma regression model if you have a positive-valued dependent variable such as the number of years a parliamentary cabinet endures, or the seconds you can stay airborne while jumping. The gamma distribution assumes that all waiting times are complete by the end of the study (censoring is not allowed).

With reference classes:

```
z5 <- zgamma$new()
z5$zelig(Y ~ X1 + X ~ X, data = mydata)
z5$setx()
z5$sim()
```

With the Zelig 4 compatibility wrappers:

```
z.out <- zelig(Y ~ X1 + X2, model = "gamma", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out, x1 = NULL)
```

Attach the sample data:

```
data(coalition)
```

Estimate the model:

```
z.out <- zelig(duration ~ fract + numst2, model = "gamma", data = coalition)
```

```
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, Olivia Lau. 2007.
## gamma: Gamma Regression for Continuous, Positive Dependent Variables
## in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
## http://zeligproject.org/
```

View the regression output:

```
summary(z.out)
```

```
## Model:
## $by
## [1] 1
##
##
## Call: stats::glm(formula = duration ~ fract + numst2, family = Gamma("inverse"),
## data = .)
##
## Coefficients:
## (Intercept) fract numst2
## -0.012960 0.000115 -0.017387
##
## Degrees of Freedom: 313 Total (i.e. Null); 311 Residual
## Null Deviance: 301
## Residual Deviance: 272 AIC: 2430
## Next step: Use 'setx' method
```

Set the baseline values (with the ruling coalition in the minority) and the alternative values (with the ruling coalition in the majority) for X:

```
x.low <- setx(z.out, numst2 = 0)
x.high <- setx(z.out, numst2 = 1)
```

Simulate expected values (qi$ev) and first differences (qi$fd):

```
s.out <- sim(z.out, x = x.low, x1 = x.high)
```

```
summary(s.out)
```

```
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 14.46 1.071 14.38 12.57 16.68
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 14.87 13.61 10.9 0.7095 49.69
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 19.16 1.102 19.11 17.19 21.34
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 18.85 17.62 14.12 0.9335 68.06
## fd
## mean sd 50% 2.5% 97.5%
## [1,] 4.693 1.516 4.723 1.713 7.475
```

```
plot(s.out)
```

The Gamma distribution with scale parameter has a

*stochastic component*:for 0"/>.The

*systematic component*is given by

The expected values (qi$ev) are simulations of the mean of the stochastic component given draws of and from their posteriors:

The predicted values (qi$pr) are draws from the gamma distribution for each given set of parameters .

If x1 is specified, sim() also returns the differences in the expected values (qi$fd),

.

In conditional prediction models, the average expected treatment effect (att.ev) for the treatment group is

where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual expected value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

In conditional prediction models, the average predicted treatment effect (att.pr) for the treatment group is

where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual predicted value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

The output of each Zelig command contains useful information which you
may view. For example, if you run
`z.out <- zelig(y ~ x, model = gamma, data)`, then you may examine the
available information in `z.out` by using `names(z.out)`, see the
coefficients by using z.out$coefficients, and a default summary of
information through `summary(z.out)`.

The gamma model is part of the stats package. Advanced users may
wish to refer to `help(glm)` and `help(family)`.

Logistic Regression for Dichotomous Dependent Variables

Logistic regression specifies a dichotomous dependent variable as a function of a set of explanatory variables.

With reference classes:

```
z5 <- zlogit$new()
z5$zelig(Y ~ X1 + X ~ X, data = mydata)
z5$setx()
z5$sim()
```

With the Zelig 4 compatibility wrappers:

```
z.out <- zelig(Y ~ X1 + X2, model = "logit", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out, x1 = NULL)
```

Attaching the sample turnout dataset:

```
data(turnout)
```

Estimating parameter values for the logistic regression:

```
z.out1 <- zelig(vote ~ age + race, model = "logit", data = turnout)
```

```
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, Olivia Lau. 2007.
## logit: Logistic Regression for Dichotomous Dependent Variables
## in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
## http://zeligproject.org/
```

Setting values for the explanatory variables:

```
x.out1 <- setx(z.out1, age = 36, race = "white")
```

Simulating quantities of interest from the posterior distribution.

```
s.out1 <- sim(z.out1, x = x.out1)
```

```
summary(s.out1)
```

```
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 0.7486 0.01185 0.7486 0.7243 0.771
## pv
## 0 1
## [1,] 0.255 0.745
```

```
plot(s.out1)
```

Estimating the risk difference (and risk ratio) between low education (25th percentile) and high education (75th percentile) while all the other variables held at their default values.

```
z.out2 <- zelig(vote ~ race + educate, model = "logit", data = turnout)
```

```
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, Olivia Lau. 2007.
## logit: Logistic Regression for Dichotomous Dependent Variables
## in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
## http://zeligproject.org/
```

```
x.high <- setx(z.out2, educate = quantile(turnout$educate, prob = 0.75))
x.low <- setx(z.out2, educate = quantile(turnout$educate, prob = 0.25))
s.out2 <- sim(z.out2, x = x.high, x1 = x.low)
summary(s.out2)
```

```
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 0.8229 0.0102 0.8232 0.8028 0.8413
## pv
## 0 1
## [1,] 0.195 0.805
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 0.7093 0.01318 0.7091 0.6834 0.7337
## pv
## 0 1
## [1,] 0.291 0.709
## fd
## mean sd 50% 2.5% 97.5%
## [1,] -0.1136 0.01203 -0.1135 -0.1379 -0.09082
```

```
plot(s.out2)
```

Let be the binary dependent variable for observation which takes the value of either 0 or 1.

The

*stochastic component*is given bywhere .

The

*systematic component*is given by:where is the vector of explanatory variables for observation and is the vector of coefficients.

The expected values (qi$ev) for the logit model are simulations of the predicted probability of a success:

given draws of from its sampling distribution.

The predicted values (qi$pr) are draws from the Binomial distribution with mean equal to the simulated expected value .

The first difference (qi$fd) for the logit model is defined as

The risk ratio (qi$rr) is defined as

In conditional prediction models, the average expected treatment effect (att.ev) for the treatment group is

where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual expected value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

In conditional prediction models, the average predicted treatment effect (att.pr) for the treatment group is

where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual predicted value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

The output of each Zelig command contains useful information which you
may view. For example, if you run
`z.out <- zelig(y ~ x, model = logit, data)`, then you may examine the
available information in `z.out` by using `names(z.out)`, see the
coefficients by using z.out$coefficients, and a default summary of
information through `summary(z.out)`.

The logit model is part of the stats package. Advanced users may
wish to refer to `help(glm)` and `help(family)`.

Log-Normal Regression for Duration Dependent Variables

The log-normal model describes an event’s duration, the dependent variable, as a function of a set of explanatory variables. The log-normal model may take time censored dependent variables, and allows the hazard rate to increase and decrease.

With reference classes:

```
z5 <- zlognorm$new()
z5$zelig(Surv(Y, C) ~ X, data = mydata)
z5$setx()
z5$sim()
```

With reference classes:

```
z5 <- zlognorm$new()
z5$zelig(Surv(Y, C) ~ X, data = mydata)
z5$setx()
z5$sim()
```

With the Zelig 4 compatibility wrappers:

```
z.out <- zelig(Surv(Y, C) ~ X, model = "lognorm", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)
```

Log-normal models require that the dependent variable be in the form
Surv(Y, C), where Y and C are vectors of length . For each
observation in 1, …, , the value is the
duration (lifetime, for example) of each subject, and the associated
is a binary variable such that if the
duration is not censored (*e.g.*, the subject dies during the study) or
if the duration is censored (*e.g.*, the subject is
still alive at the end of the study). If is omitted, all Y
are assumed to be completed; that is, time defaults to 1 for all
observations.

In addition to the standard inputs, zelig() takes the following additional options for lognormal regression:

- robust: defaults to FALSE. If TRUE, zelig() computes robust standard errors based on sandwich estimators (see and ) based on the options in cluster.
- cluster: if robust = TRUE, you may select a variable to define groups of correlated observations. Let x3 be a variable that consists of either discrete numeric values, character strings, or factors that define strata. Then

```
z.out <- zelig(y ~ x1 + x2, robust = TRUE, cluster = "x3", model = "exp", data = mydata)
```

means that the observations can be correlated within the strata defined by the variable x3, and that robust standard errors should be calculated according to those clusters. If robust = TRUE but cluster is not specified, zelig() assumes that each observation falls into its own cluster.

Attach the sample data:

```
data(coalition)
```

Estimate the model:

```
z.out <- zelig(Surv(duration, ciep12) ~ fract + numst2, model ="lognorm", data = coalition)
```

```
## How to cite this model in Zelig:
## Matthew Owen, Olivia Lau, Kosuke Imai, Gary King. 2007.
## lognorm: Log-Normal Regression for Duration Dependent Variables
## in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
## http://zeligproject.org/
```

View the regression output:

```
summary(z.out)
```

```
## Model:
## $by
## [1] 1
##
## Call:
## survival::survreg(formula = Surv(duration, ciep12) ~ fract +
## numst2, data = ., dist = "lognormal", model = FALSE)
##
## Coefficients:
## (Intercept) fract numst2
## 5.366670 -0.004438 0.559833
##
## Scale= 1.2
##
## Loglik(model)= -1078 Loglik(intercept only)= -1101
## Chisq= 46.58 on 2 degrees of freedom, p= 7.7e-11
## n= 314
## Next step: Use 'setx' method
```

Set the baseline values (with the ruling coalition in the minority) and the alternative values (with the ruling coalition in the majority) for X:

```
x.low <- setx(z.out, numst2 = 0)
```

```
## Error: error in evaluating the argument 'x' in selecting a method for function 'terms': Error in Ops.Surv(y, z$residuals) : Invalid operation on a survival time
## Calls: lm -> lm.fit -> Ops.Surv
```

```
x.high <- setx(z.out, numst2= 1)
```

```
## Error: error in evaluating the argument 'x' in selecting a method for function 'terms': Error in Ops.Surv(y, z$residuals) : Invalid operation on a survival time
## Calls: lm -> lm.fit -> Ops.Surv
```

Simulate expected values (qi$ev) and first differences (qi$fd):

```
s.out <- sim(z.out, x = x.low, x1 = x.high)
```

```
summary(s.out)
```

```
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 0.7091 0.01293 0.7098 0.6829 0.7328
## pv
## 0 1
## [1,] 0.294 0.706
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 0.8224 0.009923 0.8226 0.8031 0.8417
## pv
## 0 1
## [1,] 0.178 0.822
## fd
## mean sd 50% 2.5% 97.5%
## [1,] 0.1134 0.01112 0.1132 0.09218 0.1363
```

```
plot(s.out)
```

Let be the survival time for observation with the density function and the corresponding distribution function . This variable might be censored for some observations at a fixed time such that the fully observed dependent variable, , is defined as

y_c \\ \end{array} \right."/>

The

*stochastic component*is described by the distribution of the partially observed variable, . For the lognormal model, there are two equivalent representations:where the parameters and are the mean and variance of the Normal distribution. (Note that the output from zelig() parameterizes scale:math:` = sigma`.)

In addition, survival models like the lognormal have three additional properties. The hazard function measures the probability of not surviving past time given survival up to . In general, the hazard function is equal to where the survival function represents the fraction still surviving at time . The cumulative hazard function describes the probability of dying before time . In general, . In the case of the lognormal model,

where is the cumulative density function for the Normal distribution.

The

*systematic component*is described as:

The expected values (qi$ev) for the lognormal model are simulations of the expected duration:

given draws of and from their sampling distributions.

The predicted value is a draw from the log-normal distribution given simulations of the parameters .

The first difference (qi$fd) is

where is a binary explanatory variable defining the treatment () and control () groups. When is censored rather than observed, we replace it with a simulation from the model given available knowledge of the censoring process. Variation in the simulations is due to two factors: uncertainty in the imputation process for censored and uncertainty in simulating , the counterfactual expected value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

where is a binary explanatory variable defining the treatment () and control () groups. When is censored rather than observed, we replace it with a simulation from the model given available knowledge of the censoring process. Variation in the simulations are due to two factors: uncertainty in the imputation process for censored and uncertainty in simulating , the counterfactual predicted value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

The output of each Zelig command contains useful information which you
may view. For example, if you run
`z.out <- zelig(Surv(Y, C) ~ X, model = lognorm, data)`, then you may
examine the available information in `z.out` by using
`names(z.out)`, see the coefficients by using z.out$coefficients, and
a default summary of information through `summary(z.out)`.

The exponential function is part of the survival library by by Terry
Therneau, ported to R by Thomas Lumley. Advanced users may wish to refer
to `help(survfit)` in the survival library.

Least Squares Regression for Continuous Dependent Variables

Use least squares regression analysis to estimate the best linear predictor for the specified dependent variables.

With reference classes:

```
z5 <- zls$new()
z5$zelig(Y ~ X1 + X ~ X, data = mydata)
z5$setx()
z5$sim()
```

With the Zelig 4 compatibility wrappers:

```
z.out <- zelig(Y ~ X1 + X2, model = "ls", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)
```

Attach sample data:

```
data(macro)
```

Estimate model:

```
z.out1 <- zelig(unem ~ gdp + capmob + trade, model = "ls", data = macro)
```

```
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, and Olivia Lau. 2007.
## ls: Least Squares Regression for Continuous Dependent Variables
## in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
## http://zeligproject.org/
```

Summarize regression coefficients:

```
summary(z.out1)
```

```
## Model:
## $by
## [1] 1
##
##
## Call:
## stats::lm(formula = unem ~ gdp + capmob + trade, data = .)
##
## Coefficients:
## (Intercept) gdp capmob trade
## 6.1813 -0.3236 1.4219 0.0199
##
## Next step: Use 'setx' method
```

Set explanatory variables to their default (mean/mode) values, with high (80th percentile) and low (20th percentile) values for the trade variable:

```
x.high <- setx(z.out1, trade = quantile(macro$trade, 0.8))
x.low <- setx(z.out1, trade = quantile(macro$trade, 0.2))
```

Generate first differences for the effect of high versus low trade on GDP:

```
s.out1 <- sim(z.out1, x = x.high, x1 = x.low)
```

```
summary(s.out1)
```

```
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## 1 5.434 0.1854 5.433 5.065 5.794
## pv
## mean sd 50% 2.5% 97.5%
## 1 5.434 0.1854 5.433 5.065 5.794
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## 1 4.591 0.1859 4.591 4.223 4.939
## pv
## mean sd 50% 2.5% 97.5%
## 1 4.591 0.1859 4.591 4.223 4.939
## fd
## mean sd 50% 2.5% 97.5%
## 1 -0.8432 0.2323 -0.8312 -1.321 -0.4059
```

```
plot(s.out1)
```

Estimate a model with fixed effects for each country (see for help with dummy variables). Note that you do not need to create dummy variables, as the program will automatically parse the unique values in the selected variable into discrete levels.

```
z.out2 <- zelig(unem ~ gdp + trade + capmob + as.factor(country), model = "ls", data = macro)
```

```
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, and Olivia Lau. 2007.
## ls: Least Squares Regression for Continuous Dependent Variables
## in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
## http://zeligproject.org/
```

Set values for the explanatory variables, using the default mean/mode values, with country set to the United States and Japan, respectively:

```
x.US <- setx(z.out2, country = "United States")
x.Japan <- setx(z.out2, country = "Japan")
```

Simulate quantities of interest:

```
s.out2 <- sim(z.out2, x = x.US, x1 = x.Japan)
```

```
plot(s.out2)
```

The

*stochastic component*is described by a density with mean and the common varianceThe

*systematic component*models the conditional mean aswhere is the vector of covariates, and is the vector of coefficients.

The least squares estimator is the best linear predictor of a dependent variable given , and minimizes the sum of squared residuals, .

The expected value (qi$ev) is the mean of simulations from the stochastic component,

given a draw of from its sampling distribution.

where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual expected value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

The output of each Zelig command contains useful information which you
may view. For example, if you run
`z.out <- zelig(y ~ x, model = ls, data)`, then you may examine the
available information in `z.out` by using `names(z.out)`, see the
coefficients by using z.out$coefficients, and a default summary of
information through `summary(z.out)`. Other elements available through
the $ operator are listed below.

From the zelig() output object z.out, you may extract:

- coefficients: parameter estimates for the explanatory variables.
- residuals: the working residuals in the final iteration of the IWLS fit.
- fitted.values: fitted values.
- df.residual: the residual degrees of freedom.
- zelig.data: the input data frame if save.data = TRUE.

From summary(z.out), you may extract:

coefficients: the parameter estimates with their associated standard errors, -values, and -statistics.

sigma: the square root of the estimate variance of the random error :

r.squared: the fraction of the variance explained by the model.

adj.r.squared: the above statistic, penalizing for an increased number of explanatory variables.

cov.unscaled: a matrix of unscaled covariances.

The least squares regression is part of the stats package by William N.
Venables and Brian D. Ripley .In addition, advanced users may wish to
refer to `help(lm)` and `help(lm.fit)`.

Negative Binomial Regression for Event Count Dependent Variables

Use the negative binomial regression if you have a count of events for each observation of your dependent variable. The negative binomial model is frequently used to estimate over-dispersed event count models.

With reference classes:

```
z5 <- znegbin$new()
z5$zelig(Y ~ X1 + X ~ X, data = mydata)
z5$setx()
z5$sim()
```

With the Zelig 4 compatibility wrappers:

```
z.out <- zelig(Y ~ X1 + X2, model = "negbin", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)
```

Load sample data:

```
data(sanction)
```

Estimate the model:

```
z.out <- zelig(num ~ target + coop, model = "negbin", data = sanction)
```

```
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, Olivia Lau. 2008.
## negbinom: Negative Binomial Regression for Event Count Dependent Variables
## in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
## http://zeligproject.org/
```

```
summary(z.out)
```

```
## Model:
## $by
## [1] 1
##
##
## Call: MASS::glm.nb(formula = num ~ target + coop, data = ., init.theta = 1.841603403,
## link = log)
##
## Coefficients:
## (Intercept) target coop
## -1.564 0.151 1.286
##
## Degrees of Freedom: 77 Total (i.e. Null); 75 Residual
## Null Deviance: 237
## Residual Deviance: 56.5 AIC: 360
## Next step: Use 'setx' method
```

Set values for the explanatory variables to their default mean values:

```
x.out <- setx(z.out)
```

Simulate fitted values:

```
s.out <- sim(z.out, x = x.out)
```

```
summary(s.out)
```

```
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 2.975 0.3322 2.959 2.376 3.703
## pv
## qi
## 0 1 2 3 4 5 6 7 8 9 10 11
## 0.147 0.184 0.157 0.120 0.112 0.084 0.068 0.039 0.029 0.013 0.020 0.012
## 12 13 14 16
## 0.009 0.003 0.001 0.002
```

```
plot(s.out)
```

Let be the number of independent events that occur during a fixed time period. This variable can take any non-negative integer value.

The negative binomial distribution is derived by letting the mean of the Poisson distribution vary according to a fixed parameter given by the Gamma distribution. The

*stochastic component*is given byThe marginal distribution of is then the negative binomial with mean and variance :

where is the systematic parameter of the Gamma distribution modeling .

The

*systematic component*is given bywhere is the vector of explanatory variables and is the vector of coefficients.

The expected values (qi$ev) are simulations of the mean of the stochastic component. Thus,

given simulations of .

The predicted value (qi$pr) drawn from the distribution defined by the set of parameters .

The first difference (qi$fd) is

where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual predicted value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

The output of each Zelig command contains useful information which you
may view. For example, if you run
`z.out <- zelig(y ~ x, model = negbin, data)`, then you may examine
the available information in `z.out` by using `names(z.out)`, see
the coefficients by using z.out$coefficients, and a default summary of
information through `summary(z.out)`.

Normal Regression for Continuous Dependent Variables

The Normal regression model is a close variant of the more standard least squares regression model (see ). Both models specify a continuous dependent variable as a linear function of a set of explanatory variables. The Normal model reports maximum likelihood (rather than least squares) estimates. The two models differ only in their estimate for the stochastic parameter .

With reference classes:

```
z5 <- znormal$new()
z5$zelig(Y ~ X1 + X ~ X, data = mydata)
z5$setx()
z5$sim()
```

With the Zelig 4 compatibility wrappers:

```
z.out <- zelig(Y ~ X1 + X2, model = "normal", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)
```

Attach sample data:

```
data(macro)
```

Estimate model:

```
z.out1 <- zelig(unem ~ gdp + capmob + trade, model = "normal", data = macro)
```

```
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, Olivia Lau. 2008.
## normal: Normal Regression for Continuous Dependent Variables
## in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
## http://zeligproject.org/
```

Summarize of regression coefficients:

```
summary(z.out1)
```

```
## Model:
## $by
## [1] 1
##
##
## Call: stats::glm(formula = unem ~ gdp + capmob + trade, family = gaussian("identity"),
## data = .)
##
## Coefficients:
## (Intercept) gdp capmob trade
## 6.1813 -0.3236 1.4219 0.0199
##
## Degrees of Freedom: 349 Total (i.e. Null); 346 Residual
## Null Deviance: 3660
## Residual Deviance: 2610 AIC: 1710
## Next step: Use 'setx' method
```

Set explanatory variables to their default (mean/mode) values, with high (80th percentile) and low (20th percentile) values for trade:

```
x.high <- setx(z.out1, trade = quantile(macro$trade, 0.8))
x.low <- setx(z.out1, trade = quantile(macro$trade, 0.2))
```

Generate first differences for the effect of high versus low trade on GDP:

```
s.out1 <- sim(z.out1, x = x.high, x1 = x.low)
```

```
summary(s.out1)
```

```
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 5.43 0.1872 5.42 5.046 5.792
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 5.376 2.788 5.485 -0.1162 10.47
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 4.592 0.1781 4.592 4.254 4.948
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 4.723 2.73 4.727 -0.5893 10.02
## fd
## mean sd 50% 2.5% 97.5%
## [1,] -0.8381 0.2308 -0.8242 -1.309 -0.3974
```

A visual summary of quantities of interest:

```
plot(s.out1)
```

Let be the continuous dependent variable for observation .

The

*stochastic component*is described by a univariate normal model with a vector of means and scalar variance :The

*systematic component*iswhere is the vector of explanatory variables and is the vector of coefficients.

The expected value (qi$ev) is the mean of simulations from the the stochastic component,

given a draw of from its posterior.

The predicted value (qi$pr) is drawn from the distribution defined by the set of parameters .

The first difference (qi$fd) is:

The output of each Zelig command contains useful information which you
may view. For example, if you run
`z.out <- zelig(y ~ x, model = normal, data)`, then you may examine
the available information in `z.out` by using `names(z.out)`, see
the coefficients by using z.out$coefficients, and a default summary of
information through `summary(z.out)`.

The normal model is part of the stats package by . Advanced users may
wish to refer to `help(glm)` and `help(family)`.

Poisson Regression for Event Count Dependent Variables

Use the Poisson regression model if the observations of your dependent variable represents the number of independent events that occur during a fixed period of time (see the negative binomial model, , for over-dispersed event counts.) For a Bayesian implementation of this model, see .

With reference classes:

```
z5 <- zpoisson$new()
z5$zelig(Y ~ X1 + X ~ X, data = mydata)
z5$setx()
z5$sim()
```

With the Zelig 4 compatibility wrappers:

```
z.out <- zelig(Y ~ X1 + X2, model = "poisson", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)
```

Load sample data:

```
data(sanction)
```

Estimate Poisson model:

```
z.out <- zelig(num ~ target + coop, model = "poisson", data = sanction)
```

```
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, Olivia Lau. 2007.
## poisson: Poisson Regression for Event Count Dependent Variables
## in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
## http://zeligproject.org/
```

```
summary(z.out)
```

```
## Model:
## $by
## [1] 1
##
##
## Call: stats::glm(formula = num ~ target + coop, family = poisson("log"),
## data = .)
##
## Coefficients:
## (Intercept) target coop
## -0.968 -0.021 1.211
##
## Degrees of Freedom: 77 Total (i.e. Null); 75 Residual
## Null Deviance: 1580
## Residual Deviance: 721 AIC: 944
## Next step: Use 'setx' method
```

Set values for the explanatory variables to their default mean values:

```
x.out <- setx(z.out)
```

Simulate fitted values:

```
s.out <- sim(z.out, x = x.out)
summary(s.out)
```

```
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 3.244 0.2324 3.239 2.799 3.732
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 3.32 1.843 3 0 7
```

```
plot(s.out)
```

Let be the number of independent events that occur during a fixed time period. This variable can take any non-negative integer.

The Poisson distribution has

*stochastic component*where is the mean and variance parameter.

The

*systematic component*iswhere is the vector of explanatory variables, and is the vector of coefficients.

The expected value (qi$ev) is the mean of simulations from the stochastic component,

given draws of from its sampling distribution.

The predicted value (qi$pr) is a random draw from the poisson distribution defined by mean .

The first difference in the expected values (qi$fd) is given by:

The output of each Zelig command contains useful information which you
may view. For example, if you run
`z.out <- zelig(y ~ x, model = poisson, data)`, then you may examine
the available information in `z.out` by using `names(z.out)`, see
the coefficients by using z.out$coefficients, and a default summary of
information through `summary(z.out)`.

The poisson model is part of the stats package by . Advanced users may
wish to refer to `help(glm)` and `help(family)`.

Probit Regression for Dichotomous Dependent Variables

Use probit regression to model binary dependent variables specified as a function of a set of explanatory variables.

With reference classes:

```
z5 <- zprobit$new()
z5$zelig(Y ~ X1 + X ~ X, data = mydata)
z5$setx()
z5$sim()
```

With the Zelig 4 compatibility wrappers:

```
z.out <- zelig(Y ~ X1 + X2, model = "probit", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out, x1 = NULL)
```

Attach the sample turnout dataset:

```
data(turnout)
```

Estimate parameter values for the probit regression:

```
z.out <- zelig(vote ~ race + educate, model = "probit", data = turnout)
```

```
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, Olivia Lau. 2007.
## probit: Probit Regression for Dichotomous Dependent Variables
## in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
## http://zeligproject.org/
```

```
summary(z.out)
```

```
## Model:
## $by
## [1] 1
##
##
## Call: stats::glm(formula = vote ~ race + educate, family = binomial("probit"),
## data = .)
##
## Coefficients:
## (Intercept) racewhite educate
## -0.7259 0.2991 0.0971
##
## Degrees of Freedom: 1999 Total (i.e. Null); 1997 Residual
## Null Deviance: 2270
## Residual Deviance: 2140 AIC: 2140
## Next step: Use 'setx' method
```

Set values for the explanatory variables to their default values.

```
x.out <- setx(z.out)
```

Simulate quantities of interest from the posterior distribution.

```
s.out <- sim(z.out, x = x.out)
```

```
summary(s.out)
```

```
plot(s.out1)
```

Let be the observed binary dependent variable for observation which takes the value of either 0 or 1.

The

*stochastic component*is given bywhere .

The

*systematic component*iswhere is the cumulative distribution function of the Normal distribution with mean 0 and unit variance.

The expected value (qi$ev) is a simulation of predicted probability of success

given a draw of from its sampling distribution.

The predicted value (qi$pr) is a draw from a Bernoulli distribution with mean .

The first difference (qi$fd) in expected values is defined as

The risk ratio (qi$rr) is defined as

The output of each Zelig command contains useful information which you
may view. For example, if you run
`z.out <- zelig(y ~ x, model = probit, data)`, then you may examine
the available information in `z.out` by using `names(z.out)`, see
the coefficients by using z.out$coefficients, and a default summary of
information through `summary(z.out)`.

The probit model is part of the stats package by . Advanced users may
wish to refer to `help(glm)` and `help(family)`.

Rare Events Logistic Regression for Dichotomous Dependent Variables

The relogit procedure estimates the same model as standard logistic regression (appropriate when you have a dichotomous dependent variable and a set of explanatory variables; see ), but the estimates are corrected for the bias that occurs when the sample is small or the observed events are rare (i.e., if the dependent variable has many more 1s than 0s or the reverse). The relogit procedure also optionally uses prior correction for case-control sampling designs.

With reference classes:

```
z5 <- zrelogit$new()
z5$zelig(Y ~ X1 + X2, tau = NULL,
case.control = c("prior", "weighting"),
bias.correct = TRUE, robust = FALSE,
data = mydata, ...)
z5$setx()
z5$sim()
```

With the Zelig 4 compatibility wrappers:

```
z.out <- zelig(Y ~ X1 + X2, model = "relogit", tau = NULL,
case.control = c("prior", "weighting"),
bias.correct = TRUE, robust = FALSE,
data = mydata, ...)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)
```

The relogit procedure supports four optional arguments in addition to the standard arguments for zelig(). You may additionally use:

- tau: a vector containing either one or two values for , the true population fraction of ones. Use, for example, tau = c(0.05, 0.1) to specify that the lower bound on tau is 0.05 and the upper bound is 0.1. If left unspecified, only finite-sample bias correction is performed, not case-control correction.
- case.control: if tau is specified, choose a method to correct for case-control sampling design: “prior” (default) or “weighting”.
- bias.correct: a logical value of TRUE (default) or FALSE indicating whether the intercept should be corrected for finite sample (rare events) bias.

Note that if tau = NULL, bias.correct = FALSE, the relogit procedure performs a standard logistic regression without any correction.

Due to memory and space considerations, the data used here are a sample drawn from the full data set used in King and Zeng, 2001, The proportion of militarized interstate conflicts to the absence of disputes is . To estimate the model,

```
data(mid)
```

```
z.out1 <- zelig(conflict ~ major + contig + power + maxdem + mindem + years, data = mid, model = "relogit", tau = 1042/303772)
```

```
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, and Olivia Lau. 2014.
## relogit: Rare Events Logistic Regression for Dichotomous Dependent Variables
## in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
## http://zeligproject.org/
```

Summarize the model output:

```
summary(z.out1)
```

```
## Model:
## $by
## [1] 1
##
##
## Call: relogit(formula = cbind(conflict, 1 - conflict) ~ major + contig +
## power + maxdem + mindem + years, data = ., tau = 0.00343020423212146,
## bias.correct = TRUE, case.control = "prior")
##
## Coefficients:
## (Intercept) major contig power maxdem
## -7.5084 2.4320 4.1080 1.0536 0.0480
## mindem years
## -0.0641 -0.0629
##
## Degrees of Freedom: 3125 Total (i.e. Null); 3119 Residual
## Null Deviance: 3980
## Residual Deviance: 1870 AIC: 1880
## Next step: Use 'setx' method
```

Set the explanatory variables to their means:

```
x.out1 <- setx(z.out1)
```

Simulate quantities of interest:

```
s.out1 <- sim(z.out1, x = x.out1)
summary(s.out1)
```

```
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 0.002388 0.000155 0.002383 0.002104 0.002702
## pv
## 0 1
## [1,] 0.998 0.002
```

```
plot(s.out1)
```

Suppose that we wish to perform case control correction using weighting (rather than the default prior correction). To estimate the model:

```
z.out2 <- zelig(conflict ~ major + contig + power + maxdem + mindem + years, data = mid, model = "relogit", tau = 1042/303772, case.control = "weighting", robust = TRUE)
```

```
## Error: unused argument (robust = TRUE)
```

Summarize the model output:

```
summary(z.out2)
```

```
## Model:
## $by
## [1] 1
##
##
## Call:
## stats::lm(formula = unem ~ gdp + trade + capmob + as.factor(country),
## data = .)
##
## Coefficients:
## (Intercept) gdp
## -5.843 -0.110
## trade capmob
## 0.144 0.815
## as.factor(country)Belgium as.factor(country)Canada
## -1.599 6.759
## as.factor(country)Denmark as.factor(country)Finland
## 4.311 4.810
## as.factor(country)France as.factor(country)Italy
## 6.905 9.290
## as.factor(country)Japan as.factor(country)Netherlands
## 5.459 -1.459
## as.factor(country)Norway as.factor(country)Sweden
## -2.754 0.925
## as.factor(country)United Kingdom as.factor(country)United States
## 5.601 10.066
## as.factor(country)West Germany
## 3.364
##
## Next step: Use 'setx' method
```

Set the explanatory variables to their means:

```
x.out2 <- setx(z.out2)
```

Simulate quantities of interest:

```
s.out2 <- sim(z.out2, x = x.out2)
summary(s.out2)
```

```
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## 1 -0.192 0.5793 -0.1943 -1.268 0.9895
## pv
## mean sd 50% 2.5% 97.5%
## 1 -0.192 0.5793 -0.1943 -1.268 0.9895
```

Suppose that we did not know that , but only that it was somewhere between . To estimate a model with a range of feasible estimates for (using the default prior correction method for case control correction):

```
z.out2 <- zelig(conflict ~ major + contig + power + maxdem + mindem + years, data = mid, model = "relogit", tau = c(0.002, 0.005))
```

```
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, and Olivia Lau. 2014.
## relogit: Rare Events Logistic Regression for Dichotomous Dependent Variables
## in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
## http://zeligproject.org/
```

Summarize the model output:

```
z.out2
```

```
## Model:
## $by
## [1] 1
##
## $lower.estimate
##
## Call: (function (formula, data = sys.parent(), tau = NULL, bias.correct = TRUE,
## case.control = "prior", ...)
## {
## mf <- match.call()
## mf$tau <- mf$bias.correct <- mf$case.control <- NULL
## if (!is.null(tau)) {
## tau <- unique(tau)
## if (length(case.control) > 1)
## stop("You can only choose one option for case control correction.")
## ck1 <- grep("p", case.control)
## ck2 <- grep("w", case.control)
## if (length(ck1) == 0 & length(ck2) == 0)
## stop("choose either case.control = \"prior\" ", "or case.control = \"weighting\"")
## if (length(ck2) == 0)
## weighting <- FALSE
## else weighting <- TRUE
## }
## else weighting <- FALSE
## if (length(tau) > 2)
## stop("tau must be a vector of length less than or equal to 2")
## else if (length(tau) == 2) {
## mf[[1]] <- relogit
## res <- list()
## mf$tau <- min(tau)
## res$lower.estimate <- eval(as.call(mf), parent.frame())
## mf$tau <- max(tau)
## res$upper.estimate <- eval(as.call(mf), parent.frame())
## res$formula <- formula
## class(res) <- c("Relogit2", "Relogit")
## return(res)
## }
## else {
## mf[[1]] <- glm
## mf$family <- binomial(link = "logit")
## y2 <- model.response(model.frame(mf$formula, data))
## if (is.matrix(y2))
## y <- y2[, 1]
## else y <- y2
## ybar <- mean(y)
## if (weighting) {
## w1 <- tau/ybar
## w0 <- (1 - tau)/(1 - ybar)
## wi <- w1 * y + w0 * (1 - y)
## mf$weights <- wi
## }
## res <- eval(as.call(mf), parent.frame())
## res$call <- match.call(expand.dots = TRUE)
## res$tau <- tau
## X <- model.matrix(res)
## if (bias.correct) {
## pihat <- fitted(res)
## if (is.null(tau))
## wi <- rep(1, length(y))
## else if (weighting)
## res$weighting <- TRUE
## else {
## w1 <- tau/ybar
## w0 <- (1 - tau)/(1 - ybar)
## wi <- w1 * y + w0 * (1 - y)
## res$weighting <- FALSE
## }
## W <- pihat * (1 - pihat) * wi
## Qdiag <- lm.influence(lm(y ~ X - 1, weights = W))$hat/W
## if (is.null(tau))
## xi <- 0.5 * Qdiag * (2 * pihat - 1)
## else xi <- 0.5 * Qdiag * ((1 + w0) * pihat - w0)
## res$coefficients <- res$coefficients - lm(xi ~ X -
## 1, weights = W)$coefficients
## res$bias.correct <- TRUE
## }
## else res$bias.correct <- FALSE
## if (!is.null(tau) & !weighting) {
## if (tau <= 0 || tau >= 1)
## stop("\ntau needs to be between 0 and 1.\n")
## res$coefficients["(Intercept)"] <- res$coefficients["(Intercept)"] -
## log(((1 - tau)/tau) * (ybar/(1 - ybar)))
## res$prior.correct <- TRUE
## res$weighting <- FALSE
## }
## else res$prior.correct <- FALSE
## if (is.null(res$weighting))
## res$weighting <- FALSE
## res$linear.predictors <- t(res$coefficients) %*% t(X)
## res$fitted.values <- 1/(1 + exp(-res$linear.predictors))
## res$zelig <- "Relogit"
## class(res) <- c("Relogit", "glm")
## return(res)
## }
## })(formula = cbind(conflict, 1 - conflict) ~ major + contig +
## power + maxdem + mindem + years, data = ., tau = 0.002)
##
## Coefficients:
## (Intercept) major contig power maxdem
## -8.0492 2.4320 4.1079 1.0536 0.0480
## mindem years
## -0.0641 -0.0629
##
## Degrees of Freedom: 3125 Total (i.e. Null); 3119 Residual
## Null Deviance: 3980
## Residual Deviance: 1870 AIC: 1880
##
## $upper.estimate
##
## Call: (function (formula, data = sys.parent(), tau = NULL, bias.correct = TRUE,
## case.control = "prior", ...)
## {
## mf <- match.call()
## mf$tau <- mf$bias.correct <- mf$case.control <- NULL
## if (!is.null(tau)) {
## tau <- unique(tau)
## if (length(case.control) > 1)
## stop("You can only choose one option for case control correction.")
## ck1 <- grep("p", case.control)
## ck2 <- grep("w", case.control)
## if (length(ck1) == 0 & length(ck2) == 0)
## stop("choose either case.control = \"prior\" ", "or case.control = \"weighting\"")
## if (length(ck2) == 0)
## weighting <- FALSE
## else weighting <- TRUE
## }
## else weighting <- FALSE
## if (length(tau) > 2)
## stop("tau must be a vector of length less than or equal to 2")
## else if (length(tau) == 2) {
## mf[[1]] <- relogit
## res <- list()
## mf$tau <- min(tau)
## res$lower.estimate <- eval(as.call(mf), parent.frame())
## mf$tau <- max(tau)
## res$upper.estimate <- eval(as.call(mf), parent.frame())
## res$formula <- formula
## class(res) <- c("Relogit2", "Relogit")
## return(res)
## }
## else {
## mf[[1]] <- glm
## mf$family <- binomial(link = "logit")
## y2 <- model.response(model.frame(mf$formula, data))
## if (is.matrix(y2))
## y <- y2[, 1]
## else y <- y2
## ybar <- mean(y)
## if (weighting) {
## w1 <- tau/ybar
## w0 <- (1 - tau)/(1 - ybar)
## wi <- w1 * y + w0 * (1 - y)
## mf$weights <- wi
## }
## res <- eval(as.call(mf), parent.frame())
## res$call <- match.call(expand.dots = TRUE)
## res$tau <- tau
## X <- model.matrix(res)
## if (bias.correct) {
## pihat <- fitted(res)
## if (is.null(tau))
## wi <- rep(1, length(y))
## else if (weighting)
## res$weighting <- TRUE
## else {
## w1 <- tau/ybar
## w0 <- (1 - tau)/(1 - ybar)
## wi <- w1 * y + w0 * (1 - y)
## res$weighting <- FALSE
## }
## W <- pihat * (1 - pihat) * wi
## Qdiag <- lm.influence(lm(y ~ X - 1, weights = W))$hat/W
## if (is.null(tau))
## xi <- 0.5 * Qdiag * (2 * pihat - 1)
## else xi <- 0.5 * Qdiag * ((1 + w0) * pihat - w0)
## res$coefficients <- res$coefficients - lm(xi ~ X -
## 1, weights = W)$coefficients
## res$bias.correct <- TRUE
## }
## else res$bias.correct <- FALSE
## if (!is.null(tau) & !weighting) {
## if (tau <= 0 || tau >= 1)
## stop("\ntau needs to be between 0 and 1.\n")
## res$coefficients["(Intercept)"] <- res$coefficients["(Intercept)"] -
## log(((1 - tau)/tau) * (ybar/(1 - ybar)))
## res$prior.correct <- TRUE
## res$weighting <- FALSE
## }
## else res$prior.correct <- FALSE
## if (is.null(res$weighting))
## res$weighting <- FALSE
## res$linear.predictors <- t(res$coefficients) %*% t(X)
## res$fitted.values <- 1/(1 + exp(-res$linear.predictors))
## res$zelig <- "Relogit"
## class(res) <- c("Relogit", "glm")
## return(res)
## }
## })(formula = cbind(conflict, 1 - conflict) ~ major + contig +
## power + maxdem + mindem + years, data = ., tau = 0.005)
##
## Coefficients:
## (Intercept) major contig power maxdem
## -7.1300 2.4320 4.1080 1.0536 0.0480
## mindem years
## -0.0641 -0.0629
##
## Degrees of Freedom: 3125 Total (i.e. Null); 3119 Residual
## Null Deviance: 3980
## Residual Deviance: 1870 AIC: 1880
##
## $formula
## cbind(conflict, 1 - conflict) ~ major + contig + power + maxdem +
## mindem + years
## <environment: 0x7fd4fbf8e908>
##
## attr(,"class")
## [1] "Relogit2" "Relogit"
## Next step: Use 'setx' method
```

Set the explanatory variables to their means:

```
x.out2 <- setx(z.out2)
```

Simulate quantities of interest:

```
s.out <- sim(z.out2, x = x.out2)
```

```
## Error: no applicable method for 'vcov' applied to an object of class
## "c('Relogit2', 'Relogit')"
```

```
summary(s.out2)
```

```
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## 1 -0.192 0.5793 -0.1943 -1.268 0.9895
## pv
## mean sd 50% 2.5% 97.5%
## 1 -0.192 0.5793 -0.1943 -1.268 0.9895
```

```
plot(s.out2)
```

The cost of giving a range of values for is that point estimates are not available for quantities of interest. Instead, quantities are presented as confidence intervals with significance less than or equal to a specified level (e.g., at least 95% of the simulations are contained in the nominal 95% confidence interval).

Like the standard logistic regression, the

*stochastic component*for the rare events logistic regression is:where is the binary dependent variable, and takes a value of either 0 or 1.

The

*systematic component*is:If the sample is generated via a case-control (or choice-based) design, such as when drawing all events (or “cases”) and a sample from the non-events (or “controls”) and going backwards to collect the explanatory variables, you must correct for selecting on the dependent variable. While the slope coefficients are approximately unbiased, the constant term may be significantly biased. Zelig has two methods for case control correction:

The “prior correction” method adjusts the intercept term. Let be the true population fraction of events, the fraction of events in the sample, and the uncorrected intercept term. The corrected intercept is:

The “weighting” method performs a weighted logistic regression to correct for a case-control sampling design. Let the 1 subscript denote observations for which the dependent variable is observed as a 1, and the 0 subscript denote observations for which the dependent variable is observed as a 0. Then the vector of weights

If is unknown, you may alternatively specify an upper and lower bound for the possible range of . In this case, the relogit procedure uses “robust Bayesian” methods to generate a confidence interval (rather than a point estimate) for each quantity of interest. The nominal coverage of the confidence interval is at least as great as the actual coverage.

By default, estimates of the the coefficients are bias-corrected to account for finite sample or rare events bias. In addition, quantities of interest, such as predicted probabilities, are also corrected of rare-events bias. If are the uncorrected logit coefficients and bias() is the bias term, the corrected coefficients are

The bias term is

where

where and are given in the “weighting” section above.

For either one or no :

The expected values (qi$ev) for the rare events logit are simulations of the predicted probability

given draws of from its posterior.

The predicted value (qi$pr) is a draw from a binomial distribution with mean equal to the simulated .

The first difference (qi$fd) is defined as

The risk ratio (qi$rr) is defined as

For a range of defined by , each of the quantities of interest are matrices, which report the lower and upper bounds, respectively, for a confidence interval with nominal coverage at least as great as the actual coverage. At worst, these bounds are conservative estimates for the likely range for each quantity of interest. Please refer to for the specific method of calculating bounded quantities of interest.

The output of each Zelig command contains useful information which you
may view. For example, if you run
`z.out <- zelig(y ~ x, model = relogit, data)`, then you may examine
the available information in `z.out` by using `names(z.out)`, see
the coefficients by using z.out$coefficients, and a default summary of
information through `summary(z.out)`.

The Stata version of ReLogit and the R implementation differ slightly in their coefficient estimates due to differences in the matrix inversion routines implemented in R and Stata. Zelig uses orthogonal-triangular decomposition (through lm.influence()) to compute the bias term, which is more numerically stable than standard matrix calculations.

Linear Regression for a Left-Censored Dependent Variable

Tobit regression estimates a linear regression model for a left-censored dependent variable, where the dependent variable is censored from below. While the classical tobit model has values censored at 0, you may select another censoring point. For other linear regression models with fully observed dependent variables, see Bayesian regression (), maximum likelihood normal regression (), or least squares ().

```
z5 <- ztobit$new()
z5$zelig(Y ~ X1 + X2, below = 0, above = Inf, data = mydata)
z5$setx()
z5$sim()
```

With the Zelig 4 compatibility wrappers:

```
z.out <- zelig(Y ~ X1 + X2, below = 0, above = Inf, model = "tobit", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)
```

zelig() accepts the following arguments to specify how the dependent variable is censored.

`below`: (defaults to 0) The point at which the dependent variable is censored from below. If any values in the dependent variable are observed to be less than the censoring point, it is assumed that that particular observation is censored from below at the observed value. (See for a Bayesian implementation that supports both left and right censoring.)robust: defaults to FALSE. If TRUE, zelig() computes robust standard errors based on sandwich estimators (see and ) and the options selected in cluster.

cluster: if robust = TRUE, you may select a variable to define groups of correlated observations. Let x3 be a variable that consists of either discrete numeric values, character strings, or factors that define strata. Then

> z.out <- zelig(y ~ x1 + x2, robust = TRUE, cluster = "x3", model = "tobit", data = mydata)

means that the observations can be correlated within the strata defined by the variable x3, and that robust standard errors should be calculated according to those clusters. If robust = TRUE but cluster is not specified, zelig() assumes that each observation falls into its own cluster.

Zelig users may wish to refer to `help(survreg)` for more information.

Attaching the sample dataset:

```
data(tobin)
```

Estimating linear regression using `tobit`:

```
z.out <- zelig(durable ~ age + quant, model = "tobit", data = tobin)
```

```
## How to cite this model in Zelig:
## Kosuke Imai, Gary King, Olivia Lau. 2011.
## tobit: Linear regression for Left-Censored Dependent Variable
## in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
## http://zeligproject.org/
```

Setting values for the explanatory variables to their sample averages:

```
x.out <- setx(z.out)
```

Simulating quantities of interest from the posterior distribution given `x.out`.

```
s.out1 <- sim(z.out, x = x.out)
```

```
summary(s.out1)
```

```
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## 1 1.547 0.6275 1.462 0.5676 2.978
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 3.291 4.331 1.637 0 13.96
```

Set explanatory variables to their default(mean/mode) values, with
high (80th percentile) and low (20th percentile) liquidity ratio
(`quant`):

```
x.high <- setx(z.out, quant = quantile(tobin$quant, prob = 0.8))
x.low <- setx(z.out, quant = quantile(tobin$quant, prob = 0.2))
```

Estimating the first difference for the effect of high versus low
liquidity ratio on duration(`durable`):

```
s.out2 <- sim(z.out, x = x.high, x1 = x.low)
```

```
summary(s.out2)
```

```
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## 1 1.184 0.7182 1.069 0.1974 2.957
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 3.113 4.147 1.364 0 13.46
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## 1 2.103 1.05 1.954 0.5592 4.579
## pv
## mean sd 50% 2.5% 97.5%
## [1,] 3.759 4.338 2.412 0 14.35
## fd
## mean sd 50% 2.5% 97.5%
## 1 0.9197 1.204 0.8075 -1.227 3.563
```

```
plot(s.out1)
```

Let be a latent dependent variable which is distributed with

*stochastic*componentwhere is a vector means and is a scalar variance parameter. is not directly observed, however. Rather we observed which is defined as:

where is the lower bound below which is censored.

The

*systematic component*is given bywhere is the vector of explanatory variables for observation and is the vector of coefficients.

The expected values (

`qi$ev`) for the tobit regression model are the same as the expected value of :The first difference (

`qi$fd`) for the tobit regression model is defined asIn conditional prediction models, the average expected treatment effect (

`qi$att.ev`) for the treatment group iswhere is a binary explanatory variable defining the treatment () and control () groups.

The output of each Zelig command contains useful information which you may view. For example, if you run:

```
z.out <- zelig(y ~ x, model = "tobit", data)
```

then you may examine the available information in ``z.out`.

The tobit function is part of the survival library by Terry Therneau,
ported to R by Thomas Lumley. Advanced users may wish to refer to
`help(survfit)` in the survival library.

Given some unobserved explanatory variables and observed dependent variables, the Normal theory factor analysis model estimates the latent factors. The model is implemented using a Markov Chain Monte Carlo algorithm (Gibbs sampling with data augmentation). For factor analysis with ordinal dependent variables, see ordered factor analysis (), and for a mix of types of dependent variables, see the mixed factor analysis model ().

With reference classes:

```
z5 <- zfactorbayes$new()
z5$zelig(cbind(Y1 ,Y2, Y3) ~ NULL, factors = 2,
model = "factor.bayes", data = mydata)
```

With the Zelig 4 compatibility wrappers:

```
z.out <- zelig(cbind(Y1 ,Y2, Y3) ~ NULL, factors = 2,
model = "factor.bayes", data = mydata)
```

zelig() takes the following functions for factor.bayes:

`Y1`,`Y2`, and`Y3`: variables of interest in factor analysis (manifest variables), assumed to be normally distributed. The model requires a minimum of three manifest variables.`factors`: number of the factors to be fitted (defaults to 2).

In addition, zelig() accepts the following additional arguments for model specification:

`lambda.constraints`: list containing the equality or inequality constraints on the factor loadings. Choose from one of the following forms:- varname = list(): by default, no constraints are imposed.
`varname = list(d, c)`: constrains the th loading for the variable named`varname`to be equal to`c`.`varname = list(d, +)`: constrains the th loading for the variable named`varname`to be positive;`varname = list(d, -)`: constrains the th loading for the variable named`varname`to be negative.

`std.var`: defaults to FALSE (manifest variables are rescaled to zero mean, but retain observed variance). If`TRUE`, the manifest variables are rescaled to be mean zero and unit variance.

In addition, zelig() accepts the following additional inputs for bayes.factor:

`burnin`: number of the initial MCMC iterations to be discarded (defaults to 1,000).`mcmc`: number of the MCMC iterations after burnin (defaults to 20,000).`thin`: thinning interval for the Markov chain. Only every`thin`-th draw from the Markov chain is kept. The value of`mcmc`must be divisible by this value. The default value is 1.`verbose`: defaults to FALSE. If`TRUE`, the progress of the sampler (every ) is printed to the screen.`seed`: seed for the random number generator. The default is`NA`which corresponds to a random seed 12345.`Lambda.start`: starting values of the factor loading matrix , either a scalar (all unconstrained loadings are set to that value), or a matrix with compatible dimensions. The default is`NA`, where the start value are set to be 0 for unconstrained factor loadings, and 0.5 or 0.5 for constrained factor loadings (depending on the nature of the constraints).`Psi.start`: starting values for the uniquenesses, either a scalar (the starting values for all diagonal elements of are set to be this value), or a vector with length equal to the number of manifest variables. In the latter case, the starting values of the diagonal elements of take the values of`Psi.start`. The default value is`NA`where the starting values of the all the uniquenesses are set to be 0.5.`store.lambda`: defaults to TRUE, which stores the posterior draws of the factor loadings.`store.scores`: defaults to FALSE. If TRUE, stores the posterior draws of the factor scores. (Storing factor scores may take large amount of memory for a large number of draws or observations.)

The model also accepts the following additional arguments to specify prior parameters:

`l0`: mean of the Normal prior for the factor loadings, either a scalar or a matrix with the same dimensions as . If a scalar value, that value will be the prior mean for all the factor loadings. Defaults to 0.`L0`: precision parameter of the Normal prior for the factor loadings, either a scalar or a matrix with the same dimensions as . If`L0`takes a scalar value, then the precision matrix will be a diagonal matrix with the diagonal elements set to that value. The default value is 0, which leads to an improper prior.`a0`: the shape parameter of the Inverse Gamma prior for the uniquenesses is`a0/2`. It can take a scalar value or a vector. The default value is 0.001.`b0`: the shape parameter of the Inverse Gamma prior for the uniquenesses is`b0/2`. It can take a scalar value or a vector. The default value is 0.001.

Zelig users may wish to refer to `help(MCMCfactanal)` for more
information.

Attaching the sample dataset:

```
data(swiss)
names(swiss) <- c("Fert", "Agr", "Exam", "Educ", "Cath", "InfMort")
```

Factor analysis:

```
z.out <- zelig(cbind(Agr, Exam, Educ, Cath, InfMort) ~ NULL,
model = "factor.bayes", data = swiss, factors = 2, verbose = TRUE,
a0 = 1, b0 = 0.15, burnin = 5000, mcmc = 50000)
```

Checking for convergence before summarizing the estimates:

```
algor <- try(geweke.diag(z.out$coefficients), silent=T)
if (class(algor) == "try-error")
print(algor)
```

Since the algorithm did not converge, we now add some constraints on to optimize the algorithm:

```
z.out <- zelig(cbind(Agr, Exam, Educ, Cath, InfMort) ~ NULL,
model = "factor.bayes", data = swiss, factors = 2,
lambda.constraints = list(Exam = list(1,"+"),
Exam = list(2,"-"), Educ = c(2, 0),
InfMort = c(1, 0)),
verbose = TRUE, a0 = 1, b0 = 0.15,
burnin = 5000, mcmc = 50000)
geweke.diag(z.out$coefficients)
heidel.diag(z.out$coefficients)
raftery.diag(z.out$coefficients)
summary(z.out)
```

Suppose for observation we observe variables and hypothesize that there are underlying factors such that:

where is the vector of manifest variables for observation . is the factor loading matrix and is the -vector of latent factor scores. Both and need to be estimated.

The

*stochastic component*is given by:where is a diagonal, positive definite matrix. The diagonal elements of are referred to as uniquenesses.

The

*systematic component*is given byThe independent conjugate

*prior*for each is given byThe independent conjugate

*prior*for each is given byThe

*prior*for iswhere is a :math:` dtimes d ` identity matrix.

The output of each Zelig command contains useful information which you may view. For example, if you run:

```
z.out <- zelig(cbind(Y1, Y2, Y3), model = "factor.bayes", data)
```

then you may examine the available information in `z.out` by using
`names(z.out)`, see the draws from the posterior distribution of the
`coefficients` by using `z.out$coefficients`, and view a default
summary of information through `summary(z.out)`. Other elements
available through the `$` operator are listed below.

- From the
`zelig()`output object`z.out`, you may extract:`coefficients`: draws from the posterior distributions of the estimated factor loadings and the uniquenesses. If`store.scores = TRUE`, the estimated factors scores are also contained in`coefficients`.`data`: the name of the input data frame.`seed`: the random seed used in the model.

- Since there are no explanatory variables, the
`sim()`procedure is not applicable for factor analysis models.

Use Bayesian multinomial logistic regression to model unordered categorical variables. The dependent variable may be in the format of either character strings or integer values. The model is estimated via a random walk Metropolis algorithm or a slice sampler. See for the maximum-likelihood estimation of this model.

With reference classes:

```
z5 <- zmlogitbayes$new()
z5$zelig(Y ~ X1 + X2, data = mydata)
z5$setx()
z5$sim()
```

With the Zelig 4 compatibility wrappers:

```
z.out <- zelig(Y ~ X1 + X2, model = "mlogit.bayes", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)
```

zelig() accepts the following arguments for mlogit.bayes:

`baseline`: either a character string or numeric value (equal to one of the observed values in the dependent variable) specifying a baseline category. The default value is`NA`which sets the baseline to the first alphabetical or numerical unique value of the dependent variable.

The model accepts the following additional arguments to monitor the Markov chains:

`burnin`: number of the initial MCMC iterations to be discarded (defaults to 1,000).`mcmc`: number of the MCMC iterations after burnin (defaults to 10,000).`thin`: thinning interval for the Markov chain. Only every`thin`-th draw from the Markov chain is kept. The value of`mcmc`must be divisible by this value. The default value is 1.`mcmc.method`: either “MH” or “slice”, specifying whether to use Metropolis Algorithm or slice sampler. The default value is`MH`.`tune`: tuning parameter for the Metropolis-Hasting step, either a scalar or a numeric vector (for coefficients, enter a vector). The tuning parameter should be set such that the acceptance rate is satisfactory (between 0.2 and 0.5). The default value is 1.1.`verbose`: defaults to`FALSE`. If`TRUE`, the progress of the sampler (every ) is printed to the screen.`seed`: seed for the random number generator. The default is`NA`which corresponds to a random seed of 12345.`beta.start`: starting values for the Markov chain, either a scalar or a vector (for coefficients, enter a vector). The default is`NA`where the maximum likelihood estimates are used as the starting values.

Use the following arguments to specify the priors for the model:

`b0`: prior mean for the coefficients, either a scalar or vector. If a scalar, that value will be the prior mean for all the coefficients. The default is 0.`B0`: prior precision parameter for the coefficients, either a square matrix with the dimensions equal to the number of coefficients or a scalar. If a scalar, that value times an identity matrix will be the prior precision parameter. The default is 0 which leads to an improper prior.

Zelig users may wish to refer to `help(MCMCmnl)` for more information.

Attaching the sample dataset:

```
data(mexico)
```

Estimating multinomial logistics regression using `mlogit.bayes`:

```
z.out <- zelig(vote88 ~ pristr + othcok + othsocok,
model = "mlogit.bayes", data = mexico)
```

```
## Calculating MLEs and large sample var-cov matrix.
## This may take a moment...
## Inverting Hessian to get large sample var-cov matrix.
```

```
## Warning: the condition has length > 1 and only the first element will be used
## Warning: the condition has length > 1 and only the first element will be used
```

```
##
##
## MCMCmnl Metropolis iteration 1 of 11000
## beta =
## -2.43503
## -2.72348
## -0.70416
## -0.71276
## 0.99671
## 1.16367
## 0.52819
## 0.50504
## Metropolis acceptance rate for beta = 1.00000
##
##
##
## MCMCmnl Metropolis iteration 1101 of 11000
## beta =
## -3.08406
## -3.23117
## -0.68172
## -0.58702
## 1.32006
## 1.46886
## 0.34517
## 0.08113
## Metropolis acceptance rate for beta = 0.76203
##
##
##
## MCMCmnl Metropolis iteration 2201 of 11000
## beta =
## -2.37163
## -2.72435
## -0.75901
## -0.65170
## 1.12979
## 1.04706
## 0.30992
## 0.63282
## Metropolis acceptance rate for beta = 0.77010
##
##
##
## MCMCmnl Metropolis iteration 3301 of 11000
## beta =
## -2.61718
## -2.37818
## -0.81263
## -0.72653
## 1.21112
## 1.15121
## 0.29893
## 0.22116
## Metropolis acceptance rate for beta = 0.76704
##
##
##
## MCMCmnl Metropolis iteration 4401 of 11000
## beta =
## -2.55979
## -3.33989
## -0.77931
## -0.58265
## 1.15036
## 1.32952
## 0.45473
## 0.48890
## Metropolis acceptance rate for beta = 0.76096
##
##
##
## MCMCmnl Metropolis iteration 5501 of 11000
## beta =
## -2.11066
## -2.75327
## -0.85810
## -0.62697
## 1.08461
## 1.14543
## 0.27174
## 0.40096
## Metropolis acceptance rate for beta = 0.76077
##
##
##
## MCMCmnl Metropolis iteration 6601 of 11000
## beta =
## -2.46162
## -2.81167
## -0.85914
## -0.57910
## 1.12878
## 1.21514
## 0.48948
## 0.32908
## Metropolis acceptance rate for beta = 0.75746
##
##
##
## MCMCmnl Metropolis iteration 7701 of 11000
## beta =
## -2.48861
## -3.34618
## -0.78412
## -0.50089
## 1.21538
## 1.47418
## 0.25571
## 0.14282
## Metropolis acceptance rate for beta = 0.76029
##
##
##
## MCMCmnl Metropolis iteration 8801 of 11000
## beta =
## -2.74870
## -2.47143
## -0.55028
## -0.66065
## 1.21813
## 1.19189
## 0.14761
## 0.21105
## Metropolis acceptance rate for beta = 0.75844
##
##
##
## MCMCmnl Metropolis iteration 9901 of 11000
## beta =
## -3.17385
## -3.28283
## -0.65244
## -0.60003
## 1.31771
## 1.24363
## 0.35401
## 0.61689
## Metropolis acceptance rate for beta = 0.75891
##
## How to cite this model in Zelig:
## Ben Goodrich, Ying Lu. 2013.
## mlogitbayes: Bayesian Multinomial Logistic Regression for Dependent Variables with Unordered Categorical Values
## in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
## http://zeligproject.org/
```

Checking for convergence before summarizing the estimates:

```
raftery.diag(z.out$coefficients)
```

```
summary(z.out)
```

Setting values for the explanatory variables to their sample averages:

```
x.out <- setx(z.out)
```

Simulating quantities of interest from the posterior distribution
given `x.out`.

```
s.out1 <- sim(z.out, x = x.out)
summary(s.out1)
```

```
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## P(Y=1) 0.5613 0.01592 0.5615 0.5307 0.5915
## P(Y=2) 0.2099 0.01273 0.2098 0.1854 0.2351
## P(Y=3) 0.2288 0.01360 0.2286 0.2034 0.2558
## pv
## qi
## 1 2 3
## 0.5596 0.2088 0.2316
```

Estimating the first difference (and risk ratio) in the
probabilities of voting different candidates when `pristr` (the
strength of the PRI) is set to be weak (equal to 1) versus strong
(equal to 3) while all the other variables held at their default
values.

```
x.weak <- setx(z.out, pristr = 1)
x.strong <- setx(z.out, pristr = 3)
s.out2 <- sim(z.out, x = x.strong, x1 = x.weak)
summary(s.out2)
```

```
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## P(Y=1) 0.7157 0.02128 0.7158 0.6726 0.7561
## P(Y=2) 0.1270 0.01458 0.1266 0.1001 0.1563
## P(Y=3) 0.1573 0.01646 0.1568 0.1261 0.1910
## pv
## qi
## 1 2 3
## 0.7179 0.1264 0.1557
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## P(Y=1) 0.4028 0.02358 0.4028 0.3563 0.4484
## P(Y=2) 0.3037 0.02131 0.3029 0.2638 0.3471
## P(Y=3) 0.2935 0.02189 0.2932 0.2518 0.3372
## pv
## qi
## 1 2 3
## 0.4051 0.2986 0.2963
## fd
## mean sd 50% 2.5% 97.5%
## P(Y=1) -0.3129 0.03460 -0.3129 -0.38111 -0.2443
## P(Y=2) 0.1767 0.02735 0.1765 0.12360 0.2314
## P(Y=3) 0.1362 0.02881 0.1363 0.07966 0.1936
```

Let be the (unordered) categorical dependent variable for observation which takes an integer values .

The

*stochastic component*is given by:where for .

The

*systematic component*is given bywhere is the vector of explanatory variables for observation and is the vector of coefficient for category . Category is assumed to be the baseline category.

The

*prior*for is given bywhere is the vector of means for the explanatory variables and is the precision matrix (the inverse of a variance-covariance matrix).

The expected values (

`qi$ev`) for the multinomial logistics regression model are the predicted probability of belonging to each category:and

given the posterior draws of for all categories from the MCMC iterations.

The predicted values (

`qi$pr`) are the draws of from a multinomial distribution whose parameters are the expected values(`qi$ev`) computed based on the posterior draws of from the MCMC iterations.The first difference (

`qi$fd`) in category for the multinomial logistic model is defined asThe risk ratio (

`qi$rr`) in category is defined asIn conditional prediction models, the average expected treatment effect (

`qi$att.ev`) for the treatment group in category iswhere is a binary explanatory variable defining the treatment () and control () groups, and is the number of treated observations in category .

In conditional prediction models, the average predicted treatment effect (

`qi$att.pr`) for the treatment group in category iswhere is a binary explanatory variable defining the treatment () and control () groups, and is the number of treated observations in category .

The output of each Zelig command contains useful information which you may view. For example, if you run:

```
z.out <- zelig(y ~ x, model = "mlogit.bayes", data)
```

then you may examine the available information in `z.out` by using
`names(z.out)`, see the draws from the posterior distribution of the
`coefficients` by using `z.out$coefficients`, and view a default
summary of information through `summary(z.out)`. Other elements
available through the `$` operator are listed below.

Bayesian logistic regression is part of the MCMCpack library by Andrew D. Martin and Kevin M. Quinn . The convergence diagnostics are part of the CODA library by Martyn Plummer, Nicky Best, Kate Cowles, Karen Vines, Deepayan Sarkar, Russell Almond.