Fork me on GitHub

Model Reference and Vignettes

This section includes technical information on the models currently implemented in Zelig (5.0-1). This includes a reference with a list of supported models as well as individual model vignettes with detailed information on the model, quantities of interest and syntax.

Reference

The following models are currently supported in Zelig 5.0-1:


zelig-exp

Exponential Regression for Duration Dependent Variables

Use the exponential duration regression model if you have a dependent variable representing a duration (time until an event). The model assumes a constant hazard rate for all events. The dependent variable may be censored (for observations have not yet been completed when data were collected).

Syntax

With reference classes:

z5 <- zexp$new()
z5$zelig(Surv(Y, C) ~ X, data = mydata)
z5$setx()
z5$sim()

With the Zelig 4 compatibility wrappers:

z.out <- zelig(Surv(Y, C) ~ X, model = "exp", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)

Exponential models require that the dependent variable be in the form Surv(Y, C), where Y and C are vectors of length . For each observation in 1, …, , the value is the duration (lifetime, for example), and the associated is a binary variable such that if the duration is not censored (e.g., the subject dies during the study) or if the duration is censored (e.g., the subject is still alive at the end of the study and is know to live at least as long as ). If is omitted, all Y are assumed to be completed; that is, time defaults to 1 for all observations.

Input Values

In addition to the standard inputs, zelig() takes the following additional options for exponential regression:

  • robust: defaults to FALSE. If TRUE, zelig() computes robust standard errors based on sandwich estimators (see and ) and the options selected in cluster.
  • cluster: if robust = TRUE, you may select a variable to define groups of correlated observations. Let x3 be a variable that consists of either discrete numeric values, character strings, or factors that define strata. Then
z.out <- zelig(y ~ x1 + x2, robust = TRUE, cluster = "x3",
                        model = "exp", data = mydata)

means that the observations can be correlated within the strata defined by the variable x3, and that robust standard errors should be calculated according to those clusters. If robust = TRUE but cluster is not specified, zelig() assumes that each observation falls into its own cluster.

Example

Attach the sample data:

data(coalition)

Estimate the model:

z.out <- zelig(Surv(duration, ciep12) ~ fract + numst2, model = "exp", data = coalition)
## How to cite this model in Zelig:
##   Olivia Lau, Kosuke Imai, Gary King. 2011.
##   exp: Exponential Regression for Duration Dependent Variables
##   in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
##   http://zeligproject.org/

View the regression output:

summary(z.out)
## Model:
## $by
## [1] 1
##
## Call:
## survival::survreg(formula = Surv(duration, ciep12) ~ fract +
##     numst2, data = ., dist = "exponential", model = FALSE)
##
## Coefficients:
## (Intercept)       fract      numst2
##    5.535873   -0.003909    0.461179
##
## Scale fixed at 1
##
## Loglik(model)= -1077   Loglik(intercept only)= -1101
##  Chisq= 46.66 on 2 degrees of freedom, p= 7.4e-11
## n= 314
## Next step: Use 'setx' method

Set the baseline values (with the ruling coalition in the minority) and the alternative values (with the ruling coalition in the majority) for X:

x.low <- setx(z.out, numst2 = 0)
## Error: error in evaluating the argument 'x' in selecting a method for function 'terms': Error in Ops.Surv(y, z$residuals) : Invalid operation on a survival time
## Calls: lm -> lm.fit -> Ops.Surv
x.high <- setx(z.out, numst2 = 1)
## Error: error in evaluating the argument 'x' in selecting a method for function 'terms': Error in Ops.Surv(y, z$residuals) : Invalid operation on a survival time
## Calls: lm -> lm.fit -> Ops.Surv

Simulate expected values and first differences:

s.out <- sim(z.out, x = x.low, x1 = x.high)
## Error: object 'x.low' not found

Summarize quantities of interest and produce some plots:

summary(s.out)
## Error: error in evaluating the argument 'object' in selecting a method for function 'summary': Error: object 's.out' not found
plot(s.out)
## Error: error in evaluating the argument 'x' in selecting a method for function 'plot': Error: object 's.out' not found

Model

Let be the survival time for observation . This variable might be censored for some observations at a fixed time such that the fully observed dependent variable, , is defined as

y_c \\ \end{array} \right."/>

  • The stochastic component is described by the distribution of the partially observed variable . We assume follows the exponential distribution whose density function is given by

    for and 0"/>. The mean of this distribution is .

    In addition, survival models like the exponential have three additional properties. The hazard function measures the probability of not surviving past time given survival up to . In general, the hazard function is equal to where the survival function represents the fraction still surviving at time . The cumulative hazard function describes the probability of dying before time . In general, . In the case of the exponential model,

    For the exponential model, the hazard function is constant over time. The Weibull model and lognormal models allow the hazard function to vary as a function of elapsed time (see and respectively).

  • The systematic component is modeled as

    where is the vector of explanatory variables, and is the vector of coefficients.

Quantities of Interest

  • The expected values (qi$ev) for the exponential model are simulations of the expected duration given and draws of from its posterior,

  • The predicted values (qi$pr) are draws from the exponential distribution with rate equal to the expected value.

  • The first difference (or difference in expected values, qi$ev.diff), is

    where and are different vectors of values for the explanatory variables.

  • In conditional prediction models, the average expected treatment effect (att.ev) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups. When is censored rather than observed, we replace it with a simulation from the model given available knowledge of the censoring process. Variation in the simulations is due to two factors: uncertainty in the imputation process for censored and uncertainty in simulating , the counterfactual expected value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

  • In conditional prediction models, the average predicted treatment effect (att.pr) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups. When is censored rather than observed, we replace it with a simulation from the model given available knowledge of the censoring process. Variation in the simulations is due to two factors: uncertainty in the imputation process for censored and uncertainty in simulating , the counterfactual predicted value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

Output Values

The output of each Zelig command contains useful information which you may view. For example, if you run z.out <- zelig(Surv(Y, C) ~ X, model = exp, data), then you may examine the available information in z.out by using names(z.out), see the coefficients by using z.out$coefficients, and a default summary of information through summary(z.out).

See also

The exponential function is part of the survival library by Terry Therneau, ported to R by Thomas Lumley. Advanced users may wish to refer to help(survfit) in the survival library.


zelig-gamma

Gamma Regression for Continuous, Positive Dependent Variables

Use the gamma regression model if you have a positive-valued dependent variable such as the number of years a parliamentary cabinet endures, or the seconds you can stay airborne while jumping. The gamma distribution assumes that all waiting times are complete by the end of the study (censoring is not allowed).

Syntax

With reference classes:

z5 <- zgamma$new()
z5$zelig(Y ~ X1 + X ~ X, data = mydata)
z5$setx()
z5$sim()

With the Zelig 4 compatibility wrappers:

z.out <- zelig(Y ~ X1 + X2, model = "gamma", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out, x1 = NULL)

Example

Attach the sample data:

data(coalition)

Estimate the model:

z.out <- zelig(duration ~ fract + numst2, model = "gamma", data = coalition)
## How to cite this model in Zelig:
##   Kosuke Imai, Gary King, Olivia Lau. 2007.
##   gamma: Gamma Regression for Continuous, Positive Dependent Variables
##   in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
##   http://zeligproject.org/

View the regression output:

summary(z.out)
## Model:
## $by
## [1] 1
##
##
## Call:  stats::glm(formula = duration ~ fract + numst2, family = Gamma("inverse"),
##     data = .)
##
## Coefficients:
## (Intercept)        fract       numst2
##   -0.012960     0.000115    -0.017387
##
## Degrees of Freedom: 313 Total (i.e. Null);  311 Residual
## Null Deviance:       301
## Residual Deviance: 272   AIC: 2430
## Next step: Use 'setx' method

Set the baseline values (with the ruling coalition in the minority) and the alternative values (with the ruling coalition in the majority) for X:

x.low <- setx(z.out, numst2 = 0)
x.high <- setx(z.out, numst2 = 1)

Simulate expected values (qi$ev) and first differences (qi$fd):

s.out <- sim(z.out, x = x.low, x1 = x.high)
summary(s.out)
##
##  sim x :
##  -----
## ev
##       mean    sd   50%  2.5% 97.5%
## [1,] 14.46 1.071 14.38 12.57 16.68
## pv
##       mean    sd  50%   2.5% 97.5%
## [1,] 14.87 13.61 10.9 0.7095 49.69
##
##  sim x1 :
##  -----
## ev
##       mean    sd   50%  2.5% 97.5%
## [1,] 19.16 1.102 19.11 17.19 21.34
## pv
##       mean    sd   50%   2.5% 97.5%
## [1,] 18.85 17.62 14.12 0.9335 68.06
## fd
##       mean    sd   50%  2.5% 97.5%
## [1,] 4.693 1.516 4.723 1.713 7.475
plot(s.out)

Zelig-gamma

Model

  • The Gamma distribution with scale parameter has a stochastic component:

    for 0"/>.
  • The systematic component is given by

Quantities of Interest

  • The expected values (qi$ev) are simulations of the mean of the stochastic component given draws of and from their posteriors:

  • The predicted values (qi$pr) are draws from the gamma distribution for each given set of parameters .

  • If x1 is specified, sim() also returns the differences in the expected values (qi$fd),

    .

  • In conditional prediction models, the average expected treatment effect (att.ev) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual expected value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

  • In conditional prediction models, the average predicted treatment effect (att.pr) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual predicted value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

Output Values

The output of each Zelig command contains useful information which you may view. For example, if you run z.out <- zelig(y ~ x, model = gamma, data), then you may examine the available information in z.out by using names(z.out), see the coefficients by using z.out$coefficients, and a default summary of information through summary(z.out).

See also

The gamma model is part of the stats package. Advanced users may wish to refer to help(glm) and help(family).


zelig-logit

Logistic Regression for Dichotomous Dependent Variables

Logistic regression specifies a dichotomous dependent variable as a function of a set of explanatory variables.

Syntax

With reference classes:

z5 <- zlogit$new()
z5$zelig(Y ~ X1 + X ~ X, data = mydata)
z5$setx()
z5$sim()

With the Zelig 4 compatibility wrappers:

z.out <- zelig(Y ~ X1 + X2, model = "logit", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out, x1 = NULL)

Examples

Basic Example

Attaching the sample turnout dataset:

data(turnout)

Estimating parameter values for the logistic regression:

z.out1 <- zelig(vote ~ age + race, model = "logit", data = turnout)
## How to cite this model in Zelig:
##   Kosuke Imai, Gary King, Olivia Lau. 2007.
##   logit: Logistic Regression for Dichotomous Dependent Variables
##   in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
##   http://zeligproject.org/

Setting values for the explanatory variables:

x.out1 <- setx(z.out1, age = 36, race = "white")

Simulating quantities of interest from the posterior distribution.

s.out1 <- sim(z.out1, x = x.out1)
summary(s.out1)
##
##  sim x :
##  -----
## ev
##        mean      sd    50%   2.5% 97.5%
## [1,] 0.7486 0.01185 0.7486 0.7243 0.771
## pv
##          0     1
## [1,] 0.255 0.745
plot(s.out1)

Zelig-logit-1

Simulating First Differences

Estimating the risk difference (and risk ratio) between low education (25th percentile) and high education (75th percentile) while all the other variables held at their default values.

z.out2 <- zelig(vote ~ race + educate, model = "logit", data = turnout)
## How to cite this model in Zelig:
##   Kosuke Imai, Gary King, Olivia Lau. 2007.
##   logit: Logistic Regression for Dichotomous Dependent Variables
##   in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
##   http://zeligproject.org/
x.high <- setx(z.out2, educate = quantile(turnout$educate, prob = 0.75))
x.low <- setx(z.out2, educate = quantile(turnout$educate, prob = 0.25))
s.out2 <- sim(z.out2, x = x.high, x1 = x.low)
summary(s.out2)
##
##  sim x :
##  -----
## ev
##        mean     sd    50%   2.5%  97.5%
## [1,] 0.8229 0.0102 0.8232 0.8028 0.8413
## pv
##          0     1
## [1,] 0.195 0.805
##
##  sim x1 :
##  -----
## ev
##        mean      sd    50%   2.5%  97.5%
## [1,] 0.7093 0.01318 0.7091 0.6834 0.7337
## pv
##          0     1
## [1,] 0.291 0.709
## fd
##         mean      sd     50%    2.5%    97.5%
## [1,] -0.1136 0.01203 -0.1135 -0.1379 -0.09082
plot(s.out2)

Zelig-logit-2

Model

Let be the binary dependent variable for observation which takes the value of either 0 or 1.

  • The stochastic component is given by

    where .

  • The systematic component is given by:

    where is the vector of explanatory variables for observation and is the vector of coefficients.

Quantities of Interest

  • The expected values (qi$ev) for the logit model are simulations of the predicted probability of a success:

    given draws of from its sampling distribution.

  • The predicted values (qi$pr) are draws from the Binomial distribution with mean equal to the simulated expected value .

  • The first difference (qi$fd) for the logit model is defined as

  • The risk ratio (qi$rr) is defined as

  • In conditional prediction models, the average expected treatment effect (att.ev) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual expected value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

  • In conditional prediction models, the average predicted treatment effect (att.pr) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual predicted value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

Output Values

The output of each Zelig command contains useful information which you may view. For example, if you run z.out <- zelig(y ~ x, model = logit, data), then you may examine the available information in z.out by using names(z.out), see the coefficients by using z.out$coefficients, and a default summary of information through summary(z.out).

See also

The logit model is part of the stats package. Advanced users may wish to refer to help(glm) and help(family).


zelig-lognorm

Log-Normal Regression for Duration Dependent Variables

The log-normal model describes an event’s duration, the dependent variable, as a function of a set of explanatory variables. The log-normal model may take time censored dependent variables, and allows the hazard rate to increase and decrease.

Syntax

With reference classes:

z5 <- zlognorm$new()
z5$zelig(Surv(Y, C) ~ X, data = mydata)
z5$setx()
z5$sim()

With reference classes:

z5 <- zlognorm$new()
z5$zelig(Surv(Y, C) ~ X, data = mydata)
z5$setx()
z5$sim()

With the Zelig 4 compatibility wrappers:

z.out <- zelig(Surv(Y, C) ~ X, model = "lognorm", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)

Log-normal models require that the dependent variable be in the form Surv(Y, C), where Y and C are vectors of length . For each observation in 1, …, , the value is the duration (lifetime, for example) of each subject, and the associated is a binary variable such that if the duration is not censored (e.g., the subject dies during the study) or if the duration is censored (e.g., the subject is still alive at the end of the study). If is omitted, all Y are assumed to be completed; that is, time defaults to 1 for all observations.

Input Values

In addition to the standard inputs, zelig() takes the following additional options for lognormal regression:

  • robust: defaults to FALSE. If TRUE, zelig() computes robust standard errors based on sandwich estimators (see and ) based on the options in cluster.
  • cluster: if robust = TRUE, you may select a variable to define groups of correlated observations. Let x3 be a variable that consists of either discrete numeric values, character strings, or factors that define strata. Then
z.out <- zelig(y ~ x1 + x2, robust = TRUE, cluster = "x3", model = "exp", data = mydata)

means that the observations can be correlated within the strata defined by the variable x3, and that robust standard errors should be calculated according to those clusters. If robust = TRUE but cluster is not specified, zelig() assumes that each observation falls into its own cluster.

Example

Attach the sample data:

data(coalition)

Estimate the model:

z.out <- zelig(Surv(duration, ciep12) ~ fract + numst2, model ="lognorm",  data = coalition)
## How to cite this model in Zelig:
##   Matthew Owen, Olivia Lau, Kosuke Imai, Gary King. 2007.
##   lognorm: Log-Normal Regression for Duration Dependent Variables
##   in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
##   http://zeligproject.org/

View the regression output:

summary(z.out)
## Model:
## $by
## [1] 1
##
## Call:
## survival::survreg(formula = Surv(duration, ciep12) ~ fract +
##     numst2, data = ., dist = "lognormal", model = FALSE)
##
## Coefficients:
## (Intercept)       fract      numst2
##    5.366670   -0.004438    0.559833
##
## Scale= 1.2
##
## Loglik(model)= -1078   Loglik(intercept only)= -1101
##  Chisq= 46.58 on 2 degrees of freedom, p= 7.7e-11
## n= 314
## Next step: Use 'setx' method

Set the baseline values (with the ruling coalition in the minority) and the alternative values (with the ruling coalition in the majority) for X:

x.low <- setx(z.out, numst2 = 0)
## Error: error in evaluating the argument 'x' in selecting a method for function 'terms': Error in Ops.Surv(y, z$residuals) : Invalid operation on a survival time
## Calls: lm -> lm.fit -> Ops.Surv
x.high <- setx(z.out, numst2= 1)
## Error: error in evaluating the argument 'x' in selecting a method for function 'terms': Error in Ops.Surv(y, z$residuals) : Invalid operation on a survival time
## Calls: lm -> lm.fit -> Ops.Surv

Simulate expected values (qi$ev) and first differences (qi$fd):

s.out <- sim(z.out, x = x.low, x1 = x.high)
summary(s.out)
##
##  sim x :
##  -----
## ev
##        mean      sd    50%   2.5%  97.5%
## [1,] 0.7091 0.01293 0.7098 0.6829 0.7328
## pv
##          0     1
## [1,] 0.294 0.706
##
##  sim x1 :
##  -----
## ev
##        mean       sd    50%   2.5%  97.5%
## [1,] 0.8224 0.009923 0.8226 0.8031 0.8417
## pv
##          0     1
## [1,] 0.178 0.822
## fd
##        mean      sd    50%    2.5%  97.5%
## [1,] 0.1134 0.01112 0.1132 0.09218 0.1363
plot(s.out)

Zelig-lognorm

Model

Let be the survival time for observation with the density function and the corresponding distribution function . This variable might be censored for some observations at a fixed time such that the fully observed dependent variable, , is defined as

y_c \\ \end{array} \right."/>

  • The stochastic component is described by the distribution of the partially observed variable, . For the lognormal model, there are two equivalent representations:

    where the parameters and are the mean and variance of the Normal distribution. (Note that the output from zelig() parameterizes scale:math:` = sigma`.)

    In addition, survival models like the lognormal have three additional properties. The hazard function measures the probability of not surviving past time given survival up to . In general, the hazard function is equal to where the survival function represents the fraction still surviving at time . The cumulative hazard function describes the probability of dying before time . In general, . In the case of the lognormal model,

    where is the cumulative density function for the Normal distribution.

  • The systematic component is described as:

Quantities of Interest

  • The expected values (qi$ev) for the lognormal model are simulations of the expected duration:

    given draws of and from their sampling distributions.

  • The predicted value is a draw from the log-normal distribution given simulations of the parameters .

  • The first difference (qi$fd) is

  • In conditional prediction models, the average expected treatment effect (att.ev) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups. When is censored rather than observed, we replace it with a simulation from the model given available knowledge of the censoring process. Variation in the simulations is due to two factors: uncertainty in the imputation process for censored and uncertainty in simulating , the counterfactual expected value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

  • In conditional prediction models, the average predicted treatment effect (att.pr) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups. When is censored rather than observed, we replace it with a simulation from the model given available knowledge of the censoring process. Variation in the simulations are due to two factors: uncertainty in the imputation process for censored and uncertainty in simulating , the counterfactual predicted value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

Output Values

The output of each Zelig command contains useful information which you may view. For example, if you run z.out <- zelig(Surv(Y, C) ~ X, model = lognorm, data), then you may examine the available information in z.out by using names(z.out), see the coefficients by using z.out$coefficients, and a default summary of information through summary(z.out).

See also

The exponential function is part of the survival library by by Terry Therneau, ported to R by Thomas Lumley. Advanced users may wish to refer to help(survfit) in the survival library.


zelig-ls

Least Squares Regression for Continuous Dependent Variables

Use least squares regression analysis to estimate the best linear predictor for the specified dependent variables.

Syntax

With reference classes:

z5 <- zls$new()
z5$zelig(Y ~ X1 + X ~ X, data = mydata)
z5$setx()
z5$sim()

With the Zelig 4 compatibility wrappers:

z.out <- zelig(Y ~ X1 + X2, model = "ls", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)

Examples

Basic Example with First Differences

Attach sample data:

data(macro)

Estimate model:

z.out1 <- zelig(unem ~ gdp + capmob + trade, model = "ls", data = macro)
## How to cite this model in Zelig:
##   Kosuke Imai, Gary King, and Olivia Lau. 2007.
##   ls: Least Squares Regression for Continuous Dependent Variables
##   in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
##   http://zeligproject.org/

Summarize regression coefficients:

summary(z.out1)
## Model:
## $by
## [1] 1
##
##
## Call:
## stats::lm(formula = unem ~ gdp + capmob + trade, data = .)
##
## Coefficients:
## (Intercept)          gdp       capmob        trade
##      6.1813      -0.3236       1.4219       0.0199
##
## Next step: Use 'setx' method

Set explanatory variables to their default (mean/mode) values, with high (80th percentile) and low (20th percentile) values for the trade variable:

x.high <- setx(z.out1, trade = quantile(macro$trade, 0.8))
x.low <- setx(z.out1, trade = quantile(macro$trade, 0.2))

Generate first differences for the effect of high versus low trade on GDP:

s.out1 <- sim(z.out1, x = x.high, x1 = x.low)
summary(s.out1)
##
##  sim x :
##  -----
## ev
##    mean     sd   50%  2.5% 97.5%
## 1 5.434 0.1854 5.433 5.065 5.794
## pv
##    mean     sd   50%  2.5% 97.5%
## 1 5.434 0.1854 5.433 5.065 5.794
##
##  sim x1 :
##  -----
## ev
##    mean     sd   50%  2.5% 97.5%
## 1 4.591 0.1859 4.591 4.223 4.939
## pv
##    mean     sd   50%  2.5% 97.5%
## 1 4.591 0.1859 4.591 4.223 4.939
## fd
##      mean     sd     50%   2.5%   97.5%
## 1 -0.8432 0.2323 -0.8312 -1.321 -0.4059
plot(s.out1)

Using Dummy Variables

Estimate a model with fixed effects for each country (see for help with dummy variables). Note that you do not need to create dummy variables, as the program will automatically parse the unique values in the selected variable into discrete levels.

z.out2 <- zelig(unem ~ gdp + trade + capmob + as.factor(country), model = "ls", data = macro)
## How to cite this model in Zelig:
##   Kosuke Imai, Gary King, and Olivia Lau. 2007.
##   ls: Least Squares Regression for Continuous Dependent Variables
##   in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
##   http://zeligproject.org/

Set values for the explanatory variables, using the default mean/mode values, with country set to the United States and Japan, respectively:

x.US <- setx(z.out2, country = "United States")
x.Japan <- setx(z.out2, country = "Japan")

Simulate quantities of interest:

s.out2 <- sim(z.out2, x = x.US, x1 = x.Japan)
plot(s.out2)

Model

  • The stochastic component is described by a density with mean and the common variance

  • The systematic component models the conditional mean as

    where is the vector of covariates, and is the vector of coefficients.

    The least squares estimator is the best linear predictor of a dependent variable given , and minimizes the sum of squared residuals, .

Quantities of Interest

  • The expected value (qi$ev) is the mean of simulations from the stochastic component,

    given a draw of from its sampling distribution.

  • In conditional prediction models, the average expected treatment effect (att.ev) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual expected value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

Output Values

The output of each Zelig command contains useful information which you may view. For example, if you run z.out <- zelig(y ~ x, model = ls, data), then you may examine the available information in z.out by using names(z.out), see the coefficients by using z.out$coefficients, and a default summary of information through summary(z.out). Other elements available through the $ operator are listed below.

  • From the zelig() output object z.out, you may extract:

    • coefficients: parameter estimates for the explanatory variables.
    • residuals: the working residuals in the final iteration of the IWLS fit.
    • fitted.values: fitted values.
    • df.residual: the residual degrees of freedom.
    • zelig.data: the input data frame if save.data = TRUE.
  • From summary(z.out), you may extract:

    • coefficients: the parameter estimates with their associated standard errors, -values, and -statistics.

    • sigma: the square root of the estimate variance of the random error :

    • r.squared: the fraction of the variance explained by the model.

    • adj.r.squared: the above statistic, penalizing for an increased number of explanatory variables.

    • cov.unscaled: a matrix of unscaled covariances.

See also

The least squares regression is part of the stats package by William N. Venables and Brian D. Ripley .In addition, advanced users may wish to refer to help(lm) and help(lm.fit).


zelig-negbin

Negative Binomial Regression for Event Count Dependent Variables

Use the negative binomial regression if you have a count of events for each observation of your dependent variable. The negative binomial model is frequently used to estimate over-dispersed event count models.

Syntax

With reference classes:

z5 <- znegbin$new()
z5$zelig(Y ~ X1 + X ~ X, data = mydata)
z5$setx()
z5$sim()

With the Zelig 4 compatibility wrappers:

z.out <- zelig(Y ~ X1 + X2, model = "negbin", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)

Example

Load sample data:

data(sanction)

Estimate the model:

z.out <- zelig(num ~ target + coop, model = "negbin", data = sanction)
## How to cite this model in Zelig:
##   Kosuke Imai, Gary King, Olivia Lau. 2008.
##   negbinom: Negative Binomial Regression for Event Count Dependent Variables
##   in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
##   http://zeligproject.org/
summary(z.out)
## Model:
## $by
## [1] 1
##
##
## Call:  MASS::glm.nb(formula = num ~ target + coop, data = ., init.theta = 1.841603403,
##     link = log)
##
## Coefficients:
## (Intercept)       target         coop
##      -1.564        0.151        1.286
##
## Degrees of Freedom: 77 Total (i.e. Null);  75 Residual
## Null Deviance:       237
## Residual Deviance: 56.5  AIC: 360
## Next step: Use 'setx' method

Set values for the explanatory variables to their default mean values:

x.out <- setx(z.out)

Simulate fitted values:

s.out <- sim(z.out, x = x.out)
summary(s.out)
##
##  sim x :
##  -----
## ev
##       mean     sd   50%  2.5% 97.5%
## [1,] 2.975 0.3322 2.959 2.376 3.703
## pv
## qi
##     0     1     2     3     4     5     6     7     8     9    10    11
## 0.147 0.184 0.157 0.120 0.112 0.084 0.068 0.039 0.029 0.013 0.020 0.012
##    12    13    14    16
## 0.009 0.003 0.001 0.002
plot(s.out)

Zelig-negbin

Model

Let be the number of independent events that occur during a fixed time period. This variable can take any non-negative integer value.

  • The negative binomial distribution is derived by letting the mean of the Poisson distribution vary according to a fixed parameter given by the Gamma distribution. The stochastic component is given by

    The marginal distribution of is then the negative binomial with mean and variance :

    where is the systematic parameter of the Gamma distribution modeling .

  • The systematic component is given by

    where is the vector of explanatory variables and is the vector of coefficients.

Quantities of Interest

  • The expected values (qi$ev) are simulations of the mean of the stochastic component. Thus,

    given simulations of .

  • The predicted value (qi$pr) drawn from the distribution defined by the set of parameters .

  • The first difference (qi$fd) is

  • In conditional prediction models, the average expected treatment effect (att.ev) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual expected value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

  • In conditional prediction models, the average predicted treatment effect (att.pr) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual predicted value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

Output Values

The output of each Zelig command contains useful information which you may view. For example, if you run z.out <- zelig(y ~ x, model = negbin, data), then you may examine the available information in z.out by using names(z.out), see the coefficients by using z.out$coefficients, and a default summary of information through summary(z.out).

See also

The negative binomial model is part of the MASS package by William N. Venable and Brian D. Ripley . Advanced users may wish to refer to ``help(glm.nb)`.


zelig-normal

Normal Regression for Continuous Dependent Variables

The Normal regression model is a close variant of the more standard least squares regression model (see ). Both models specify a continuous dependent variable as a linear function of a set of explanatory variables. The Normal model reports maximum likelihood (rather than least squares) estimates. The two models differ only in their estimate for the stochastic parameter .

Syntax

With reference classes:

z5 <- znormal$new()
z5$zelig(Y ~ X1 + X ~ X, data = mydata)
z5$setx()
z5$sim()

With the Zelig 4 compatibility wrappers:

z.out <- zelig(Y ~ X1 + X2, model = "normal", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)

Examples

Basic Example with First Differences

Attach sample data:

data(macro)

Estimate model:

z.out1 <- zelig(unem ~ gdp + capmob + trade, model = "normal", data = macro)
## How to cite this model in Zelig:
##   Kosuke Imai, Gary King, Olivia Lau. 2008.
##   normal: Normal Regression for Continuous Dependent Variables
##   in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
##   http://zeligproject.org/

Summarize of regression coefficients:

summary(z.out1)
## Model:
## $by
## [1] 1
##
##
## Call:  stats::glm(formula = unem ~ gdp + capmob + trade, family = gaussian("identity"),
##     data = .)
##
## Coefficients:
## (Intercept)          gdp       capmob        trade
##      6.1813      -0.3236       1.4219       0.0199
##
## Degrees of Freedom: 349 Total (i.e. Null);  346 Residual
## Null Deviance:       3660
## Residual Deviance: 2610  AIC: 1710
## Next step: Use 'setx' method

Set explanatory variables to their default (mean/mode) values, with high (80th percentile) and low (20th percentile) values for trade:

x.high <- setx(z.out1, trade = quantile(macro$trade, 0.8))
x.low <- setx(z.out1, trade = quantile(macro$trade, 0.2))

Generate first differences for the effect of high versus low trade on GDP:

s.out1 <- sim(z.out1, x = x.high, x1 = x.low)
summary(s.out1)
##
##  sim x :
##  -----
## ev
##      mean     sd  50%  2.5% 97.5%
## [1,] 5.43 0.1872 5.42 5.046 5.792
## pv
##       mean    sd   50%    2.5% 97.5%
## [1,] 5.376 2.788 5.485 -0.1162 10.47
##
##  sim x1 :
##  -----
## ev
##       mean     sd   50%  2.5% 97.5%
## [1,] 4.592 0.1781 4.592 4.254 4.948
## pv
##       mean   sd   50%    2.5% 97.5%
## [1,] 4.723 2.73 4.727 -0.5893 10.02
## fd
##         mean     sd     50%   2.5%   97.5%
## [1,] -0.8381 0.2308 -0.8242 -1.309 -0.3974

A visual summary of quantities of interest:

plot(s.out1)

Zelig-normal

Model

Let be the continuous dependent variable for observation .

  • The stochastic component is described by a univariate normal model with a vector of means and scalar variance :

  • The systematic component is

    where is the vector of explanatory variables and is the vector of coefficients.

Quantities of Interest

  • The expected value (qi$ev) is the mean of simulations from the the stochastic component,

    given a draw of from its posterior.

  • The predicted value (qi$pr) is drawn from the distribution defined by the set of parameters .

  • The first difference (qi$fd) is:

  • In conditional prediction models, the average expected treatment effect (att.ev) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual expected value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

  • In conditional prediction models, the average predicted treatment effect (att.pr) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual predicted value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

Output Values

The output of each Zelig command contains useful information which you may view. For example, if you run z.out <- zelig(y ~ x, model = normal, data), then you may examine the available information in z.out by using names(z.out), see the coefficients by using z.out$coefficients, and a default summary of information through summary(z.out).

See also

The normal model is part of the stats package by . Advanced users may wish to refer to help(glm) and help(family).


zelig-poisson

Poisson Regression for Event Count Dependent Variables

Use the Poisson regression model if the observations of your dependent variable represents the number of independent events that occur during a fixed period of time (see the negative binomial model, , for over-dispersed event counts.) For a Bayesian implementation of this model, see .

Syntax

With reference classes:

z5 <- zpoisson$new()
z5$zelig(Y ~ X1 + X ~ X, data = mydata)
z5$setx()
z5$sim()

With the Zelig 4 compatibility wrappers:

z.out <- zelig(Y ~ X1 + X2, model = "poisson", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)

Example

Load sample data:

data(sanction)

Estimate Poisson model:

z.out <- zelig(num ~ target + coop, model = "poisson", data = sanction)
## How to cite this model in Zelig:
##   Kosuke Imai, Gary King, Olivia Lau. 2007.
##   poisson: Poisson Regression for Event Count Dependent Variables
##   in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
##   http://zeligproject.org/
summary(z.out)
## Model:
## $by
## [1] 1
##
##
## Call:  stats::glm(formula = num ~ target + coop, family = poisson("log"),
##     data = .)
##
## Coefficients:
## (Intercept)       target         coop
##      -0.968       -0.021        1.211
##
## Degrees of Freedom: 77 Total (i.e. Null);  75 Residual
## Null Deviance:       1580
## Residual Deviance: 721   AIC: 944
## Next step: Use 'setx' method

Set values for the explanatory variables to their default mean values:

x.out <- setx(z.out)

Simulate fitted values:

s.out <- sim(z.out, x = x.out)
summary(s.out)
##
##  sim x :
##  -----
## ev
##       mean     sd   50%  2.5% 97.5%
## [1,] 3.244 0.2324 3.239 2.799 3.732
## pv
##      mean    sd 50% 2.5% 97.5%
## [1,] 3.32 1.843   3    0     7
plot(s.out)

Zelig-poisson

Model

Let be the number of independent events that occur during a fixed time period. This variable can take any non-negative integer.

  • The Poisson distribution has stochastic component

    where is the mean and variance parameter.

  • The systematic component is

    where is the vector of explanatory variables, and is the vector of coefficients.

Quantities of Interest

  • The expected value (qi$ev) is the mean of simulations from the stochastic component,

    given draws of from its sampling distribution.

  • The predicted value (qi$pr) is a random draw from the poisson distribution defined by mean .

  • The first difference in the expected values (qi$fd) is given by:

  • In conditional prediction models, the average expected treatment effect (att.ev) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual expected value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

  • In conditional prediction models, the average predicted treatment effect (att.pr) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual predicted value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

Output Values

The output of each Zelig command contains useful information which you may view. For example, if you run z.out <- zelig(y ~ x, model = poisson, data), then you may examine the available information in z.out by using names(z.out), see the coefficients by using z.out$coefficients, and a default summary of information through summary(z.out).

See also

The poisson model is part of the stats package by . Advanced users may wish to refer to help(glm) and help(family).


zelig-probit

Probit Regression for Dichotomous Dependent Variables

Use probit regression to model binary dependent variables specified as a function of a set of explanatory variables.

Syntax

With reference classes:

z5 <- zprobit$new()
z5$zelig(Y ~ X1 + X ~ X, data = mydata)
z5$setx()
z5$sim()

With the Zelig 4 compatibility wrappers:

z.out <- zelig(Y ~ X1 + X2, model = "probit", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out, x1 = NULL)

Example

Attach the sample turnout dataset:

data(turnout)

Estimate parameter values for the probit regression:

z.out <- zelig(vote ~ race + educate, model = "probit", data = turnout)
## How to cite this model in Zelig:
##   Kosuke Imai, Gary King, Olivia Lau. 2007.
##   probit: Probit Regression for Dichotomous Dependent Variables
##   in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
##   http://zeligproject.org/
summary(z.out)
## Model:
## $by
## [1] 1
##
##
## Call:  stats::glm(formula = vote ~ race + educate, family = binomial("probit"),
##     data = .)
##
## Coefficients:
## (Intercept)    racewhite      educate
##     -0.7259       0.2991       0.0971
##
## Degrees of Freedom: 1999 Total (i.e. Null);  1997 Residual
## Null Deviance:       2270
## Residual Deviance: 2140  AIC: 2140
## Next step: Use 'setx' method

Set values for the explanatory variables to their default values.

x.out <- setx(z.out)

Simulate quantities of interest from the posterior distribution.

s.out <- sim(z.out, x = x.out)
summary(s.out)
plot(s.out1)

Zelig-probit

Model

Let be the observed binary dependent variable for observation which takes the value of either 0 or 1.

  • The stochastic component is given by

    where .

  • The systematic component is

    where is the cumulative distribution function of the Normal distribution with mean 0 and unit variance.

Quantities of Interest

  • The expected value (qi$ev) is a simulation of predicted probability of success

    given a draw of from its sampling distribution.

  • The predicted value (qi$pr) is a draw from a Bernoulli distribution with mean .

  • The first difference (qi$fd) in expected values is defined as

  • The risk ratio (qi$rr) is defined as

  • In conditional prediction models, the average expected treatment effect (att.ev) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual expected value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

  • In conditional prediction models, the average predicted treatment effect (att.pr) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual predicted value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

Output Values

The output of each Zelig command contains useful information which you may view. For example, if you run z.out <- zelig(y ~ x, model = probit, data), then you may examine the available information in z.out by using names(z.out), see the coefficients by using z.out$coefficients, and a default summary of information through summary(z.out).

See also

The probit model is part of the stats package by . Advanced users may wish to refer to help(glm) and help(family).


zelig-relogit

Rare Events Logistic Regression for Dichotomous Dependent Variables

The relogit procedure estimates the same model as standard logistic regression (appropriate when you have a dichotomous dependent variable and a set of explanatory variables; see ), but the estimates are corrected for the bias that occurs when the sample is small or the observed events are rare (i.e., if the dependent variable has many more 1s than 0s or the reverse). The relogit procedure also optionally uses prior correction for case-control sampling designs.

Syntax

With reference classes:

z5 <- zrelogit$new()
z5$zelig(Y ~ X1 + X2, tau = NULL,
                       case.control = c("prior", "weighting"),
                       bias.correct = TRUE, robust = FALSE,
                       data = mydata, ...)
z5$setx()
z5$sim()

With the Zelig 4 compatibility wrappers:

z.out <- zelig(Y ~ X1 + X2, model = "relogit", tau = NULL,
                       case.control = c("prior", "weighting"),
                       bias.correct = TRUE, robust = FALSE,
                       data = mydata, ...)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)

Arguments

The relogit procedure supports four optional arguments in addition to the standard arguments for zelig(). You may additionally use:

  • tau: a vector containing either one or two values for , the true population fraction of ones. Use, for example, tau = c(0.05, 0.1) to specify that the lower bound on tau is 0.05 and the upper bound is 0.1. If left unspecified, only finite-sample bias correction is performed, not case-control correction.
  • case.control: if tau is specified, choose a method to correct for case-control sampling design: “prior” (default) or “weighting”.
  • bias.correct: a logical value of TRUE (default) or FALSE indicating whether the intercept should be corrected for finite sample (rare events) bias.

Note that if tau = NULL, bias.correct = FALSE, the relogit procedure performs a standard logistic regression without any correction.

Example 1: One Tau with Prior Correction and Bias Correction

Due to memory and space considerations, the data used here are a sample drawn from the full data set used in King and Zeng, 2001, The proportion of militarized interstate conflicts to the absence of disputes is . To estimate the model,

data(mid)
z.out1 <- zelig(conflict ~ major + contig + power + maxdem + mindem + years, data = mid, model = "relogit", tau = 1042/303772)
## How to cite this model in Zelig:
##   Kosuke Imai, Gary King, and Olivia Lau. 2014.
##   relogit: Rare Events Logistic Regression for Dichotomous Dependent Variables
##   in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
##   http://zeligproject.org/

Summarize the model output:

summary(z.out1)
## Model:
## $by
## [1] 1
##
##
## Call:  relogit(formula = cbind(conflict, 1 - conflict) ~ major + contig +
##     power + maxdem + mindem + years, data = ., tau = 0.00343020423212146,
##     bias.correct = TRUE, case.control = "prior")
##
## Coefficients:
## (Intercept)        major       contig        power       maxdem
##     -7.5084       2.4320       4.1080       1.0536       0.0480
##      mindem        years
##     -0.0641      -0.0629
##
## Degrees of Freedom: 3125 Total (i.e. Null);  3119 Residual
## Null Deviance:       3980
## Residual Deviance: 1870  AIC: 1880
## Next step: Use 'setx' method

Set the explanatory variables to their means:

x.out1 <- setx(z.out1)

Simulate quantities of interest:

s.out1 <- sim(z.out1, x = x.out1)
summary(s.out1)
##
##  sim x :
##  -----
## ev
##          mean       sd      50%     2.5%    97.5%
## [1,] 0.002388 0.000155 0.002383 0.002104 0.002702
## pv
##          0     1
## [1,] 0.998 0.002
plot(s.out1)

Zelig-relogit

Example 2: One Tau with Weighting, Robust Standard Errors, and Bias Correction

Suppose that we wish to perform case control correction using weighting (rather than the default prior correction). To estimate the model:

z.out2 <- zelig(conflict ~ major + contig + power + maxdem + mindem + years, data = mid, model = "relogit", tau = 1042/303772, case.control = "weighting", robust = TRUE)
## Error: unused argument (robust = TRUE)

Summarize the model output:

summary(z.out2)
## Model:
## $by
## [1] 1
##
##
## Call:
## stats::lm(formula = unem ~ gdp + trade + capmob + as.factor(country),
##     data = .)
##
## Coefficients:
##                      (Intercept)                               gdp
##                           -5.843                            -0.110
##                            trade                            capmob
##                            0.144                             0.815
##        as.factor(country)Belgium          as.factor(country)Canada
##                           -1.599                             6.759
##        as.factor(country)Denmark         as.factor(country)Finland
##                            4.311                             4.810
##         as.factor(country)France           as.factor(country)Italy
##                            6.905                             9.290
##          as.factor(country)Japan     as.factor(country)Netherlands
##                            5.459                            -1.459
##         as.factor(country)Norway          as.factor(country)Sweden
##                           -2.754                             0.925
## as.factor(country)United Kingdom   as.factor(country)United States
##                            5.601                            10.066
##   as.factor(country)West Germany
##                            3.364
##
## Next step: Use 'setx' method

Set the explanatory variables to their means:

x.out2 <- setx(z.out2)

Simulate quantities of interest:

s.out2 <- sim(z.out2, x = x.out2)
summary(s.out2)
##
##  sim x :
##  -----
## ev
##     mean     sd     50%   2.5%  97.5%
## 1 -0.192 0.5793 -0.1943 -1.268 0.9895
## pv
##     mean     sd     50%   2.5%  97.5%
## 1 -0.192 0.5793 -0.1943 -1.268 0.9895

Example 3: Two Taus with Bias Correction and Prior Correction

Suppose that we did not know that , but only that it was somewhere between . To estimate a model with a range of feasible estimates for (using the default prior correction method for case control correction):

z.out2 <- zelig(conflict ~ major + contig + power + maxdem + mindem + years, data = mid, model = "relogit", tau = c(0.002, 0.005))
## How to cite this model in Zelig:
##   Kosuke Imai, Gary King, and Olivia Lau. 2014.
##   relogit: Rare Events Logistic Regression for Dichotomous Dependent Variables
##   in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
##   http://zeligproject.org/

Summarize the model output:

z.out2
## Model:
## $by
## [1] 1
##
## $lower.estimate
##
## Call:  (function (formula, data = sys.parent(), tau = NULL, bias.correct = TRUE,
##     case.control = "prior", ...)
## {
##     mf <- match.call()
##     mf$tau <- mf$bias.correct <- mf$case.control <- NULL
##     if (!is.null(tau)) {
##         tau <- unique(tau)
##         if (length(case.control) > 1)
##             stop("You can only choose one option for case control correction.")
##         ck1 <- grep("p", case.control)
##         ck2 <- grep("w", case.control)
##         if (length(ck1) == 0 & length(ck2) == 0)
##             stop("choose either case.control = \"prior\" ", "or case.control = \"weighting\"")
##         if (length(ck2) == 0)
##             weighting <- FALSE
##         else weighting <- TRUE
##     }
##     else weighting <- FALSE
##     if (length(tau) > 2)
##         stop("tau must be a vector of length less than or equal to 2")
##     else if (length(tau) == 2) {
##         mf[[1]] <- relogit
##         res <- list()
##         mf$tau <- min(tau)
##         res$lower.estimate <- eval(as.call(mf), parent.frame())
##         mf$tau <- max(tau)
##         res$upper.estimate <- eval(as.call(mf), parent.frame())
##         res$formula <- formula
##         class(res) <- c("Relogit2", "Relogit")
##         return(res)
##     }
##     else {
##         mf[[1]] <- glm
##         mf$family <- binomial(link = "logit")
##         y2 <- model.response(model.frame(mf$formula, data))
##         if (is.matrix(y2))
##             y <- y2[, 1]
##         else y <- y2
##         ybar <- mean(y)
##         if (weighting) {
##             w1 <- tau/ybar
##             w0 <- (1 - tau)/(1 - ybar)
##             wi <- w1 * y + w0 * (1 - y)
##             mf$weights <- wi
##         }
##         res <- eval(as.call(mf), parent.frame())
##         res$call <- match.call(expand.dots = TRUE)
##         res$tau <- tau
##         X <- model.matrix(res)
##         if (bias.correct) {
##             pihat <- fitted(res)
##             if (is.null(tau))
##                 wi <- rep(1, length(y))
##             else if (weighting)
##                 res$weighting <- TRUE
##             else {
##                 w1 <- tau/ybar
##                 w0 <- (1 - tau)/(1 - ybar)
##                 wi <- w1 * y + w0 * (1 - y)
##                 res$weighting <- FALSE
##             }
##             W <- pihat * (1 - pihat) * wi
##             Qdiag <- lm.influence(lm(y ~ X - 1, weights = W))$hat/W
##             if (is.null(tau))
##                 xi <- 0.5 * Qdiag * (2 * pihat - 1)
##             else xi <- 0.5 * Qdiag * ((1 + w0) * pihat - w0)
##             res$coefficients <- res$coefficients - lm(xi ~ X -
##                 1, weights = W)$coefficients
##             res$bias.correct <- TRUE
##         }
##         else res$bias.correct <- FALSE
##         if (!is.null(tau) & !weighting) {
##             if (tau <= 0 || tau >= 1)
##                 stop("\ntau needs to be between 0 and 1.\n")
##             res$coefficients["(Intercept)"] <- res$coefficients["(Intercept)"] -
##                 log(((1 - tau)/tau) * (ybar/(1 - ybar)))
##             res$prior.correct <- TRUE
##             res$weighting <- FALSE
##         }
##         else res$prior.correct <- FALSE
##         if (is.null(res$weighting))
##             res$weighting <- FALSE
##         res$linear.predictors <- t(res$coefficients) %*% t(X)
##         res$fitted.values <- 1/(1 + exp(-res$linear.predictors))
##         res$zelig <- "Relogit"
##         class(res) <- c("Relogit", "glm")
##         return(res)
##     }
## })(formula = cbind(conflict, 1 - conflict) ~ major + contig +
##     power + maxdem + mindem + years, data = ., tau = 0.002)
##
## Coefficients:
## (Intercept)        major       contig        power       maxdem
##     -8.0492       2.4320       4.1079       1.0536       0.0480
##      mindem        years
##     -0.0641      -0.0629
##
## Degrees of Freedom: 3125 Total (i.e. Null);  3119 Residual
## Null Deviance:       3980
## Residual Deviance: 1870  AIC: 1880
##
## $upper.estimate
##
## Call:  (function (formula, data = sys.parent(), tau = NULL, bias.correct = TRUE,
##     case.control = "prior", ...)
## {
##     mf <- match.call()
##     mf$tau <- mf$bias.correct <- mf$case.control <- NULL
##     if (!is.null(tau)) {
##         tau <- unique(tau)
##         if (length(case.control) > 1)
##             stop("You can only choose one option for case control correction.")
##         ck1 <- grep("p", case.control)
##         ck2 <- grep("w", case.control)
##         if (length(ck1) == 0 & length(ck2) == 0)
##             stop("choose either case.control = \"prior\" ", "or case.control = \"weighting\"")
##         if (length(ck2) == 0)
##             weighting <- FALSE
##         else weighting <- TRUE
##     }
##     else weighting <- FALSE
##     if (length(tau) > 2)
##         stop("tau must be a vector of length less than or equal to 2")
##     else if (length(tau) == 2) {
##         mf[[1]] <- relogit
##         res <- list()
##         mf$tau <- min(tau)
##         res$lower.estimate <- eval(as.call(mf), parent.frame())
##         mf$tau <- max(tau)
##         res$upper.estimate <- eval(as.call(mf), parent.frame())
##         res$formula <- formula
##         class(res) <- c("Relogit2", "Relogit")
##         return(res)
##     }
##     else {
##         mf[[1]] <- glm
##         mf$family <- binomial(link = "logit")
##         y2 <- model.response(model.frame(mf$formula, data))
##         if (is.matrix(y2))
##             y <- y2[, 1]
##         else y <- y2
##         ybar <- mean(y)
##         if (weighting) {
##             w1 <- tau/ybar
##             w0 <- (1 - tau)/(1 - ybar)
##             wi <- w1 * y + w0 * (1 - y)
##             mf$weights <- wi
##         }
##         res <- eval(as.call(mf), parent.frame())
##         res$call <- match.call(expand.dots = TRUE)
##         res$tau <- tau
##         X <- model.matrix(res)
##         if (bias.correct) {
##             pihat <- fitted(res)
##             if (is.null(tau))
##                 wi <- rep(1, length(y))
##             else if (weighting)
##                 res$weighting <- TRUE
##             else {
##                 w1 <- tau/ybar
##                 w0 <- (1 - tau)/(1 - ybar)
##                 wi <- w1 * y + w0 * (1 - y)
##                 res$weighting <- FALSE
##             }
##             W <- pihat * (1 - pihat) * wi
##             Qdiag <- lm.influence(lm(y ~ X - 1, weights = W))$hat/W
##             if (is.null(tau))
##                 xi <- 0.5 * Qdiag * (2 * pihat - 1)
##             else xi <- 0.5 * Qdiag * ((1 + w0) * pihat - w0)
##             res$coefficients <- res$coefficients - lm(xi ~ X -
##                 1, weights = W)$coefficients
##             res$bias.correct <- TRUE
##         }
##         else res$bias.correct <- FALSE
##         if (!is.null(tau) & !weighting) {
##             if (tau <= 0 || tau >= 1)
##                 stop("\ntau needs to be between 0 and 1.\n")
##             res$coefficients["(Intercept)"] <- res$coefficients["(Intercept)"] -
##                 log(((1 - tau)/tau) * (ybar/(1 - ybar)))
##             res$prior.correct <- TRUE
##             res$weighting <- FALSE
##         }
##         else res$prior.correct <- FALSE
##         if (is.null(res$weighting))
##             res$weighting <- FALSE
##         res$linear.predictors <- t(res$coefficients) %*% t(X)
##         res$fitted.values <- 1/(1 + exp(-res$linear.predictors))
##         res$zelig <- "Relogit"
##         class(res) <- c("Relogit", "glm")
##         return(res)
##     }
## })(formula = cbind(conflict, 1 - conflict) ~ major + contig +
##     power + maxdem + mindem + years, data = ., tau = 0.005)
##
## Coefficients:
## (Intercept)        major       contig        power       maxdem
##     -7.1300       2.4320       4.1080       1.0536       0.0480
##      mindem        years
##     -0.0641      -0.0629
##
## Degrees of Freedom: 3125 Total (i.e. Null);  3119 Residual
## Null Deviance:       3980
## Residual Deviance: 1870  AIC: 1880
##
## $formula
## cbind(conflict, 1 - conflict) ~ major + contig + power + maxdem +
##     mindem + years
## <environment: 0x7fd4fbf8e908>
##
## attr(,"class")
## [1] "Relogit2" "Relogit"
## Next step: Use 'setx' method

Set the explanatory variables to their means:

x.out2 <- setx(z.out2)

Simulate quantities of interest:

s.out <- sim(z.out2, x = x.out2)
## Error: no applicable method for 'vcov' applied to an object of class
## "c('Relogit2', 'Relogit')"
summary(s.out2)
##
##  sim x :
##  -----
## ev
##     mean     sd     50%   2.5%  97.5%
## 1 -0.192 0.5793 -0.1943 -1.268 0.9895
## pv
##     mean     sd     50%   2.5%  97.5%
## 1 -0.192 0.5793 -0.1943 -1.268 0.9895
plot(s.out2)

The cost of giving a range of values for is that point estimates are not available for quantities of interest. Instead, quantities are presented as confidence intervals with significance less than or equal to a specified level (e.g., at least 95% of the simulations are contained in the nominal 95% confidence interval).

Model

  • Like the standard logistic regression, the stochastic component for the rare events logistic regression is:

    where is the binary dependent variable, and takes a value of either 0 or 1.

  • The systematic component is:

  • If the sample is generated via a case-control (or choice-based) design, such as when drawing all events (or “cases”) and a sample from the non-events (or “controls”) and going backwards to collect the explanatory variables, you must correct for selecting on the dependent variable. While the slope coefficients are approximately unbiased, the constant term may be significantly biased. Zelig has two methods for case control correction:

    1. The “prior correction” method adjusts the intercept term. Let be the true population fraction of events, the fraction of events in the sample, and the uncorrected intercept term. The corrected intercept is:

    2. The “weighting” method performs a weighted logistic regression to correct for a case-control sampling design. Let the 1 subscript denote observations for which the dependent variable is observed as a 1, and the 0 subscript denote observations for which the dependent variable is observed as a 0. Then the vector of weights

    If is unknown, you may alternatively specify an upper and lower bound for the possible range of . In this case, the relogit procedure uses “robust Bayesian” methods to generate a confidence interval (rather than a point estimate) for each quantity of interest. The nominal coverage of the confidence interval is at least as great as the actual coverage.

  • By default, estimates of the the coefficients are bias-corrected to account for finite sample or rare events bias. In addition, quantities of interest, such as predicted probabilities, are also corrected of rare-events bias. If are the uncorrected logit coefficients and bias() is the bias term, the corrected coefficients are

    The bias term is

    where

    where and are given in the “weighting” section above.

Quantities of Interest

  • For either one or no :

    • The expected values (qi$ev) for the rare events logit are simulations of the predicted probability

      given draws of from its posterior.

    • The predicted value (qi$pr) is a draw from a binomial distribution with mean equal to the simulated .

    • The first difference (qi$fd) is defined as

    • The risk ratio (qi$rr) is defined as

  • For a range of defined by , each of the quantities of interest are matrices, which report the lower and upper bounds, respectively, for a confidence interval with nominal coverage at least as great as the actual coverage. At worst, these bounds are conservative estimates for the likely range for each quantity of interest. Please refer to for the specific method of calculating bounded quantities of interest.

  • In conditional prediction models, the average expected treatment effect (att.ev) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual expected value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

  • In conditional prediction models, the average predicted treatment effect (att.pr) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups. Variation in the simulations are due to uncertainty in simulating , the counterfactual predicted value of for observations in the treatment group, under the assumption that everything stays the same except that the treatment indicator is switched to .

Output Values

The output of each Zelig command contains useful information which you may view. For example, if you run z.out <- zelig(y ~ x, model = relogit, data), then you may examine the available information in z.out by using names(z.out), see the coefficients by using z.out$coefficients, and a default summary of information through summary(z.out).

Differences with Stata Version

The Stata version of ReLogit and the R implementation differ slightly in their coefficient estimates due to differences in the matrix inversion routines implemented in R and Stata. Zelig uses orthogonal-triangular decomposition (through lm.influence()) to compute the bias term, which is more numerically stable than standard matrix calculations.

See also


zelig-tobit

Linear Regression for a Left-Censored Dependent Variable

Tobit regression estimates a linear regression model for a left-censored dependent variable, where the dependent variable is censored from below. While the classical tobit model has values censored at 0, you may select another censoring point. For other linear regression models with fully observed dependent variables, see Bayesian regression (), maximum likelihood normal regression (), or least squares ().

Syntax

z5 <- ztobit$new()
z5$zelig(Y ~ X1 + X2, below = 0, above = Inf, data = mydata)
z5$setx()
z5$sim()

With the Zelig 4 compatibility wrappers:

z.out <- zelig(Y ~ X1 + X2, below = 0, above = Inf, model = "tobit", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)

Inputs

zelig() accepts the following arguments to specify how the dependent variable is censored.

  • below: (defaults to 0) The point at which the dependent variable is censored from below. If any values in the dependent variable are observed to be less than the censoring point, it is assumed that that particular observation is censored from below at the observed value. (See for a Bayesian implementation that supports both left and right censoring.)

  • robust: defaults to FALSE. If TRUE, zelig() computes robust standard errors based on sandwich estimators (see and ) and the options selected in cluster.

  • cluster: if robust = TRUE, you may select a variable to define groups of correlated observations. Let x3 be a variable that consists of either discrete numeric values, character strings, or factors that define strata. Then

    > z.out <- zelig(y ~ x1 + x2, robust = TRUE, cluster = "x3",
                     model = "tobit", data = mydata)
    

    means that the observations can be correlated within the strata defined by the variable x3, and that robust standard errors should be calculated according to those clusters. If robust = TRUE but cluster is not specified, zelig() assumes that each observation falls into its own cluster.

Zelig users may wish to refer to help(survreg) for more information.

Examples

Basic Example

Attaching the sample dataset:

data(tobin)

Estimating linear regression using tobit:

z.out <- zelig(durable ~ age + quant, model = "tobit", data = tobin)
## How to cite this model in Zelig:
##   Kosuke Imai, Gary King, Olivia Lau. 2011.
##   tobit: Linear regression for Left-Censored Dependent Variable
##   in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
##   http://zeligproject.org/

Setting values for the explanatory variables to their sample averages:

x.out <- setx(z.out)

Simulating quantities of interest from the posterior distribution given x.out.

s.out1 <- sim(z.out, x = x.out)
summary(s.out1)
##
##  sim x :
##  -----
## ev
##    mean     sd   50%   2.5% 97.5%
## 1 1.547 0.6275 1.462 0.5676 2.978
## pv
##       mean    sd   50% 2.5% 97.5%
## [1,] 3.291 4.331 1.637    0 13.96

Simulating First Differences

Set explanatory variables to their default(mean/mode) values, with high (80th percentile) and low (20th percentile) liquidity ratio (quant):

x.high <- setx(z.out, quant = quantile(tobin$quant, prob = 0.8))
x.low <- setx(z.out, quant = quantile(tobin$quant, prob = 0.2))

Estimating the first difference for the effect of high versus low liquidity ratio on duration(durable):

s.out2 <- sim(z.out, x = x.high, x1 = x.low)
summary(s.out2)
##
##  sim x :
##  -----
## ev
##    mean     sd   50%   2.5% 97.5%
## 1 1.184 0.7182 1.069 0.1974 2.957
## pv
##       mean    sd   50% 2.5% 97.5%
## [1,] 3.113 4.147 1.364    0 13.46
##
##  sim x1 :
##  -----
## ev
##    mean   sd   50%   2.5% 97.5%
## 1 2.103 1.05 1.954 0.5592 4.579
## pv
##       mean    sd   50% 2.5% 97.5%
## [1,] 3.759 4.338 2.412    0 14.35
## fd
##     mean    sd    50%   2.5% 97.5%
## 1 0.9197 1.204 0.8075 -1.227 3.563
plot(s.out1)

Zelig-tobit

Model

  • Let be a latent dependent variable which is distributed with stochastic component

    where is a vector means and is a scalar variance parameter. is not directly observed, however. Rather we observed which is defined as:

    where is the lower bound below which is censored.

  • The systematic component is given by

    where is the vector of explanatory variables for observation and is the vector of coefficients.

Quantities of Interest

  • The expected values (qi$ev) for the tobit regression model are the same as the expected value of :

  • The first difference (qi$fd) for the tobit regression model is defined as

  • In conditional prediction models, the average expected treatment effect (qi$att.ev) for the treatment group is

    where is a binary explanatory variable defining the treatment () and control () groups.

Output Values

The output of each Zelig command contains useful information which you may view. For example, if you run:

z.out <- zelig(y ~ x, model = "tobit", data)

then you may examine the available information in ``z.out`.

See also

The tobit function is part of the survival library by Terry Therneau, ported to R by Thomas Lumley. Advanced users may wish to refer to help(survfit) in the survival library.


zelig-factorbayes

Given some unobserved explanatory variables and observed dependent variables, the Normal theory factor analysis model estimates the latent factors. The model is implemented using a Markov Chain Monte Carlo algorithm (Gibbs sampling with data augmentation). For factor analysis with ordinal dependent variables, see ordered factor analysis (), and for a mix of types of dependent variables, see the mixed factor analysis model ().

Syntax

With reference classes:

z5 <- zfactorbayes$new()
z5$zelig(cbind(Y1 ,Y2, Y3) ~ NULL, factors = 2,
         model = "factor.bayes", data = mydata)

With the Zelig 4 compatibility wrappers:

z.out <- zelig(cbind(Y1 ,Y2, Y3) ~ NULL, factors = 2,
               model = "factor.bayes", data = mydata)

Inputs

zelig() takes the following functions for factor.bayes:

  • Y1, Y2, and Y3: variables of interest in factor analysis (manifest variables), assumed to be normally distributed. The model requires a minimum of three manifest variables.
  • factors: number of the factors to be fitted (defaults to 2).

Additional Inputs

In addition, zelig() accepts the following additional arguments for model specification:

  • lambda.constraints: list containing the equality or inequality constraints on the factor loadings. Choose from one of the following forms:
    • varname = list(): by default, no constraints are imposed.
    • varname = list(d, c): constrains the th loading for the variable named varname to be equal to c.
    • varname = list(d, +): constrains the th loading for the variable named varname to be positive;
    • varname = list(d, -): constrains the th loading for the variable named varname to be negative.
  • std.var: defaults to FALSE (manifest variables are rescaled to zero mean, but retain observed variance). If TRUE, the manifest variables are rescaled to be mean zero and unit variance.

In addition, zelig() accepts the following additional inputs for bayes.factor:

  • burnin: number of the initial MCMC iterations to be discarded (defaults to 1,000).
  • mcmc: number of the MCMC iterations after burnin (defaults to 20,000).
  • thin: thinning interval for the Markov chain. Only every thin-th draw from the Markov chain is kept. The value of mcmc must be divisible by this value. The default value is 1.
  • verbose: defaults to FALSE. If TRUE, the progress of the sampler (every ) is printed to the screen.
  • seed: seed for the random number generator. The default is NA which corresponds to a random seed 12345.
  • Lambda.start: starting values of the factor loading matrix , either a scalar (all unconstrained loadings are set to that value), or a matrix with compatible dimensions. The default is NA, where the start value are set to be 0 for unconstrained factor loadings, and 0.5 or 0.5 for constrained factor loadings (depending on the nature of the constraints).
  • Psi.start: starting values for the uniquenesses, either a scalar (the starting values for all diagonal elements of are set to be this value), or a vector with length equal to the number of manifest variables. In the latter case, the starting values of the diagonal elements of take the values of Psi.start. The default value is NA where the starting values of the all the uniquenesses are set to be 0.5.
  • store.lambda: defaults to TRUE, which stores the posterior draws of the factor loadings.
  • store.scores: defaults to FALSE. If TRUE, stores the posterior draws of the factor scores. (Storing factor scores may take large amount of memory for a large number of draws or observations.)

The model also accepts the following additional arguments to specify prior parameters:

  • l0: mean of the Normal prior for the factor loadings, either a scalar or a matrix with the same dimensions as . If a scalar value, that value will be the prior mean for all the factor loadings. Defaults to 0.
  • L0: precision parameter of the Normal prior for the factor loadings, either a scalar or a matrix with the same dimensions as . If L0 takes a scalar value, then the precision matrix will be a diagonal matrix with the diagonal elements set to that value. The default value is 0, which leads to an improper prior.
  • a0: the shape parameter of the Inverse Gamma prior for the uniquenesses is a0/2. It can take a scalar value or a vector. The default value is 0.001.
  • b0: the shape parameter of the Inverse Gamma prior for the uniquenesses is b0/2. It can take a scalar value or a vector. The default value is 0.001.

Zelig users may wish to refer to help(MCMCfactanal) for more information.

Example

Attaching the sample dataset:

data(swiss)
names(swiss) <- c("Fert", "Agr", "Exam", "Educ", "Cath", "InfMort")

Factor analysis:

z.out <- zelig(cbind(Agr, Exam, Educ, Cath, InfMort) ~ NULL,
               model = "factor.bayes", data = swiss, factors = 2, verbose = TRUE,
               a0 = 1, b0 = 0.15, burnin = 5000, mcmc = 50000)

Checking for convergence before summarizing the estimates:

algor <- try(geweke.diag(z.out$coefficients), silent=T)
if (class(algor) == "try-error")
    print(algor)

Since the algorithm did not converge, we now add some constraints on to optimize the algorithm:

z.out <- zelig(cbind(Agr, Exam, Educ, Cath, InfMort) ~ NULL,
               model = "factor.bayes", data = swiss, factors = 2,
               lambda.constraints = list(Exam = list(1,"+"),
                   Exam = list(2,"-"), Educ = c(2, 0),
                   InfMort = c(1, 0)),
               verbose = TRUE, a0 = 1, b0 = 0.15,
               burnin = 5000, mcmc = 50000)
geweke.diag(z.out$coefficients)
heidel.diag(z.out$coefficients)
raftery.diag(z.out$coefficients)
summary(z.out)

Model

Suppose for observation we observe variables and hypothesize that there are underlying factors such that:

where is the vector of manifest variables for observation . is the factor loading matrix and is the -vector of latent factor scores. Both and need to be estimated.

  • The stochastic component is given by:

    where is a diagonal, positive definite matrix. The diagonal elements of are referred to as uniquenesses.

  • The systematic component is given by

  • The independent conjugate prior for each is given by

  • The independent conjugate prior for each is given by

  • The prior for is

    where is a :math:` dtimes d ` identity matrix.

Output Values

The output of each Zelig command contains useful information which you may view. For example, if you run:

z.out <- zelig(cbind(Y1, Y2, Y3), model = "factor.bayes", data)

then you may examine the available information in z.out by using names(z.out), see the draws from the posterior distribution of the coefficients by using z.out$coefficients, and view a default summary of information through summary(z.out). Other elements available through the $ operator are listed below.

  • From the zelig() output object z.out, you may extract:
    • coefficients: draws from the posterior distributions of the estimated factor loadings and the uniquenesses. If store.scores = TRUE, the estimated factors scores are also contained in coefficients.
    • data: the name of the input data frame.
    • seed: the random seed used in the model.
  • Since there are no explanatory variables, the sim() procedure is not applicable for factor analysis models.

zelig-mlogitbayes

Use Bayesian multinomial logistic regression to model unordered categorical variables. The dependent variable may be in the format of either character strings or integer values. The model is estimated via a random walk Metropolis algorithm or a slice sampler. See for the maximum-likelihood estimation of this model.

Syntax

With reference classes:

z5 <- zmlogitbayes$new()
z5$zelig(Y ~ X1 + X2, data = mydata)
z5$setx()
z5$sim()

With the Zelig 4 compatibility wrappers:

z.out <- zelig(Y ~ X1 + X2, model = "mlogit.bayes", data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)

Additional Inputs

zelig() accepts the following arguments for mlogit.bayes:

  • baseline: either a character string or numeric value (equal to one of the observed values in the dependent variable) specifying a baseline category. The default value is NA which sets the baseline to the first alphabetical or numerical unique value of the dependent variable.

The model accepts the following additional arguments to monitor the Markov chains:

  • burnin: number of the initial MCMC iterations to be discarded (defaults to 1,000).
  • mcmc: number of the MCMC iterations after burnin (defaults to 10,000).
  • thin: thinning interval for the Markov chain. Only every thin-th draw from the Markov chain is kept. The value of mcmc must be divisible by this value. The default value is 1.
  • mcmc.method: either “MH” or “slice”, specifying whether to use Metropolis Algorithm or slice sampler. The default value is MH.
  • tune: tuning parameter for the Metropolis-Hasting step, either a scalar or a numeric vector (for coefficients, enter a vector). The tuning parameter should be set such that the acceptance rate is satisfactory (between 0.2 and 0.5). The default value is 1.1.
  • verbose: defaults to FALSE. If TRUE, the progress of the sampler (every ) is printed to the screen.
  • seed: seed for the random number generator. The default is NA which corresponds to a random seed of 12345.
  • beta.start: starting values for the Markov chain, either a scalar or a vector (for coefficients, enter a vector). The default is NA where the maximum likelihood estimates are used as the starting values.

Use the following arguments to specify the priors for the model:

  • b0: prior mean for the coefficients, either a scalar or vector. If a scalar, that value will be the prior mean for all the coefficients. The default is 0.
  • B0: prior precision parameter for the coefficients, either a square matrix with the dimensions equal to the number of coefficients or a scalar. If a scalar, that value times an identity matrix will be the prior precision parameter. The default is 0 which leads to an improper prior.

Zelig users may wish to refer to help(MCMCmnl) for more information.

Examples

Basic Example

Attaching the sample dataset:

data(mexico)

Estimating multinomial logistics regression using mlogit.bayes:

z.out <- zelig(vote88 ~ pristr + othcok + othsocok,
               model = "mlogit.bayes", data = mexico)
## Calculating MLEs and large sample var-cov matrix.
## This may take a moment...
## Inverting Hessian to get large sample var-cov matrix.
## Warning: the condition has length > 1 and only the first element will be used
## Warning: the condition has length > 1 and only the first element will be used
##
##
## MCMCmnl Metropolis iteration 1 of 11000
## beta =
##   -2.43503
##   -2.72348
##   -0.70416
##   -0.71276
##    0.99671
##    1.16367
##    0.52819
##    0.50504
## Metropolis acceptance rate for beta = 1.00000
##
##
##
## MCMCmnl Metropolis iteration 1101 of 11000
## beta =
##   -3.08406
##   -3.23117
##   -0.68172
##   -0.58702
##    1.32006
##    1.46886
##    0.34517
##    0.08113
## Metropolis acceptance rate for beta = 0.76203
##
##
##
## MCMCmnl Metropolis iteration 2201 of 11000
## beta =
##   -2.37163
##   -2.72435
##   -0.75901
##   -0.65170
##    1.12979
##    1.04706
##    0.30992
##    0.63282
## Metropolis acceptance rate for beta = 0.77010
##
##
##
## MCMCmnl Metropolis iteration 3301 of 11000
## beta =
##   -2.61718
##   -2.37818
##   -0.81263
##   -0.72653
##    1.21112
##    1.15121
##    0.29893
##    0.22116
## Metropolis acceptance rate for beta = 0.76704
##
##
##
## MCMCmnl Metropolis iteration 4401 of 11000
## beta =
##   -2.55979
##   -3.33989
##   -0.77931
##   -0.58265
##    1.15036
##    1.32952
##    0.45473
##    0.48890
## Metropolis acceptance rate for beta = 0.76096
##
##
##
## MCMCmnl Metropolis iteration 5501 of 11000
## beta =
##   -2.11066
##   -2.75327
##   -0.85810
##   -0.62697
##    1.08461
##    1.14543
##    0.27174
##    0.40096
## Metropolis acceptance rate for beta = 0.76077
##
##
##
## MCMCmnl Metropolis iteration 6601 of 11000
## beta =
##   -2.46162
##   -2.81167
##   -0.85914
##   -0.57910
##    1.12878
##    1.21514
##    0.48948
##    0.32908
## Metropolis acceptance rate for beta = 0.75746
##
##
##
## MCMCmnl Metropolis iteration 7701 of 11000
## beta =
##   -2.48861
##   -3.34618
##   -0.78412
##   -0.50089
##    1.21538
##    1.47418
##    0.25571
##    0.14282
## Metropolis acceptance rate for beta = 0.76029
##
##
##
## MCMCmnl Metropolis iteration 8801 of 11000
## beta =
##   -2.74870
##   -2.47143
##   -0.55028
##   -0.66065
##    1.21813
##    1.19189
##    0.14761
##    0.21105
## Metropolis acceptance rate for beta = 0.75844
##
##
##
## MCMCmnl Metropolis iteration 9901 of 11000
## beta =
##   -3.17385
##   -3.28283
##   -0.65244
##   -0.60003
##    1.31771
##    1.24363
##    0.35401
##    0.61689
## Metropolis acceptance rate for beta = 0.75891
##
## How to cite this model in Zelig:
##   Ben Goodrich, Ying Lu. 2013.
##   mlogitbayes: Bayesian Multinomial Logistic Regression for Dependent Variables with Unordered Categorical Values
##   in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical Software,"
##   http://zeligproject.org/

Checking for convergence before summarizing the estimates:

raftery.diag(z.out$coefficients)
summary(z.out)

Setting values for the explanatory variables to their sample averages:

x.out <- setx(z.out)

Simulating quantities of interest from the posterior distribution given x.out.

s.out1 <- sim(z.out, x = x.out)
summary(s.out1)
##
##  sim x :
##  -----
## ev
##          mean      sd    50%   2.5%  97.5%
## P(Y=1) 0.5613 0.01592 0.5615 0.5307 0.5915
## P(Y=2) 0.2099 0.01273 0.2098 0.1854 0.2351
## P(Y=3) 0.2288 0.01360 0.2286 0.2034 0.2558
## pv
## qi
##      1      2      3
## 0.5596 0.2088 0.2316

Simulating First Differences

Estimating the first difference (and risk ratio) in the probabilities of voting different candidates when pristr (the strength of the PRI) is set to be weak (equal to 1) versus strong (equal to 3) while all the other variables held at their default values.

x.weak <- setx(z.out, pristr = 1)
x.strong <- setx(z.out, pristr = 3)
s.out2 <- sim(z.out, x = x.strong, x1 = x.weak)
summary(s.out2)
##
##  sim x :
##  -----
## ev
##          mean      sd    50%   2.5%  97.5%
## P(Y=1) 0.7157 0.02128 0.7158 0.6726 0.7561
## P(Y=2) 0.1270 0.01458 0.1266 0.1001 0.1563
## P(Y=3) 0.1573 0.01646 0.1568 0.1261 0.1910
## pv
## qi
##      1      2      3
## 0.7179 0.1264 0.1557
##
##  sim x1 :
##  -----
## ev
##          mean      sd    50%   2.5%  97.5%
## P(Y=1) 0.4028 0.02358 0.4028 0.3563 0.4484
## P(Y=2) 0.3037 0.02131 0.3029 0.2638 0.3471
## P(Y=3) 0.2935 0.02189 0.2932 0.2518 0.3372
## pv
## qi
##      1      2      3
## 0.4051 0.2986 0.2963
## fd
##           mean      sd     50%     2.5%   97.5%
## P(Y=1) -0.3129 0.03460 -0.3129 -0.38111 -0.2443
## P(Y=2)  0.1767 0.02735  0.1765  0.12360  0.2314
## P(Y=3)  0.1362 0.02881  0.1363  0.07966  0.1936

Model

Let be the (unordered) categorical dependent variable for observation which takes an integer values .

  • The stochastic component is given by:

    where for .

  • The systematic component is given by

    where is the vector of explanatory variables for observation and is the vector of coefficient for category . Category is assumed to be the baseline category.

  • The prior for is given by

    where is the vector of means for the explanatory variables and is the precision matrix (the inverse of a variance-covariance matrix).

Quantities of Interest

  • The expected values (qi$ev) for the multinomial logistics regression model are the predicted probability of belonging to each category:

    and

    given the posterior draws of for all categories from the MCMC iterations.

  • The predicted values (qi$pr) are the draws of from a multinomial distribution whose parameters are the expected values(qi$ev) computed based on the posterior draws of from the MCMC iterations.

  • The first difference (qi$fd) in category for the multinomial logistic model is defined as

  • The risk ratio (qi$rr) in category is defined as

  • In conditional prediction models, the average expected treatment effect (qi$att.ev) for the treatment group in category is

    where is a binary explanatory variable defining the treatment () and control () groups, and is the number of treated observations in category .

  • In conditional prediction models, the average predicted treatment effect (qi$att.pr) for the treatment group in category is

    where is a binary explanatory variable defining the treatment () and control () groups, and is the number of treated observations in category .

Output Values

The output of each Zelig command contains useful information which you may view. For example, if you run:

z.out <- zelig(y ~ x, model = "mlogit.bayes", data)

then you may examine the available information in z.out by using names(z.out), see the draws from the posterior distribution of the coefficients by using z.out$coefficients, and view a default summary of information through summary(z.out). Other elements available through the $ operator are listed below.

See also

Bayesian logistic regression is part of the MCMCpack library by Andrew D. Martin and Kevin M. Quinn . The convergence diagnostics are part of the CODA library by Martyn Plummer, Nicky Best, Kate Cowles, Karen Vines, Deepayan Sarkar, Russell Almond.