Bayesian Multinomial Logistic Regression
Use Bayesian multinomial logistic regression to model unordered categorical variables. The dependent variable may be in the format of either character strings or integer values. The model is estimated via a random walk Metropolis algorithm or a slice sampler. See for the maximum-likelihood estimation of this model.
With reference classes:
z5 <- zmlogitbayes$new()
z5$zelig(Y ~ X1 + X2, weights = w, data = mydata)
z5$setx()
z5$sim()
With the Zelig 4 compatibility wrappers:
z.out <- zelig(Y ~ X1 + X2, model = "mlogit.bayes", weights = w, data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)
zelig() accepts the following arguments for mlogit.bayes:
The model accepts the following additional arguments to monitor the Markov chains:
Use the following arguments to specify the priors for the model:
Zelig users may wish to refer to help(MCMCmnl) for more information.
Attaching the sample dataset:
data(mexico)
Estimating multinomial logistics regression using mlogit.bayes:
z.out <- zelig(vote88 ~ pristr + othcok + othsocok,
model = "mlogit.bayes", data = mexico,
verbose = FALSE)
## Warning in readLines(zeligmixedmodels): incomplete final line found on
## '/usr/lib64/R/library/ZeligMultilevel/JSON/zelig5mixedmodels.json'
## Calculating MLEs and large sample var-cov matrix.
## This may take a moment...
## Inverting Hessian to get large sample var-cov matrix.
## Warning in if (mcmc.method == "RWM") {: the condition has length > 1 and
## only the first element will be used
## Warning in if (mcmc.method == "IndMH") {: the condition has length > 1 and
## only the first element will be used
## How to cite this model in Zelig:
## Ben Goodrich, and Ying Lu. 2013.
## mlogit-bayes: Bayesian Multinomial Logistic Regression for Dependent Variables with Unordered Categorical Values
## in Christine Choirat, Christopher Gandrud, James Honaker, Kosuke Imai, Gary King, and Olivia Lau,
## "Zelig: Everyone's Statistical Software," http://zeligproject.org/
You can check for convergence before summarizing the estimates with three diagnostic tests. See the section Diagnostics for Zelig Models for examples of the output with interpretation:
z.out$geweke.diag()
z.out$heidel.diag()
z.out$raftery.diag()
summary(z.out)
## Model:
##
## Iterations = 1001:11000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## (Intercept).2 -2.484 0.4050 0.004050 0.004192
## (Intercept).3 -2.882 0.4032 0.004032 0.004328
## pristr.2 -0.726 0.0955 0.000955 0.001001
## pristr.3 -0.601 0.0932 0.000932 0.000977
## othcok.2 1.109 0.1146 0.001146 0.001200
## othcok.3 1.250 0.1116 0.001116 0.001205
## othsocok.2 0.352 0.1563 0.001563 0.001660
## othsocok.3 0.302 0.1504 0.001504 0.001504
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## (Intercept).2 -3.2943 -2.755 -2.479 -2.212 -1.694
## (Intercept).3 -3.6919 -3.144 -2.876 -2.607 -2.103
## pristr.2 -0.9199 -0.789 -0.724 -0.662 -0.544
## pristr.3 -0.7871 -0.663 -0.601 -0.538 -0.417
## othcok.2 0.8906 1.032 1.107 1.187 1.333
## othcok.3 1.0298 1.177 1.246 1.322 1.478
## othsocok.2 0.0503 0.245 0.351 0.459 0.660
## othsocok.3 0.0133 0.199 0.302 0.404 0.596
##
## Next step: Use 'setx' method
Setting values for the explanatory variables to their sample averages:
x.out <- setx(z.out)
Simulating quantities of interest from the posterior distribution given x.out.
s.out1 <- sim(z.out, x = x.out)
summary(s.out1)
plot(s.out1)
Estimating the first difference (and risk ratio) in the probabilities of voting different candidates when pristr (the strength of the PRI) is set to be weak (equal to 1) versus strong (equal to 3) while all the other variables held at their default values.
x.weak <- setx(z.out, pristr = 1)
x.strong <- setx(z.out, pristr = 3)
s.out2 <- sim(z.out, x = x.strong, x1 = x.weak)
summary(s.out2)
plot(s.out2)
Let be the (unordered) categorical dependent variable for observation which takes an integer values .
The stochastic component is given by:
where for .
The systematic component is given by
where is the vector of explanatory variables for observation and is the vector of coefficient for category . Category is assumed to be the baseline category.
The prior for is given by
where is the vector of means for the explanatory variables and is the precision matrix (the inverse of a variance-covariance matrix).
The expected values (qi$ev) for the multinomial logistics regression model are the predicted probability of belonging to each category:
and
given the posterior draws of for all categories from the MCMC iterations.
The predicted values (qi$pr) are the draws of from a multinomial distribution whose parameters are the expected values(qi$ev) computed based on the posterior draws of from the MCMC iterations.
The first difference (qi$fd) in category for the multinomial logistic model is defined as
The risk ratio (qi$rr) in category is defined as
In conditional prediction models, the average expected treatment effect (qi$att.ev) for the treatment group in category is
where is a binary explanatory variable defining the treatment () and control () groups, and is the number of treated observations in category .
In conditional prediction models, the average predicted treatment effect (qi$att.pr) for the treatment group in category is
where is a binary explanatory variable defining the treatment () and control () groups, and is the number of treated observations in category .
The output of each Zelig command contains useful information which you may view. For example, if you run:
z.out <- zelig(y ~ x, model = "mlogit.bayes", data)
then you may examine the available information in z.out by using names(z.out), see the draws from the posterior distribution of the coefficients by using z.out$coefficients, and view a default summary of information through summary(z.out). Other elements available through the $ operator are listed below.
Bayesian logistic regression is part of the MCMCpack library by Andrew D. Martin and Kevin M. Quinn . The convergence diagnostics are part of the CODA library by Martyn Plummer, Nicky Best, Kate Cowles, Karen Vines, Deepayan Sarkar, Russell Almond.
Martin AD, Quinn KM and Park JH (2011). “MCMCpack: Markov Chain Monte Carlo in R.” Journal of Statistical Software, 42 (9), pp. 22. <URL: http://www.jstatsoft.org/v42/i09/>.