Bayesian Ordered Probit Regression
Use the ordinal probit regression model if your dependent variables are ordered and categorical. They may take either integer values or character strings. The model is estimated using a Gibbs sampler with data augmentation. For a maximum-likelihood implementation of this models, see oprobit.
With reference classes:
z5 <- zoprobitbayes$new()
z5$zelig(Y ~ X1 + X2, weights = w, data = mydata)
z5$setx()
z5$sim()
With the Zelig 4 compatibility wrappers:
z.out <- zelig(Y ~ X1 + X2, model = "oprobit.bayes", weights = w, data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)
zelig() accepts the following arguments to monitor the Markov chain:
Use the following parameters to specify the model’s priors:
Zelig users may wish to refer to help(MCMCoprobit) for more information.
Attaching the sample dataset:
data(sanction)
Creating an ordered dependent variable:
sanction$ncost <- factor(sanction$ncost, ordered = TRUE,
levels = c("net gain", "little effect", "modest loss",
"major loss"))
Estimating ordered probit regression using oprobit.bayes:
z.out <- zelig(ncost ~ mil + coop, model = "oprobit.bayes",
data = sanction, verbose = FALSE)
## Warning in readLines(zeligmixedmodels): incomplete final line found on
## '/usr/lib64/R/library/ZeligMultilevel/JSON/zelig5mixedmodels.json'
## Warning in model.response(mf, "numeric"): using type = "numeric" with a
## factor response will be ignored
## How to cite this model in Zelig:
## Ben Goodrich, and Ying Lu. 2013.
## oprobit-bayes: Bayesian Probit Regression for Dichotomous Dependent Variables
## in Christine Choirat, Christopher Gandrud, James Honaker, Kosuke Imai, Gary King, and Olivia Lau,
## "Zelig: Everyone's Statistical Software," http://zeligproject.org/
You can check for convergence before summarizing the estimates with three diagnostic tests. See the section Diagnostics for Zelig Models for examples of the output with interpretation:
z.out$geweke.diag()
z.out$heidel.diag()
z.out$raftery.diag()
summary(z.out)
## Model:
##
## Iterations = 1001:11000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## (Intercept) -0.7088 0.281 0.00281 0.00364
## mil -0.0378 0.426 0.00426 0.00529
## coop 0.5861 0.142 0.00142 0.00641
## gamma2 1.5451 0.210 0.00210 0.06882
## gamma3 2.3390 0.208 0.00208 0.06582
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## (Intercept) -1.269 -0.896 -0.7102 -0.518 -0.165
## mil -0.871 -0.322 -0.0374 0.246 0.803
## coop 0.311 0.488 0.5854 0.683 0.865
## gamma2 1.172 1.401 1.5459 1.659 2.001
## gamma3 1.960 2.171 2.3557 2.517 2.682
##
## Next step: Use 'setx' method
Setting values for the explanatory variables to their sample averages:
x.out <- setx(z.out)
## Warning in model.response(mf, "numeric"): using type = "numeric" with a
## factor response will be ignored
## Warning in Ops.ordered(y, z$residuals): '-' is not meaningful for ordered
## factors
Simulating quantities of interest from the posterior distribution given: x.out.
s.out1 <- sim(z.out, x = x.out)
summary(s.out1)
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## net gain 0.3657 0.0548 0.3647 0.26323 0.4774
## little effect 0.5145 0.0573 0.5149 0.40508 0.6305
## modest loss 0.0941 0.0338 0.0905 0.03942 0.1674
## major loss 0.0257 0.0128 0.0230 0.00846 0.0562
## pv
## qi
## little effect net gain
## 9.899 0.101
plot(s.out1)
Estimating the first difference (and risk ratio) in the probabilities of incurring different level of cost when there is no military action versus military action while all the other variables held at their default values.
x.high <- setx(z.out, mil = 0)
## Warning in model.response(mf, "numeric"): using type = "numeric" with a
## factor response will be ignored
## Warning in Ops.ordered(y, z$residuals): '-' is not meaningful for ordered
## factors
x.low <- setx(z.out, mil = 1)
## Warning in model.response(mf, "numeric"): using type = "numeric" with a
## factor response will be ignored
## Warning in model.response(mf, "numeric"): '-' is not meaningful for
## ordered factors
s.out2 <- sim(z.out, x = x.high, x1 = x.low)
summary(s.out2)
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## net gain 0.3644 0.0572 0.3629 0.25733 0.4816
## little effect 0.5150 0.0582 0.5154 0.40408 0.6337
## modest loss 0.0946 0.0342 0.0908 0.03925 0.1686
## major loss 0.0260 0.0132 0.0234 0.00831 0.0577
## pv
## qi
## little effect net gain
## 9.873 0.127
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## net gain 0.3861 0.1450 0.3761 0.13212 0.684
## little effect 0.4845 0.0887 0.4943 0.28913 0.641
## modest loss 0.0977 0.0625 0.0847 0.01480 0.251
## major loss 0.0318 0.0331 0.0214 0.00191 0.122
## pv
## qi
## little effect modest loss net gain
## 7.752 0.028 2.220
## fd
## mean sd 50% 2.5% 97.5%
## net gain 0.02163 0.1521 0.01386 -0.2533 0.3337
## little effect -0.03049 0.0794 -0.00898 -0.2315 0.0771
## modest loss 0.00302 0.0576 -0.00465 -0.0921 0.1362
## major loss 0.00583 0.0323 -0.00182 -0.0357 0.0916
plot(s.out2)
Let be the ordered categorical dependent variable for observation which takes an integer value .
The stochastic component is described by an unobserved continuous variable, ,
Instead of , we observe categorical variable ,
where for are the threshold parameters with the following constraints, for , and .
The probability of observing equal to category is,
where is the cumulative distribution function of the Normal distribution with mean and variance 1.
The systematic component is given by
where is the vector of explanatory variables for observation and is the vector of coefficients.
The prior for is given by
where is the vector of means for the explanatory variables and is the precision matrix (the inverse of a variance-covariance matrix).
The expected values (qi$ev) for the ordered probit model are the predicted probability of belonging to each category:
given the posterior draws of and threshold parameters from the MCMC iterations.
The predicted values (qi$pr) are the observed values of given the observation scheme and the posterior draws of and cut points from the MCMC iterations.
The first difference (qi$fd) in category for the ordered probit model is defined as
The risk ratio (qi$rr) in category is defined as
In conditional prediction models, the average expected treatment effect (qi$att.ev) for the treatment group in category is
where is a binary explanatory variable defining the treatment () and control () groups, and is the number of observations in the treatment group that belong to category .
In conditional prediction models, the average predicted treatment effect (qi$att.pr) for the treatment group in category is
where is a binary explanatory variable defining the treatment () and control () groups, and is the number of observations in the treatment group that belong to category .
The output of each Zelig command contains useful information which you may view. For example, if you run:
z.out <- zelig(y ~ x, model = "oprobit.bayes", data)
then you may examine the available information in z.out by using names(z.out), see the draws from the posterior distribution of the coefficients by using z.out$coefficients, and view a default summary of information through summary(z.out). Other elements available through the $ operator are listed below.
Bayesian ordinal probit regression is part of the MCMCpack library by Andrew D. Martin and Kevin M. Quinn . The convergence diagnostics are part of the CODA library by Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines.
Martin AD, Quinn KM and Park JH (2011). “MCMCpack: Markov Chain Monte Carlo in R.” Journal of Statistical Software, 42 (9), pp. 22. <URL: http://www.jstatsoft.org/v42/i09/>.