Bayesian Probit Regression
Use the probit regression model for model binary dependent variables specified as a function of a set of explanatory variables. The model is estimated using a Gibbs sampler. For other models suitable for binary response variables, see Bayesian logistic regression, maximum likelihood logit regression, and maximum likelihood probit regression.
With reference classes:
z5 <- zprobitbayes$new()
z5$zelig(Y ~ X1 + X2, weights = w, data = mydata)
z5$setx()
z5$sim()
With the Zelig 4 compatibility wrappers:
z.out <- zelig(Y ~ X1 + X2, model = "probit.bayes", weights = w, data = mydata)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)
Using the following arguments to monitor the Markov chains:
Use the following parameters to specify the model’s priors:
Use the following arguments to specify optional output for the model:
Zelig users may wish to refer to help(MCMCprobit) for more information.
Attaching the sample dataset:
data(turnout)
Estimating the probit regression using probit.bayes:
z.out <- zelig(vote ~ race + educate, model = "probit.bayes",
data = turnout, verbose = FALSE)
## Warning in readLines(zeligmixedmodels): incomplete final line found on
## '/usr/lib64/R/library/ZeligMultilevel/JSON/zelig5mixedmodels.json'
## How to cite this model in Zelig:
## Ben Goodrich, and Ying Lu. 2013.
## probit-bayes: Bayesian Probit Regression for Dichotomous Dependent Variables
## in Christine Choirat, Christopher Gandrud, James Honaker, Kosuke Imai, Gary King, and Olivia Lau,
## "Zelig: Everyone's Statistical Software," http://zeligproject.org/
You can check for convergence before summarizing the estimates with three diagnostic tests. See the section Diagnostics for Zelig Models for examples of the output with interpretation:
z.out$geweke.diag()
z.out$heidel.diag()
z.out$raftery.diag()
summary(z.out)
## Model:
##
## Iterations = 1001:11000
## Thinning interval = 1
## Number of chains = 1
## Sample size per chain = 10000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## (Intercept) -0.7327 0.12805 1.28e-03 0.002140
## racewhite 0.2989 0.08480 8.48e-04 0.001369
## educate 0.0977 0.00957 9.57e-05 0.000169
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## (Intercept) -0.9854 -0.8189 -0.7325 -0.646 -0.483
## racewhite 0.1359 0.2410 0.2983 0.356 0.467
## educate 0.0791 0.0912 0.0976 0.104 0.117
##
## Next step: Use 'setx' method
Setting values for the explanatory variables to their sample averages:
x.out <- setx(z.out)
Simulating quantities of interest from the posterior distribution given: x.out
s.out1 <- sim(z.out, x = x.out)
summary(s.out1)
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 0.772 0.0103 0.772 0.751 0.791
## pv
## 0 1
## [1,] 2.25 7.75
Estimating the first difference (and risk ratio) in individual’s probability of voting when education is set to be low (25th percentile) versus high (75th percentile) while all the other variables are held at their default values:
x.high <- setx(z.out, educate = quantile(turnout$educate, prob = 0.75))
x.low <- setx(z.out, educate = quantile(turnout$educate, prob = 0.25))
s.out2 <- sim(z.out, x = x.high, x1 = x.low)
summary(s.out2)
##
## sim x :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 0.825 0.0105 0.825 0.803 0.844
## pv
## 0 1
## [1,] 1.77 8.23
##
## sim x1 :
## -----
## ev
## mean sd 50% 2.5% 97.5%
## [1,] 0.706 0.0129 0.706 0.681 0.731
## pv
## 0 1
## [1,] 2.99 7.01
## fd
## mean sd 50% 2.5% 97.5%
## [1,] -0.118 0.0116 -0.118 -0.141 -0.0957
Let be the binary dependent variable for observation which takes the value of either 0 or 1.
The stochastic component is given by
where .
The systematic component is given by
where is the cumulative density function of the standard Normal distribution with mean 0 and variance 1, is the vector of explanatory variables for observation , and is the vector of coefficients.
The prior for is given by
where is the vector of means for the explanatory variables and is the precision matrix (the inverse of a variance-covariance matrix).
The expected values (qi$ev) for the probit model are the predicted probability of a success:
given the posterior draws of from the MCMC iterations.
The predicted values (qi$pr) are draws from the Bernoulli distribution with mean equal to the simulated expected value .
The first difference (qi$fd) for the probit model is defined as
The risk ratio (qi$rr)is defined as
In conditional prediction models, the average expected treatment effect (qi$att.ev) for the treatment group is
where is a binary explanatory variable defining the treatment () and control () groups.
In conditional prediction models, the average predicted treatment effect (qi$att.pr) for the treatment group is
where is a binary explanatory variable defining the treatment () and control () groups.
The output of each Zelig command contains useful information which you may view. For example, if you run:
z.out <- zelig(y ~ x, model = "probit.bayes", data)
then you may examine the available information in z.out by using names(z.out), see the draws from the posterior distribution of the coefficients by using z.out$coefficients, and view a default summary of information through summary(z.out). Other elements available through the $ operator are listed below.
Bayesian probit regression is part of the MCMCpack library by Andrew D. Martin and Kevin M. Quinn . The convergence diagnostics are part of the CODA library by Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines.
Martin AD, Quinn KM and Park JH (2011). “MCMCpack: Markov Chain Monte Carlo in R.” Journal of Statistical Software, 42 (9), pp. 22. <URL: http://www.jstatsoft.org/v42/i09/>.