Built using Zelig version 5.1.4.90000
Multinomial Dirichlet model for Ecological Inference in RxC tables with rxc
using ZeligEI.
Unlike other EI models in Zelig, in the \(RxC\) case, all the row and column variables need to specified. Let’s assume \(C1\) through \(CC\) are the column totals, and \(R1\) through \(RR\) are the row totals, and \(N=R1_i + R2_i + \ldots + RR_i = C1_i + C2_i + \ldots + CC_i\) is the total in unit \(i\). In the case with three row variables and three column variables, the syntax is:
z.out <- zelig(cbind(C1,C2,C3) ~ cbind(R1,R2,R3), data=data)
Additionally, if C1
, C2
, R1
, R2
are percentages rather than counts, then you must also provide N
the unit totals as:
z.out <- zelig( cbind(C1,C2,C3) ~ cbind(R1,R2,R3), N=N, data=data)
The argument N
can be either a numeric vector of the total in each i-th unit, or the character name of a variable in the dataset that contains these values.
First load packages:
library(zeligverse)
Here is an example of all the syntax for the analysis using the first syntax method, and the direct use of the reference classes:
z5 <- zeirxc$new()
z5$zelig(cbind(C1, C2, C3) ~ cbind(R1, R2, R3), N = myN,
weights = w, data = myData)
With the Zelig 4 compatibility wrappers this looks like:
z.out <- zelig(cbind(C1, C2, C3) ~ cbind(R1, R2, R3), N = myN,
model = "rxc", weights = w, data = myData)
We’ll use a dataset from the eiPack
package, of registration data for White, Black, and Native American voters, and party voteshare, in 277 precincts in eight counties of south-eastern North Carolina in 2001.
library("eiPack", quietly=TRUE)
data(senc)
Here is the model estimated in Zelig.
z.out <- zeirxc$new()
z.out$zelig(cbind(dem, rep, non) ~ cbind(black, white, natam),
N = "total", data = senc)
summary(z.out)
## Model:
##
## Formula: cbind(dem, rep, non) ~ cbind(black, white, natam)
## Total sims:
## Burnin discarded:
## Sims saved:
##
##
## Acceptance ratios for Beta (averaged over units):
## dem rep
## black 0.467 0.394
## white 0.313 0.285
## natam 0.593 0.522
##
## Acceptance ratios for alpha:
## columns
## rows dem rep non
## black 0.683 0.192 0.144
## white 0.702 0.552 0.388
## natam 0.629 0.203 0.156
##
## Draws for Alpha:
## Mean Std. Error 2.5% 97.5%
## black.dem 4.9606 0.8259 3.6629 7.0602
## white.dem 6.9323 0.4460 6.1259 7.8585
## natam.dem 5.9738 1.1092 3.6357 7.8956
## black.rep 0.6211 0.0873 0.4562 0.8134
## white.rep 4.1032 0.2658 3.5925 4.5817
## natam.rep 0.6134 0.0897 0.4676 0.8301
## black.non 0.4989 0.0810 0.3727 0.6651
## white.non 2.1443 0.1392 1.9154 2.4543
## natam.non 0.6025 0.0891 0.4654 0.7921
##
## Aggregate cell counts (summed over units):
## Mean Std. Error 2.5% 97.5%
## black.dem 66542 1155 64360 68005
## white.dem 124692 1332 122913 127083
## natam.dem 25112 192 24683 25404
## black.rep 6733 547 6117 7862
## white.rep 94161 670 92712 95146
## natam.rep 1240 118 1020 1476
## black.non 4971 801 3792 6347
## white.non 41682 889 40120 43024
## natam.non 1156 169 840 1482
You can check for convergence before summarizing the estimates with three diagnostic tests. See the section Diagnostics for Zelig Models for more examples of these tests, and interpretation of their meaning. For brevity we only show the first set of parameters, the \(\alpha\)’s, as there are twice as many \(\beta\)’s as there are precincts:
gd <- z.out$geweke.diag()
print(gd[[1]])
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
## alpha.black.dem alpha.white.dem alpha.natam.dem alpha.black.rep
## -2.9601 3.1625 -5.0757 0.1445
## alpha.white.rep alpha.natam.rep alpha.black.non alpha.white.non
## 1.5115 0.3101 5.9617 -1.9623
## alpha.natam.non
## -0.8079
hd <- z.out$heidel.diag()
print(hd[[1]])
##
## Stationarity start p-value
## test iteration
## alpha.black.dem passed 301 0.1087
## alpha.white.dem passed 101 0.1294
## alpha.natam.dem passed 201 0.0667
## alpha.black.rep passed 1 0.0993
## alpha.white.rep passed 1 0.4821
## alpha.natam.rep passed 1 0.7501
## alpha.black.non passed 401 0.0889
## alpha.white.non passed 301 0.3559
## alpha.natam.non passed 1 0.1661
##
## Halfwidth Mean Halfwidth
## test
## alpha.black.dem failed 5.248 0.5307
## alpha.white.dem passed 6.887 0.1577
## alpha.natam.dem failed 6.309 0.6997
## alpha.black.rep passed 0.621 0.0449
## alpha.white.rep passed 4.103 0.0790
## alpha.natam.rep passed 0.613 0.0438
## alpha.black.non passed 0.454 0.0319
## alpha.white.non passed 2.175 0.0365
## alpha.natam.non passed 0.603 0.0443
rd <-z.out$raftery.diag()
print(rd[[1]])
##
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95
##
## You need a sample size of at least 3746 with these values of q, r and s
This model is part of the eiPack package by Olivia Lau, Ryan T. Moore and Michael Kellerman. Advanced users may wish to refer to help(ei.MD.bayes)
in the eiPack package.
Rosen O, Jiang W, King G and Tanner M (2001). “Bayesian and Frequentist Inference for Ecological Inference: The R x C case.” Statistica Neerlandia, 167, pp. 134-156.
Lau O, Moore RT and Kellermann M (2012). eiPack: eiPack: Ecological Inference and Higher-Dimension Data Management. R package version 0.1-7, <URL: https://CRAN.R-project.org/package=eiPack>.