zeligei-eirxc

Multinomial Dirichlet model for Ecological Inference in RxC tables

Syntax

Unlike other EI models in Zelig, in the RxC case, all the row and column variables need to specified. Let’s assume C1 through CC are the column totals, and R1 through RR are the row totals, and N=R1_i + R2_i + \ldots + RR_i = C1_i + C2_i + \ldots + CC_i is the total in unit i. In the case with three row variables and three column variables, the syntax is:

z.out <- zelig( cbind(C1,C2,C3) ~ cbind(R1,R2,R3), data=data)

Additionally, if C1, C2, R1, R2 are percentages rather than counts, then either formula method above is acceptable, however, you must also provide N the unit totals as:

z.out <- zelig( cbind(C1,C2,C3) ~ cbind(R1,R2,R3), N=N, data=data)

The argument N can be either a numeric vector of the total in each i-th unit, or the character name of a variable in the dataset that contains these values.

First load packages:

library("Zelig")
library("ZeligEI")

Here is an example of all the syntax for the analysis using the first syntax method, and the direct use of the reference classes:

z5 <- zeirxc$new()
z5$zelig(C1 ~ R1, N=myN, weights = w, data = myData)
z5$setx()
z5$sim()

With the Zelig 4 compatibility wrappers this looks like:

z.out <- zelig(C1 ~ R1, N=N, model = "eihier", weights = w, data = myData)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)

Examples

We’ll use a dataset from the eiPack package, of registration data for White, Black, and Native American voters, and party voteshare, in 277 precincts in eight counties of south-eastern North Carolina in 2001.

library("eiPack", quietly=TRUE)
data(senc)

Here is the model estimated in Zelig.

z.out <- zeirxc$new()
z.out$zelig(cbind(dem, rep, non) ~ cbind(black, white, natam), N="total", data = senc)
summary(z.out)
## Model:
##
## Formula:  cbind(dem, rep, non) ~ cbind(black, white, natam)
## Total sims:
## Burnin discarded:
## Sims saved:
##
##
## Acceptance ratios for Beta (averaged over units):
##       dem   rep
## black 0.467 0.394
## white 0.313 0.285
## natam 0.593 0.522
##
## Acceptance ratios for alpha:
##        columns
## rows    dem   rep   non
##   black 0.683 0.192 0.144
##   white 0.702 0.552 0.388
##   natam 0.629 0.203 0.156
##
## Draws for Alpha:
##           Mean   Std. Error 2.5%   97.5%
## black.dem 4.9606 0.8259     3.6629 7.0602
## white.dem 6.9323 0.4460     6.1259 7.8585
## natam.dem 5.9738 1.1092     3.6357 7.8956
## black.rep 0.6211 0.0873     0.4562 0.8134
## white.rep 4.1032 0.2658     3.5925 4.5817
## natam.rep 0.6134 0.0897     0.4676 0.8301
## black.non 0.4989 0.0810     0.3727 0.6651
## white.non 2.1443 0.1392     1.9154 2.4543
## natam.non 0.6025 0.0891     0.4654 0.7921
##
## Aggregate cell counts (summed over units):
##           Mean   Std. Error 2.5%   97.5%
## black.dem  66542   1155      64360  68005
## white.dem 124692   1332     122913 127083
## natam.dem  25112    192      24683  25404
## black.rep   6733    547       6117   7862
## white.rep  94161    670      92712  95146
## natam.rep   1240    118       1020   1476
## black.non   4971    801       3792   6347
## white.non  41682    889      40120  43024
## natam.non   1156    169        840   1482
## Next step: Use 'setx' method

You can check for convergence before summarizing the estimates with three diagnostic tests. See the section Diagnostics for Zelig Models for more examples of these tests, and interpretation of their meaning. For brevity we only show the first set of parameters, the \alpha‘s, as there are twice as many \beta‘s as there are precincts:

gd <- z.out$geweke.diag()
print(gd[[1]])
##
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5
##
## alpha.black.dem alpha.white.dem alpha.natam.dem alpha.black.rep
##          -2.960           3.163          -5.076           0.144
## alpha.white.rep alpha.natam.rep alpha.black.non alpha.white.non
##           1.512           0.310           5.962          -1.962
## alpha.natam.non
##          -0.808
hd <- z.out$heidel.diag()
print(hd[[1]])
##
##                 Stationarity start     p-value
##                 test         iteration
## alpha.black.dem passed       301       0.1087
## alpha.white.dem passed       101       0.1294
## alpha.natam.dem passed       201       0.0667
## alpha.black.rep passed         1       0.0993
## alpha.white.rep passed         1       0.4821
## alpha.natam.rep passed         1       0.7501
## alpha.black.non passed       401       0.0889
## alpha.white.non passed       301       0.3559
## alpha.natam.non passed         1       0.1661
##
##                 Halfwidth Mean  Halfwidth
##                 test
## alpha.black.dem failed    5.248 0.5307
## alpha.white.dem passed    6.887 0.1577
## alpha.natam.dem failed    6.309 0.6997
## alpha.black.rep passed    0.621 0.0449
## alpha.white.rep passed    4.103 0.0790
## alpha.natam.rep passed    0.613 0.0438
## alpha.black.non passed    0.454 0.0319
## alpha.white.non passed    2.175 0.0365
## alpha.natam.non passed    0.603 0.0443
rd <-z.out$raftery.diag()
print(rd[[1]])
##
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95
##
## You need a sample size of at least 3746 with these values of q, r and s

See also

This model is part of the eiPack package by Olivia Lau, Ryan T. Moore and Michael Kellerman. Advanced users may wish to refer to help(ei.MD.bayes) in the eiPack package.

Rosen O, Jiang W, King G and Tanner M (2001). “Bayesian and Frequentist Inference for Ecological Inference: The R x C case.” Statistica Neerlandia, 167, pp. 134-156.

Lau O, Moore RT and Kellermann M (2012). eiPack: eiPack: Ecological Inference and Higher-Dimension Data Management. R package version 0.1-7, <URL: https://CRAN.R-project.org/package=eiPack>.