Weights are often added to statistical models to adjust the observed sample distribution in the data to an underlying population of interest. For example, some types of observations may have been intentionally oversampled, and need to be downweighted for population inferences, or weights may have been created by a matching procedure to create a dataset with treatment and control groups that resemble randomized designs and achieve balance in covariates.
The weights argument, can be a vector of weight values, or a name of a variable in the dataset.
Not all the R implementations of statistical models that Zelig uses have been written to accept weights or use them in estimation. When weights have been supplied by the user, but weights are not written into the package for that model, Zelig is still able to use the weights by one of two procedures:
Here we are building a simulated dataset where in the first fifty observations y has a positive relationship with x and in the next fifty observations a negative relationship.
x <- runif(90)
y <- c( 2*x[1:45], -3*x[46:90] ) + rnorm(90)
z <- as.numeric(y>0)
w1 <- c(rep(1.8, 45), rep(0.2,45))
mydata <- data.frame(z,y,x,w1)
w2 <- rep(c(1.8,0.2), 45)
In the first example below, we are passing the name of a variable included in the dataset. We see the weights are correctly implemented as we are more heavily weighting the first 50 observations, where there is a positive relationship, and positive relationship is seen in the regression.
z1.out <- zelig( y ~ x, cite=FALSE, model="ls", weights="w1", data=mydata)
summary(z1.out)
## Model:
##
## Call:
## z5$zelig(formula = y ~ x, data = mydata, weights = "w1")
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -2.5371 -1.1768 -0.4990 0.2021 3.6811
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.07124 0.26640 -0.267 0.78978
## x 1.56864 0.50014 3.136 0.00233
##
## Residual standard error: 1.277 on 88 degrees of freedom
## Multiple R-squared: 0.1005, Adjusted R-squared: 0.09033
## F-statistic: 9.837 on 1 and 88 DF, p-value: 0.002326
##
## Next step: Use 'setx' method
In our second example, the weights are provided as a separate vector of the same length as the dataset. These weights give weight to both relationships present when we constructed the data, and we see the estimated relationship is now negative.
z2.out <- zelig( y ~ x, cite=FALSE, model="ls", weights=w2, data=mydata)
summary(z2.out)
## Model:
##
## Call:
## z5$zelig(formula = y ~ x, data = mydata, weights = w2)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -3.4060 -0.9061 -0.1566 0.6452 4.2139
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.04364 0.25848 0.169 0.866
## x -0.22032 0.55572 -0.396 0.693
##
## Residual standard error: 1.398 on 88 degrees of freedom
## Multiple R-squared: 0.001783, Adjusted R-squared: -0.00956
## F-statistic: 0.1572 on 1 and 88 DF, p-value: 0.6927
##
## Next step: Use 'setx' method
Some checking of the supplied weights are conducted, and warnings or error messages will be given to the user if, for example, the supplied weights are of the wrong length, or the variable name supplied is not present in the dataset. Negative weights are treated as zero weights. Here we use the object oriented approach to building the Zelig object.
z3.out <- zls$new()
z3.out$zelig( y ~ x, weights="noSuchName", data=mydata)
## Variable name given for weights not found in dataset, so will be ignored.
z4.out <- zls$new()
z4.out$zelig( y ~ x, weights=w2[1:10], data=mydata)
## Length of vector given for weights is not equal to number of observations in dataset, and will be ignored.
Here we use a model where sampling weights are not accepted by the underlying package, so Zelig gives a warning message that bootstrapping will be conducted to construct a dataset.
continuous.weights <- rep(x=c(0.6, 1, 1.4), times=30)
z5.out <- zelig( z ~ x, model="logit", weights=continuous.weights, data=mydata)
## Noninteger weights were set, but the model in Zelig is only able to use integer valued weights.
## A bootstrapped version of the dataset was constructed using the weights as sample probabilities.
##
## How to cite this model in Zelig:
## R Core Team. 2007.
## logit: Logistic Regression for Dichotomous Dependent Variables
## in Christine Choirat, James Honaker, Kosuke Imai, Gary King, and Olivia Lau,
## "Zelig: Everyone's Statistical Software," http://zeligproject.org/
But when the weights happen to be integer valued, then Zelig can construct a dataset by a combination of duplicating and deleting observations.
integer.weights <- rep(x=c(0, 1, 2), times=30)
z6.out <- zelig( z ~ x, model="logit", weights=integer.weights, data=mydata)
## How to cite this model in Zelig:
## R Core Team. 2007.
## logit: Logistic Regression for Dichotomous Dependent Variables
## in Christine Choirat, James Honaker, Kosuke Imai, Gary King, and Olivia Lau,
## "Zelig: Everyone's Statistical Software," http://zeligproject.org/
Weights that are creating using the matching mechanisms in the MatchIt package will be automatically employed in Zelig analyses if the output object from MatchIt is passed to Zelig as the data argument. For more detail, see Using Zelig with MatchIt