Nonparametric Bootstraps in Zelig Models

Bootstrapping is often used to obtain a robust estimate of the uncertainty of a parameter due to sampling error.

  • In the nonparametric bootstrap, new datasets are iteratively created by resampling with replacement from the original dataset. Resampling from the available sample data gives an approximation of sampling new datasets from the original population. The model of interest is rerun in each newly constructed dataset, and the distribution of parameter estimates shows the variance of the sampling distribution. This can create a robust numerical confidence interval for the sampling uncertainty of this parameter, and indeed recover a confidence interval when such is not analytically tractable.
  • In the parametric bootstrap, we instead use the model estimates from the original sample data to create these new datasets, or some other function of those datsets.

The normal algorithm that Zelig uses to simulate quantities of interest is a form of the parametric bootstrap. Zelig has an argument, however, to switch to the nonparametric bootstrap. Hereafter, when we say bootstrap, we imply the nonparametric form.

The bootstrap argument has a default of FALSE, and can be set to TRUE or a numeric value giving the number of bootstrapped datasets to run. If set to TRUE the default is 100 bootstraps. The bootstrap works in combination with other Zelig arguments as follows:

  • If a dataset of observations is supplied with weights, the bootstrap will provide a new dataset of size , with probability of each observation being resampled directly proportional to the weighting on that observation.
  • Presently, if the data is multiply imputed, the bootstrap function is disabled. We are currently researching correct interaction and implementation of the bootstrap across imputed datasets.
  • Presently, the bootstrap is available for all models implemented in Zelig, except MCMC models and time-series models. Models utilizing MCMC sampling aren’t conformable with a bootstrap, as they are different approaches to the same concept. The serial correlation assumptions in time-series models violate the assumption that the observations present in the sample are drawn independently and identically (iid) from the population, and make appropriate bootstrap designs more complicated. Zelig does not presently have a time-series appropriate bootstrap design (such as residual bootstraps or the block bootstrap), so the bootstrap is not an option for this set of models.

Examples

Attach sample data:

data(macro)

Estimate the model, setting the number of bootstrapped datasets to construct:

z.out <- zls$new()
z.out$zelig(unem ~ gdp + capmob +
 trade, data = macro, bootstrap=500)

Summary by default shows the point parameter estimates, with the standard errors generated by the bootstrap.

summary(z.out)
## Model: Combined Bootstraps
##             Estimate Std.Error z value  Pr(>|z|)
## (Intercept)  6.18129  0.437869  14.117 0.000e+00 ***
## gdp         -0.32360  0.055803  -5.799 6.671e-09 ***
## capmob       1.42194  0.165019   8.617 0.000e+00 ***
## trade        0.01985  0.005335   3.721 1.981e-04  **
## ---
## Signif. codes:  '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## For results from individual bootstrapped datasets, use summary(x, subset = i:j)
## Next step: Use 'setx' method

We can instead choose to show the bagging estimator, that is, the average parameter value across all the bootstrapped datasets. Bagging generally trades bias for a reduction in variance that results in lower mean squared error (notably in non-linear models). The bagging estimator can be obtained as:

summary(z.out, bagging=TRUE)
## Model: Combined Bootstraps
##             Estimate Std.Error z value  Pr(>|z|)
## (Intercept)  6.16514  0.437869  14.080 0.000e+00 ***
## gdp         -0.32175  0.055803  -5.766 8.125e-09 ***
## capmob       1.41801  0.165019   8.593 0.000e+00 ***
## trade        0.01998  0.005335   3.744 1.809e-04  **
## ---
## Signif. codes:  '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## For results from individual bootstrapped datasets, use summary(x, subset = i:j)
## Next step: Use 'setx' method

If we want to inspect particular individual results, the subset argument is available:

summary(z.out, subset=13:15)
## Bootstrapped Dataset 13
## Call:
## z.out$zelig(formula = unem ~ gdp + capmob + trade, data = macro,
##     bootstrap = 500)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -4.4731 -2.1510 -0.7507  1.8016  8.0866
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  5.341864   0.419883  12.722  < 2e-16
## gdp         -0.317835   0.062931  -5.050 7.14e-07
## capmob       0.915843   0.180335   5.079 6.23e-07
## trade        0.019740   0.005985   3.298  0.00107
##
## Residual standard error: 2.708 on 346 degrees of freedom
## Multiple R-squared:  0.2023,     Adjusted R-squared:  0.1954
## F-statistic: 29.26 on 3 and 346 DF,  p-value: < 2.2e-16
##
## Bootstrapped Dataset 14
## Call:
## z.out$zelig(formula = unem ~ gdp + capmob + trade, data = macro,
##     bootstrap = 500)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -5.2987 -2.2535 -0.4098  2.0188  5.9420
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  5.930725   0.435651  13.613  < 2e-16
## gdp         -0.303369   0.062188  -4.878 1.64e-06
## capmob       1.414034   0.174193   8.118 8.43e-15
## trade        0.024247   0.005542   4.375 1.61e-05
##
## Residual standard error: 2.817 on 346 degrees of freedom
## Multiple R-squared:  0.2789,     Adjusted R-squared:  0.2726
## F-statistic: 44.61 on 3 and 346 DF,  p-value: < 2.2e-16
##
## Bootstrapped Dataset 15
## Call:
## z.out$zelig(formula = unem ~ gdp + capmob + trade, data = macro,
##     bootstrap = 500)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -5.7300 -2.3095  0.0192  2.1121  7.1824
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  6.834144   0.490435  13.935  < 2e-16
## gdp         -0.314014   0.067781  -4.633 5.12e-06
## capmob       1.581995   0.176602   8.958  < 2e-16
## trade        0.013709   0.006002   2.284    0.023
##
## Residual standard error: 2.837 on 346 degrees of freedom
## Multiple R-squared:  0.2667,     Adjusted R-squared:  0.2604
## F-statistic: 41.95 on 3 and 346 DF,  p-value: < 2.2e-16
##
## Next step: Use 'setx' method

If bootstraps were obtained, the first results are those models on the bootstrapped data, and the -th result is the model estimated on the original data. The value of is stored in a field in the Zelig object, named bootstrap.num. In our running example this was 500:

summary(z.out, subset=(z.out$bootstrap.num + 1))
## Bootstrapped Dataset 501
## Call:
## z.out$zelig(formula = unem ~ gdp + capmob + trade, data = macro,
##     bootstrap = 500)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -5.3008 -2.0768 -0.3187  1.9789  7.7715
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  6.181294   0.450572  13.719  < 2e-16
## gdp         -0.323601   0.062820  -5.151 4.36e-07
## capmob       1.421939   0.166443   8.543 4.22e-16
## trade        0.019854   0.005606   3.542 0.000452
##
## Residual standard error: 2.746 on 346 degrees of freedom
## Multiple R-squared:  0.2878,     Adjusted R-squared:  0.2817
## F-statistic: 46.61 on 3 and 346 DF,  p-value: < 2.2e-16
##
## Next step: Use 'setx' method