Bootstrapping is often used to obtain a robust estimate of the uncertainty of a parameter due to sampling error.
The normal algorithm that Zelig uses to simulate quantities of interest is a form of the parametric bootstrap. Zelig has an argument, however, to switch to the nonparametric bootstrap. Hereafter, when we say bootstrap, we imply the nonparametric form.
The bootstrap argument has a default of FALSE, and can be set to TRUE or a numeric value giving the number of bootstrapped datasets to run. If set to TRUE the default is 100 bootstraps. The bootstrap works in combination with other Zelig arguments as follows:
Attach sample data:
data(macro)
Estimate the model, setting the number of bootstrapped datasets to construct:
z.out <- zls$new()
z.out$zelig(unem ~ gdp + capmob +
trade, data = macro, bootstrap=500)
Summary by default shows the point parameter estimates, with the standard errors generated by the bootstrap.
summary(z.out)
## Model: Combined Bootstraps
## Estimate Std.Error z value Pr(>|z|)
## (Intercept) 6.18129 0.437869 14.117 0.000e+00 ***
## gdp -0.32360 0.055803 -5.799 6.671e-09 ***
## capmob 1.42194 0.165019 8.617 0.000e+00 ***
## trade 0.01985 0.005335 3.721 1.981e-04 **
## ---
## Signif. codes: '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## For results from individual bootstrapped datasets, use summary(x, subset = i:j)
## Next step: Use 'setx' method
We can instead choose to show the bagging estimator, that is, the average parameter value across all the bootstrapped datasets. Bagging generally trades bias for a reduction in variance that results in lower mean squared error (notably in non-linear models). The bagging estimator can be obtained as:
summary(z.out, bagging=TRUE)
## Model: Combined Bootstraps
## Estimate Std.Error z value Pr(>|z|)
## (Intercept) 6.16514 0.437869 14.080 0.000e+00 ***
## gdp -0.32175 0.055803 -5.766 8.125e-09 ***
## capmob 1.41801 0.165019 8.593 0.000e+00 ***
## trade 0.01998 0.005335 3.744 1.809e-04 **
## ---
## Signif. codes: '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## For results from individual bootstrapped datasets, use summary(x, subset = i:j)
## Next step: Use 'setx' method
If we want to inspect particular individual results, the subset argument is available:
summary(z.out, subset=13:15)
## Bootstrapped Dataset 13
## Call:
## z.out$zelig(formula = unem ~ gdp + capmob + trade, data = macro,
## bootstrap = 500)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.4731 -2.1510 -0.7507 1.8016 8.0866
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.341864 0.419883 12.722 < 2e-16
## gdp -0.317835 0.062931 -5.050 7.14e-07
## capmob 0.915843 0.180335 5.079 6.23e-07
## trade 0.019740 0.005985 3.298 0.00107
##
## Residual standard error: 2.708 on 346 degrees of freedom
## Multiple R-squared: 0.2023, Adjusted R-squared: 0.1954
## F-statistic: 29.26 on 3 and 346 DF, p-value: < 2.2e-16
##
## Bootstrapped Dataset 14
## Call:
## z.out$zelig(formula = unem ~ gdp + capmob + trade, data = macro,
## bootstrap = 500)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.2987 -2.2535 -0.4098 2.0188 5.9420
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.930725 0.435651 13.613 < 2e-16
## gdp -0.303369 0.062188 -4.878 1.64e-06
## capmob 1.414034 0.174193 8.118 8.43e-15
## trade 0.024247 0.005542 4.375 1.61e-05
##
## Residual standard error: 2.817 on 346 degrees of freedom
## Multiple R-squared: 0.2789, Adjusted R-squared: 0.2726
## F-statistic: 44.61 on 3 and 346 DF, p-value: < 2.2e-16
##
## Bootstrapped Dataset 15
## Call:
## z.out$zelig(formula = unem ~ gdp + capmob + trade, data = macro,
## bootstrap = 500)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7300 -2.3095 0.0192 2.1121 7.1824
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.834144 0.490435 13.935 < 2e-16
## gdp -0.314014 0.067781 -4.633 5.12e-06
## capmob 1.581995 0.176602 8.958 < 2e-16
## trade 0.013709 0.006002 2.284 0.023
##
## Residual standard error: 2.837 on 346 degrees of freedom
## Multiple R-squared: 0.2667, Adjusted R-squared: 0.2604
## F-statistic: 41.95 on 3 and 346 DF, p-value: < 2.2e-16
##
## Next step: Use 'setx' method
If bootstraps were obtained, the first results are those models on the bootstrapped data, and the -th result is the model estimated on the original data. The value of is stored in a field in the Zelig object, named bootstrap.num. In our running example this was 500:
summary(z.out, subset=(z.out$bootstrap.num + 1))
## Bootstrapped Dataset 501
## Call:
## z.out$zelig(formula = unem ~ gdp + capmob + trade, data = macro,
## bootstrap = 500)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.3008 -2.0768 -0.3187 1.9789 7.7715
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.181294 0.450572 13.719 < 2e-16
## gdp -0.323601 0.062820 -5.151 4.36e-07
## capmob 1.421939 0.166443 8.543 4.22e-16
## trade 0.019854 0.005606 3.542 0.000452
##
## Residual standard error: 2.746 on 346 degrees of freedom
## Multiple R-squared: 0.2878, Adjusted R-squared: 0.2817
## F-statistic: 46.61 on 3 and 346 DF, p-value: < 2.2e-16
##
## Next step: Use 'setx' method