Built using Zelig version 5.1.4.90000
Bootstrapping is often used to obtain a robust estimate of the uncertainty of a parameter due to sampling error.
In the nonparametric bootstrap, new datasets are iteratively created by resampling with replacement from the original dataset. Resampling from the available sample data gives an approximation of sampling new datasets from the original population. The model of interest is rerun in each newly constructed dataset, and the distribution of parameter estimates shows the variance of the sampling distribution. This can create a robust numerical confidence interval for the sampling uncertainty of this parameter, and indeed recover a confidence interval when such is not analytically tractable.
In the parametric bootstrap, we instead use the model estimates from the original sample data to create these new datasets, or some other function of those datsets.
The normal algorithm that Zelig uses to simulate quantities of interest is a form of the parametric bootstrap. Zelig has an argument, however, to switch to the nonparametric bootstrap. Hereafter, when we say bootstrap, we imply the nonparametric form.
The zelig()
bootstrap
argument has a default of FALSE
, and can be set to TRUE
or a numeric value giving the number of bootstrapped datasets to run. If set to TRUE
the default is 100 bootstraps. The bootstrap works in combination with other Zelig arguments as follows:
If a dataset of \(N\) observations is supplied with weights, the bootstrap will provide a new dataset of size \(N\), with probability of each observation being resampled directly proportional to the weighting on that observation.
Presently, if the data is multiply imputed, the bootstrap function is disabled. We are currently researching correct interaction and implementation of the bootstrap across imputed datasets.
Presently, the bootstrap is available for all models implemented in Zelig, except MCMC models and time-series models. Models utilizing MCMC sampling aren’t conformable with a bootstrap, as they are different approaches to the same concept. The serial correlation assumptions in time-series models violate the assumption that the observations present in the sample are drawn independently and identically (iid) from the population, and make appropriate bootstrap designs more complicated. Zelig does not presently have a time-series appropriate bootstrap design (such as residual bootstraps or the block bootstrap), so the bootstrap is not an option for this set of models.
Attach sample data:
data(macro)
Estimate the model, setting the number of bootstrapped datasets to construct:
z.out <- zelig(unem ~ gdp + capmob + trade,
model = "ls", data = macro, bootstrap = 500)
## How to cite this model in Zelig:
## R Core Team. 2007.
## ls: Least Squares Regression for Continuous Dependent Variables
## in Christine Choirat, Christopher Gandrud, James Honaker, Kosuke Imai, Gary King, and Olivia Lau,
## "Zelig: Everyone's Statistical Software," http://zeligproject.org/
Summary by default shows the point parameter estimates, with the standard errors generated by the bootstrap.
summary(z.out)
## Model: Combined Bootstraps
##
## Estimate Std.Error z value Pr(>|z|)
## (Intercept) 6.18129 0.43787 14.12 < 2e-16
## gdp -0.32360 0.05580 -5.80 6.7e-09
## capmob 1.42194 0.16502 8.62 < 2e-16
## trade 0.01985 0.00534 3.72 2e-04
##
## For results from individual bootstrapped datasets, use summary(x, subset = i:j)
## Next step: Use 'setx' method
We can instead choose to show the bagging estimator, that is, the average parameter value across all the bootstrapped datasets. Bagging generally trades bias for a reduction in variance that results in lower mean squared error (notably in non-linear models). The bagging estimator can be obtained as:
summary(z.out, bagging = TRUE)
## Model: Combined Bootstraps
##
## Estimate Std.Error z value Pr(>|z|)
## (Intercept) 6.18129 0.43787 14.12 < 2e-16
## gdp -0.32360 0.05580 -5.80 6.7e-09
## capmob 1.42194 0.16502 8.62 < 2e-16
## trade 0.01985 0.00534 3.72 2e-04
##
## For results from individual bootstrapped datasets, use summary(x, subset = i:j)
## Next step: Use 'setx' method
If we want to inspect particular individual results, the subset argument is available:
summary(z.out, subset = 13:15)
## Bootstrapped Dataset 13
## Call:
## z5$zelig(formula = unem ~ gdp + capmob + trade, data = macro,
## bootstrap = 500)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.4731 -2.1510 -0.7507 1.8016 8.0866
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.341864 0.419883 12.722 < 2e-16
## gdp -0.317835 0.062931 -5.050 7.14e-07
## capmob 0.915843 0.180335 5.079 6.23e-07
## trade 0.019740 0.005985 3.298 0.00107
##
## Residual standard error: 2.708 on 346 degrees of freedom
## Multiple R-squared: 0.2023, Adjusted R-squared: 0.1954
## F-statistic: 29.26 on 3 and 346 DF, p-value: < 2.2e-16
##
## Bootstrapped Dataset 14
## Call:
## z5$zelig(formula = unem ~ gdp + capmob + trade, data = macro,
## bootstrap = 500)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.2987 -2.2535 -0.4098 2.0188 5.9420
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.930725 0.435651 13.613 < 2e-16
## gdp -0.303369 0.062188 -4.878 1.64e-06
## capmob 1.414034 0.174193 8.118 8.43e-15
## trade 0.024247 0.005542 4.375 1.61e-05
##
## Residual standard error: 2.817 on 346 degrees of freedom
## Multiple R-squared: 0.2789, Adjusted R-squared: 0.2726
## F-statistic: 44.61 on 3 and 346 DF, p-value: < 2.2e-16
##
## Bootstrapped Dataset 15
## Call:
## z5$zelig(formula = unem ~ gdp + capmob + trade, data = macro,
## bootstrap = 500)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7300 -2.3095 0.0192 2.1121 7.1824
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.834144 0.490435 13.935 < 2e-16
## gdp -0.314014 0.067781 -4.633 5.12e-06
## capmob 1.581995 0.176602 8.958 < 2e-16
## trade 0.013709 0.006002 2.284 0.023
##
## Residual standard error: 2.837 on 346 degrees of freedom
## Multiple R-squared: 0.2667, Adjusted R-squared: 0.2604
## F-statistic: 41.95 on 3 and 346 DF, p-value: < 2.2e-16
##
## Next step: Use 'setx' method
If \(B\) bootstraps were obtained, the first \(B\) results are those models on the bootstrapped data, and the \(B+1\) -th result is the model estimated on the original data. The value of \(B\) is stored in a field in the Zelig object, named bootstrap.num
. In our running example this was 500:
summary(z.out, subset = (z.out$bootstrap.num + 1))
## Bootstrapped Dataset 501
## Call:
## z5$zelig(formula = unem ~ gdp + capmob + trade, data = macro,
## bootstrap = 500)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.3008 -2.0768 -0.3187 1.9789 7.7715
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.181294 0.450572 13.719 < 2e-16
## gdp -0.323601 0.062820 -5.151 4.36e-07
## capmob 1.421939 0.166443 8.543 4.22e-16
## trade 0.019854 0.005606 3.542 0.000452
##
## Residual standard error: 2.746 on 346 degrees of freedom
## Multiple R-squared: 0.2878, Adjusted R-squared: 0.2817
## F-statistic: 46.61 on 3 and 346 DF, p-value: < 2.2e-16
##
## Next step: Use 'setx' method