zelig-quantile¶

Quantile Regression for Continuous Dependent Variables

Use a linear programming implementation of quantile regression to estimate a linear predictor of the th conditional quantile of the population.

Syntax¶

z5 <- zquantile$new()
z5$zelig(Y ~ X1 + X2, weights = w, data = mydata, tau = 0.5)
z5$setx()
z5$sim()

With the Zelig 4 compatibility wrappers:

z.out <- zelig(Y ~ X1 + X2, model = "rq", weights = w, data = mydata, tau = 0.5)
x.out <- setx(z.out)
s.out <- sim(z.out, x = x.out)

Additional Inputs¶

In addition to the standard inputs, zelig() takes the following additional options for quantile regression:

tau: defaults to 0.5. Specifies the conditional quantile(s) that will be estimated. 0.5 corresponds to estimating the conditional median, 0.25 and 0.75 correspond to the conditional quartiles, etc. tau vectors with length greater than 1 are not currently supported. If tau is set outside of the interval [0,1], zelig returns the solution for all possible conditional quantiles given the data, but does not support inference on this fit (setx and sim will fail).
se: a string value that defaults to “nid”. Specifies the method by which the covariance matrix of coefficients is estimated during the sim stage of analysis. se can take the following values, which are passed to the summary.rq function from the quantreg package. These descriptions are copied from the summary.rq documentation.
- "iid" which presumes that the errors are iid and computes an estimate of the asymptotic covariance matrix as in KB(1978).
- "nid" which presumes local (in tau) linearity (in x) of the the conditional quantile functions and computes a Huber sandwich estimate using a local estimate of the sparsity.
- "ker" which uses a kernel estimate of the sandwich as proposed by Powell(1990).
...: additional options passed to rq when fitting the model. See documentation for rq in the quantreg package for more information.

Examples¶

Basic Example with First Differences¶

Attach sample data, in this case a dataset pertaining to the efficiency of plants that convert ammonia to nitric acid. The dependent variable, stack.loss, is 10 times the percentage of ammonia that escaped unconverted:

data(stackloss)

Estimate model:

z.out1 <- zelig(stack.loss ~ Air.Flow + Water.Temp + Acid.Conc.,
                model = "rq", data = stackloss,
                tau = 0.5)

## Warning in readLines(zeligmixedmodels): incomplete final line found on
## '/usr/lib64/R/library/ZeligMultilevel/JSON/zelig5mixedmodels.json'

## How to cite this model in Zelig:
##   Alexander D'Amour. 2008.
##   quantile: Quantile Regression for Continuous Dependent Variables
##   in Christine Choirat, Christopher Gandrud, James Honaker, Kosuke Imai, Gary King, and Olivia Lau,
##   "Zelig: Everyone's Statistical Software," http://zeligproject.org/

Summarize regression coefficients:

summary(z.out1)

## Model:
##
## Call: z5$zelig(formula = stack.loss ~ Air.Flow + Water.Temp + Acid.Conc.,
##     data = stackloss, tau = 0.5)
##
## tau: [1] 0.5
##
## Coefficients:
##             coefficients lower bd upper bd
## (Intercept) -39.6899     -41.6197 -29.6775
## Air.Flow      0.8319       0.5128   1.1412
## Water.Temp    0.5739       0.3218   1.4109
## Acid.Conc.   -0.0609      -0.2135  -0.0289
## Next step: Use 'setx' method

Set explanatory variables to their default (mean/mode) values, with high (80th percentile) and low (20th percentile) values for the water temperature variable (the variable that indiates the temperature of water in the plant’s cooling coils):

x.high <- setx(z.out1, Water.Temp = quantile(stackloss$Water.Temp, 0.8))
x.low <- setx(z.out1, Water.Temp = quantile(stackloss$Water.Temp, 0.2))

Generate first differences for the effect of high versus low water temperature on stack loss:

s.out1 <- sim(z.out1, x = x.high, x1 = x.low)

summary(s.out1)

##
##  sim x :
##  -----
## ev
##   mean   sd  50% 2.5% 97.5%
## 1 19.1 1.57 19.2 15.9  22.1
## pv
##   mean   sd  50% 2.5% 97.5%
## 1 19.1 1.81 19.2 15.6  22.7
##
##  sim x1 :
##  -----
## ev
##   mean   sd  50% 2.5% 97.5%
## 1 15.7 1.17 15.7 13.4  17.9
## pv
##   mean   sd  50% 2.5% 97.5%
## 1 15.7 1.31 15.7 13.2  18.2
## fd
##    mean   sd   50%  2.5% 97.5%
## 1 -3.45 2.08 -3.43 -7.52 0.619

plot(s.out1)

Graphs of Quantities of Interest for Quantile Regression

Using Dummy Variables¶

We can estimate a model of unemployment as a function of macroeconomic indicators and fixed effects for each country (see for help with dummy variables). Note that you do not need to create dummy variables, as the program will automatically parse the unique values in the selected variable into discrete levels.

data(macro)
z.out2 <- zelig(unem ~ gdp + trade + capmob + as.factor(country),
                model = "rq", tau = 0.5,
                data = macro)

## Warning in readLines(zeligmixedmodels): incomplete final line found on
## '/usr/lib64/R/library/ZeligMultilevel/JSON/zelig5mixedmodels.json'

## How to cite this model in Zelig:
##   Alexander D'Amour. 2008.
##   quantile: Quantile Regression for Continuous Dependent Variables
##   in Christine Choirat, Christopher Gandrud, James Honaker, Kosuke Imai, Gary King, and Olivia Lau,
##   "Zelig: Everyone's Statistical Software," http://zeligproject.org/

Set values for the explanatory variables, using the default mean/mode values, with country set to the United States and Japan, respectively:

x.US <- setx(z.out2, country = "United States")
x.Japan <- setx(z.out2, country = "Japan")

Simulate quantities of interest:

s.out2 <- sim(z.out2, x = x.US, x1 = x.Japan)

summary(s.out2)

##
##  sim x :
##  -----
## ev
##   mean    sd  50% 2.5% 97.5%
## 1 11.6 0.609 11.6 10.4  12.8
## pv
##   mean   sd  50% 2.5% 97.5%
## 1 11.6 0.61 11.6 10.4  12.8
##
##  sim x1 :
##  -----
## ev
##   mean    sd  50% 2.5% 97.5%
## 1 6.85 0.404 6.85 6.08  7.65
## pv
##   mean    sd  50% 2.5% 97.5%
## 1 6.85 0.404 6.85 6.09  7.65
## fd
##    mean    sd   50%  2.5% 97.5%
## 1 -4.75 0.423 -4.75 -5.56 -3.95

plot(s.out2)

Graphs of Quantities of Interest for Quantile Regression

Estimating Multiple Quantiles¶

Using the Engel dataset (from the quantreg package) on food expenditure as a function of income, we can use the “quantile” model to estimate multiple conditional quantiles:

data(engel, package="quantreg")
z.out3 <- zelig(foodexp ~ income, model = "rq",
                tau = seq(0.1, 0.9, by = 0.1), data = engel)

We can summarize the coefficient fits, or plot them to compare them to the least squares conditional mean estimator.

summary(z.out3)

plot(summary(z.out3))

Set the value of income to the top quartile and the bottom quartile of the income distribution for each fit:

x.bottom <- setx(z.out3, income=quantile(engel$income, 0.25))
x.top <- setx(z.out3, income=quantile(engel$income, 0.75))

Simulate quantities of interest for each fit simultaneously:

s.out3 <- sim(z.out3, x = x.bottom, x1 = x.top)

Summary

summary(s.out3)

Model¶

The quantile estimator is best introduced by considering the sample median estimator and comparing it to the sample mean estimator. To find the mean of a sample, we solve for the quantity which minimizes the sum squared residuals:

Estimating a quantile is similar, but we solve for which minimizes the sum absolute residuals:

One can confirm the equivalence of these optimization problems and the standard mean and median operators by taking the derivative with respect to the argument and setting it to zero.

The relationship between quantile regression and ordinary least squares regression is analogous to the relationship between the sample median and the sample mean, except we are now solving for the conditional median or conditional mean given covariates and a linear functional form. The optimization problems for the sample mean and median are then easily generalized to optimization problems for estimating conditional means or medians by replacing or with a linear combination of covariates :

Equation [median] can be generalized to provide any quantile of the conditional distribution, not just the median. We do this by weighting the aboslute value function asymmetrically in proportion to the requested th quantile:

0)) + (1-\tau)I(Y-X_i'\beta > 0) \nonumber\end{aligned}"/>

We call the asymmetric absolute value function a “check function”. This optimization problem has no closed form solution and is solved via linear programming.

Equation [beta] now lets us define a conditional quantile estimator. Suppose that for a given set of covariates , the response variable has as true conditional probability distribution where can be any probability density function parametrized by a vector of parameters . This density function defines a value , the true th population quantile given . We can write our conditional quantile estimator as:

Where is the vector that solves equation [beta]. Because we solve for the estimator without constructing a likelihood function, it is not straightforward to specify a systematic and stochastic component for conditional quantile estimates. However, systematic and stochastic components do emerge asymptotically in the large- limit. Asymptotically, is normally distributed, and can be written with stochastic component

And systematic components

Where is the number of datapoints, and is the true population density at the th conditional quantile. Zelig uses this asymptotic approximation of stochastic and systematic components in simulation and numerically estimates the population density to derive . The simulation results should thus be treated with caution when using small datasets as both this asymptotic approximation and the population density approximation can break down.

Quantities of Interest¶

The expected value (qi$ev) is the mean of simulations from the stochastic component,

given a draw of from its sampling distribution. Variation in the expected value distribution comes from estimation uncertainty of .
The predicted value (qi$pr) is the result of a single draw from the stochastic component given a draw of from its sampling distribution. The distribution of predicted values should be centered around the same place as the expected values but have larger variance because it includes both estimation uncertainty and fundamental uncertainty.
This model does not support conditional prediction.

Output Values¶

The Zelig object stores fields containing everything needed to rerun the Zelig output, and all the results and simulations as they are generated. In addition to the summary commands demonstrated above, some simply utility functions (known as getters) provide easy access to the raw fields most commonly of use for further investigation.

In the example above z.out$getcoef() returns the estimated coefficients, z.out$getvcov() returns the estimated covariance matrix, and z.out$getpredict() provides predicted values for all observations in the dataset from the analysis.