Extended log-F Distribution Family Function

Maximum likelihood estimation of the 1-parameter extended log-F distribution.

Usage

extlogF1(tau = c(0.25, 0.5, 0.75), parallel = TRUE ~ 0,
          seppar = 0, tol0 = -0.001,
          llocation = "identitylink", ilocation = NULL,
          lambda.arg = NULL, scale.arg = 1, ishrinkage = 0.95,
          digt = 4, idf.mu = 3, imethod = 1)

Arguments

tau

Numeric, the desired quantiles. A strictly increasing sequence, each value must be in $(0, 1)$. The default values are the three quartiles, matching lms.bcn.

parallel

Similar to alaplace1, applying to the location parameters. One can try fix up the quantile-crossing problem after fitting the model by calling fix.crossing. Use is.crossing to see if there is a problem. The default for parallel is totally FALSE, i.e., FALSE for every variable including the intercept. Quantile-crossing can occur when values of tau are too close, given the data. How the quantiles are modelled with respect to the covariates also has a big effect, e.g., if they are too flexible or too inflexible then the problem is likely to occur. For example, using bs with df = 10 is likely to create problems.

Setting parallel = TRUE results in a totally parallel model; all quantiles are parallel and this assumption can be too strong for some data sets. Instead, fix.crossing only repairs the quantiles that cross. So one must carefully choose values of tau for fitting the original fit.

seppar, tol0

Numeric, both of unit length and nonnegative, the separation and shift parameters. If seppar is positive then any crossing quantile is penalized by the difference cubed multiplied by seppar. The log-likelihood subtracts the penalty. The shift parameter ensures that the result is strictly noncrossing when seppar is large enough; otherwise if tol0 = 0 and seppar is large then the crossing quantiles remain crossed even though the offending amount becomes small but never exactly 0. Informally, tol0 pushes the adjustment enough so that is.crossing should return FALSE.

If tol0 is positive then that is the shift in absolute terms. But tol0 may be assigned a negative value, in which case it is interpreted multiplicatively relative to the midspread of the response; tol0 <- abs(tol0) * midspread. Regardless, fit@extra$tol0 is the amount in absolute terms.

If avoiding the quantile crossing problem is of concern to you, try increasing seppar to decrease the amount of crossing. Probably it is best to choose the smallest value of seppar so that is.crossing returns FALSE. Increasing tol0 relatively or absolutely means the fitted quantiles are allowed to move apart more. However, tau must be considered when choosing tol0.

llocation, ilocation

See Links for more choices and CommonVGAMffArguments for more information. Choosing loglink should usually be good for counts. And choosing logitlink should be a reasonable for proportions. However, avoid choosing tau values close to the boundary, for example, if $p_0$ is the proportion of 0s then choose $p_0 \ll \tau$. For proportions grouped data is much better than ungrouped data, and the bigger the groups the more the granularity so that the empirical proportion can approximate tau more closely.

lambda.arg

Positive tuning parameter which controls the sharpness of the cusp. The limit as it approaches 0 is probably very similar to dalap. The default is to choose the value internally. If scale.arg increases, then probably lambda.arg needs to increase accordingly. If lambda.arg is too large then the empirical quantiles may not be very close to tau. If lambda.arg is too close to 0 then the convergence behaviour will not be good and local solutions found, as well as numerical problems in general. Monitoring convergence is recommended when varying lambda.arg.

scale.arg

Positive scale parameter and sometimes called scale. The transformation used is (y - location) / scale. This function should be okay for response variables having a moderate range (0–100, say), but if very different from this then experimenting with this argument will be a good idea.

ishrinkage, idf.mu, digt

Similar to alaplace1.

imethod

Initialization method. Either the value 1, 2, or .... See CommonVGAMffArguments for more information.

Details

This is an experimental family function for quantile regression. Fasiolo et al. (2020) propose an extended log-F distribution (ELF) however this family function only estimates the location parameter. The distribution has a scale parameter which can be inputted (default value is unity). One location parameter is estimated for each tau value and these are the estimated quantiles. For quantile regression it is not necessary to estimate the scale parameter since the log-likelihood function is triangle shaped.

The ELF is used as an approximation of the asymmetric Laplace distribution (ALD). The latter cannot be estimated properly using Fisher scoring/IRLS but the ELF holds promise because it has continuous derivatives and therefore fewer problems with the regularity conditions. Because the ELF is fitted to data to obtain an empirical result the convergence behaviour may not be gentle and smooth. Hence there is a function-specific control function called extlogF1.control which has something like stepsize = 0.5 and maxits = 100. It has been found that slowing down the rate of convergence produces greater stability during the estimation process. Regardless, convergence should be monitored carefully always.

This function accepts a vector response but not a matrix response.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm and vgam.

References

Fasiolo, M., Wood, S. N., Zaffran, M., Nedellec, R. and Goude, Y. (2020). Fast calibrated additive quantile regression. J. Amer. Statist. Assoc., in press.

Yee, T. W. (2020). On quantile regression based on the 1-parameter extended log-F distribution. In preparation.

Author

Thomas W. Yee

Note

Changes will occur in the future to fine-tune things. In general setting trace = TRUE is strongly encouraged because it is needful to check that convergence occurs properly.

If seppar > 0 then logLik(fit) will return the penalized log-likelihood.

Examples

if (FALSE) { # \dontrun{
nn <- 1000; mytau <- c(0.25, 0.75)
edata <- data.frame(x2 = sort(rnorm(nn)))
edata <- transform(edata, y1 = 1 + x2  + rnorm(nn, sd = exp(-1)),
          y2 = cos(x2) / (1 + abs(x2)) + rnorm(nn, sd = exp(-1)))
fit1 <- vglm(y1 ~ x2, extlogF1(tau = mytau), data = edata)  # trace = TRUE
fit2 <- vglm(y2 ~ bs(x2, 6), extlogF1(tau = mytau), data = edata)
coef(fit1, matrix = TRUE)
fit2@extra$percentile  # Empirical percentiles here
summary(fit2)
c(is.crossing(fit1), is.crossing(fit2))
head(fitted(fit1))
plot(y2 ~ x2, edata, col = "blue")
matlines(with(edata, x2), fitted(fit2), col="orange", lty = 1, lwd = 2) } # }