Extended log-F Distribution Family Function
extlogF1.RdMaximum likelihood estimation of the 1-parameter extended log-F distribution.
Usage
extlogF1(tau = c(0.25, 0.5, 0.75), parallel = TRUE ~ 0,
seppar = 0, tol0 = -0.001,
llocation = "identitylink", ilocation = NULL,
lambda.arg = NULL, scale.arg = 1, ishrinkage = 0.95,
digt = 4, idf.mu = 3, imethod = 1)Arguments
- tau
Numeric, the desired quantiles. A strictly increasing sequence, each value must be in \((0, 1)\). The default values are the three quartiles, matching
lms.bcn.- parallel
Similar to
alaplace1, applying to the location parameters. One can try fix up the quantile-crossing problem after fitting the model by callingfix.crossing. Useis.crossingto see if there is a problem. The default forparallelis totallyFALSE, i.e.,FALSEfor every variable including the intercept. Quantile-crossing can occur when values oftauare too close, given the data. How the quantiles are modelled with respect to the covariates also has a big effect, e.g., if they are too flexible or too inflexible then the problem is likely to occur. For example, usingbswithdf = 10is likely to create problems.Setting
parallel = TRUEresults in a totally parallel model; all quantiles are parallel and this assumption can be too strong for some data sets. Instead,fix.crossingonly repairs the quantiles that cross. So one must carefully choose values oftaufor fitting the original fit.- seppar, tol0
Numeric, both of unit length and nonnegative, the separation and shift parameters. If
sepparis positive then any crossing quantile is penalized by the difference cubed multiplied byseppar. The log-likelihood subtracts the penalty. The shift parameter ensures that the result is strictly noncrossing whensepparis large enough; otherwise iftol0 = 0andsepparis large then the crossing quantiles remain crossed even though the offending amount becomes small but never exactly 0. Informally,tol0pushes the adjustment enough so thatis.crossingshould returnFALSE.If
tol0is positive then that is the shift in absolute terms. Buttol0may be assigned a negative value, in which case it is interpreted multiplicatively relative to the midspread of the response;tol0 <- abs(tol0) * midspread. Regardless,fit@extra$tol0is the amount in absolute terms.If avoiding the quantile crossing problem is of concern to you, try increasing
sepparto decrease the amount of crossing. Probably it is best to choose the smallest value ofsepparso thatis.crossingreturnsFALSE. Increasingtol0relatively or absolutely means the fitted quantiles are allowed to move apart more. However,taumust be considered when choosingtol0.
- llocation, ilocation
See
Linksfor more choices andCommonVGAMffArgumentsfor more information. Choosingloglinkshould usually be good for counts. And choosinglogitlinkshould be a reasonable for proportions. However, avoid choosingtauvalues close to the boundary, for example, if \(p_0\) is the proportion of 0s then choose \(p_0 \ll \tau\). For proportions grouped data is much better than ungrouped data, and the bigger the groups the more the granularity so that the empirical proportion can approximatetaumore closely.- lambda.arg
Positive tuning parameter which controls the sharpness of the cusp. The limit as it approaches 0 is probably very similar to
dalap. The default is to choose the value internally. Ifscale.argincreases, then probablylambda.argneeds to increase accordingly. Iflambda.argis too large then the empirical quantiles may not be very close totau. Iflambda.argis too close to 0 then the convergence behaviour will not be good and local solutions found, as well as numerical problems in general. Monitoring convergence is recommended when varyinglambda.arg.- scale.arg
Positive scale parameter and sometimes called
scale. The transformation used is(y - location) / scale. This function should be okay for response variables having a moderate range (0–100, say), but if very different from this then experimenting with this argument will be a good idea.- ishrinkage, idf.mu, digt
Similar to
alaplace1.- imethod
Initialization method. Either the value 1, 2, or .... See
CommonVGAMffArgumentsfor more information.
Details
This is an experimental family function for quantile regression.
Fasiolo et al. (2020) propose an extended log-F distribution
(ELF)
however this family function only estimates the location parameter.
The distribution has a scale parameter which can be inputted
(default value is unity).
One location parameter is estimated for each tau value
and these are the estimated quantiles.
For quantile regression it is not necessary to estimate
the scale parameter since the log-likelihood function is
triangle shaped.
The ELF is used as an approximation of the asymmetric Laplace
distribution (ALD).
The latter cannot be estimated properly using Fisher scoring/IRLS
but the ELF holds promise because it has continuous derivatives
and therefore fewer problems with the regularity conditions.
Because the ELF is fitted to data to obtain an
empirical result the convergence behaviour may not be gentle
and smooth.
Hence there is a function-specific control function called
extlogF1.control which has something like
stepsize = 0.5 and maxits = 100.
It has been found that
slowing down the rate of convergence produces greater
stability during the estimation process.
Regardless, convergence should be monitored carefully always.
This function accepts a vector response but not a matrix response.
Value
An object of class "vglmff" (see vglmff-class).
The object is used by modelling functions such as vglm
and vgam.
References
Fasiolo, M., Wood, S. N., Zaffran, M., Nedellec, R. and Goude, Y. (2020). Fast calibrated additive quantile regression. J. Amer. Statist. Assoc., in press.
Yee, T. W. (2020). On quantile regression based on the 1-parameter extended log-F distribution. In preparation.
Note
Changes will occur in the future to fine-tune things.
In general
setting trace = TRUE is strongly encouraged because it is
needful to check that convergence occurs properly.
If seppar > 0 then logLik(fit) will return the
penalized log-likelihood.
See also
dextlogF,
is.crossing,
fix.crossing,
eCDF,
vglm.control,
logF,
alaplace1,
dalap,
lms.bcn.
Examples
if (FALSE) { # \dontrun{
nn <- 1000; mytau <- c(0.25, 0.75)
edata <- data.frame(x2 = sort(rnorm(nn)))
edata <- transform(edata, y1 = 1 + x2 + rnorm(nn, sd = exp(-1)),
y2 = cos(x2) / (1 + abs(x2)) + rnorm(nn, sd = exp(-1)))
fit1 <- vglm(y1 ~ x2, extlogF1(tau = mytau), data = edata) # trace = TRUE
fit2 <- vglm(y2 ~ bs(x2, 6), extlogF1(tau = mytau), data = edata)
coef(fit1, matrix = TRUE)
fit2@extra$percentile # Empirical percentiles here
summary(fit2)
c(is.crossing(fit1), is.crossing(fit2))
head(fitted(fit1))
plot(y2 ~ x2, edata, col = "blue")
matlines(with(edata, x2), fitted(fit2), col="orange", lty = 1, lwd = 2) } # }