LMS Quantile Regression with a Yeo-Johnson Transformation to Normality
lms.yjn.RdLMS quantile regression with the Yeo-Johnson transformation to normality. This family function is experimental and the LMS-BCN family function is recommended instead.
Usage
lms.yjn(percentiles = c(25, 50, 75), zero = c("lambda", "sigma"),
llambda = "identitylink", lsigma = "loglink",
idf.mu = 4, idf.sigma = 2,
ilambda = 1, isigma = NULL, rule = c(10, 5),
yoffset = NULL, diagW = FALSE, iters.diagW = 6)
lms.yjn2(percentiles = c(25, 50, 75), zero = c("lambda", "sigma"),
llambda = "identitylink", lmu = "identitylink", lsigma = "loglink",
idf.mu = 4, idf.sigma = 2, ilambda = 1.0,
isigma = NULL, yoffset = NULL, nsimEIM = 250)Arguments
- percentiles
A numerical vector containing values between 0 and 100, which are the quantiles. They will be returned as `fitted values'.
- zero
See
lms.bcn. SeeCommonVGAMffArgumentsfor more information.- llambda, lmu, lsigma
See
lms.bcn.- idf.mu, idf.sigma
See
lms.bcn.- ilambda, isigma
See
lms.bcn.- rule
Number of abscissae used in the Gaussian integration scheme to work out elements of the weight matrices. The values given are the possible choices, with the first value being the default. The larger the value, the more accurate the approximation is likely to be but involving more computational expense.
- yoffset
A value to be added to the response y, for the purpose of centering the response before fitting the model to the data. The default value,
NULL, means-median(y)is used, so that the response actually used has median zero. Theyoffsetis saved on the object and used during prediction.- diagW
Logical. This argument is offered because the expected information matrix may not be positive-definite. Using the diagonal elements of this matrix results in a higher chance of it being positive-definite, however convergence will be very slow.
If
TRUE, then the firstiters.diagWiterations will use the diagonal of the expected information matrix. The default isFALSE, meaning faster convergence.- iters.diagW
Integer. Number of iterations in which the diagonal elements of the expected information matrix are used. Only used if
diagW = TRUE.- nsimEIM
See
CommonVGAMffArgumentsfor more information.
Details
Given a value of the covariate, this function applies a
Yeo-Johnson transformation to the response to best obtain
normality. The parameters chosen to do this are estimated by
maximum likelihood or penalized maximum likelihood.
The function lms.yjn2() estimates the expected information
matrices using simulation (and is consequently slower) while
lms.yjn() uses numerical integration.
Try the other if one function fails.
Value
An object of class "vglmff"
(see vglmff-class).
The object is used by modelling functions
such as vglm
and vgam.
References
Yeo, I.-K. and Johnson, R. A. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87, 954–959.
Yee, T. W. (2004). Quantile regression via vector generalized additive models. Statistics in Medicine, 23, 2295–2315.
Yee, T. W. (2002). An Implementation for Regression Quantile Estimation. Pages 3–14. In: Haerdle, W. and Ronz, B., Proceedings in Computational Statistics COMPSTAT 2002. Heidelberg: Physica-Verlag.
Note
The response may contain both positive and negative values. In contrast, the LMS-Box-Cox-normal and LMS-Box-Cox-gamma methods only handle a positive response because the Box-Cox transformation cannot handle negative values.
Some other notes can be found at lms.bcn.
Warning
The computations are not simple, therefore convergence may fail. In that case, try different starting values.
The generic function predict, when applied to a
lms.yjn fit, does not add back the yoffset value.
As described above, this family function is experimental and the LMS-BCN family function is recommended instead.
Examples
fit <- vgam(BMI ~ s(age, df = 4), lms.yjn, bmi.nz, trace = TRUE)
#> VGAM s.vam loop 1 : loglikelihood = -1361.2243
#> VGAM s.vam loop 2 : loglikelihood = -1357.5289
#> VGAM s.vam loop 3 : loglikelihood = -1357.3829
#> VGAM s.vam loop 4 : loglikelihood = -1357.3542
#> VGAM s.vam loop 5 : loglikelihood = -1357.3539
#> VGAM s.vam loop 6 : loglikelihood = -1357.3519
#> VGAM s.vam loop 7 : loglikelihood = -1357.3521
#> VGAM s.vam loop 8 : loglikelihood = -1357.3519
#> VGAM s.vam loop 9 : loglikelihood = -1357.3519
head(predict(fit))
#> lambda mu loglink(sigma)
#> 1 0.8078551 -0.77249525 1.422234
#> 2 0.8078551 -0.05619456 1.422234
#> 3 0.8078551 0.42445223 1.422234
#> 4 0.8078551 -0.50654859 1.422234
#> 5 0.8078551 0.90505962 1.422234
#> 6 0.8078551 -0.07976647 1.422234
head(fitted(fit))
#> 25% 50% 75%
#> 1 23.07743 25.37026 28.41762
#> 2 23.63333 26.04344 29.34193
#> 3 24.01604 26.53978 29.98527
#> 4 23.28199 25.61305 28.75568
#> 5 24.40811 27.07187 30.64528
#> 6 23.61477 26.02016 29.31084
head(bmi.nz)
#> age BMI
#> 1 31.52966 22.77107
#> 2 39.38045 27.70033
#> 3 43.38940 28.18127
#> 4 34.84894 25.08380
#> 5 53.81990 26.46388
#> 6 39.17002 36.19648
# Person 1 is near the lower quartile of BMI amongst people his age
head(cdf(fit))
#> 1 2 3 4 5 6
#> 0.2201431 0.6410319 0.6331574 0.4435149 0.4470689 0.9646218
if (FALSE) { # \dontrun{
# Quantile plot
par(bty = "l", mar = c(5, 4, 4, 3) + 0.1, xpd = TRUE)
qtplot(fit, percentiles = c(5, 50, 90, 99), main = "Quantiles",
xlim = c(15, 90), las = 1, ylab = "BMI", lwd = 2, lcol = 4)
# Density plot
ygrid <- seq(15, 43, len = 100) # BMI ranges
par(mfrow = c(1, 1), lwd = 2)
(Z <- deplot(fit, x0 = 20, y = ygrid, xlab = "BMI", col = "black",
main = "PDFs at Age = 20 (black), 42 (red) and 55 (blue)"))
Z <- deplot(fit, x0 = 42, y = ygrid, add = TRUE, llty = 2, col = "red")
Z <- deplot(fit, x0 = 55, y = ygrid, add = TRUE, llty = 4, col = "blue",
Attach = TRUE)
with(Z@post, deplot) # Contains PDF values; == a@post$deplot
} # }