Crossvalidation using replicate weights

In each set of replicate weights there will be some clusters that have essentially zero weight. These are used as the test set, with the other clusters used as the training set. Jackknife weights ("JK1","JKn") are very similar to cross-validation at the cluster level; bootstrap weights are similar to bootstrapping for cross-validation.

Usage

withCrossval(design, formula, trainfun, testfun, loss = c("MSE",
"entropy", "AbsError"), intercept, tuning,  nearly_zero=1e-4,...)

Arguments

design: A survey design object (currently only svyrep.design)
formula: Model formula where the left-hand side specifies the outcome vairable and the right-hand side specifies the variables that will be used for prediction
trainfun: Function taking a predictor matrix X, an outcome vector y, a weights vector w, and an element of tuning, and training a model that is returned as some R object.
testfun: Function taking a predictor matrix X and the output from trainfun and returning fitted values for the outcome variable.
loss: Loss function for assessing prediction
intercept: Should the predictor matrix have an intercept added?
tuning: vector of tuning parameters, such as the regularisation parameter in information criteria or the number of predictors. trainfun and testfun will be called with each element of this vector in turn. Use any single-element vector if no tuning parameter is needed
nearly_zero: test-set threshold on the scale of replicate weight divided by sampling weight.
...: future expansion

Value

A number

References

Iparragirre, A., Lumley, T., Barrio, I., & Arostegui, I. (2023). Variable selection with LASSO regression for complex survey data. Stat, 12(1), e578.

Examples

data(api)
rclus1<-as.svrepdesign(svydesign(id=~dnum, weights=~pw, data=apiclus1,
fpc=~fpc))


withCrossval(rclus1, api00~api99+ell+stype,
  trainfun=function(X,y,w,tuning) lm.wfit(X,y,w),
  testfun=function(X, trainfit,tuning) X%*%coef(trainfit),
  intercept=TRUE,loss="MSE",tuning=1)
#> [1] 723.04


## More realistic example using lasso
## tuning parameter is number of variables in model
##
##  library(glmnet)
##  ftrain=function(X,y,w,tuning) {
##     m<-glmnet(X,y,weights=w)
##     lambda<-m$lambda[min(which(m$df>=tuning))]
##   list(m,lambda)
##     }
##   ftest=function(X, trainfit, tuning){
##     predict(trainfit[[1]], newx=X, s=trainfit[[2]])
##   }
##
##  withCrossval(rclus1, api00~api99+ell+stype+mobility+enroll,
##    trainfun=ftrain,
##    testfun=ftest,
##    intercept=FALSE,loss="MSE",
##    tuning=0:3)
##
## [1] 11445.2379  9649.1150   800.0742   787.4171
##
## Models with two or three predictors are about equally good

Usage

Arguments

Value

References

See also

Examples