Smooth Coefficient Kernel Regression
np.smoothcoef.Rdnpscoef computes a kernel regression estimate of a one (1)
dimensional dependent variable on \(p\)-variate explanatory data,
using the model \(Y_i = W_{i}^{\prime} \gamma (Z_i) + u_i\) where
\(W_i'=(1,X_i')\), given a set of evaluation
points, training points (consisting of explanatory data and dependent
data), and a bandwidth specification. A bandwidth specification can be
a scbandwidth object, or a bandwidth vector, bandwidth type and
kernel type.
Usage
npscoef(bws, ...)
# S3 method for class 'formula'
npscoef(bws, data = NULL, newdata = NULL, y.eval =
FALSE, ...)
# S3 method for class 'call'
npscoef(bws, ...)
# Default S3 method
npscoef(bws, txdat, tydat, tzdat, ...)
# S3 method for class 'scbandwidth'
npscoef(bws,
txdat = stop("training data 'txdat' missing"),
tydat = stop("training data 'tydat' missing"),
tzdat = NULL,
exdat,
eydat,
ezdat,
residuals = FALSE,
errors = TRUE,
iterate = TRUE,
maxiter = 100,
tol = .Machine$double.eps,
leave.one.out = FALSE,
betas = FALSE,
...)Arguments
- bws
a bandwidth specification. This can be set as a
scbandwidthobject returned from an invocation ofnpscoefbw, or as a vector of bandwidths, with each element \(i\) corresponding to the bandwidth for column \(i\) intzdat. If specified as a vector additional arguments will need to be supplied as necessary to specify the bandwidth type, kernel types, training data, and so on.- ...
additional arguments supplied to specify the regression type, bandwidth type, kernel types, selection methods, and so on. To do this, you may specify any of
bwscaling,bwtype,ckertype,ckerorder, as described innpscoefbw.- data
an optional data frame, list or environment (or object coercible to a data frame by
as.data.frame) containing the variables in the model. If not found in data, the variables are taken fromenvironment(bws), typically the environment from whichnpscoefbwwas called.- newdata
An optional data frame in which to look for evaluation data. If omitted, the training data are used.
- y.eval
If
newdatacontains dependent data andy.eval = TRUE,npwill compute goodness of fit statistics on these data and return them. Defaults toFALSE.- txdat
a \(p\)-variate data frame of explanatory data (training data), which, by default, populates the columns \(2\) through \(p+1\) of \(W\) in the model equation, and in the absence of
zdat, will also correspond to \(Z\) from the model equation. Defaults to the training data used to compute the bandwidth object.- tydat
a one (1) dimensional numeric or integer vector of dependent data, each element \(i\) corresponding to each observation (row) \(i\) of
txdat. Defaults to the training data used to compute the bandwidth object.- tzdat
an optionally specified \(q\)-variate data frame of explanatory data (training data), which corresponds to \(Z\) in the model equation. Defaults to the training data used to compute the bandwidth object.
- exdat
a \(p\)-variate data frame of points on which the regression will be estimated (evaluation data).By default, evaluation takes place on the data provided by
txdat.- eydat
a one (1) dimensional numeric or integer vector of the true values of the dependent variable. Optional, and used only to calculate the true errors.
- ezdat
an optionally specified \(q\)-variate data frame of points on which the regression will be estimated (evaluation data), which corresponds to \(Z\) in the model equation. Defaults to be the same as
txdat.- errors
a logical value indicating whether or not asymptotic standard errors should be computed and returned in the resulting
smoothcoefficientobject. Defaults toTRUE.- residuals
a logical value indicating that you want residuals computed and returned in the resulting
smoothcoefficientobject. Defaults toFALSE.- iterate
a logical value indicating whether or not backfitted estimates should be iterated for self-consistency. Defaults to
TRUE.- maxiter
integer specifying the maximum number of times to iterate the backfitted estimates while attempting to make the backfitted estimates converge to the desired tolerance. Defaults to
100.- tol
desired tolerance on the relative convergence of backfit estimates. Defaults to
.Machine$double.eps.- leave.one.out
a logical value to specify whether or not to compute the leave one out estimates. Will not work if
e[xyz]datis specified. Defaults toFALSE.- betas
a logical value indicating whether or not estimates of the components of \(\gamma\) should be returned in the
smoothcoefficientobject along with the regression estimates. Defaults toFALSE.
Value
npscoef returns a smoothcoefficient object. The generic
functions fitted, residuals, coef,
se, and predict,
extract (or generate) estimated values,
residuals, coefficients, bootstrapped standard
errors on estimates, and predictions, respectively, from
the returned object. Furthermore, the functions summary
and plot support objects of this type. The returned object
has the following components:
- eval
evaluation points
- mean
estimation of the regression function (conditional mean) at the evaluation points
- merr
if
errors = TRUE, standard errors of the regression estimates- beta
if
betas = TRUE, estimates of the coefficients \(\gamma\) at the evaluation points- resid
if
residuals = TRUE, in-sample or out-of-sample residuals where appropriate (or possible)- R2
coefficient of determination (Doksum and Samarov (1995))
- MSE
mean squared error
- MAE
mean absolute error
- MAPE
mean absolute percentage error
- CORR
absolute value of Pearson's correlation coefficient
- SIGN
fraction of observations where fitted and observed values agree in sign
References
Aitchison, J. and C.G.G. Aitken (1976), “Multivariate binary discrimination by the kernel method,” Biometrika, 63, 413-420.
Cai Z. (2007), “Trending time-varying coefficient time series models with serially correlated errors,” Journal of Econometrics, 136, 163-188.
Doksum, K. and A. Samarov (1995), “Nonparametric estimation of global functionals and a measure of the explanatory power of covariates in regression,” The Annals of Statistics, 23 1443-1473.
Hastie, T. and R. Tibshirani (1993), “Varying-coefficient models,” Journal of the Royal Statistical Society, B 55, 757-796.
Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.
Li, Q. and J.S. Racine (2010), “Smooth varying-coefficient estimation and inference for qualitative and quantitative data,” Econometric Theory, 26, 1-31.
Pagan, A. and A. Ullah (1999), Nonparametric Econometrics, Cambridge University Press.
Li, Q. and D. Ouyang and J.S. Racine (2013), “Categorical semiparametric varying-coefficient models,” Journal of Applied Econometrics, 28, 551-589.
Wang, M.C. and J. van Ryzin (1981), “A class of smooth estimators for discrete distributions,” Biometrika, 68, 301-309.
Author
Tristen Hayfield tristen.hayfield@gmail.com, Jeffrey S. Racine racinej@mcmaster.ca
Usage Issues
If you are using data of mixed types, then it is advisable to use the
data.frame function to construct your input data and not
cbind, since cbind will typically not work as
intended on mixed data types and will coerce the data to the same
type.
Support for backfitted bandwidths is experimental and is limited in functionality. The code does not support asymptotic standard errors or out of sample estimates with backfitting.
Examples
if (FALSE) { # \dontrun{
# EXAMPLE 1 (INTERFACE=FORMULA):
n <- 250
x <- runif(n)
z <- runif(n, min=-2, max=2)
y <- x*exp(z)*(1.0+rnorm(n,sd = 0.2))
bw <- npscoefbw(y~x|z)
model <- npscoef(bw)
plot(model)
# EXAMPLE 1 (INTERFACE=DATA FRAME):
n <- 250
x <- runif(n)
z <- runif(n, min=-2, max=2)
y <- x*exp(z)*(1.0+rnorm(n,sd = 0.2))
bw <- npscoefbw(xdat=x, ydat=y, zdat=z)
model <- npscoef(bw)
plot(model)
} # }