Kernel Shape Constrained Bounded Univariate Density Estimation
npuniden.sc.Rdnpuniden.sc computes shape constrained kernel univariate
unconditional density estimates given a vector of continuously
distributed training data and a bandwidth. Lower and upper bounds
[a,b] can be supplied (default is [0,1]) and if a
is set to -Inf there is only one bound on the right, while if
b is set to Inf there is only one bound on the left.
Usage
npuniden.sc(X = NULL,
Y = NULL,
h = NULL,
a = 0,
b = 1,
lb = NULL,
ub = NULL,
extend.range = 0,
num.grid = 0,
function.distance = TRUE,
integral.equal = FALSE,
constraint = c("density",
"mono.incr",
"mono.decr",
"concave",
"convex",
"log-concave",
"log-convex"))Arguments
- X
a required numeric vector of training data lying in \([a,b]\)
- Y
an optional numeric vector of evaluation data lying in \([a,b]\)
- h
a bandwidth (\(>0\))
- a
an optional lower bound on the support of
XorY(defaults to 0)- b
an optional upper bound on the support of
XorY(defaults to 1)- lb
a scalar lower bound (\(\ge 0\)) to be used in conjunction with
constraint="density"- ub
a scalar upper bound (\(\ge 0\) and \(\ge\)
lb) to be used in conjunction withconstraint="density"- extend.range
number specifying the fraction by which the range of the training data should be extended for the additional grid points (passed to the function
extendrange)- num.grid
number of additional grid points (in addition to
XandY) placed on an equi-spaced grid spanningextendrange(c(X,Y),f=extend.range)(ifnum.grid=0no additional grid points will be used regardless of the value ofextend.range)- function.distance
a logical value that, if
TRUE, minimizes the squared deviation between the constrained and unconstrained estimates, otherwise, minimizes the squared deviation between the constrained and unconstrained weights- integral.equal
a logical value, that, if
TRUE, adjusts the constrained estimate to have the same probability mass over the rangeX,Y, and the additional grid points- constraint
a character string indicating whether the estimate is to be constrained to be monotonically increasing (
constraint="mono.incr"), decreasing (constraint="mono.decr"), convex (constraint="convex"), concave (constraint="concave"), log-convex (constraint="log-convex"), or log-concave (constraint="log-concave")
Details
Typical usages are (see below for a complete list of options and also the examples at the end of this help file)
model <- npuniden.sc(X,a=-2,b=3)
npuniden.sc implements a methods for estimating a univariate
density function defined over a continuous random variable in the
presence of bounds subject to a variety of shape constraints. The
bounded estimates use the truncated Gaussian kernel function.
Note that for the log-constrained estimates, the derivative estimate returned is that for the log-constrained estimate not the non-log value of the estimate returned by the function. See Example 5 below hat manually plots the log-density and returned derivative (no transformation is needed when plotting the density estimate itself).
If the quadratic program solver fails to find a solution, the
unconstrained estimate is returned with an immediate warning. Possible
causes to be investigated are undersmoothing, sparsity, and the
presence of non-sample grid points. To investigate the possibility of
undersmoothing try using a larger bandwidth, to investigate sparsity
try decreasing extend.range, and to investigate non-sample grid
points try setting num.grid to 0.
Mean square error performance seems to improve generally when using
additional grid points in the empirical support of X and
Y (i.e., in the observed range of the data sample) but appears
to deteriorate when imposing constraints beyond the empirical support
(i.e., when extend.range is positive). Increasing the number of
additional points beyond a hundred or so appears to have a limited
impact.
The option function.distance=TRUE appears to perform better for
imposing convexity, concavity, log-convexity and log-concavity, while
function.distance=FALSE appears to perform better for imposing
monotonicity, whether increasing or decreasing (based on simulations
for the Beta(s1,s2) distribution with sample size \(n=100\)).
Value
A list with the following elements:
- f
unconstrained density estimate
- f.sc
shape constrained density estimate
- se.f
asymptotic standard error of the unconstrained density estimate
- se.f.sc
asymptotic standard error of the shape constrained density estimate
- f.deriv
unconstrained derivative estimate (of order 1 or 2 or log thereof)
- f.sc.deriv
shape constrained derivative estimate (of order 1 or 2 or log thereof)
- F
unconstrained distribution estimate
- F.sc
shape constrained distribution estimate
- integral.f
the integral of the unconstrained estimate over
X,Y, and the additional grid points- integral.f.sc
the integral of the constrained estimate over
X,Y, and the additional grid points- solve.QP
logical, if
TRUEsolve.QPsucceeded, otherwise failed- attempts
number of attempts when
solve.QPfails (max = 9)
References
Du, P. and C. Parmeter and J. Racine (2024), “Shape Constrained Kernel PDF and PMF Estimation”, Statistica Sinica, 34 (1), 257-289, doi:10.5705/ss.202021.0112
Author
Jeffrey S. Racine racinej@mcmaster.ca
See also
The logcondens, LogConDEAD, and scdensity packages,
and the function npuniden.boundary.
Examples
if (FALSE) { # \dontrun{
n <- 100
set.seed(42)
## Example 1: N(0,1), constrain the density to lie within lb=.1 and ub=.2
X <- sort(rnorm(n))
h <- npuniden.boundary(X,a=-Inf,b=Inf)$h
foo <- npuniden.sc(X,h=h,constraint="density",a=-Inf,b=Inf,lb=.1,ub=.2)
ylim <- range(c(foo$f.sc,foo$f))
plot(X,foo$f.sc,type="l",ylim=ylim,xlab="X",ylab="Density")
lines(X,foo$f,col=2,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
## Example 2: Beta(5,1), DGP is monotone increasing, impose valid
## restriction
X <- sort(rbeta(n,5,1))
h <- npuniden.boundary(X)$h
foo <- npuniden.sc(X=X,h=h,constraint=c("mono.incr"))
oldpar <- par(no.readonly = TRUE)
par(mfrow=c(1,2))
ylim <- range(c(foo$f.sc,foo$f))
plot(X,foo$f.sc,type="l",ylim=ylim,xlab="X",ylab="Density")
lines(X,foo$f,col=2,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
ylim <- range(c(foo$f.sc.deriv,foo$f.deriv))
plot(X,foo$f.sc.deriv,type="l",ylim=ylim,xlab="X",ylab="First Derivative")
lines(X,foo$f.deriv,col=2,lty=2)
abline(h=0,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
## Example 3: Beta(1,5), DGP is monotone decreasing, impose valid
## restriction
X <- sort(rbeta(n,1,5))
h <- npuniden.boundary(X)$h
foo <- npuniden.sc(X=X,h=h,constraint=c("mono.decr"))
par(mfrow=c(1,2))
ylim <- range(c(foo$f.sc,foo$f))
plot(X,foo$f.sc,type="l",ylim=ylim,xlab="X",ylab="Density")
lines(X,foo$f,col=2,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
ylim <- range(c(foo$f.sc.deriv,foo$f.deriv))
plot(X,foo$f.sc.deriv,type="l",ylim=ylim,xlab="X",ylab="First Derivative")
lines(X,foo$f.deriv,col=2,lty=2)
abline(h=0,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
## Example 4: N(0,1), DGP is log-concave, impose invalid concavity
## restriction
X <- sort(rnorm(n))
h <- npuniden.boundary(X,a=-Inf,b=Inf)$h
foo <- npuniden.sc(X=X,h=h,a=-Inf,b=Inf,constraint=c("concave"))
par(mfrow=c(1,2))
ylim <- range(c(foo$f.sc,foo$f))
plot(X,foo$f.sc,type="l",ylim=ylim,xlab="X",ylab="Density")
lines(X,foo$f,col=2,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
ylim <- range(c(foo$f.sc.deriv,foo$f.deriv))
plot(X,foo$f.sc.deriv,type="l",ylim=ylim,xlab="X",ylab="Second Derivative")
lines(X,foo$f.deriv,col=2,lty=2)
abline(h=0,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
## Example 45: Beta(3/4,3/4), DGP is convex, impose valid restriction
X <- sort(rbeta(n,3/4,3/4))
h <- npuniden.boundary(X)$h
foo <- npuniden.sc(X=X,h=h,constraint=c("convex"))
par(mfrow=c(1,2))
ylim <- range(c(foo$f.sc,foo$f))
plot(X,foo$f.sc,type="l",ylim=ylim,xlab="X",ylab="Density")
lines(X,foo$f,col=2,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
ylim <- range(c(foo$f.sc.deriv,foo$f.deriv))
plot(X,foo$f.sc.deriv,type="l",ylim=ylim,xlab="X",ylab="Second Derivative")
lines(X,foo$f.deriv,col=2,lty=2)
abline(h=0,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
## Example 6: N(0,1), DGP is log-concave, impose log-concavity
## restriction
X <- sort(rnorm(n))
h <- npuniden.boundary(X,a=-Inf,b=Inf)$h
foo <- npuniden.sc(X=X,h=h,a=-Inf,b=Inf,constraint=c("log-concave"))
par(mfrow=c(1,2))
ylim <- range(c(log(foo$f.sc),log(foo$f)))
plot(X,log(foo$f.sc),type="l",ylim=ylim,xlab="X",ylab="Log-Density")
lines(X,log(foo$f),col=2,lty=2)
rug(X)
legend("topleft",c("Constrained-log","Unconstrained-log"),lty=1:2,col=1:2,bty="n")
ylim <- range(c(foo$f.sc.deriv,foo$f.deriv))
plot(X,foo$f.sc.deriv,type="l",ylim=ylim,xlab="X",ylab="Second Derivative of Log-Density")
lines(X,foo$f.deriv,col=2,lty=2)
abline(h=0,lty=2)
rug(X)
legend("topleft",c("Constrained-log","Unconstrained-log"),lty=1:2,col=1:2,bty="n")
par(oldpar)
} # }