npuniden.sc.Rdnpuniden.sc computes shape constrained kernel univariate
unconditional density estimates given a vector of continuously
distributed training data and a bandwidth. Lower and upper bounds
[a,b] can be supplied (default is [0,1]) and if a
is set to -Inf there is only one bound on the right, while if
b is set to Inf there is only one bound on the left.
npuniden.sc(X = NULL,
Y = NULL,
h = NULL,
a = 0,
b = 1,
lb = NULL,
ub = NULL,
extend.range = 0,
num.grid = 0,
function.distance = TRUE,
integral.equal = FALSE,
constraint = c("density",
"mono.incr",
"mono.decr",
"concave",
"convex",
"log-concave",
"log-convex"))a required numeric vector of training data lying in \([a,b]\)
an optional numeric vector of evaluation data lying in \([a,b]\)
a bandwidth (\(>0\))
an optional lower bound on the support of X or Y (defaults to 0)
an optional upper bound on the support of X or Y (defaults to 1)
a scalar lower bound (\(\ge 0\)) to be used in conjunction with
constraint="density"
a scalar upper bound (\(\ge 0\) and \(\ge\) lb) to be used
in conjunction with constraint="density"
number specifying the fraction by which the range of the training data
should be extended for the additional grid points (passed to the
function extendrange)
number of additional grid points (in addition to X and
Y) placed on an equi-spaced grid spanning
extendrange(c(X,Y),f=extend.range) (if num.grid=0 no
additional grid points will be used regardless of the value of
extend.range)
a logical value that, if TRUE, minimizes the squared deviation between
the constrained and unconstrained estimates, otherwise, minimizes the
squared deviation between the constrained and unconstrained weights
a logical value, that, if TRUE, adjusts the constrained estimate
to have the same probability mass over the range X, Y,
and the additional grid points
a character string indicating whether the estimate is to be
constrained to be monotonically increasing
(constraint="mono.incr"), decreasing
(constraint="mono.decr"), convex (constraint="convex"),
concave (constraint="concave"), log-convex
(constraint="log-convex"), or log-concave
(constraint="log-concave")
Typical usages are (see below for a complete list of options and also the examples at the end of this help file)
model <- npuniden.sc(X,a=-2,b=3)
npuniden.sc implements a methods for estimating a univariate
density function defined over a continuous random variable in the
presence of bounds subject to a variety of shape constraints. The
bounded estimates use the truncated Gaussian kernel function.
Note that for the log-constrained estimates, the derivative estimate returned is that for the log-constrained estimate not the non-log value of the estimate returned by the function. See Example 5 below hat manually plots the log-density and returned derivative (no transformation is needed when plotting the density estimate itself).
If the quadratic program solver fails to find a solution, the
unconstrained estimate is returned with an immediate warning. Possible
causes to be investigated are undersmoothing, sparsity, and the
presence of non-sample grid points. To investigate the possibility of
undersmoothing try using a larger bandwidth, to investigate sparsity
try decreasing extend.range, and to investigate non-sample grid
points try setting num.grid to 0.
Mean square error performance seems to improve generally when using
additional grid points in the empirical support of X and
Y (i.e., in the observed range of the data sample) but appears
to deteriorate when imposing constraints beyond the empirical support
(i.e., when extend.range is positive). Increasing the number of
additional points beyond a hundred or so appears to have a limited
impact.
The option function.distance=TRUE appears to perform better for
imposing convexity, concavity, log-convexity and log-concavity, while
function.distance=FALSE appears to perform better for imposing
monotonicity, whether increasing or decreasing (based on simulations
for the Beta(s1,s2) distribution with sample size \(n=100\)).
A list with the following elements:
unconstrained density estimate
shape constrained density estimate
asymptotic standard error of the unconstrained density estimate
asymptotic standard error of the shape constrained density estimate
unconstrained derivative estimate (of order 1 or 2 or log thereof)
shape constrained derivative estimate (of order 1 or 2 or log thereof)
unconstrained distribution estimate
shape constrained distribution estimate
the integral of the unconstrained estimate over X, Y, and the additional grid points
the integral of the constrained estimate over X, Y, and the additional grid points
logical, if TRUE solve.QP succeeded, otherwise failed
number of attempts when solve.QP fails (max = 9)
Du, P. and C. Parmeter and J. Racine (2024), “Shape Constrained Kernel PDF and PMF Estimation”, Statistica Sinica, 34 (1), 257–289, https://doi.org/10.5705/ss.202021.0112
The logcondens, LogConDEAD, and scdensity packages,
and the function npuniden.boundary.
if (FALSE) { # \dontrun{
n <- 100
set.seed(42)
## Example 1: N(0,1), constrain the density to lie within lb=.1 and ub=.2
X <- sort(rnorm(n))
h <- npuniden.boundary(X,a=-Inf,b=Inf)$h
foo <- npuniden.sc(X,h=h,constraint="density",a=-Inf,b=Inf,lb=.1,ub=.2)
ylim <- range(c(foo$f.sc,foo$f))
plot(X,foo$f.sc,type="l",ylim=ylim,xlab="X",ylab="Density")
lines(X,foo$f,col=2,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
## Example 2: Beta(5,1), DGP is monotone increasing, impose valid
## restriction
X <- sort(rbeta(n,5,1))
h <- npuniden.boundary(X)$h
foo <- npuniden.sc(X=X,h=h,constraint=c("mono.incr"))
par(mfrow=c(1,2))
ylim <- range(c(foo$f.sc,foo$f))
plot(X,foo$f.sc,type="l",ylim=ylim,xlab="X",ylab="Density")
lines(X,foo$f,col=2,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
ylim <- range(c(foo$f.sc.deriv,foo$f.deriv))
plot(X,foo$f.sc.deriv,type="l",ylim=ylim,xlab="X",ylab="First Derivative")
lines(X,foo$f.deriv,col=2,lty=2)
abline(h=0,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
## Example 3: Beta(1,5), DGP is monotone decreasing, impose valid
## restriction
X <- sort(rbeta(n,1,5))
h <- npuniden.boundary(X)$h
foo <- npuniden.sc(X=X,h=h,constraint=c("mono.decr"))
par(mfrow=c(1,2))
ylim <- range(c(foo$f.sc,foo$f))
plot(X,foo$f.sc,type="l",ylim=ylim,xlab="X",ylab="Density")
lines(X,foo$f,col=2,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
ylim <- range(c(foo$f.sc.deriv,foo$f.deriv))
plot(X,foo$f.sc.deriv,type="l",ylim=ylim,xlab="X",ylab="First Derivative")
lines(X,foo$f.deriv,col=2,lty=2)
abline(h=0,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
## Example 4: N(0,1), DGP is log-concave, impose invalid concavity
## restriction
X <- sort(rnorm(n))
h <- npuniden.boundary(X,a=-Inf,b=Inf)$h
foo <- npuniden.sc(X=X,h=h,a=-Inf,b=Inf,constraint=c("concave"))
par(mfrow=c(1,2))
ylim <- range(c(foo$f.sc,foo$f))
plot(X,foo$f.sc,type="l",ylim=ylim,xlab="X",ylab="Density")
lines(X,foo$f,col=2,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
ylim <- range(c(foo$f.sc.deriv,foo$f.deriv))
plot(X,foo$f.sc.deriv,type="l",ylim=ylim,xlab="X",ylab="Second Derivative")
lines(X,foo$f.deriv,col=2,lty=2)
abline(h=0,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
## Example 45: Beta(3/4,3/4), DGP is convex, impose valid restriction
X <- sort(rbeta(n,3/4,3/4))
h <- npuniden.boundary(X)$h
foo <- npuniden.sc(X=X,h=h,constraint=c("convex"))
par(mfrow=c(1,2))
ylim <- range(c(foo$f.sc,foo$f))
plot(X,foo$f.sc,type="l",ylim=ylim,xlab="X",ylab="Density")
lines(X,foo$f,col=2,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
ylim <- range(c(foo$f.sc.deriv,foo$f.deriv))
plot(X,foo$f.sc.deriv,type="l",ylim=ylim,xlab="X",ylab="Second Derivative")
lines(X,foo$f.deriv,col=2,lty=2)
abline(h=0,lty=2)
rug(X)
legend("topleft",c("Constrained","Unconstrained"),lty=1:2,col=1:2,bty="n")
## Example 6: N(0,1), DGP is log-concave, impose log-concavity
## restriction
X <- sort(rnorm(n))
h <- npuniden.boundary(X,a=-Inf,b=Inf)$h
foo <- npuniden.sc(X=X,h=h,a=-Inf,b=Inf,constraint=c("log-concave"))
par(mfrow=c(1,2))
ylim <- range(c(log(foo$f.sc),log(foo$f)))
plot(X,log(foo$f.sc),type="l",ylim=ylim,xlab="X",ylab="Log-Density")
lines(X,log(foo$f),col=2,lty=2)
rug(X)
legend("topleft",c("Constrained-log","Unconstrained-log"),lty=1:2,col=1:2,bty="n")
ylim <- range(c(foo$f.sc.deriv,foo$f.deriv))
plot(X,foo$f.sc.deriv,type="l",ylim=ylim,xlab="X",ylab="Second Derivative of Log-Density")
lines(X,foo$f.deriv,col=2,lty=2)
abline(h=0,lty=2)
rug(X)
legend("topleft",c("Constrained-log","Unconstrained-log"),lty=1:2,col=1:2,bty="n")
} # }