Kernel Copula Estimation with Mixed Data Types
np.copula.Rdnpcopula implements the nonparametric mixed data kernel copula
approach of Racine (2015) for an arbitrary number of dimensions
Arguments
- bws
an unconditional joint distribution (
npudistbw) or joint density (npudensbw) bandwidth object (ifbwsis delivered vianpudistbwthe copula distribution is estimated, while ifbwsis delivered vianpudensbwthe copula density is estimated)- data
a data frame containing variables used to construct
bws- u
an optional matrix of real numbers lying in [0,1], each column of which corresponds to the vector of uth quantile values desired for each variable in the copula (otherwise the u values returned are those corresponding to the sample realizations)
- n.quasi.inv
number of grid points generated when
uis provided in order to compute the quasi-inverse of each marginal distribution (see details)- er.quasi.inv
number passed to
extendrangewhenuis provided specifying the fraction by which the data range should be extended when constructing the grid used to compute the quasi-inverse (see details)
Value
npcopula returns an object of type data.frame
with the following components
- copula
the copula (bandwidth object obtained from
npudistbw) or copula density (bandwidth object obtained fromnpudensbw)- u
the matrix of marginal u values associated with the sample realizations (
u=NULL) or those created viaexpand.gridwhenuis provided- data
the matrix of marginal quantiles constructed when
uis provided (datareturned has the same names asdatainputted)
References
Nelsen, R. B. (2006), An Introduction to Copulas, Second Edition, Springer-Verlag.
Racine, J.S. (2015), “Mixed Data Kernel Copulas,” Empirical Economics, 48, 37-59.
Author
Jeffrey S. Racine racinej@mcmaster.ca
Details
npcopula computes the nonparametric copula or copula density
using inversion (Nelsen (2006), page 51). For the inversion approach,
we exploit Sklar's theorem (Corollary 2.3.7, Nelsen (2006)) to produce
copulas directly from the joint distribution function using
\(C(u,v) = H(F^{-1}(u),G^{-1}(v))\) rather than the typical approach
that instead uses \(H(x,y) = C(F(x),G(y))\). Whereas the latter
requires kernel density estimation on a d-dimensional unit hypercube
which necessitates the use of boundary correction methods, the former
does not.
Note that if u is provided then expand.grid is
called on u. As the dimension increases this can become
unwieldy and potentially consume an enormous amount of memory unless
the number of grid points is kept very small. Given that computing the
copula on a grid is typically done for graphical purposes, providing
u is typically done for two-dimensional problems only. Even
here, however, providing a grid of length 100 will expand into a
matrix of dimension 10000 by 2 which, though not memory intensive, may
be computationally burdensome.
The ‘quasi-inverse’ is computed via Definition 2.3.6 from
Nelsen (2006). We compute an equi-quantile grid on the range of the
data of length n.quasi.inv/2. We then extend the range of the
data by the factor er.quasi.inv and compute an equi-spaced grid
of points of length n.quasi.inv/2 (e.g. using the default
er.quasi.inv=1 we go from the minimum data value minus
\(1\times\) the range to the maximum data value plus
\(1\times\) the range for each marginal). We then take these two
grids, concatenate and sort, and these form the final grid of length
n.quasi.inv for computing the quasi-inverse.
Note that if u is provided and any elements of (the columns of)
u are such that they lie beyond the respective values of
F for the evaluation data for the respective marginal, such
values are reset to the minimum/maximum values of F for the
respective marginal. It is therefore prudent to inspect the values of
u returned by npcopula when u is provided.
Note that copula are only defined for data of type
numeric or ordered.
Examples
if (FALSE) { # \dontrun{
## Example 1: Bivariate Mixed Data
require(MASS)
set.seed(42)
## Simulate correlated Gaussian data (rho(x,y)=0.99)
n <- 1000
n.eval <- 100
rho <- 0.99
mu <- c(0,0)
Sigma <- matrix(c(1,rho,rho,1),2,2)
mydat <- mvrnorm(n=n, mu, Sigma)
mydat <- data.frame(x=mydat[,1],
y=ordered(as.integer(cut(mydat[,2],
quantile(mydat[,2],seq(0,1,by=.1)),
include.lowest=TRUE))-1))
q.min <- 0.0
q.max <- 1.0
grid.seq <- seq(q.min,q.max,length=n.eval)
grid.dat <- cbind(grid.seq,grid.seq)
## Estimate the copula (bw object obtained from npudistbw())
bw.cdf <- npudistbw(~x+y,data=mydat)
copula <- npcopula(bws=bw.cdf,data=mydat,u=grid.dat)
## Plot the copula
contour(grid.seq,grid.seq,matrix(copula$copula,n.eval,n.eval),
xlab="u1",
ylab="u2",
main="Copula Contour")
persp(grid.seq,grid.seq,matrix(copula$copula,n.eval,n.eval),
ticktype="detailed",
xlab="u1",
ylab="u2",
zlab="Copula",zlim=c(0,1))
## Plot the empirical copula
copula.emp <- npcopula(bws=bw.cdf,data=mydat)
plot(copula.emp$u1,copula.emp$u2,
xlab="u1",
ylab="u2",
cex=.25,
main="Empirical Copula")
## Estimate the copula density (bw object obtained from npudensbw())
bw.pdf <- npudensbw(~x+y,data=mydat)
copula <- npcopula(bws=bw.pdf,data=mydat,u=grid.dat)
## Plot the copula density
persp(grid.seq,grid.seq,matrix(copula$copula,n.eval,n.eval),
ticktype="detailed",
xlab="u1",
ylab="u2",
zlab="Copula Density")
## Example 2: Bivariate Continuous Data
require(MASS)
set.seed(42)
## Simulate correlated Gaussian data (rho(x,y)=0.99)
n <- 1000
n.eval <- 100
rho <- 0.99
mu <- c(0,0)
Sigma <- matrix(c(1,rho,rho,1),2,2)
mydat <- mvrnorm(n=n, mu, Sigma)
mydat <- data.frame(x=mydat[,1],y=mydat[,2])
q.min <- 0.0
q.max <- 1.0
grid.seq <- seq(q.min,q.max,length=n.eval)
grid.dat <- cbind(grid.seq,grid.seq)
## Estimate the copula (bw object obtained from npudistbw())
bw.cdf <- npudistbw(~x+y,data=mydat)
copula <- npcopula(bws=bw.cdf,data=mydat,u=grid.dat)
## Plot the copula
contour(grid.seq,grid.seq,matrix(copula$copula,n.eval,n.eval),
xlab="u1",
ylab="u2",
main="Copula Contour")
persp(grid.seq,grid.seq,matrix(copula$copula,n.eval,n.eval),
ticktype="detailed",
xlab="u1",
ylab="u2",
zlab="Copula",
zlim=c(0,1))
## Plot the empirical copula
copula.emp <- npcopula(bws=bw.cdf,data=mydat)
plot(copula.emp$u1,copula.emp$u2,
xlab="u1",
ylab="u2",
cex=.25,
main="Empirical Copula")
## Estimate the copula density (bw object obtained from npudensbw())
bw.pdf <- npudensbw(~x+y,data=mydat)
copula <- npcopula(bws=bw.pdf,data=mydat,u=grid.dat)
## Plot the copula density
persp(grid.seq,grid.seq,matrix(copula$copula,n.eval,n.eval),
ticktype="detailed",
xlab="u1",
ylab="u2",
zlab="Copula Density")
} # }