Kernel Copula Estimation with Mixed Data Types

npcopula implements the nonparametric mixed data kernel copula approach of Racine (2015) for an arbitrary number of dimensions

Usage

npcopula(bws,
         data,
         u = NULL,
         n.quasi.inv = 1000,
         er.quasi.inv = 1)

Arguments

bws: an unconditional joint distribution (npudistbw) or joint density (npudensbw) bandwidth object (if bws is delivered via npudistbw the copula distribution is estimated, while if bws is delivered via npudensbw the copula density is estimated)
data: a data frame containing variables used to construct bws
u: an optional matrix of real numbers lying in [0,1], each column of which corresponds to the vector of uth quantile values desired for each variable in the copula (otherwise the u values returned are those corresponding to the sample realizations)
n.quasi.inv: number of grid points generated when u is provided in order to compute the quasi-inverse of each marginal distribution (see details)
er.quasi.inv: number passed to extendrange when u is provided specifying the fraction by which the data range should be extended when constructing the grid used to compute the quasi-inverse (see details)

Value

npcopula returns an object of type data.frame with the following components

copula: the copula (bandwidth object obtained from npudistbw) or copula density (bandwidth object obtained from npudensbw)
u: the matrix of marginal u values associated with the sample realizations (u=NULL) or those created via expand.grid when u is provided
data: the matrix of marginal quantiles constructed when u is provided (data returned has the same names as data inputted)

References

Nelsen, R. B. (2006), An Introduction to Copulas, Second Edition, Springer-Verlag.

Racine, J.S. (2015), “Mixed Data Kernel Copulas,” Empirical Economics, 48, 37-59.

Author

Jeffrey S. Racine racinej@mcmaster.ca

Details

npcopula computes the nonparametric copula or copula density using inversion (Nelsen (2006), page 51). For the inversion approach, we exploit Sklar's theorem (Corollary 2.3.7, Nelsen (2006)) to produce copulas directly from the joint distribution function using \(C(u,v) = H(F^{-1}(u),G^{-1}(v))\) rather than the typical approach that instead uses \(H(x,y) = C(F(x),G(y))\). Whereas the latter requires kernel density estimation on a d-dimensional unit hypercube which necessitates the use of boundary correction methods, the former does not.

Note that if u is provided then expand.grid is called on u. As the dimension increases this can become unwieldy and potentially consume an enormous amount of memory unless the number of grid points is kept very small. Given that computing the copula on a grid is typically done for graphical purposes, providing u is typically done for two-dimensional problems only. Even here, however, providing a grid of length 100 will expand into a matrix of dimension 10000 by 2 which, though not memory intensive, may be computationally burdensome.

The ‘quasi-inverse’ is computed via Definition 2.3.6 from Nelsen (2006). We compute an equi-quantile grid on the range of the data of length n.quasi.inv/2. We then extend the range of the data by the factor er.quasi.inv and compute an equi-spaced grid of points of length n.quasi.inv/2 (e.g. using the default er.quasi.inv=1 we go from the minimum data value minus \(1\times\) the range to the maximum data value plus \(1\times\) the range for each marginal). We then take these two grids, concatenate and sort, and these form the final grid of length n.quasi.inv for computing the quasi-inverse.

Note that if u is provided and any elements of (the columns of) u are such that they lie beyond the respective values of F for the evaluation data for the respective marginal, such values are reset to the minimum/maximum values of F for the respective marginal. It is therefore prudent to inspect the values of u returned by npcopula when u is provided.

Note that copula are only defined for data of type numeric or ordered.

Usage Issues

See the example below for proper usage.

Examples

if (FALSE) { # \dontrun{
## Example 1: Bivariate Mixed Data

require(MASS)

set.seed(42)

## Simulate correlated Gaussian data (rho(x,y)=0.99)

n <- 1000
n.eval <- 100
rho <- 0.99
mu <- c(0,0)
Sigma <- matrix(c(1,rho,rho,1),2,2)
mydat <- mvrnorm(n=n, mu, Sigma)
mydat <- data.frame(x=mydat[,1],
                    y=ordered(as.integer(cut(mydat[,2],
                      quantile(mydat[,2],seq(0,1,by=.1)),
                      include.lowest=TRUE))-1))
q.min <- 0.0
q.max <- 1.0
grid.seq <- seq(q.min,q.max,length=n.eval)
grid.dat <- cbind(grid.seq,grid.seq)

## Estimate the copula (bw object obtained from npudistbw())

bw.cdf <- npudistbw(~x+y,data=mydat)
copula <- npcopula(bws=bw.cdf,data=mydat,u=grid.dat)

## Plot the copula


contour(grid.seq,grid.seq,matrix(copula$copula,n.eval,n.eval),
        xlab="u1",
        ylab="u2",
        main="Copula Contour")

persp(grid.seq,grid.seq,matrix(copula$copula,n.eval,n.eval),
      ticktype="detailed",
      xlab="u1",
      ylab="u2",
      zlab="Copula",zlim=c(0,1))

## Plot the empirical copula

copula.emp <- npcopula(bws=bw.cdf,data=mydat)

plot(copula.emp$u1,copula.emp$u2,
     xlab="u1",
     ylab="u2",
     cex=.25,
     main="Empirical Copula")

## Estimate the copula density (bw object obtained from npudensbw())

bw.pdf <- npudensbw(~x+y,data=mydat)
copula <- npcopula(bws=bw.pdf,data=mydat,u=grid.dat)

## Plot the copula density

persp(grid.seq,grid.seq,matrix(copula$copula,n.eval,n.eval),
      ticktype="detailed",
      xlab="u1",
      ylab="u2",
      zlab="Copula Density")

## Example 2: Bivariate Continuous Data

require(MASS)

set.seed(42)

## Simulate correlated Gaussian data (rho(x,y)=0.99)

n <- 1000
n.eval <- 100
rho <- 0.99
mu <- c(0,0)
Sigma <- matrix(c(1,rho,rho,1),2,2)
mydat <- mvrnorm(n=n, mu, Sigma)
mydat <- data.frame(x=mydat[,1],y=mydat[,2])

q.min <- 0.0
q.max <- 1.0
grid.seq <- seq(q.min,q.max,length=n.eval)
grid.dat <- cbind(grid.seq,grid.seq)

## Estimate the copula (bw object obtained from npudistbw())

bw.cdf <- npudistbw(~x+y,data=mydat)
copula <- npcopula(bws=bw.cdf,data=mydat,u=grid.dat)

## Plot the copula

contour(grid.seq,grid.seq,matrix(copula$copula,n.eval,n.eval),
        xlab="u1",
        ylab="u2",
        main="Copula Contour")

persp(grid.seq,grid.seq,matrix(copula$copula,n.eval,n.eval),
      ticktype="detailed",
      xlab="u1",
      ylab="u2",
      zlab="Copula",
      zlim=c(0,1))

## Plot the empirical copula

copula.emp <- npcopula(bws=bw.cdf,data=mydat)

plot(copula.emp$u1,copula.emp$u2,
     xlab="u1",
     ylab="u2",
     cex=.25,
     main="Empirical Copula")

## Estimate the copula density (bw object obtained from npudensbw())

bw.pdf <- npudensbw(~x+y,data=mydat)
copula <- npcopula(bws=bw.pdf,data=mydat,u=grid.dat)

## Plot the copula density

persp(grid.seq,grid.seq,matrix(copula$copula,n.eval,n.eval),
      ticktype="detailed",
      xlab="u1",
      ylab="u2",
      zlab="Copula Density")
} # }