Kernel Consistent Density Equality Test with Mixed Data Types
np.deneqtest.Rdnpdeneqtest implements a consistent integrated squared
difference test for equality of densities as described in Li, Maasoumi,
and Racine (2009).
Usage
npdeneqtest(x = NULL,
y = NULL,
bw.x = NULL,
bw.y = NULL,
boot.num = 399,
random.seed = 42,
...)Arguments
- x,y
data frames for the two samples for which one wishes to test equality of densities. The variables in each data frame must be the same (i.e. have identical names).
- bw.x,bw.y
optional bandwidth objects for
x,y- boot.num
an integer value specifying the number of bootstrap replications to use. Defaults to
399.- random.seed
an integer used to seed R's random number generator. This is to ensure replicability. Defaults to 42.
- ...
additional arguments supplied to specify the bandwidth type, kernel types, and so on. This is used if you do not pass in bandwidth objects and you do not desire the default behaviours. To do this, you may specify any of
bwscaling,bwtype,ckertype,ckerorder,ukertype,okertype.
Value
npdeneqtest returns an object of type deneqtest with the
following components
- Tn
the (standardized) statistic
Tn- In
the (unstandardized) statistic
In- Tn.bootstrap
contains the bootstrap replications of
Tn- In.bootstrap
contains the bootstrap replications of
In- Tn.P
the P-value of the
Tnstatistic- In.P
the P-value of the
Instatistic- boot.num
number of bootstrap replications
summary supports object of type deneqtest.
References
Li, Q. and E. Maasoumi and J.S. Racine (2009), “A Nonparametric Test for Equality of Distributions with Mixed Categorical and Continuous Data,” Journal of Econometrics, 148, pp 186-200.
Author
Tristen Hayfield tristen.hayfield@gmail.com, Jeffrey S. Racine racinej@mcmaster.ca
Details
npdeneqtest computes the integrated squared density difference
between the estimated densities/probabilities of two samples having
identical variables/datatypes. See Li, Maasoumi, and Racine (2009) for
details.
Usage Issues
If you are using data of mixed types, then it is advisable to use the
data.frame function to construct your input data and not
cbind, since cbind will typically not work as
intended on mixed data types and will coerce the data to the same
type.
It is crucial that both data frames have the same variable names.
Examples
if (FALSE) { # \dontrun{
set.seed(1234)
## Distributions are equal
n <- 250
sample.A <- data.frame(x=rnorm(n))
sample.B <- data.frame(x=rnorm(n))
npdeneqtest(sample.A,sample.B,boot.num=99)
Sys.sleep(5)
## Distributions are unequal
sample.A <- data.frame(x=rnorm(n))
sample.B <- data.frame(x=rchisq(n,df=5))
npdeneqtest(sample.A,sample.B,boot.num=99)
## Mixed datatypes, distributions are equal
sample.A <- data.frame(a=rnorm(n),b=factor(rbinom(n,2,.5)))
sample.B <- data.frame(a=rnorm(n),b=factor(rbinom(n,2,.5)))
npdeneqtest(sample.A,sample.B,boot.num=99)
Sys.sleep(5)
## Mixed datatypes, distributions are unequal
sample.A <- data.frame(a=rnorm(n),b=factor(rbinom(n,2,.5)))
sample.B <- data.frame(a=rnorm(n,sd=10),b=factor(rbinom(n,2,.25)))
npdeneqtest(sample.A,sample.B,boot.num=99)
} # }