ContingencyTests.RdTesting the independence of two nominal or ordered factors.
# S3 method for class 'formula'
chisq_test(formula, data, subset = NULL, weights = NULL, ...)
# S3 method for class 'table'
chisq_test(object, ...)
# S3 method for class 'IndependenceProblem'
chisq_test(object, ...)
# S3 method for class 'formula'
cmh_test(formula, data, subset = NULL, weights = NULL, ...)
# S3 method for class 'table'
cmh_test(object, ...)
# S3 method for class 'IndependenceProblem'
cmh_test(object, ...)
# S3 method for class 'formula'
lbl_test(formula, data, subset = NULL, weights = NULL, ...)
# S3 method for class 'table'
lbl_test(object, ...)
# S3 method for class 'IndependenceProblem'
lbl_test(object, ...)a formula of the form y ~ x | block where y and x are
factors and block is an optional factor for stratification.
an optional data frame containing the variables in the model formula.
an optional vector specifying a subset of observations to be used. Defaults
to NULL.
an optional formula of the form ~ w defining integer valued case
weights for each observation. Defaults to NULL, implying equal
weight for all observations.
an object inheriting from classes "table" or
"IndependenceProblem".
further arguments to be passed to independence_test().
chisq_test(), cmh_test() and lbl_test() provide the
Pearson chi-squared test, the generalized Cochran-Mantel-Haenszel test and the
linear-by-linear association test. A general description of these methods is
given by Agresti (2002).
The null hypothesis of independence, or conditional independence given
block, between y and x is tested.
If y and/or x are ordered factors, the default scores,
1:nlevels(y) and 1:nlevels(x), respectively, can be altered
using the scores argument (see independence_test()); this
argument can also be used to coerce nominal factors to class "ordered".
(lbl_test() coerces to class "ordered" under any circumstances.)
If both y and x are ordered factors, a linear-by-linear
association test is computed and the direction of the alternative hypothesis
can be specified using the alternative argument. For the Pearson
chi-squared test, this extension was given by Yates (1948) who also discussed
the situation when either the response or the covariate is an ordered factor;
see also Cochran (1954) and Armitage (1955) for the particular case when
y is a binary factor and x is ordered. The Mantel-Haenszel
statistic (Mantel and Haenszel, 1959) was similarly extended by Mantel (1963)
and Landis, Heyman and Koch (1978).
The conditional null distribution of the test statistic is used to obtain
\(p\)-values and an asymptotic approximation of the exact distribution is
used by default (distribution = "asymptotic"). Alternatively, the
distribution can be approximated via Monte Carlo resampling or computed
exactly for univariate two-sample problems by setting distribution to
"approximate" or "exact", respectively. See
asymptotic(), approximate() and
exact() for details.
An object inheriting from class "IndependenceTest".
The exact versions of the Pearson chi-squared test and the generalized Cochran-Mantel-Haenszel test do not necessarily result in the same \(p\)-value as Fisher's exact test (Davis, 1986).
Agresti, A. (2002). Categorical Data Analysis, Second Edition. Hoboken, New Jersey: John Wiley & Sons.
Armitage, P. (1955). Tests for linear trends in proportions and frequencies. Biometrics 11(3), 375–386. doi:10.2307/3001775
Cochran, W.G. (1954). Some methods for strengthening the common \(\chi^2\) tests. Biometrics 10(4), 417–451. doi:10.2307/3001616
Davis, L. J. (1986). Exact tests for \(2 \times 2\) contingency tables. The American Statistician 40(2), 139–141. doi:10.1080/00031305.1986.10475377
Landis, J. R., Heyman, E. R. and Koch, G. G. (1978). Average partial association in three-way contingency tables: a review and discussion of alternative tests. International Statistical Review 46(3), 237–254. doi:10.2307/1402373
Mantel, N. and Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute 22(4), 719–748. doi:10.1093/jnci/22.4.719
Mantel, N. (1963). Chi-square tests with one degree of freedom: extensions of the Mantel-Haenszel procedure. Journal of the American Statistical Association 58(303), 690–700. doi:10.1080/01621459.1963.10500879
Yates, F. (1948). The analysis of contingency tables with groupings based on quantitative characters. Biometrika 35(1/2), 176–181. doi:10.1093/biomet/35.1-2.176
## Example data
## Davis (1986, p. 140)
davis <- matrix(
c(3, 6,
2, 19),
nrow = 2, byrow = TRUE
)
davis <- as.table(davis)
## Asymptotic Pearson chi-squared test
chisq_test(davis)
#>
#> Asymptotic Pearson Chi-Squared Test
#>
#> data: Var2 by Var1 (A, B)
#> chi-squared = 2.5714, df = 1, p-value = 0.1088
#>
chisq.test(davis, correct = FALSE) # same as above
#> Warning: Chi-squared approximation may be incorrect
#>
#> Pearson's Chi-squared test
#>
#> data: davis
#> X-squared = 2.5714, df = 1, p-value = 0.1088
#>
## Approximative (Monte Carlo) Pearson chi-squared test
ct <- chisq_test(davis,
distribution = approximate(nresample = 10000))
pvalue(ct) # standard p-value
#> [1] 0.294
#> 99 percent confidence interval:
#> 0.2823153 0.3058818
#>
midpvalue(ct) # mid-p-value
#> [1] 0.15665
#> 99 percent confidence interval:
#> 0.1474022 0.1661244
#>
pvalue_interval(ct) # p-value interval
#> p_0 p_1
#> 0.0193 0.2940
size(ct, alpha = 0.05) # test size at alpha = 0.05 using the p-value
#> [1] 0.0193
## Exact Pearson chi-squared test (Davis, 1986)
## Note: disagrees with Fisher's exact test
ct <- chisq_test(davis,
distribution = "exact")
pvalue(ct) # standard p-value
#> [1] 0.2860301
midpvalue(ct) # mid-p-value
#> [1] 0.1527409
pvalue_interval(ct) # p-value interval
#> p_0 p_1
#> 0.01945181 0.28603006
size(ct, alpha = 0.05) # test size at alpha = 0.05 using the p-value
#> [1] 0.01945181
fisher.test(davis)
#>
#> Fisher's Exact Test for Count Data
#>
#> data: davis
#> p-value = 0.1432
#> alternative hypothesis: true odds ratio is not equal to 1
#> 95 percent confidence interval:
#> 0.410317 65.723900
#> sample estimates:
#> odds ratio
#> 4.462735
#>
## Laryngeal cancer data
## Agresti (2002, p. 107, Tab. 3.13)
cancer <- matrix(
c(21, 2,
15, 3),
nrow = 2, byrow = TRUE,
dimnames = list(
"Treatment" = c("Surgery", "Radiation"),
"Cancer" = c("Controlled", "Not Controlled")
)
)
cancer <- as.table(cancer)
## Exact Pearson chi-squared test (Agresti, 2002, p. 108, Tab. 3.14)
## Note: agrees with Fishers's exact test
(ct <- chisq_test(cancer,
distribution = "exact"))
#>
#> Exact Pearson Chi-Squared Test
#>
#> data: Cancer by Treatment (Surgery, Radiation)
#> chi-squared = 0.59915, p-value = 0.6384
#>
midpvalue(ct) # mid-p-value
#> [1] 0.5006832
pvalue_interval(ct) # p-value interval
#> p_0 p_1
#> 0.3629407 0.6384258
size(ct, alpha = 0.05) # test size at alpha = 0.05 using the p-value
#> [1] 0.01143318
fisher.test(cancer)
#>
#> Fisher's Exact Test for Count Data
#>
#> data: cancer
#> p-value = 0.6384
#> alternative hypothesis: true odds ratio is not equal to 1
#> 95 percent confidence interval:
#> 0.2089115 27.5538747
#> sample estimates:
#> odds ratio
#> 2.061731
#>
## Homework conditions and teacher's rating
## Yates (1948, Tab. 1)
yates <- matrix(
c(141, 67, 114, 79, 39,
131, 66, 143, 72, 35,
36, 14, 38, 28, 16),
byrow = TRUE, ncol = 5,
dimnames = list(
"Rating" = c("A", "B", "C"),
"Condition" = c("A", "B", "C", "D", "E")
)
)
yates <- as.table(yates)
## Asymptotic Pearson chi-squared test (Yates, 1948, p. 176)
chisq_test(yates)
#>
#> Asymptotic Pearson Chi-Squared Test
#>
#> data: Condition by Rating (A, B, C)
#> chi-squared = 9.0928, df = 8, p-value = 0.3345
#>
## Asymptotic Pearson-Yates chi-squared test (Yates, 1948, pp. 180-181)
## Note: 'Rating' and 'Condition' as ordinal
(ct <- chisq_test(yates,
alternative = "less",
scores = list("Rating" = c(-1, 0, 1),
"Condition" = c(2, 1, 0, -1, -2))))
#>
#> Asymptotic Linear-by-Linear Association Test
#>
#> data: Condition (ordered) by Rating (A < B < C)
#> Z = -1.5269, p-value = 0.06339
#> alternative hypothesis: less
#>
statistic(ct)^2 # chi^2 = 2.332
#> [1] 2.33154
## Asymptotic Pearson-Yates chi-squared test (Yates, 1948, p. 181)
## Note: 'Rating' as ordinal
chisq_test(yates,
scores = list("Rating" = c(-1, 0, 1))) # Q = 3.825
#>
#> Asymptotic Generalized Pearson Chi-Squared Test
#>
#> data: Condition by Rating (A < B < C)
#> chi-squared = 3.8242, df = 4, p-value = 0.4303
#>
## Change in clinical condition and degree of infiltration
## Cochran (1954, Tab. 6)
cochran <- matrix(
c(11, 7,
27, 15,
42, 16,
53, 13,
11, 1),
byrow = TRUE, ncol = 2,
dimnames = list(
"Change" = c("Marked", "Moderate", "Slight",
"Stationary", "Worse"),
"Infiltration" = c("0-7", "8-15")
)
)
cochran <- as.table(cochran)
## Asymptotic Pearson chi-squared test (Cochran, 1954, p. 435)
chisq_test(cochran) # X^2 = 6.88
#>
#> Asymptotic Pearson Chi-Squared Test
#>
#> data: Infiltration by
#> Change (Marked, Moderate, Slight, Stationary, Worse)
#> chi-squared = 6.8807, df = 4, p-value = 0.1423
#>
## Asymptotic Cochran-Armitage test (Cochran, 1954, p. 436)
## Note: 'Change' as ordinal
(ct <- chisq_test(cochran,
scores = list("Change" = c(3, 2, 1, 0, -1))))
#>
#> Asymptotic Linear-by-Linear Association Test
#>
#> data: Infiltration by
#> Change (Marked < Moderate < Slight < Stationary < Worse)
#> Z = -2.5818, p-value = 0.009829
#> alternative hypothesis: two.sided
#>
statistic(ct)^2 # X^2 = 6.66
#> [1] 6.665691
## Change in size of ulcer crater for two treatment groups
## Armitage (1955, Tab. 2)
armitage <- matrix(
c( 6, 4, 10, 12,
11, 8, 8, 5),
byrow = TRUE, ncol = 4,
dimnames = list(
"Treatment" = c("A", "B"),
"Crater" = c("Larger", "< 2/3 healed",
">= 2/3 healed", "Healed")
)
)
armitage <- as.table(armitage)
## Approximative (Monte Carlo) Pearson chi-squared test (Armitage, 1955, p. 379)
chisq_test(armitage,
distribution = approximate(nresample = 10000)) # chi^2 = 5.91
#>
#> Approximative Pearson Chi-Squared Test
#>
#> data: Crater by Treatment (A, B)
#> chi-squared = 5.9085, p-value = 0.1276
#>
## Approximative (Monte Carlo) Cochran-Armitage test (Armitage, 1955, p. 379)
(ct <- chisq_test(armitage,
distribution = approximate(nresample = 10000),
scores = list("Crater" = c(-1.5, -0.5, 0.5, 1.5))))
#>
#> Approximative Linear-by-Linear Association Test
#>
#> data: Crater (ordered) by Treatment (A, B)
#> Z = 2.2932, p-value = 0.0238
#> alternative hypothesis: two.sided
#>
statistic(ct)^2 # chi_0^2 = 5.26
#> [1] 5.258804
## Relationship between job satisfaction and income stratified by gender
## Agresti (2002, p. 288, Tab. 7.8)
## Asymptotic generalized Cochran-Mantel-Haenszel test (Agresti, p. 297)
(ct <- cmh_test(jobsatisfaction)) # CMH = 10.2001
#>
#> Asymptotic Generalized Cochran-Mantel-Haenszel Test
#>
#> data: Job.Satisfaction by
#> Income (<5000, 5000-15000, 15000-25000, >25000)
#> stratified by Gender
#> chi-squared = 10.2, df = 9, p-value = 0.3345
#>
## The standardized linear statistic
statistic(ct, type = "standardized")
#> Very Dissatisfied A Little Satisfied Moderately Satisfied
#> <5000 1.3112789 0.69201053 -0.2478705
#> 5000-15000 0.6481783 0.83462550 0.5175755
#> 15000-25000 -1.0958361 -1.50130926 0.2361231
#> >25000 -1.0377629 -0.08983052 -0.5946119
#> Very Satisfied
#> <5000 -0.9293458
#> 5000-15000 -1.6257547
#> 15000-25000 1.4614123
#> >25000 1.2031648
## The standardized linear statistic for each block
statistic(ct, type = "standardized", partial = TRUE)
#> , , Female
#>
#> Very Dissatisfied A Little Satisfied Moderately Satisfied
#> <5000 0.2698452 0.4922196 0.2175066
#> 5000-15000 0.9959059 -0.3770330 0.7219631
#> 15000-25000 -0.9314188 -0.8360078 -0.4647580
#> >25000 -0.6652991 0.9438798 -0.7745967
#> Very Satisfied
#> <5000 -0.8543153
#> 5000-15000 -1.0990044
#> 15000-25000 1.8254610
#> >25000 0.4803845
#>
#> , , Male
#>
#> Very Dissatisfied A Little Satisfied Moderately Satisfied
#> <5000 2.6457513 0.5352855 -0.8355894
#> 5000-15000 -0.5388159 2.1196904 -0.1323549
#> 15000-25000 -0.5773503 -1.3627703 0.9117028
#> >25000 -0.8164966 -0.9636241 -0.1289343
#> Very Satisfied
#> <5000 -0.3964690
#> 5000-15000 -1.2350566
#> 15000-25000 0.2018721
#> >25000 1.1419611
#>
## Asymptotic generalized Cochran-Mantel-Haenszel test (Agresti, p. 297)
## Note: 'Job.Satisfaction' as ordinal
cmh_test(jobsatisfaction,
scores = list("Job.Satisfaction" = c(1, 3, 4, 5))) # L^2 = 9.0342
#>
#> Asymptotic Generalized Cochran-Mantel-Haenszel Test
#>
#> data: Job.Satisfaction (ordered) by
#> Income (<5000, 5000-15000, 15000-25000, >25000)
#> stratified by Gender
#> chi-squared = 9.0342, df = 3, p-value = 0.02884
#>
## Asymptotic linear-by-linear association test (Agresti, p. 297)
## Note: 'Job.Satisfaction' and 'Income' as ordinal
(lt <- lbl_test(jobsatisfaction,
scores = list("Job.Satisfaction" = c(1, 3, 4, 5),
"Income" = c(3, 10, 20, 35))))
#>
#> Asymptotic Linear-by-Linear Association Test
#>
#> data: Job.Satisfaction (ordered) by
#> Income (<5000 < 5000-15000 < 15000-25000 < >25000)
#> stratified by Gender
#> Z = 2.4812, p-value = 0.01309
#> alternative hypothesis: two.sided
#>
statistic(lt)^2 # M^2 = 6.1563
#> [1] 6.156301
## The standardized linear statistic
statistic(lt, type = "standardized")
#>
#> 2.48119
## The standardized linear statistic for each block
statistic(lt, type = "standardized", partial = TRUE)
#> , , Female
#>
#>
#> 1.271093
#>
#> , , Male
#>
#>
#> 2.317108
#>