Calculate Generalized Log Odds Ratios for Frequency Tables

Computes (log) odds ratios and their asymptotic variance covariance matrix for R x C (x strata) tables. Odds ratios are calculated for two array dimensions, separately for each level of all stratifying dimensions. See Friendly et al. (2011) for a sketch of a general theory.

loddsratio(x, ...)
# Default S3 method
loddsratio(x, strata = NULL, log = TRUE,
  ref = NULL, correct = any(x == 0L), ...)

# S3 method for class 'formula'
loddsratio(formula, data = NULL, ...,
subset = NULL, na.action = NULL)

oddsratio(x, stratum = NULL, log = TRUE)

# S3 method for class 'loddsratio'
coef(object, log = object$log, ...)
# S3 method for class 'loddsratio'
vcov(object, log = object$log, ...)
# S3 method for class 'loddsratio'
print(x, log = x$log, ...)
# S3 method for class 'loddsratio'
confint(object, parm, level = 0.95, log = object$log, ...)

<!-- %as.array(x, \dots) -->
# S3 method for class 'loddsratio'
as.array(x, log=x$log, ...)
# S3 method for class 'loddsratio'
t(x)
# S3 method for class 'loddsratio'
aperm(a, perm, ...)

Arguments

x: an object. For the default method a k-way matrix/table/array of frequencies. The number of margins has to be at least 2.
strata, stratum: Numeric or character indicating the margins of a $k$-way table x (with $k$ greater than 2) that should be employed as strata. By default all dimensions except the first two are used.
ref: numeric or character. Reference categories for the (non-stratum) row and column dimensions that should be employed for computing the odds ratios. By default, odds ratios for profile contrasts (or sequential contrasts, i.e., successive differences of adjacent categories) are used. See details below.
formula: a formula specifying the variables used to create a contingency table from data. A conditioning formula can be specified; the conditioning variables will then be used as strata variables.
data: either a data frame, or an object of class "table" or "ftable".
subset: an optional vector specifying a subset of observations to be used.
na.action: a function which indicates what should happen when the data contain NAs. Ignored if data is a contingency table.
log: logical. Should the results be displayed on a log scale or not? All internal computations are always on the log-scale but the results are transformed by default if log = TRUE.
correct: logical or numeric. Should a continuity correction be applied before computing odds ratios? If TRUE, 0.5 is added to all cells; if numeric (or an array conforming to the data) that value is added to all cells. By default, this not employed unless there are any zero cells in the table, but this correction is often recommended to reduce bias when some frequencies are small (Fleiss, 1981).
a, object: an object of class loddsratio as computed by loddsratio.
perm: numeric or character vector specifying a permutation of strata.
...: arguments passed to methods.
parm: a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.
level: the confidence level required for the confint method.

Details

For an R x C table, (log) odds ratios are formed for the set of (R-1) x (C-1) 2 x 2 tables, corresponding to some set of contrasts among the row and column variables. The ref argument allows these to be specified in a general way.

ref = NULL (default) corresponds to “profile contrasts” (or sequential contrasts or successive differences) for ordered categories, i.e., R1–R2, R2–R3, R3–R4, etc., and similarly for the column categories. These are sometimes called “local odds ratios”.

ref = 1 gives contrasts with the first category; ref = dim(x) gives contrasts with the last category; ref = c(2, 4) or ref = list(2, 4) corresponds to the reference being the second category in rows and the fourth in columns.

Combinations like ref = list(NULL, 3) are also possible, as are character vectors, e.g., ref = c("foo", "bar") also works ("foo" pertaining again to the row reference and "bar" to column reference).

Note that all such parameterizations are equivalent, in that one can derive all other possible odds ratios from any non-redundant set, but the interpretation of these values depends on the parameterization.

Note also that these reference level parameterizations only have meaning when the primary (non-strata) table dimensions are larger than 2x2. In the 2x2 case, the odds ratios are defined by the order of levels of those variables in the table, so you can achieve a desired interpretation by manipulating the table.

See the help page of plot.loddsratio for visualization methods.

Value

An object of class loddsratio, with the following components:

coefficients: A named vector, of length (R-1) x (C-1) x prod(dim(x)[strata]) containing the log odds ratios. Use the coef method to extract these from the object, and the confint method for confidence intervals. For a two-way table, the names for the log odds ratios are constructed in the form Ri:Rj/Ci:Cj using the table names for rows and columns. For a stratified table, the names are constructed in the form Ri:Rj/Ci:Cj|Lk.
vcov: Variance covariance matrix of the log odds ratios.
dimnames: Dimension names for the log odds ratios, considered as a table of size (R-1, C-1, dim(x)[strata]). Use the dim and dimnames methods to extract these and manipulate the log odds ratios in relation to the original table.
dim: Corresponding dimension vector.
contrasts: A matrix C, such that C %*% as.vector(log(x)) gives the log odds ratios. Each row corresponds to one log odds ratio, and is all zero, except for 4 elements of c(1, -1, -1, 1) for a given 2 x 2 subtable.
log: A logical, indicating the value of log in the original call.

References

A. Agresti (2013), Categorical Data Analysis, 3rd Ed. New York: Wiley.

Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions. 2nd Edition. New York: Wiley.

M. Friendly (2000), Visualizing Categorical Data. SAS Institute, Cary, NC.

Friendly, M., Turner, H,, Firth, D., Zeileis, A. (2011). Advances in Visualizing Categorical Data Using the vcd, gnm and vcdExtra Packages in R. Correspondence Analysis and Related Methods (CARME 2011). http://www.datavis.ca/papers/adv-vcd-4up.pdf

Author

Achim Zeileis, Michael Friendly and David Meyer.

Note

The method of calculation is an example of the use of the delta method described by Agresti (2013), Section 16.1.6, giving estimates of log odds ratios and their asymptotic covariance matrix.

The coef method returns the coefficients component as a vector of length (R-1) x (C-1) x prod(dim(x)[strata]). The dim and dimnames methods provide the proper attributes for treating the coefficients vector as an (R-1) x (C-1) x strata array. as.matrix and as.array methods are also provided for this purpose.

The confint method computes confidence intervals for the log odds ratios (or for odds ratios, with log = FALSE). The coeftest method (summary is an alias) prints the asymptotic standard errors, z tests (standardized log odds ratios), and the corresponding p values.

Structural zeros: In addition to the options for zero cells provided by correct, the function allows for structural zeros to be represented as NA in the data argument. NA in the data yields NA as the LOR estimate, but does not affect other cells.

oddsratio is just an alias to loddsratio for backward compatibility.

Examples

## artificial example
set.seed(1)
x <- matrix(rpois(5 * 3, 7), ncol = 5, nrow = 3)
dimnames(x) <- list(Row = head(letters, 3), Col = tail(letters, 5))

x_lor <- loddsratio(x)
coef(x_lor)
#>    a:b/v:w    b:c/v:w    a:b/w:x    b:c/w:x    a:b/x:y    b:c/x:y    a:b/y:z 
#> -0.9707789  0.5389965  0.4700036 -0.6931472  0.8292794  0.0000000 -0.7985077 
#>    b:c/y:z 
#>  0.4054651 
x_lor
#> log odds ratios for Row and Col 
#> 
#>      Col
#> Row          v:w        w:x       x:y        y:z
#>   a:b -0.9707789  0.4700036 0.8292794 -0.7985077
#>   b:c  0.5389965 -0.6931472 0.0000000  0.4054651
confint(x_lor)
#>              2.5 %    97.5 %
#> a:b/v:w -2.5601342 0.6185764
#> b:c/v:w -0.9911867 2.0691797
#> a:b/w:x -0.9253175 1.8653248
#> b:c/w:x -2.1466954 0.7604010
#> a:b/x:y -0.8672418 2.5258005
#> b:c/x:y -1.5801735 1.5801735
#> a:b/y:z -2.5787334 0.9817180
#> b:c/y:z -1.2081195 2.0190498
summary(x_lor)
#> 
#> z test of coefficients:
#> 
#>         Estimate Std. Error z value Pr(>|z|)
#> a:b/v:w -0.97078    0.81091 -1.1971   0.2312
#> b:c/v:w  0.53900    0.78072  0.6904   0.4900
#> a:b/w:x  0.47000    0.71191  0.6602   0.5091
#> b:c/w:x -0.69315    0.74162 -0.9346   0.3500
#> a:b/x:y  0.82928    0.86559  0.9581   0.3380
#> b:c/x:y  0.00000    0.80623  0.0000   1.0000
#> a:b/y:z -0.79851    0.90830 -0.8791   0.3793
#> b:c/y:z  0.40547    0.82327  0.4925   0.6224
#> 

## 2 x 2 x k cases
#data(CoalMiners, package = "vcd")
lor_CM <- loddsratio(CoalMiners)
lor_CM
#> log odds ratios for Breathlessness and Wheeze by Age 
#> 
#>    20-24    25-29    30-34    35-39    40-44    45-49    50-54    55-59 
#> 3.215502 3.695261 3.398339 3.140658 3.014687 2.782049 2.926395 2.440571 
#>    60-64 
#> 2.637954 
coef(lor_CM)
#>    20-24    25-29    30-34    35-39    40-44    45-49    50-54    55-59 
#> 3.215502 3.695261 3.398339 3.140658 3.014687 2.782049 2.926395 2.440571 
#>    60-64 
#> 2.637954 
confint(lor_CM)
#>          2.5 %   97.5 %
#> 20-24 2.206477 4.224527
#> 25-29 2.899801 4.490721
#> 30-34 2.853283 3.943394
#> 35-39 2.782392 3.498925
#> 40-44 2.682873 3.346501
#> 45-49 2.513658 3.050439
#> 50-54 2.679571 3.173220
#> 55-59 2.204392 2.676749
#> 60-64 2.349906 2.926002
confint(lor_CM, log = FALSE)
#>           2.5 %   97.5 %
#> 20-24  9.083655 68.34216
#> 25-29 18.170533 89.18574
#> 30-34 17.344638 51.59341
#> 35-39 16.157616 33.07988
#> 40-44 14.627062 28.40318
#> 45-49 12.350021 21.12463
#> 50-54 14.578836 23.88427
#> 55-59  9.064743 14.53775
#> 60-64 10.484582 18.65291

## 2 x k x 2
lor_Emp <-loddsratio(Employment)
lor_Emp
#> log odds ratios for EmploymentStatus and EmploymentLength by LayoffCause 
#> 
#>                 LayoffCause
#> EmploymentLength     Closure   Replaced
#>     <1Mo:1-3Mo   -0.04082199 -0.1941560
#>     1-3Mo:3-12Mo  0.02353050 -0.7799433
#>     3-12Mo:1-2Yr  0.04904020 -0.1851376
#>     1-2Yr:2-5Yr  -0.07555132  0.1952148
#>     2-5Yr:>5Yr   -0.26157903 -0.2479188
confint(lor_Emp)
#>                                              2.5 %     97.5 %
#> NewJob:Unemployed/<1Mo:1-3Mo|Closure    -1.0730756  0.9914316
#> NewJob:Unemployed/1-3Mo:3-12Mo|Closure  -0.5248903  0.5719513
#> NewJob:Unemployed/3-12Mo:1-2Yr|Closure  -0.4086970  0.5067774
#> NewJob:Unemployed/1-2Yr:2-5Yr|Closure   -0.5612569  0.4101543
#> NewJob:Unemployed/2-5Yr:>5Yr|Closure    -0.8419062  0.3187482
#> NewJob:Unemployed/<1Mo:1-3Mo|Replaced   -0.8208571  0.4325450
#> NewJob:Unemployed/1-3Mo:3-12Mo|Replaced -1.2815154 -0.2783712
#> NewJob:Unemployed/3-12Mo:1-2Yr|Replaced -0.8177531  0.4474780
#> NewJob:Unemployed/1-2Yr:2-5Yr|Replaced  -0.4831036  0.8735331
#> NewJob:Unemployed/2-5Yr:>5Yr|Replaced   -1.0401147  0.5442771

## 4 way tables 
data(Punishment, package = "vcd")
lor_pun <- loddsratio(Freq ~ memory + attitude | age + education, data = Punishment)
lor_pun
#> log odds ratios for memory and attitude by age, education 
#> 
#>        education
#> age     elementary  secondary       high
#>   15-24 -1.7700195 -0.2451225  0.3794896
#>   25-39 -1.6644777 -0.4367177  0.4855078
#>   40-   -0.8777167 -1.3682759 -1.8111776
confint(lor_pun)
#>                       2.5 %     97.5 %
#> 15-24:elementary -3.8226860  0.2826470
#> 25-39:elementary -2.8851216 -0.4438338
#> 40-:elementary   -1.3935029 -0.3619305
#> 15-24:secondary  -1.9601755  1.4699306
#> 25-39:secondary  -1.3265835  0.4531482
#> 40-:secondary    -2.5121032 -0.2244485
#> 15-24:high       -2.0927803  2.8517596
#> 25-39:high       -0.8959469  1.8669625
#> 40-:high         -4.0118834  0.3895283
summary(lor_pun)
#> 
#> z test of coefficients:
#> 
#>                  Estimate Std. Error z value  Pr(>|z|)    
#> 15-24:elementary -1.77002    1.04730 -1.6901 0.0910123 .  
#> 25-39:elementary -1.66448    0.62279 -2.6726 0.0075262 ** 
#> 40-:elementary   -0.87772    0.26316 -3.3353 0.0008521 ***
#> 15-24:secondary  -0.24512    0.87504 -0.2801 0.7793807    
#> 25-39:secondary  -0.43672    0.45402 -0.9619 0.3361061    
#> 40-:secondary    -1.36828    0.58360 -2.3446 0.0190496 *  
#> 15-24:high        0.37949    1.26139  0.3009 0.7635278    
#> 25-39:high        0.48551    0.70484  0.6888 0.4909346    
#> 40-:high         -1.81118    1.12283 -1.6130 0.1067342    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 

# fit linear model using WLS
lor_pun_df <- as.data.frame(lor_pun)
pun_mod1 <- lm(LOR ~ as.numeric(age) * as.numeric(education),
               data = lor_pun_df, weights = 1 / ASE^2)
anova(pun_mod1)
#> Analysis of Variance Table
#> 
#> Response: LOR
#>                                       Df Sum Sq Mean Sq F value  Pr(>F)  
#> as.numeric(age)                        1 1.0437  1.0437  2.7167 0.16022  
#> as.numeric(education)                  1 1.8395  1.8395  4.7883 0.08028 .
#> as.numeric(age):as.numeric(education)  1 5.0441  5.0441 13.1299 0.01516 *
#> Residuals                              5 1.9208  0.3842                  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

## illustrate ref levels
VA.fem <- xtabs(Freq ~ left + right, subset=gender=="female", data=VisualAcuity)
VA.fem
#>     right
#> left    1    2    3    4
#>    1 1520  234  117   36
#>    2  266 1512  362   82
#>    3  124  432 1772  179
#>    4   66   78  205  492
loddsratio(VA.fem)                  # profile contrasts
#> log odds ratios for left and right 
#> 
#>      right
#> left         1:2        2:3        3:4
#>   1:2  3.6088367 -0.7363972 -0.3062700
#>   2:3 -0.4895482  2.8409829 -0.8075534
#>   3:4 -1.0810899 -0.4451374  3.1679471
loddsratio(VA.fem, ref=1)           # contrasts against level 1
#> log odds ratios for left and right 
#> 
#>      right
#> left       1:2      1:3      1:4
#>   1:2 3.608837 2.872440 2.566170
#>   1:3 3.119289 5.223874 4.110051
#>   1:4 2.038199 3.697647 5.751771
loddsratio(VA.fem, ref=dim(VA.fem)) # contrasts against level 4
#> log odds ratios for left and right 
#> 
#>      right
#> left       1:4      2:4      3:4
#>   1:4 5.751771 3.713572 2.054124
#>   2:4 3.185601 4.756239 2.360394
#>   3:4 1.641720 2.722810 3.167947