Cate-Nelson models for bivariate data

Produces critical-x and critical-y values for bivariate data according to a Cate-Nelson analysis.

cateNelson(
  x,
  y,
  plotit = TRUE,
  hollow = TRUE,
  xlab = "X",
  ylab = "Y",
  trend = "positive",
  clx = 1,
  cly = 1,
  xthreshold = 0.1,
  ythreshold = 0.1,
  progress = TRUE,
  verbose = TRUE,
  listout = FALSE
)

Arguments

x: A vector of values for the x variable.
y: A vector of values for the y variable.
plotit: If TRUE, produces plots of the output.
hollow: If TRUE, uses hollow circles on the plot to indicate data not fitting the model.
xlab: The label for the x-axis.
ylab: The label for the y-axis.
trend: "postive" if the trend of y vs. x is generally positive. "negative" if negative.
clx: Indicates which of the listed critical x values should be chosen as the critical x value for the final model.
cly: Indicates which of the listed critical y values should be chosen as the critical y value for the final model.
xthreshold: Indicates the proportion of potential critical x values to display in the output. A value of 1 would display all of them.
ythreshold: Indicates the proportion of potential critical y values to display in the output. A value of 1 would display all of them.
progress: If TRUE, prints an indicator of progress as for loops progress.
verbose: If FALSE, suppresses printed output of tables.
listout: If TRUE, outputs a list of data frames instead of a a single data frame. This allows a data frame of critical values and associated statistics to be extracted, for example if one would want to sort by Cramer's V.

Value

A data frame of statistics from the analysis: number of observations, critical level for x, sum of squares, critical value for y, the number of observations in each of the quadrants (I, II, III, IV), the number of observations that conform with the model, the proportion of observations that conform with the model, the number of observations that do not conform to the model, the proportion of observations that do not conform to the model, a p-value for the Fisher exact test for the data divided into the groups indicated by the model, and Cramer's V for the data divided into the groups indicated by the model.

Output also includes printed lists of critical values, explanation of the values in the data frame, and plots: y vs. x; sum of squares vs. critical x value; the number of observations that do not conform to the model vs. critical y value; and y vs. x with the critical values shown as lines on the plot, and the quadrants labeled.

Details

Cate-Nelson analysis divides bivariate data into two groups. For data with a positive trend, one group has a large x value associated with a large y value, and the other group has a small x value associated with a small y value. For a negative trend, a small x is associated with a large y, and so on.

The analysis is useful for bivariate data which don't conform well to linear, curvilinear, or plateau models.

This function will fail if either of the largest two or smallest two x values are identical.

Note

The method in this function follows Cate, R. B., & Nelson, L.A. (1971). A simple statistical procedure for partitioning soil test correlation data into two classes. Soil Science Society of America Proceedings 35, 658-660.

An earlier version of this function was published in Mangiafico, S.S. 2013. Cate-Nelson Analysis for Bivariate Data Using R-project. J.of Extension 51:5, 5TOT1.

References

https://rcompanion.org/rcompanion/h_02.html

Cate, R. B., & Nelson, L.A. (1971). A simple statistical procedure for partitioning soil test correlation data into two classes. Soil Science Society of America Proceedings 35, 658–660.

Author

Salvatore Mangiafico, mangiafico@njaes.rutgers.edu

Examples

data(Nurseries)
cateNelson(x          = Nurseries$Size,
           y          = Nurseries$Proportion,
           plotit     = TRUE,
           hollow     = TRUE,
           xlab       = "Nursery size in hectares",
           ylab       = "Proportion of good practices adopted",
           trend      = "positive",
           clx        = 1,
           xthreshold = 0.10,
           ythreshold = 0.15)
#> ..................................
#> 
#> Critical x that maximize sum of squares: 
#>  
#>   Critical.x.value Sum.of.squares
#> 1            4.035      0.2254775
#> 2            4.740      0.2046979
#> ......................................
#> 
#> Critical y that minimize errors: 
#>  
#>   Critical.y.value Q.i Q.ii Q.iii Q.iv Q.model Q.err Cramer.V
#> 1           0.6355   3   20     2   13      33     5   0.7289
#> 2           0.6430   3   19     3   13      32     6   0.6761
#> 3           0.6470   3   19     3   13      32     6   0.6761
#> 4           0.6545   2   18     4   14      32     6   0.6854
#> 5           0.6620   2   18     4   14      32     6   0.6854
#> 6           0.6015   6   21     1   10      31     7   0.6309
#> 7           0.6280   5   20     2   11      31     7   0.6209
#> 8           0.6320   5   20     2   11      31     7   0.6209





#> 
#> n         = Number of observations 
#> CLx       = Critical value of x 
#> SS        = Sum of squares for that critical value of x 
#> CLy       = Critical value of y 
#> Q         = Number of observations which fall into quadrants I, II, III, IV 
#> Q.Model   = Total observations which fall into the quadrants predicted by the model 
#> p.Model   = Percent observations which fall into the quadrants predicted by the model 
#> Q.Error   = Observations which do not fall into the quadrants predicted by the model 
#> p.Error   = Percent observations which do not fall into the quadrants predicted by the model 
#> Fisher.p  = p-value from Fisher exact test dividing data into these quadrants 
#> Cramer.V  = Cramer's V statistic from dividing data into these quadrants 
#> 
#> Final model: 
#>  
#>    n   CLx        SS    CLy Q.I Q.II Q.III Q.IV Q.Model   p.Model Q.Error
#> 1 38 4.035 0.2254775 0.6355   3   20     2   13      33 0.8684211       5
#>     p.Error Fisher.p.value Cramer.V
#> 1 0.1315789   8.532968e-06   0.7289