Skip to contents

Computes pairwise correlations between numeric columns and returns results in a tidy long format, sorted by absolute correlation.

Usage

cor_df(data, columns = NULL, use = "complete.obs", method = "pearson")

Arguments

data

A data.frame containing the variables to correlate

columns

Character vector of column names to include. If NULL, all numeric columns will be used.

use

Method for handling missing values, passed to cor(). Default is "complete.obs".

method

Correlation method, passed to cor(). Default is "pearson".

Value

A tibble with columns:

name1, name2

Variable pair names (lexicographically ordered)

CORR

Correlation coefficient

ABSCORR

Absolute correlation coefficient

Results are sorted by ABSCORR in descending order.

See also

Other statistics: cv(), geom_cv(), geom_mean(), geom_sd()

Examples

# Create sample data
set.seed(123)
df <- data.frame(
  A = rnorm(100, 5, 2),
  B = rnorm(100, 10, 3),
  C = rnorm(100, 15, 1),
  D = letters[1:100] # non-numeric
)
df$B <- df$A * 0.8 + rnorm(100, 0, 1) # Create some correlation

# All numeric columns
cor_df(df)
#> # A tibble: 3 × 4
#>   name1 name2   CORR ABSCORR
#>   <chr> <chr>  <dbl>   <dbl>
#> 1 A     B      0.806   0.806
#> 2 B     C     -0.134   0.134
#> 3 A     C     -0.129   0.129

# Specific columns
cor_df(df, columns = c("A", "B", "C"))
#> # A tibble: 3 × 4
#>   name1 name2   CORR ABSCORR
#>   <chr> <chr>  <dbl>   <dbl>
#> 1 A     B      0.806   0.806
#> 2 B     C     -0.134   0.134
#> 3 A     C     -0.129   0.129