These functions sort or order character strings containing embedded numbers so that the numbers are numerically sorted rather than sorted by character value. I.e. "Aspirin 50mg" will come before "Aspirin 100mg". In addition, case of character strings is ignored so that "a", will come before "B" and "C".

mixedsort(
  x,
  decreasing = FALSE,
  na.last = TRUE,
  blank.last = FALSE,
  numeric.type = c("decimal", "roman"),
  roman.case = c("upper", "lower", "both"),
  scientific = TRUE
)

mixedorder(
  x,
  decreasing = FALSE,
  na.last = TRUE,
  blank.last = FALSE,
  numeric.type = c("decimal", "roman"),
  roman.case = c("upper", "lower", "both"),
  scientific = TRUE
)

Arguments

x

Vector to be sorted.

decreasing

logical. Should the sort be increasing or decreasing? Note that descending=TRUE reverses the meanings of na.last and blanks.last.

na.last

for controlling the treatment of NA values. If TRUE, missing values in the data are put last; if FALSE, they are put first; if NA, they are removed.

blank.last

for controlling the treatment of blank values. If TRUE, blank values in the data are put last; if FALSE, they are put first; if NA, they are removed.

numeric.type

either "decimal" (default) or "roman". Are numeric values represented as decimal numbers (numeric.type="decimal") or as Roman numerals (numeric.type="roman")?

roman.case

one of "upper", "lower", or "both". Are roman numerals represented using only capital letters ('IX') or lower-case letters ('ix') or both?

scientific

logical. Should exponential notation be allowed for numeric values.

Value

mixedorder returns a vector giving the sort order of the input elements. mixedsort returns the sorted vector.

Details

I often have character vectors (e.g. factor labels), such as compound and dose, that contain both text and numeric data. This function is useful for sorting these character vectors into a logical order.

It does so by splitting each character vector into a sequence of character and numeric sections, and then sorting along these sections, with numbers being sorted by numeric value (e.g. "50" comes before "100"), followed by characters strings sorted by character value (e.g. "A" comes before "B") ignoring case (e.g. 'A' has the same sort order as 'a').

By default, sort order is ascending, empty strings are sorted to the front, and NA values to the end. Setting descending=TRUE changes the sort order to descending and reverses the meanings of na.last and blank.last.

Parsing looks for decimal numbers unless numeric.type="roman", in which parsing looks for roman numerals, with character case specified by roman.case.

See also

Author

Gregory R. Warnes greg@warnes.net

Examples


## compound & dose labels
Treatment <- c(
  "Control", "Aspirin 10mg/day", "Aspirin 50mg/day",
  "Aspirin 100mg/day", "Acetomycin 100mg/day",
  "Acetomycin 1000mg/day"
)

## ordinary sort puts the dosages in the wrong order
sort(Treatment)
#> [1] "Acetomycin 1000mg/day" "Acetomycin 100mg/day"  "Aspirin 100mg/day"    
#> [4] "Aspirin 10mg/day"      "Aspirin 50mg/day"      "Control"              

## but mixedsort does the 'right' thing
mixedsort(Treatment)
#> [1] "Acetomycin 100mg/day"  "Acetomycin 1000mg/day" "Aspirin 10mg/day"     
#> [4] "Aspirin 50mg/day"      "Aspirin 100mg/day"     "Control"              

## Here is a more complex example
x <- rev(c(
  "AA 0.50 ml", "AA 1.5 ml", "AA 500 ml", "AA 1500 ml",
  "EXP 1", "AA 1e3 ml", "A A A", "1 2 3 A", "NA", NA, "1e2",
  "", "-", "1A", "1 A", "100", "100A", "Inf"
))

mixedorder(x)
#>  [1]  7 11  4  5  3  8  2  1  6 12 18 17 16 13 15 14 10  9

mixedsort(x) # Notice that plain numbers, including 'Inf' show up
#>  [1] ""           "1 2 3 A"    "1 A"        "1A"         "100"       
#>  [6] "1e2"        "100A"       "Inf"        "-"          "A A A"     
#> [11] "AA 0.50 ml" "AA 1.5 ml"  "AA 500 ml"  "AA 1e3 ml"  "AA 1500 ml"
#> [16] "EXP 1"      "NA"         NA          
# before strings, NAs at the end, and blanks at the
# beginning .


mixedsort(x, na.last = TRUE) # default
#>  [1] ""           "1 2 3 A"    "1 A"        "1A"         "100"       
#>  [6] "1e2"        "100A"       "Inf"        "-"          "A A A"     
#> [11] "AA 0.50 ml" "AA 1.5 ml"  "AA 500 ml"  "AA 1e3 ml"  "AA 1500 ml"
#> [16] "EXP 1"      "NA"         NA          
mixedsort(x, na.last = FALSE) # push NAs to the front
#>  [1] NA           ""           "1 2 3 A"    "1 A"        "1A"        
#>  [6] "100"        "1e2"        "100A"       "Inf"        "-"         
#> [11] "A A A"      "AA 0.50 ml" "AA 1.5 ml"  "AA 500 ml"  "AA 1e3 ml" 
#> [16] "AA 1500 ml" "EXP 1"      "NA"        


mixedsort(x, blank.last = FALSE) # default
#>  [1] ""           "1 2 3 A"    "1 A"        "1A"         "100"       
#>  [6] "1e2"        "100A"       "Inf"        "-"          "A A A"     
#> [11] "AA 0.50 ml" "AA 1.5 ml"  "AA 500 ml"  "AA 1e3 ml"  "AA 1500 ml"
#> [16] "EXP 1"      "NA"         NA          
mixedsort(x, blank.last = TRUE) # push blanks to the end
#>  [1] "1 2 3 A"    "1 A"        "1A"         "100"        "1e2"       
#>  [6] "100A"       "Inf"        "-"          "A A A"      "AA 0.50 ml"
#> [11] "AA 1.5 ml"  "AA 500 ml"  "AA 1e3 ml"  "AA 1500 ml" "EXP 1"     
#> [16] "NA"         ""           NA          

mixedsort(x, decreasing = FALSE) # default
#>  [1] ""           "1 2 3 A"    "1 A"        "1A"         "100"       
#>  [6] "1e2"        "100A"       "Inf"        "-"          "A A A"     
#> [11] "AA 0.50 ml" "AA 1.5 ml"  "AA 500 ml"  "AA 1e3 ml"  "AA 1500 ml"
#> [16] "EXP 1"      "NA"         NA          
mixedsort(x, decreasing = TRUE) # reverse sort order
#>  [1] NA           "NA"         "EXP 1"      "AA 1500 ml" "AA 1e3 ml" 
#>  [6] "AA 500 ml"  "AA 1.5 ml"  "AA 0.50 ml" "A A A"      "-"         
#> [11] "Inf"        "100A"       "1e2"        "100"        "1A"        
#> [16] "1 A"        "1 2 3 A"    ""          

## Roman numerals
chapters <- c(
  "V. Non Sequiturs", "II. More Nonsense",
  "I. Nonsense", "IV. Nonesensical Citations",
  "III. Utter Nonsense"
)
mixedsort(chapters, numeric.type = "roman")
#> [1] "I. Nonsense"                "II. More Nonsense"         
#> [3] "III. Utter Nonsense"        "IV. Nonesensical Citations"
#> [5] "V. Non Sequiturs"          

## Lower-case Roman numerals
vals <- c(
  "xix", "xii", "mcv", "iii", "iv", "dcclxxii", "cdxcii",
  "dcxcviii", "dcvi", "cci"
)
(ordered <- mixedsort(vals, numeric.type = "roman", roman.case = "lower"))
#>  [1] "iii"      "iv"       "xii"      "xix"      "cci"      "cdxcii"  
#>  [7] "dcvi"     "dcxcviii" "dcclxxii" "mcv"     
roman2int(ordered)
#>      III       IV      XII      XIX      CCI   CDXCII     DCVI DCXCVIII 
#>        3        4       12       19      201      492      606      698 
#> DCCLXXII      MCV 
#>      772     1105 

## Control scientific notation for number matching:
vals <- c("3E1", "2E3", "4e0")

mixedsort(vals) # With scientfic notation
#> [1] "4e0" "3E1" "2E3"
mixedsort(vals, scientific = FALSE) # Without scientfic notation
#> [1] "2E3" "3E1" "4e0"