Create Ordinal Variables With a Given Precision
Arguments
- y
a numeric, factor, or character vector with no
NAs- precision
number of places to the right of the decimal place to round
yifyis numeric but not integer, for the purpose of finding the distinct values. Real values rounding to the same values underprecisionare mapped to the same integer outputy- ftable
set to
FALSEto suppress creation offreq
Value
a list with the following elements:
y: vector of integer-codedyylevels: vector of corresponding originalyvalues, possibly rounded toprecision. This vector is numeric unlessyisfactoror character, in which case it is a character vector.freq: frequency table of rounded or categoricaly, withnamesattribute for the (possibly rounded)ylevels of the frequenciesmedian: medianyfrom original values if numeric, otherwise median of the new integer codes forywhichmedian: the integer valuedythat most closely corresponds tomedian; for an ordinal regression model this represents one plus the index of the intercept vector corresponding tomedian.
Details
For a factor variable y, uses existing factor levels and codes the output y as integer. For a character y, converts to factor and does the same. For a numeric y that is integer, leaves the levels intact and codes y as consecutive positive integers corresponding to distinct values in the data. For numeric y that contains any non-integer values, rounds y to precision decimal places to the right before finding the distinct values.
This function is used to prepare ordinal variables for orm.fit() and lrm.fit(). It was written because just using factor() creates slightly different distinct y levels on different hardware because factor() uses unique() which functions slightly differently on different systems when there are non-significant digits in floating point numbers. See this for more details.
Examples
w <- function(y, precision=7) {
v <- recode2integer(y, precision);
print(v)
print(table(y, ynew=v$y))
}
set.seed(1)
w(sample(1:3, 20, TRUE))
#> $y
#> [1] 1 2 2 3 1 3 3 2 2 1 1 1 3 2 3 2 3 3 2 3
#>
#> $ylevels
#> [1] 1 2 3
#>
#> $freq
#> 1 2 3
#> 5 7 8
#>
#> $median
#> [1] 2
#>
#> $whichmedian
#> [1] 2
#>
#> ynew
#> y 1 2 3
#> 1 5 0 0
#> 2 0 7 0
#> 3 0 0 8
w(sample(letters[1:3], 20, TRUE))
#> $y
#> [1] 3 1 2 1 1 2 1 2 3 2 2 2 2 1 3 3 3 1 3 2
#>
#> $ylevels
#> [1] "a" "b" "c"
#>
#> $freq
#> a b c
#> 6 8 6
#>
#> $median
#> [1] 2
#>
#> $whichmedian
#> [1] 2
#>
#> ynew
#> y 1 2 3
#> a 6 0 0
#> b 0 8 0
#> c 0 0 6
y <- runif(20)
w(y)
#> $y
#> [1] 19 13 17 12 11 18 1 8 16 15 9 20 7 4 2 3 5 10 14 6
#>
#> $ylevels
#> [1] 0.023331 0.070679 0.099466 0.244797 0.316272 0.406830 0.438097 0.477230
#> [9] 0.477620 0.518634 0.529720 0.553036 0.647060 0.662005 0.692732 0.732314
#> [17] 0.782933 0.789356 0.820946 0.861209
#>
#> $freq
#> 0.0233312 0.070679 0.0994662 0.2447973 0.3162717 0.4068302 0.4380971 0.4772301
#> 1 1 1 1 1 1 1 1
#> 0.4776196 0.5186343 0.5297196 0.5530363 0.6470602 0.6620051 0.6927316 0.7323137
#> 1 1 1 1 1 1 1 1
#> 0.7829328 0.7893562 0.8209463 0.8612095
#> 1 1 1 1
#>
#> $median
#> [1] 0.52418
#>
#> $whichmedian
#> [1] 10
#>
#> ynew
#> y 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
#> 0.023331202333793 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> 0.0706790471449494 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> 0.0994661601725966 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> 0.244797277031466 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> 0.31627170718275 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> 0.406830187188461 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> 0.438097107224166 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
#> 0.477230065036565 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
#> 0.477619622135535 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
#> 0.518634263193235 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
#> 0.529719580197707 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
#> 0.553036311641335 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
#> 0.647060193819925 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
#> 0.662005076417699 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
#> 0.692731556482613 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
#> 0.7323137386702 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
#> 0.78293276228942 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
#> 0.789356231689453 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
#> 0.820946294115856 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
#> 0.8612094768323 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
w(y, precision=2)
#> $y
#> [1] 18 12 16 11 10 17 1 8 15 14 8 19 7 4 2 3 5 9 13 6
#>
#> $ylevels
#> [1] 0.02 0.07 0.10 0.24 0.32 0.41 0.44 0.48 0.52 0.53 0.55 0.65 0.66 0.69 0.73
#> [16] 0.78 0.79 0.82 0.86
#>
#> $freq
#> 0.02 0.07 0.1 0.24 0.32 0.41 0.44 0.48 0.52 0.53 0.55 0.65 0.66 0.69 0.73 0.78
#> 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1
#> 0.79 0.82 0.86
#> 1 1 1
#>
#> $median
#> [1] 0.52418
#>
#> $whichmedian
#> [1] 9
#>
#> ynew
#> y 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
#> 0.023331202333793 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> 0.0706790471449494 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> 0.0994661601725966 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> 0.244797277031466 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> 0.31627170718275 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#> 0.406830187188461 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
#> 0.438097107224166 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
#> 0.477230065036565 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
#> 0.477619622135535 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
#> 0.518634263193235 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
#> 0.529719580197707 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
#> 0.553036311641335 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
#> 0.647060193819925 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
#> 0.662005076417699 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
#> 0.692731556482613 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
#> 0.7323137386702 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
#> 0.78293276228942 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
#> 0.789356231689453 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
#> 0.820946294115856 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
#> 0.8612094768323 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1