blom.RdNormal scores transformation (Inverse normal transformation) by Elfving, Blom, van der Waerden, Tukey, and rankit methods, as well as z score transformation (standardization) and scaling to a range (normalization).
blom(
x,
method = "general",
alpha = pi/8,
complete = FALSE,
na.last = "keep",
na.rm = TRUE,
adjustN = TRUE,
min = 1,
max = 10,
...
)A vector of numeric values.
Any one "general" (the default),
"blom", vdw,
"tukey", "elfving",
"rankit",
zscore, or scale.
A value used in the "general" method.
If alpha=pi/8 (the default), the "general" method reduces
to the "elfving" method.
If alpha=3/8, the "general" method reduces
to the "blom" method.
If alpha=1/2, the "general" method reduces to the
"rankit" method.
If alpha=1/3, the "general" method reduces to the
"tukey" method.
If alpha=0, the "general" method reduces to the
"vdw" method.
If TRUE, NA values are removed
before transformation. The default is FALSE.
Passed to rank in the normal scores methods.
See the documentation for the
rank function.
The default is "keep".
Used in the "zscore" and "scale" methods.
Passed to mean, min, and max
functions in those methods.
The default is TRUE.
If TRUE, the default, the normal scores methods
use only non-NA values to determine the sample size,
N. This seems to work well under default conditions
where NA values are retained, even if there are
a high percentage of NA values.
For the "scale" method, the minimum value of the
transformed values.
For the "scale" method, the maximum value of the
transformed values.
additional arguments passed to rank.
A vector of numeric values.
By default, NA values are retained in the output.
This behavior can be changed with the na.rm argument
for "zscore" and "scale" methods, or
with na.last for the normal scores methods.
Or NA values can be removed from the input with
complete=TRUE.
For normal scores methods, if there are NA values
or tied values,
it is helpful to look up
the documentation for rank.
In general, for normal scores methods, either of the arguments
method or alpha can be used.
With the current algorithms, there is no need to use both.
Normal scores transformation will return a normal distribution with a mean of 0 and a standard deviation of 1.
The "scale" method coverts values to the range specified
in max and min without transforming the distribution
of values. By default, the "scale" method converts values
to a 1 to 10 range.
Using the "scale" method with
min = 0 and max = 1 is
sometimes called "normalization".
The "zscore" method converts values by the usual method
for z scores: (x - mean(x)) / sd(x). The transformed
values with have a mean of 0 and a standard deviation of
1 but won't be coerced into a normal distribution.
Sometimes this method is called "standardization".
It's possible that Gustav Elfving didn't recommend the
formula used in this function for the Elfving method.
I would like thank Terence Cooke
at the University of Exeter for their
diligence at trying to track down a reference for this formula.
Conover, 1995, Practical Nonparametric Statistics, 3rd.
Solomon & Sawilowsky, 2009, Impact of rank-based normalizing transformations on the accuracy of test scores.
Beasley and Erickson, 2009, Rank-based inverse normal transformations are increasingly used, but are they merited?
set.seed(12345)
A = rlnorm(100)
if (FALSE) hist(A) # \dontrun{}
### Convert data to normal scores by Elfving method
B = blom(A)
if (FALSE) hist(B) # \dontrun{}
### Convert data to z scores
C = blom(A, method="zscore")
if (FALSE) hist(C) # \dontrun{}
### Convert data to a scale of 1 to 10
D = blom(A, method="scale")
if (FALSE) hist(D) # \dontrun{}
### Data from Sokal and Rohlf, 1995,
### Biometry: The Principles and Practice of Statistics
### in Biological Research
Value = c(709,679,699,657,594,677,592,538,476,508,505,539)
Sex = c(rep("Male",3), rep("Female",3), rep("Male",3), rep("Female",3))
Fat = c(rep("Fresh", 6), rep("Rancid", 6))
ValueBlom = blom(Value)
Sokal = data.frame(ValueBlom, Sex, Fat)
model = lm(ValueBlom ~ Sex * Fat, data=Sokal)
anova(model)
#> Analysis of Variance Table
#>
#> Response: ValueBlom
#> Df Sum Sq Mean Sq F value Pr(>F)
#> Sex 1 0.5399 0.5399 2.0932 0.1859728
#> Fat 1 6.7936 6.7936 26.3374 0.0008939 ***
#> Sex:Fat 1 0.5938 0.5938 2.3022 0.1676690
#> Residuals 8 2.0636 0.2579
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
if (FALSE) { # \dontrun{
hist(residuals(model))
plot(predict(model), residuals(model))
} # }