Generalized k-Nearest Neighbors Classification or Regression
gknn.Rdgknn is an implementation of the k-nearest neighbours algorithm making use of general distance measures. A formula interface is provided.
Usage
# S3 method for class 'formula'
gknn(formula, data = NULL, ..., subset, na.action = na.pass, scale = TRUE)
# Default S3 method
gknn(x, y, k = 1, method = NULL,
scale = TRUE, use_all = TRUE,
FUN = mean, ...)
# S3 method for class 'gknn'
predict(object, newdata,
type = c("class", "votes", "prob"),
...,
na.action = na.pass)Arguments
- formula
a symbolic description of the model to be fit.
- data
an optional data frame containing the variables in the model. By default the variables are taken from the environment which ‘gknn’ is called from.
- x
a data matrix.
- y
a response vector with one label for each row/component of
x. Can be either a factor (for classification tasks) or a numeric vector (for regression).- k
number of neighbours considered.
- scale
a logical vector indicating the variables to be scaled. If
scaleis of length 1, the value is recycled as many times as needed. By default, numeric matrices are scaled to zero mean and unit variance. The center and scale values are returned and used for later predictions. Note that the default metric for data frames is the Gower metric which standardizes the values to the unit interval.- method
Argument passed to
dist()from theproxypackage to select the distance metric used: a function, or a mnemonic string referencing the distance measure. Defaults to"Euclidean"for metric matrices, to"Jaccard"for logical matrices and to"Gower"for data frames.- use_all
controls handling of ties. If true, all distances equal to the kth largest are included. If false, a random selection of distances equal to the kth is chosen to use exactly k neighbours.
- FUN
function used to aggregate the k nearest target values in case of regression.
- object
object of class
gknn.- newdata
matrix or data frame with new instances.
- type
character specifying the return type in case of class predictions: for
"class", the class labels; for"prob", the class distribution for all k neighbours considered; for"votes", the raw counts.- ...
additional parameters passed to
dist()- subset
An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.)
- na.action
A function to specify the action to be taken if
NAs are found. The default action isna.pass. (NOTE: If given, this argument must be named.)
Value
For gknn(), an object of class "gknn" containing the data and the specified parameters. For predict.gknn(), a vector of predictions, or a matrix with votes for all classes. In case of an overall class tie, the predicted class is chosen by random.
Author
David Meyer (David.Meyer@R-project.org)
Examples
data(iris)
model <- gknn(Species ~ ., data = iris)
predict(model, iris[c(1, 51, 101),])
#> 1 51 101
#> setosa versicolor virginica
#> Levels: setosa versicolor virginica
test = c(45:50, 95:100, 145:150)
model <- gknn(Species ~ ., data = iris[-test,], k = 3, method = "Manhattan")
predict(model, iris[test,], type = "votes")
#> setosa versicolor virginica
#> 45 3 0 0
#> 46 3 0 0
#> 47 3 0 0
#> 48 3 0 0
#> 49 3 0 0
#> 50 3 0 0
#> 95 0 3 0
#> 96 0 3 0
#> 97 0 3 0
#> 98 0 3 0
#> 99 0 3 0
#> 100 0 3 0
#> 145 0 0 3
#> 146 0 0 3
#> 147 0 1 2
#> 148 0 0 3
#> 149 0 0 3
#> 150 0 1 2
model <- gknn(Species ~ ., data = iris[-test], k = 3, method = "Manhattan")
predict(model, iris[test,], type = "prob")
#> setosa versicolor virginica
#> 45 1 0.0000000 0.0000000
#> 46 1 0.0000000 0.0000000
#> 47 1 0.0000000 0.0000000
#> 48 1 0.0000000 0.0000000
#> 49 1 0.0000000 0.0000000
#> 50 1 0.0000000 0.0000000
#> 95 0 1.0000000 0.0000000
#> 96 0 1.0000000 0.0000000
#> 97 0 1.0000000 0.0000000
#> 98 0 1.0000000 0.0000000
#> 99 0 1.0000000 0.0000000
#> 100 0 1.0000000 0.0000000
#> 145 0 0.0000000 1.0000000
#> 146 0 0.0000000 1.0000000
#> 147 0 0.3333333 0.6666667
#> 148 0 0.0000000 1.0000000
#> 149 0 0.0000000 1.0000000
#> 150 0 0.0000000 1.0000000