1 분 소요

kNN (k-Nearest Negihbors algorithm)

Fundamental Concept

img

  • kNN can be applied to both classification and regression problems.
  • When classifying unknown data, the distance to the learning data is calculated to determine how the data is classified up to k points nearby, and then the data is classified by majority vote.2.

Algorithm

  1. calculating the distance of learning data from input data
  2. Search for k learning data close to input data
  3. Categorize input data that is not known by majority vote in the label of learning data.

Sample Code

from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_moons(noise = 0.3)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)

model = KNeighborsClassifier(n_neighbors = 5)

model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy_score(y_pred, y_test)
0.8666666666666667

difference in decision boundary according to k value

  • The smaller the k value, the more likely over-fitting occurs.
  • Conversely, if the value of k is too large, the decision boundary cannot be accurately learned.

points to note

  • The kNN is basically not suitable for large amounts of data processing.
  • High-level data is difficult to learn. This is because asymptotic assumptions of the kNN algorithm are not necessarily established in high-dimensional data.

참고문헌

  • 秋庭伸也 et al. 머신러닝 도감 : 그림으로 공부하는 머신러닝 알고리즘 17 / 아키바 신야, 스기야마 아세이, 데라다 마나부 [공] 지음 ; 이중민 옮김, 2019.

댓글남기기