1 분 소요

K-Means

Fundamental Concept

  • A method of dividing data of similar characteristics into groups is called clustering.
  • The group to which the data points belong is determined as the group whose distance between the data points and the center of each group is closest.
  • Determining the center of this group is an important operation of the K-means algorithm. img

Algorithm

  • K-means algorithmd’s learning process
    1. Choose and center the appropriate number of data pointers by the number of groups
    2. Calculates the distance between a data point and each center to determine the nearest center as the group to which that data point belongs
    3. Average data points for each group and set them as new centers
    4. Repeat steps 2-3 until convergence

img

from sklearn.cluster import KMeans
from sklearn.datasets import load_iris

data = load_iris()

model = KMeans(n_clusters=3)
model.fit(data.data)

print(model.labels_)
print(model.cluster_centers_)
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 2 0 0 0 0 2 0 0 0 0
 0 0 2 2 0 0 0 0 2 0 2 0 2 0 0 2 2 0 0 0 0 0 2 0 0 0 0 2 0 0 0 2 0 0 0 2 0
 0 2]
[[6.85       3.07368421 5.74210526 2.07105263]
 [5.006      3.428      1.462      0.246     ]
 [5.9016129  2.7483871  4.39354839 1.43387097]]

within-cluster sum of squares, WCSS

  • The good or bad of clustering is evaluated quantitatively by calculating the WCSS.
  • The smaller the distance between the center of the group and the data point belonging to the group, the smaller the WCSS value.

img

WCSS elbow method

img

  • The number of clusters can be determined using the WCSS elbow-method.
  • Determine the number based on the point on the graph that looks like an elbow.

참고문헌

  • 秋庭伸也 et al. 머신러닝 도감 : 그림으로 공부하는 머신러닝 알고리즘 17 / 아키바 신야, 스기야마 아세이, 데라다 마나부 [공] 지음 ; 이중민 옮김, 2019.

댓글남기기