K-Means
K-Means
Fundamental Concept
- A method of dividing data of similar characteristics into groups is called clustering.
- The group to which the data points belong is determined as the group whose distance between the data points and the center of each group is closest.
- Determining the center of this group is an important operation of the K-means algorithm.
Algorithm
- K-means algorithmd’s learning process
- Choose and center the appropriate number of data pointers by the number of groups
- Calculates the distance between a data point and each center to determine the nearest center as the group to which that data point belongs
- Average data points for each group and set them as new centers
- Repeat steps 2-3 until convergence
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
data = load_iris()
model = KMeans(n_clusters=3)
model.fit(data.data)
print(model.labels_)
print(model.cluster_centers_)
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 2 0 0 0 0 2 0 0 0 0
0 0 2 2 0 0 0 0 2 0 2 0 2 0 0 2 2 0 0 0 0 0 2 0 0 0 0 2 0 0 0 2 0 0 0 2 0
0 2]
[[6.85 3.07368421 5.74210526 2.07105263]
[5.006 3.428 1.462 0.246 ]
[5.9016129 2.7483871 4.39354839 1.43387097]]
within-cluster sum of squares, WCSS
- The good or bad of clustering is evaluated quantitatively by calculating the WCSS.
- The smaller the distance between the center of the group and the data point belonging to the group, the smaller the WCSS value.
WCSS elbow method
- The number of clusters can be determined using the WCSS elbow-method.
- Determine the number based on the point on the graph that looks like an elbow.
참고문헌
- 秋庭伸也 et al. 머신러닝 도감 : 그림으로 공부하는 머신러닝 알고리즘 17 / 아키바 신야, 스기야마 아세이, 데라다 마나부 [공] 지음 ; 이중민 옮김, 2019.
댓글남기기