1 분 소요

t-distributed stochastic neighbor embedding, t-SNE

Fundamental Concept

  • It is a method of dimensionally reducing complex data in high dimensions to two dimensions.
  • t-SNE is one of manifold learning and aims to visualize complex data.

img

Algorithm

  • t-SNE steps
    1. The similarity of x_i and x_j for all i and j pairs is expressed as similarity using Gaussian distribution.
    2. We randomly place the same number of points y_i as x_i in a low-dimensional space, and show the similarities of y_i and y_j for all i and j pairs using the t distribution.
    3. If possible, the data point y_i is updated so that the similarity distribution defined in 1 and 2 is the same.
    4. Repeat step 3 until the convergence condition.

Sample Code

from sklearn.manifold import TSNE
from sklearn.datasets import load_digits

data = load_digits()
model = TSNE(n_components=2)
print(model.fit_transform(data.data))
C:\Users\bl4an\anaconda3\lib\site-packages\sklearn\manifold\_t_sne.py:780: FutureWarning: The default initialization in TSNE will change from 'random' to 'pca' in 1.2.
  warnings.warn(
C:\Users\bl4an\anaconda3\lib\site-packages\sklearn\manifold\_t_sne.py:790: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2.
  warnings.warn(


[[-26.89956    55.44969  ]
 [ 17.039724  -16.77795  ]
 [-13.765958  -10.344211 ]
 ...
 [  2.8657644  -3.5069797]
 [  1.727609   23.916462 ]
 [ -5.1929913  -0.6161056]]

참고문헌

  • 秋庭伸也 et al. 머신러닝 도감 : 그림으로 공부하는 머신러닝 알고리즘 17 / 아키바 신야, 스기야마 아세이, 데라다 마나부 [공] 지음 ; 이중민 옮김, 2019.

댓글남기기