t-distributed stochastic neighbor embedding, t-SNE

Fundamental Concept

  • It is a method of dimensionally reducing complex data in high dimensions to two dimensions.
  • t-SNE is one of manifold learning and aims to visualize complex data.



  • t-SNE steps
    1. The similarity of x_i and x_j for all i and j pairs is expressed as similarity using Gaussian distribution.
    2. We randomly place the same number of points y_i as x_i in a low-dimensional space, and show the similarities of y_i and y_j for all i and j pairs using the t distribution.
    3. If possible, the data point y_i is updated so that the similarity distribution defined in 1 and 2 is the same.
    4. Repeat step 3 until the convergence condition.

Sample Code

from sklearn.manifold import TSNE
from sklearn.datasets import load_digits

data = load_digits()
model = TSNE(n_components=2)
[[-26.89956    55.44969  ]
 [ 17.039724  -16.77795  ]
 [-13.765958  -10.344211 ]
 [  2.8657644  -3.5069797]
 [  1.727609   23.916462 ]
 [ -5.1929913  -0.6161056]]


