The Self-Organizing Map (SOM) is an unsupervised learning algorithm used for nonlinear dimensionality reduction and data clustering. In this lecture, we will explain the basic concepts of SOM, how it works, and how to implement it using Pytorch.
What is a Self-Organizing Map (SOM)?
The Self-Organizing Map is a neural network originally developed by Teuvo Kohonen. SOM is used to map high-dimensional data into a lower-dimensional space (usually a 2D grid). In this process, data is organized into a map consisting of neighboring nodes that have similar characteristics.
Main Features of SOM
- Unsupervised Learning: It can handle unlabeled data.
- Dimensionality Reduction: Reduces high-dimensional data to lower dimensions while preserving important features of the data.
- Clustering: Similar data points are grouped in the same region.
How SOM Works
SOM learns by calculating the distance between the input vector and the node vectors. Here are the typical learning steps of SOM:
1. Initialization
All nodes are initialized randomly. Each node has a weight vector with the same dimension as the input data.
2. Input Data Selection
Randomly select a training sample. Each sample becomes an input to the SOM.
3. Finding the Nearest Node
Find the node that is most similar to the selected input data. This node is called the Best Matching Unit (BMU).
4. Weight Update
Update the weights of the BMU and its neighboring nodes to move closer to the input data. The process is as follows:
w_{i}(t+1) = w_{i}(t) + α(t) * h_{i,j}(t) * (x(t) - w_{i}(t))
Where:
w_{i}
: Weight vector of the nodeα(t)
: Learning rateh_{i,j}(t)
: Neighbor function of nodei
regarding the BMUx(t)
: Input vector
5. Iteration
Repeat steps 2-4 for a sufficient number of epochs to gradually update the weights.
Implementing SOM with Pytorch
Now let’s implement SOM using Pytorch. Here we will show you how to build and visualize a basic SOM.
Installing Required Libraries
First, install the required libraries.
!pip install torch numpy matplotlib
Defining the Model Class
Next, we define the SOM class. This class includes functions for weight initialization, finding the BMU, and updating weights.
import numpy as np
import torch
class SelfOrganizingMap:
def __init__(self, m, n, input_dim, learning_rate=0.5, sigma=None):
self.m = m # grid rows
self.n = n # grid columns
self.input_dim = input_dim
self.learning_rate = learning_rate
self.sigma = sigma if sigma else max(m, n) / 2
# Initialize weight vectors
self.weights = torch.rand(m, n, input_dim)
def find_bmu(self, x):
distances = torch.sqrt(torch.sum((self.weights - x) ** 2, dim=2))
bmu_index = torch.argmin(distances)
return bmu_index // self.n, bmu_index % self.n # return row, column
def update_weights(self, x, bmu, iteration):
learning_rate = self.learning_rate * np.exp(-iteration / 100)
sigma = self.sigma * np.exp(-iteration / 100)
for i in range(self.m):
for j in range(self.n):
h = self.neighbourhood(bmu, (i, j), sigma)
self.weights[i, j] += learning_rate * h * (x - self.weights[i, j])
def neighbourhood(self, bmu, point, sigma):
distance = np.sqrt((bmu[0] - point[0]) ** 2 + (bmu[1] - point[1]) ** 2)
return np.exp(-distance ** 2 / (2 * sigma ** 2))
def train(self, data, num_iterations):
for i in range(num_iterations):
for x in data:
bmu = self.find_bmu(x)
self.update_weights(x, bmu, i)
Preparing Data and Training the Model
We will prepare appropriate data and train the SOM model. Here we will use randomly generated data.
# Generate random data
data = torch.rand(200, 3) # 200 samples, 3 dimensions
# Create and train SOM
som = SelfOrganizingMap(10, 10, 3)
som.train(data, 100)
Visualizing the Results
We will visualize the weights of the trained SOM to check the distribution of the data.
import matplotlib.pyplot as plt
def plot_som(som):
plt.figure(figsize=(8, 8))
for i in range(som.m):
for j in range(som.n):
plt.scatter(som.weights[i, j, 0].item(), som.weights[i, j, 1].item(), c='blue')
plt.title('Self Organizing Map')
plt.xlabel('Dimension 1')
plt.ylabel('Dimension 2')
plt.show()
plot_som(som)
Conclusion
In this lecture, we explored the basic principles of Self-Organizing Maps (SOM) and how to implement SOM using Pytorch. SOM is an effective unsupervised learning technique that is useful for identifying patterns in data and performing clustering. In the future, we can experiment with SOM’s application on more complex datasets or apply optimization techniques to enhance learning performance.
I hope this article has helped you explore the world of deep learning! If you have any questions or feedback, please leave a comment.