Deep Learning PyTorch Course, LeNet-5

Deep learning has gained tremendous popularity in various fields of data science in recent years. It has become a very useful tool for solving problems in diverse domains. In this course, we will take a closer look at one of the well-known deep learning architectures, LeNet-5.

What is LeNet-5?

LeNet-5 is a convolutional neural network (CNN) architecture developed by researchers including Yann LeCun in 1998. It is a useful model for recognizing images, primarily used for handwritten digit recognition. This model follows the basic structure of CNN and consists of several layers. LeNet-5 is composed of the following layers:

Input Layer: Grayscale image of 32×32 pixels.
Convolution Layer (C1): Generates a feature map of size 28×28 using 6 filters (5×5).
Pooling Layer (S2): Generates 6 feature maps of size 14×14 through average pooling.
Convolution Layer (C3): Uses 16 filters to generate a feature map of size 10×10.
Pooling Layer (S4): Generates 16 feature maps of size 5×5 through average pooling.
Convolution Layer (C5): Generates the final feature map using 120 filters (5×5).
Fully Connected Layer (F6): Outputs the final result with 84 neurons.
Output Layer: Classifies into 10 classes (0-9).

The Importance of LeNet-5

LeNet-5 is one of the foundational architectures of CNN, forming the basis for many deep networks. This model has brought many innovations to the field of image recognition, and various modified models still exist today. Thanks to the simplicity and efficiency of LeNet-5, it performs well on many datasets.

Implementing LeNet-5

Now, let’s implement LeNet-5 using PyTorch. PyTorch is a user-friendly deep learning framework widely used in various research and industry applications. Additionally, PyTorch has the advantage of using dynamic computation graphs.

Environment Setup

First, we need to install the necessary libraries and set up the environment. Use the following code to install PyTorch and torchvision:

pip install torch torchvision

Implementing LeNet-5 Model

Now let’s implement the structure of LeNet-5:

import torch
import torch.nn as nn
import torch.nn.functional as F

class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, kernel_size=5)
        self.conv2 = nn.Conv2d(6, 16, kernel_size=5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.avg_pool2d(x, kernel_size=2, stride=2)
        x = F.relu(self.conv2(x))
        x = F.avg_pool2d(x, kernel_size=2, stride=2)
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Preparing Dataset for Model Training

LeNet-5 will be trained using the MNIST dataset. You can easily download and load the data using torchvision. Use the following code to prepare the MNIST dataset:

from torchvision import datasets, transforms

transform = transforms.Compose([
    transforms.Grayscale(num_output_channels=1),
    transforms.Resize((32, 32)),
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)

Model Training

To train the model, we need to set up a loss function and an optimization algorithm. Here, we will use Cross Entropy Loss and the Adam optimizer:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = LeNet5().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

num_epochs = 5
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i + 1) % 100 == 0:
            print(f'Epoch [{epoch + 1}/{num_epochs}], Step [{i + 1}/{len(train_loader)}], Loss: {loss.item():.4f}')

Model Evaluation

After training is completed, you can evaluate the model’s performance. We will check the accuracy using the test dataset:

model.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print(f'Accuracy of the model on the test images: {100 * correct / total:.2f}%')

Conclusion

In this course, we examined the process of implementing and training the LeNet-5 architecture using PyTorch. LeNet-5 is a good example for understanding and practicing the fundamentals of CNN. Based on this model, more complex network architectures or various applications can be developed. As the next step, we recommend exploring deeper network structures or datasets.