Deep Learning PyTorch Course, Introduction to Convolutional Neural Networks

Deep learning has established itself as a dominant methodology in the fields of artificial intelligence and machine learning in recent years. Today, we will take a look at Convolutional Neural Networks (CNNs). CNNs are particularly effective for image recognition and processing, and they are widely used across various industries.

What is a Convolutional Neural Network?

A Convolutional Neural Network is a type of neural network specialized in recognizing visual patterns in given data, such as photos or videos. CNNs are fundamentally composed of convolutional layers, pooling layers, and fully connected layers.

Convolutional Layer

The convolutional layer is responsible for extracting features from the input data. This layer uses small filters (kernels) to perform operations on specific parts of the input image to generate output. The resulting feature map contains only the useful information from the input data.

Pooling Layer

The pooling layer is used to reduce the size of the feature map. This helps to reduce model complexity and computational load, preventing overfitting. The most common method is max pooling, which reduces the size of the feature map by selecting the largest value from a given area.

Fully Connected Layer

At the end of the neural network, there is a fully connected layer. This layer makes the final predictions based on the information obtained from the previous layers. Since all neurons are connected to the previous layer, it can make complex decisions regarding the input data.

Implementing CNN with PyTorch

Now, let’s implement a simple CNN model using PyTorch. We will create a model to classify handwritten digits using the MNIST dataset.

Preparation

First, we will install the necessary libraries and download the dataset. The following libraries are required:

pip install torch torchvision

Preparing the Dataset

We will download and load the MNIST dataset. You can use the code below to prepare the training and testing datasets.


import torch
import torchvision
import torchvision.transforms as transforms

# Define data transformations
transform = transforms.Compose(
    [transforms.ToTensor(), 
     transforms.Normalize((0.5,), (0.5,))])  # Normalization using mean and standard deviation

# Training dataset
trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

# Testing dataset
testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False)

Defining the Model

Now, let’s define the convolutional neural network model. CNN models are typically designed with a structure that combines convolutional layers and pooling layers.


import torch.nn as nn
import torch.nn.functional as F

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3)  # Input channels 1, output channels 32
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)  # Max pooling
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3)  # Input channels 32, output channels 64
        self.fc1 = nn.Linear(64 * 6 * 6, 128)  # Fully connected layer, 64x6x6 is the output size
        self.fc2 = nn.Linear(128, 10)  # Final output 10 classes (0-9)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))  # First convolution and pooling
        x = self.pool(F.relu(self.conv2(x)))  # Second convolution and pooling
        x = x.view(-1, 64 * 6 * 6)  # Change tensor shape
        x = F.relu(self.fc1(x))  # First fully connected layer
        x = self.fc2(x)  # Second fully connected layer
        return x

Training the Model

To train the model, we need to define a loss function and an optimizer, and iteratively train on the data.


# Initialize the model
cnn = CNN()
criterion = nn.CrossEntropyLoss()  # Loss function
optimizer = torch.optim.SGD(cnn.parameters(), lr=0.01)  # Stochastic Gradient Descent

# Model training
for epoch in range(5):  # Number of epochs
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        optimizer.zero_grad()  # Zero the gradients
        outputs = cnn(inputs)  # Prediction
        loss = criterion(outputs, labels)  # Loss calculation
        loss.backward()  # Gradient calculation
        optimizer.step()  # Parameter update
        running_loss += loss.item()
    print(f'Epoch {epoch + 1}, Loss: {running_loss / len(trainloader)}')

Evaluating the Model

After training is complete, we evaluate the model’s performance using the test dataset.


correct = 0
total = 0

with torch.no_grad():  # Disable gradient calculation
    for data in testloader:
        images, labels = data
        outputs = cnn(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy on the test set: {100 * correct / total} %')

Conclusion

We have now experienced the process of constructing a simple convolutional neural network using PyTorch and training and evaluating it on a real dataset. We hope this tutorial has helped you understand the basic structure of deep learning and practical implementation using Python. Challenge yourself to tackle more complex models and diverse datasets in the future!

References