Deep learning has established itself as a dominant methodology in the fields of artificial intelligence and machine learning in recent years. Today, we will take a look at Convolutional Neural Networks (CNNs). CNNs are particularly effective for image recognition and processing, and they are widely used across various industries.
What is a Convolutional Neural Network?
A Convolutional Neural Network is a type of neural network specialized in recognizing visual patterns in given data, such as photos or videos. CNNs are fundamentally composed of convolutional layers, pooling layers, and fully connected layers.
Convolutional Layer
The convolutional layer is responsible for extracting features from the input data. This layer uses small filters (kernels) to perform operations on specific parts of the input image to generate output. The resulting feature map contains only the useful information from the input data.
Pooling Layer
The pooling layer is used to reduce the size of the feature map. This helps to reduce model complexity and computational load, preventing overfitting. The most common method is max pooling, which reduces the size of the feature map by selecting the largest value from a given area.
Fully Connected Layer
At the end of the neural network, there is a fully connected layer. This layer makes the final predictions based on the information obtained from the previous layers. Since all neurons are connected to the previous layer, it can make complex decisions regarding the input data.
Implementing CNN with PyTorch
Now, let’s implement a simple CNN model using PyTorch. We will create a model to classify handwritten digits using the MNIST dataset.
Preparation
First, we will install the necessary libraries and download the dataset. The following libraries are required:
pip install torch torchvision
Preparing the Dataset
We will download and load the MNIST dataset. You can use the code below to prepare the training and testing datasets.
import torch
import torchvision
import torchvision.transforms as transforms
# Define data transformations
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))]) # Normalization using mean and standard deviation
# Training dataset
trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
# Testing dataset
testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False)
Defining the Model
Now, let’s define the convolutional neural network model. CNN models are typically designed with a structure that combines convolutional layers and pooling layers.
import torch.nn as nn
import torch.nn.functional as F
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Conv2d(1, 32, kernel_size=3) # Input channels 1, output channels 32
self.pool = nn.MaxPool2d(kernel_size=2, stride=2) # Max pooling
self.conv2 = nn.Conv2d(32, 64, kernel_size=3) # Input channels 32, output channels 64
self.fc1 = nn.Linear(64 * 6 * 6, 128) # Fully connected layer, 64x6x6 is the output size
self.fc2 = nn.Linear(128, 10) # Final output 10 classes (0-9)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x))) # First convolution and pooling
x = self.pool(F.relu(self.conv2(x))) # Second convolution and pooling
x = x.view(-1, 64 * 6 * 6) # Change tensor shape
x = F.relu(self.fc1(x)) # First fully connected layer
x = self.fc2(x) # Second fully connected layer
return x
Training the Model
To train the model, we need to define a loss function and an optimizer, and iteratively train on the data.
# Initialize the model
cnn = CNN()
criterion = nn.CrossEntropyLoss() # Loss function
optimizer = torch.optim.SGD(cnn.parameters(), lr=0.01) # Stochastic Gradient Descent
# Model training
for epoch in range(5): # Number of epochs
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
inputs, labels = data
optimizer.zero_grad() # Zero the gradients
outputs = cnn(inputs) # Prediction
loss = criterion(outputs, labels) # Loss calculation
loss.backward() # Gradient calculation
optimizer.step() # Parameter update
running_loss += loss.item()
print(f'Epoch {epoch + 1}, Loss: {running_loss / len(trainloader)}')
Evaluating the Model
After training is complete, we evaluate the model’s performance using the test dataset.
correct = 0
total = 0
with torch.no_grad(): # Disable gradient calculation
for data in testloader:
images, labels = data
outputs = cnn(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(f'Accuracy on the test set: {100 * correct / total} %')
Conclusion
We have now experienced the process of constructing a simple convolutional neural network using PyTorch and training and evaluating it on a real dataset. We hope this tutorial has helped you understand the basic structure of deep learning and practical implementation using Python. Challenge yourself to tackle more complex models and diverse datasets in the future!