Deep learning is a field of machine learning that learns patterns from data through multiple layers of neurons. Various deep learning models exist, among which Convolutional Neural Networks (CNN) are architectures particularly suitable for image processing. In this course, we will understand the necessity and working principles of convolutional layers and implement them using PyTorch.
1. Concept of Convolutional Layer
A convolutional layer is designed to extract features from images, operating differently from a typical fully connected layer. The convolutional layer uses parameters called kernels or filters on the input image to learn local features of the image. In this process, it analyzes local regions and performs convolution with the filter.
1.1. Convolution Operation
The convolution operation is the process of sliding the kernel over the input image to extract local features. Specifically, when the kernel is positioned at a particular area of the image, it multiplies the pixel values of that area by the values of the kernel and sums the results to create a new pixel value.
1.2. Pooling Layer
After the convolution operation, the pooling layer is used to reduce dimensions and computational complexity while maintaining robust features against noise. Generally, maximum pooling or average pooling is used. Pooling emphasizes specific features of the image and further strengthens position invariance.
2. Necessity of Convolutional Layers
2.1. Reduction in Number of Parameters
In fully connected layers, every input node is connected to every output node, resulting in a rapid increase in the number of parameters. In contrast, convolutional layers only require parameters equal to the size of the kernel (e.g., 3×3), allowing for effective feature extraction with significantly fewer parameters compared to fully connected layers.
2.2. Extraction of Local Features
Images primarily possess local information. For example, if a particular local area of the image contains a characteristic object, it is crucial to extract the features of that area accurately. Convolutional layers learn such local patterns well, enabling precise predictions.
2.3. Position Invariance
The features learned through convolutional and pooling layers are independent of their location within the image. In other words, regardless of where an object is located in the image, the features can be recognized effectively. This becomes a significant advantage in tasks such as image classification.
2.4. Diverse Application Fields
Convolutional layers can be applied across various fields such as image classification, object detection, image generation, and even natural language processing. Despite the rapid advancement of artificial intelligence, the fundamental structure of CNNs remains a core component in many modern models.
3. Implementing Convolutional Layers in PyTorch
Now, let’s implement a simple CNN using PyTorch. Below is an example of a CNN model that includes basic convolutional layers, pooling layers, and fully connected layers.
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torch.optim as optim
# Define CNN model
class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, stride=1, padding=1)  # Convolutional Layer
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)  # Pooling Layer
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1)
        self.fc1 = nn.Linear(64 * 7 * 7, 128)  # Fully Connected Layer
        self.fc2 = nn.Linear(128, 10)  # Output Layer
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))  # First Convolution + Pooling
        x = self.pool(F.relu(self.conv2(x)))  # Second Convolution + Pooling
        x = x.view(-1, 64 * 7 * 7)  # Flatten
        x = F.relu(self.fc1(x))  # First Fully Connected Layer
        x = self.fc2(x)  # Output Layer
        return x
# Load and preprocess dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))  # Normalization
])
trainset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
# Define model, loss function, and optimization algorithm
model = SimpleCNN()
criterion = nn.CrossEntropyLoss()  # Loss function
optimizer = optim.Adam(model.parameters(), lr=0.001)  # Optimization Algorithm
# Training loop
for epoch in range(10):  # 10 epochs
    for inputs, labels in trainloader:
        optimizer.zero_grad()  # Initialize gradient
        outputs = model(inputs)  # Forward pass
        loss = criterion(outputs, labels)  # Calculate loss
        loss.backward()  # Backward pass
        optimizer.step()  # Update weights
    print(f'Epoch [{epoch+1}/10], Loss: {loss.item():.4f}')
print('Training completed!')3.1. Code Explanation
In the above code, we defined the SimpleCNN class and designed a CNN model composed of two convolutional layers and two fully connected layers. The convolutional layer is defined using torch.nn.Conv2d and the pooling layer is set up through torch.nn.MaxPool2d. The training process used the MNIST dataset and trained the model over 10 epochs.
4. Conclusion
Convolutional layers play a crucial role in effectively extracting significant features from image data. Understanding the structure and operating principles of basic convolutional neural networks is important in the field of deep learning. In this article, we explored the necessity of convolutional layers, their functions, and a simple implementation example using PyTorch. We hope to explore more complex CNN architectures and various application fields in the future.
5. References
The materials referenced in this course are as follows: