Deep Learning PyTorch Course, Convolutional & Deconvolutional Networks

Deep learning technology has achieved innovative results in computer vision, natural language processing, and various fields. In this course, we will take an in-depth look at Convolutional Neural Networks (CNN) and Deconvolutional Neural Networks (or Transpose Convolutional Networks) using PyTorch.

1. Introduction to Convolutional Neural Networks (CNN)

Convolutional Neural Networks (CNN) are a deep learning technology that demonstrates superior performance primarily in image recognition and processing. CNNs use specialized layers known as convolutional layers to process input images. These layers extract features by leveraging the spatial structure of the images.

1.1 How Convolutional Layers Work

Convolutional layers perform convolution operations with filters (or kernels) over the input image. Filters are small matrices that detect specific features in images, and multiple filters are used to extract various features. Typically, filters are updated during the learning process.

1.2 Convolution Operations

The convolution operation is performed by sliding the filter over the input image. It can be expressed by the following formula:

Convolution Operation

Here, \(Y\) is the output, \(X\) is the input image, \(K\) is the filter, and \(M\) and \(N\) are the dimensions of the filter.

1.3 Activation Functions

After the convolution operation, an activation function is applied to introduce non-linearity. The ReLU (Rectified Linear Unit) function is primarily used:

ReLU Function

2. Implementing CNN in PyTorch

Now, let’s explore how to implement a CNN using PyTorch. Below is an example of a basic CNN structure.

2.1 Preparing the Dataset

We will use the MNIST dataset. MNIST is a dataset consisting of handwritten digit images, which is suitable for testing basic image processing models.


import torch
import torchvision
import torchvision.transforms as transforms

# Data preprocessing
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5,), (0.5,))])

# Download MNIST dataset
trainset = torchvision.datasets.MNIST(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64,
                                          shuffle=True)
testset = torchvision.datasets.MNIST(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64,
                                         shuffle=False)
    

2.2 Defining the CNN Model

The code for defining the CNN structure is as follows. It includes convolutional layers, fully connected layers, and activation functions.


import torch.nn as nn
import torch.nn.functional as F

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1)  # First convolution layer
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)  # Max pooling layer
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)  # Second convolution layer
        self.fc1 = nn.Linear(64 * 7 * 7, 128)  # First fully connected layer
        self.fc2 = nn.Linear(128, 10)  # Second fully connected layer

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))  # Convolution -> Activation -> Pooling
        x = self.pool(F.relu(self.conv2(x)))  # Convolution -> Activation -> Pooling
        x = x.view(-1, 64 * 7 * 7)  # Reshape tensor
        x = F.relu(self.fc1(x))  # Fully connected -> Activation
        x = self.fc2(x)  # Output layer
        return x
    

2.3 Training the Model

To train the model, we will define the loss function and optimizer.


import torch.optim as optim

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = CNN().to(device)
criterion = nn.CrossEntropyLoss()  # Loss function
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)  # SGD optimizer

# Training the model
for epoch in range(10):  # 10 epochs
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data[0].to(device), data[1].to(device)
        
        # Zero the gradients
        optimizer.zero_grad()
        
        # Forward pass + backward pass + optimization
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        if i % 100 == 99:    # Print every 100 batches
            print(f'[Epoch {epoch + 1}, Batch {i + 1}] loss: {running_loss / 100:.3f}')
            running_loss = 0.0
    print("Epoch finished")
    

2.4 Evaluating the Model

We will evaluate the trained model and measure its accuracy.


correct = 0
total = 0

with torch.no_grad():
    for data in testloader:
        images, labels = data[0].to(device), data[1].to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the 10000 test images: {100 * correct / total:.2f}%')
    

3. Introduction to Deconvolutional Neural Networks

Deconvolutional Neural Networks, or Transpose Convolutional Networks, are structures that reconstruct images after feature extraction from Convolutional Neural Networks (CNN). They are mainly used in image generation tasks, especially in fields like Generative Adversarial Networks (GANs).

3.1 How Deconvolutional Layers Work

Deconvolutional layers perform the inverse of the standard convolution functions in CNNs. They are used to convert low-resolution images into higher resolution images. Such layers are also known as “Transpose Convolution” or “Deconvolution”. This involves applying spatial linear transformations of the filters.

3.2 Example of Deconvolution

Let’s look at an example of implementing a Deconvolutional Neural Network in PyTorch.


class DeconvNetwork(nn.Module):
    def __init__(self):
        super(DeconvNetwork, self).__init__()
        self.deconv1 = nn.ConvTranspose2d(64, 32, kernel_size=3, stride=2, padding=1)  # First deconvolution layer
        self.deconv2 = nn.ConvTranspose2d(32, 1, kernel_size=3, stride=2, padding=1)  # Second deconvolution layer

    def forward(self, x):
        x = F.relu(self.deconv1(x))  # Activation
        x = torch.sigmoid(self.deconv2(x))  # Output layer
        return x
    

3.3 Image Reconstruction via Deconvolution Networks

We can check the basic structure of image reconstruction using the model we have defined. This can be applied to solutions like GANs or Autoencoders.


deconv_model = DeconvNetwork().to(device)

# Adding an image to the array
image = torch.randn(1, 64, 7, 7).to(device)  # Random tensor
reconstructed_image = deconv_model(image)
print(reconstructed_image.shape)  # It can reconstruct to (1, 1, 28, 28)
    

4. Conclusion

In this course, we learned about two core technologies of deep learning: Convolutional Neural Networks (CNN) and Deconvolutional Neural Networks. We explained how to build and train a CNN structure using the PyTorch framework, alongside the basic operation principles of Deconvolutional Networks. These technologies are foundational to many state-of-the-art deep learning models and continue to evolve.

We hope this aids your deep learning journey, and may you continue to develop your models through deeper research and exploration!