Deep Learning PyTorch Course, Fully Convolutional Network

A Fully Convolutional Network (FCN) is a neural network architecture that is mainly suitable for the problem of image segmentation, which involves separating specific objects within an image at the pixel level. Traditional Convolutional Neural Networks (CNNs) are primarily used for classification tasks and produce fixed-size outputs. However, FCNs are structured to generate transformed outputs while maintaining visual information, allowing each pixel in the image to carry meaning.

1. Basic Structure of Fully Convolutional Networks

FCNs essentially inherit the architecture of CNNs. However, an important point is that the fully connected layers are removed from the last part of the CNN, and instead, convolutional layers and upsampling layers are used to achieve the desired output size.

The main components of FCNs are as follows:

Convolutional Layer: A layer that extracts features from the input image.
Non-linear Activation Function: Mainly, the ReLU (Rectified Linear Unit) function is used.
Upsampling: Restores downsampled data to the size of the original image.
Skip Connection: Used to integrate while maintaining the characteristics of the original resolution.

2. Implementing FCN with PyTorch

Now, let’s implement FCN using PyTorch. Below is a simple Python code example of an FCN.

import torch
import torch.nn as nn
import torch.nn.functional as F

class FCN(nn.Module):
    def __init__(self, num_classes):
        super(FCN, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
        self.conv4 = nn.Conv2d(256, 256, kernel_size=3, padding=1)
        
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        
        self.upconv1 = nn.ConvTranspose2d(256, 128, kernel_size=2, stride=2)
        self.upconv2 = nn.ConvTranspose2d(128, 64, kernel_size=2, stride=2)
        
        self.final_conv = nn.Conv2d(64, num_classes, kernel_size=1)

    def forward(self, x):
        x1 = F.relu(self.conv1(x))
        x2 = self.pool(x1)
        x2 = F.relu(self.conv2(x2))
        x3 = self.pool(x2)
        x3 = F.relu(self.conv3(x3))
        x4 = self.pool(x3)
        x4 = F.relu(self.conv4(x4))

        x = self.upconv1(x4)
        x = self.upconv2(x)
        x = self.final_conv(x)
        
        return x

2.1 Model Description

In the code above, our FCN model undergoes the following steps:

Takes a 3-channel (typical RGB image) input and passes through the first convolutional layer that generates 64 feature maps.
Moves through the next two convolutional layers, gradually generating more feature maps and reducing the image size by half through max pooling.
Upsamples the image size back to the original size.
The final output passes through a convolutional layer with a number of channels equal to the number of classes.

3. Preparing the Dataset

To train the FCN model, an appropriate dataset is needed. Commonly used datasets for image segmentation include Pascal VOC, COCO, etc., and here we will use a simple example of an image and a mask.

3.1 Generating Example Dataset

import numpy as np
import cv2
import matplotlib.pyplot as plt

def generate_example_data():
    h, w = 128, 128
    image = np.random.randint(0, 255, (h, w, 3), dtype=np.uint8)
    mask = np.zeros((h, w), dtype=np.uint8)
    mask[30:70, 30:70] = 1  # Rectangular object

    return image, mask

image, mask = generate_example_data()

# Visualizing the image and mask
plt.subplot(1, 2, 1)
plt.title('Image')
plt.imshow(image)
plt.subplot(1, 2, 2)
plt.title('Mask')
plt.imshow(mask, cmap='gray')
plt.show()

4. Training the Model

Once the dataset is created, we are ready to train the FCN model. During the training process, we need to set the loss function and optimizer of PyTorch.

import torch.optim as optim

# Initialize the model, loss function, and optimizer
num_classes = 2  # Object and background
model = FCN(num_classes)
criterion = nn.CrossEntropyLoss()  # Cross-entropy loss function
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Dummy training loop
for epoch in range(5):  # Train for 5 epochs
    model.train()
    optimizer.zero_grad()
    
    # Forward pass
    inputs = torch.Tensor(image).permute(2, 0, 1).unsqueeze(0)  # (1, 3, 128, 128)
    targets = torch.Tensor(mask).long().unsqueeze(0)  # (1, 128, 128)
    
    outputs = model(inputs)
    loss = criterion(outputs, targets)
    
    # Backward pass
    loss.backward()
    optimizer.step()
    
    print(f'Epoch [{epoch+1}/5], Loss: {loss.item():.4f}')

5. Conclusion

In this article, we examined the basic concepts of Fully Convolutional Networks (FCN), the implementation process of a simple FCN model using PyTorch, dataset preparation, and training methods. FCNs are highly useful models for image segmentation and can be used in various application fields.

With further research on more advanced FCN models and additional datasets, we can aim for better performance. If you are curious about the applications of FCNs, I recommend exploring more content!