Deep Learning PyTorch Course, Explainable CNN

1. Introduction: The Development of Deep Learning and CNNs

Deep learning is a field of artificial intelligence (AI) that has the ability to learn patterns and make predictions from large amounts of data. Among these, Convolutional Neural Networks (CNNs) have established themselves as a powerful tool for image processing. CNNs effectively extract patterns from low-dimensional data and have a structure capable of learning high-dimensional features. However, understanding the internal workings of CNNs can be challenging, making explainability a topic of great interest for many researchers today.

2. The Necessity of Explainable Deep Learning

Deep learning models, especially those with complex structures like CNNs, are often perceived as ‘black boxes’. This means it is difficult to understand how the model makes decisions. Therefore, developing explainable CNN models has become increasingly important. This helps users to understand the predictions made by the model and contributes to enhancing the model’s reliability.

3. Implementing CNN with PyTorch

First, let’s go through the basic setup required to implement a CNN. PyTorch is a powerful machine learning library that helps us build our CNN easily. We will start by installing the necessary libraries and preparing the data.

3.1 Installing PyTorch

pip install torch torchvision

3.2 Preparing the Dataset

We will use the CIFAR-10 dataset here. CIFAR-10 consists of 60,000 32×32 pixel images across 10 classes. We can easily load the dataset using the torchvision library in PyTorch.


import torch
import torchvision
import torchvision.transforms as transforms

# Data transformation
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# Download CIFAR-10 dataset
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)
    

3.3 Defining the CNN Model

Now, we will define the CNN model. We will use a simple CNN architecture by stacking different layers. The model is built by combining convolutional layers and pooling layers.


import torch.nn as nn
import torch.nn.functional as F

class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)  # 3-channel input, 6-channel output, kernel size 5
        self.pool = nn.MaxPool2d(2, 2)   # 2x2 max pooling
        self.conv2 = nn.Conv2d(6, 16, 5) # 6-channel input, 16-channel output, kernel size 5
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # Fully connected layer
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)  # Flattening the output
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    

3.4 Training the Model

Having defined the model, we will now proceed with the training process. We will set up the loss function and optimizer, and train the model for a specified number of epochs.


import torch.optim as optim

# Create model instance
net = SimpleCNN()

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

# Train the model
for epoch in range(2):  # Setting the number of iterations
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        optimizer.zero_grad()  # Zero the gradients
        outputs = net(inputs)  # Model predictions
        loss = criterion(outputs, labels)  # Calculate loss
        loss.backward()  # Calculate gradients
        optimizer.step()  # Update parameters
        running_loss += loss.item()
        if i % 2000 == 1999:  # Print every 2000th batch
            print(f"[{epoch + 1}, {i + 1}] Loss: {running_loss / 2000:.3f}")
            running_loss = 0.0
    print("Training complete!")
    

3.5 Evaluating the Model

We will evaluate the trained model using the test dataset. By measuring accuracy, we can check how well the model has learned.


correct = 0
total = 0

with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy: {100 * correct / total:.2f}%')
    

4. Implementing Explainable CNNs

Now, we will explore how to make CNNs explainable. One approach is to use the Grad-CAM (Gradient-weighted Class Activation Mapping) technique to visualize which parts of the model had a significant impact on the predictions.

4.1 Defining Grad-CAM

Grad-CAM is a method for visualizing contributions to the predictions of a CNN. This can provide users with insights into the model’s interpretability. Here is the code for implementing Grad-CAM.


import cv2
import numpy as np
import matplotlib.pyplot as plt

def grad_cam(input_model, image, category_index):
    # Get the last convolutional layer of the model.
    final_conv_layer = 'conv2'
    grad_model = nn.Sequential(*list(input_model.children())[:-1])
    
    with torch.enable_grad():
        # Convert input image to tensor
        inputs = image.unsqueeze(0)  # Add batch dimension
        inputs.requires_grad = True  # Set to require gradients
        preds = grad_model(inputs)  # Predictions
        class_channel = preds[0][category_index]  # Target class channel
        
        # Compute gradients for the predicted class
        grad_model.zero_grad()
        class_channel.backward()
        
        # Get the output and gradients of the last convolutional layer
        conv_layer_output = grad_model[-1].forward(inputs).cpu().data.numpy()
        gradients = grad_model[-1].weight.grad.cpu().data.numpy()
        
        # Calculate the ratio for generating Grad-CAM
        alpha = np.mean(gradients, axis=(2, 3))[0, :]
        cam = np.dot(alpha, conv_layer_output[0])  # Contribution calculation
        cam = np.maximum(cam, 0)  # ReLU application
        cam = cam / np.max(cam)  # Normalization
        
        # Overlay on the original image
        return cam
    

4.2 Applying Grad-CAM

Now, let’s apply Grad-CAM to the trained model and visualize some images.


# Load example image
image, label = testset[0]
category_index = label  # Target class index
cam = grad_cam(net, image, category_index)

# Visualizing original image and Grad-CAM heatmap
plt.subplot(1, 2, 1)
plt.imshow(image.permute(1, 2, 0))
plt.title('Original Image')

plt.subplot(1, 2, 2)
plt.imshow(cam, cmap='jet', alpha=0.5)  # Apply color map
plt.title('Grad-CAM Heatmap')
plt.show()
    

5. Conclusion

Explainability in deep learning is becoming an increasingly important topic. There is a need for ways to understand the internal workings of CNNs and to visually explain their results. We explored how to implement CNNs using PyTorch and interpret the model’s predictions through the Grad-CAM technique.

This process began with training a simple CNN model and culminated in utilizing the state-of-the-art explainable deep learning technique, Grad-CAM, to interpret and visualize the predictions of CNNs. In the future, we should continue to explore more complex models and methodologies through various attempts. The development of explainable AI systems is crucial alongside the advancement of deep learning.

6. References