Deep Learning PyTorch Course, Training Process Monitoring

Monitoring the performance of the model during the training process of deep learning is very important. It helps to adjust hyperparameters appropriately, prevent model overfitting, and improve generalization performance. In this article, we will explain how to monitor the training process using the PyTorch framework.

1. Importance of Monitoring the Training Process

When training a deep learning model, simply checking the model’s accuracy is not enough. By monitoring the loss and accuracy on the training and validation datasets:

Early detection of when the model may overfit or underlearn
Identification of the need for hyperparameter tuning
Evaluation of the potential for performance improvement of the model

For these reasons, visualizing and monitoring the training process is essential.

2. Installing PyTorch

First, you need to have PyTorch installed. You can install it using the following command:

pip install torch torchvision

3. Preparing the Dataset

Here, we will demonstrate how to monitor the training process using a simple example of classifying digits with the MNIST dataset. You can load the MNIST dataset through PyTorch’s torchvision package.

import torch
import torchvision
import torchvision.transforms as transforms

# Data preprocessing
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Training dataset
trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

# Validation dataset
testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False)

4. Defining the Model

Next, we will define a neural network model. We will use a simple multilayer perceptron (MLP) structure.

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = x.view(-1, 28 * 28)  # Flatten
        x = F.relu(self.fc1(x))  # First layer
        x = F.relu(self.fc2(x))  # Second layer
        x = self.fc3(x)          # Output layer
        return x

# Create model instance
model = Net()

5. Loss Function and Optimization Algorithm

Set the loss function and optimization algorithm. Typically, cross-entropy loss and Adam optimization are used.

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

6. Setting Up the Training Process

Set up the training process and prepare to monitor it. We will save and visualize the loss values and accuracy at each epoch.

import matplotlib.pyplot as plt

num_epochs = 10
train_losses = []
test_losses = []
train_accuracies = []
test_accuracies = []

# Training function
def train():
    model.train()  # Switch model to training mode
    running_loss = 0.0
    correct = 0
    total = 0
    
    for inputs, labels in trainloader:
        optimizer.zero_grad()  # Reset gradients
        outputs = model(inputs)  # Predictions
        loss = criterion(outputs, labels)  # Calculate loss
        loss.backward()  # Backpropagation
        optimizer.step()  # Update parameters
        
        running_loss += loss.item()
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    
    # Save training loss and accuracy
    train_losses.append(running_loss / len(trainloader))
    train_accuracies.append(correct / total)

# Validation function
def test():
    model.eval()  # Switch model to evaluation mode
    running_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():  # Disable gradient calculation
        for inputs, labels in testloader:
            outputs = model(inputs)  # Predictions
            loss = criterion(outputs, labels)  # Calculate loss
            
            running_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    
    # Save validation loss and accuracy
    test_losses.append(running_loss / len(testloader))
    test_accuracies.append(correct / total)

7. Training Loop

Run the training loop to train the model and record the training and validation loss and accuracy at each epoch.

for epoch in range(num_epochs):
    train()  # Call training function
    test()   # Call validation function

    print(f'Epoch [{epoch+1}/{num_epochs}], '
          f'Train Loss: {train_losses[-1]:.4f}, Train Accuracy: {train_accuracies[-1]:.4f}, '
          f'Test Loss: {test_losses[-1]:.4f}, Test Accuracy: {test_accuracies[-1]:.4f}')

8. Visualizing Results

We will use the Matplotlib library to visualize the training process by plotting the loss and accuracy.

plt.figure(figsize=(12, 5))

# Visualizing Loss
plt.subplot(1, 2, 1)
plt.plot(train_losses, label='Train Loss')
plt.plot(test_losses, label='Test Loss')
plt.title('Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

# Visualizing Accuracy
plt.subplot(1, 2, 2)
plt.plot(train_accuracies, label='Train Accuracy')
plt.plot(test_accuracies, label='Test Accuracy')
plt.title('Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

plt.tight_layout()
plt.show()

9. Conclusion

In this course, we covered how to monitor the training process of deep learning models using PyTorch. Various visualization techniques and metrics can provide insights to improve the model’s performance.

As such, monitoring and visualizing the training process play a crucial role in optimizing the model’s performance, so it is advisable to always keep this in mind and apply the content.