Deep Learning PyTorch Course, AlexNet

Deep learning has emerged as one of the most notable technologies in the field of artificial intelligence (AI) in recent years. In particular, various deep learning-based models have demonstrated outstanding performance in the field of computer vision. Among them, AlexNet was an innovative model that led to the popularization of deep learning by achieving remarkable results in the 2012 ImageNet competition. It is a very deep neural network structure consisting of multiple convolutional layers and pooling layers.

1. Introduction to AlexNet Structure

AlexNet consists of the following key components:

Input Layer: Color image of size 224×224
Layer 1: Convolutional Layer: Uses 96 filters, filter size 11×11, stride 4
Layer 2: Max Pooling Layer: 3×3 max pooling, stride 2
Layer 3: Convolutional Layer: Uses 256 filters, filter size 5×5
Layer 4: Max Pooling Layer: 3×3 max pooling, stride 2
Layer 5: Convolutional Layer: Uses 384 filters, filter size 3×3
Layer 6: Convolutional Layer: Uses 384 filters, filter size 3×3
Layer 7: Convolutional Layer: Uses 256 filters, filter size 3×3
Layer 8: Max Pooling Layer: 3×3 max pooling, stride 2
Layer 9: Fully Connected Layer: 4096 neurons
Layer 10: Fully Connected Layer: 4096 neurons
Layer 11: Output Layer: Softmax output for 1000 classes

2. How AlexNet Works

The basic idea of AlexNet is to extract features from images and use them to classify the images. In the initial stages, it learns high-level features of the image, and in subsequent stages, it combines them to learn more complex concepts. Each Convolutional Layer generates feature maps from the input image through filters, and the Max Pooling Layer downsamples these features to reduce the computational load.

3. Implementing AlexNet with PyTorch

Now, let’s implement the AlexNet model using PyTorch. PyTorch is a very useful framework for implementing deep learning models, providing a flexible and intuitive API.

3.1 Importing Packages

Import the packages needed to use PyTorch.

python
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

3.2 Defining the AlexNet Model

Now, we will define the AlexNet architecture. Each layer is implemented as a class that inherits from nn.Module.

python
class AlexNet(nn.Module):
    def __init__(self, num_classes=1000):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(96, 256, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(256, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

3.3 Preparing the Dataset

Prepare the dataset for model training. Datasets such as CIFAR-10 or ImageNet can be used. Here, we will take CIFAR-10 as an example.

python
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
train_loader = DataLoader(dataset=train_dataset, batch_size=32, shuffle=True)

3.4 Training the Model

Define the loss function and optimizer for model training and proceed with the training process.

python
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = AlexNet(num_classes=10).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

for epoch in range(10):  # Training for 10 epochs
    model.train()
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
    print(f'Epoch [{epoch+1}/10], Loss: {loss.item():.4f}')

4. Conclusion

AlexNet is a simple model, but it played a very important role in the advancement of deep learning. It demonstrated that deep learning could achieve in-depth learning through datasets and became the foundation for many advanced models developed thereafter. Through this tutorial, we explored the structure of AlexNet and a simple implementation example using PyTorch. The path of deep learning is long, but if we understand the basic concepts well and move forward, we will be able to understand more complex models easily.

5. References

AlexNet paper: ImageNet Classification with Deep Convolutional Neural Networks
PyTorch Documentation: PyTorch Documentation