Deep Learning PyTorch Course, GoogLeNet

Deep learning has become one of the most important technologies in the field of artificial intelligence, and among them, neural networks are widely used to solve various problems. In this course, we will take a closer look at GoogLeNet, a CNN (Convolutional Neural Network). GoogLeNet gained significant attention by winning the ILSVRC (Imagenet Large Scale Visual Recognition Challenge) in 2014.

1. Overview of GoogLeNet

GoogLeNet, also known as ‘Inception v1’, has a unique structure that includes multiple convolution layers. Its main feature is the ‘Inception module’, which uses filters of various sizes to process images simultaneously. This approach helps the network learn more information without losing details.

2. Structure of GoogLeNet

  • Input Layer: Accepts images of size 224×224.
  • Convolution Layer: Uses filters of various sizes (1×1, 3×3, 5×5).
  • Pooling Layer: Reduces the size of the feature map through down sampling.
  • Fully Connected Layer: Provides classification results as the final output.

2.1 Inception Module

The Inception module uses multiple filters to capture details at different levels. Each module is composed as follows:

  • 1×1 Convolution
  • 3×3 Convolution
  • 5×5 Convolution
  • 3×3 Max Pooling

All these outputs are combined and passed to the next layer. This way, features at various scales can be obtained.

3. Implementing GoogLeNet in PyTorch

Now let’s look at how to implement GoogLeNet in PyTorch. First, we need to install PyTorch and other essential libraries.

pip install torch torchvision

3.1 Preparing the Dataset

In this example, we will use the CIFAR-10 dataset. This dataset consists of 60,000 images divided into 10 classes.


import torch
import torchvision
import torchvision.transforms as transforms

# Define data transformations
transform = transforms.Compose(
    [transforms.Resize((224, 224)),
     transforms.ToTensor()])

# Download CIFAR-10 dataset
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32,
                                          shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=32,
                                         shuffle=False, num_workers=2)

3.2 Defining the GoogLeNet Model

Next, we will define the GoogLeNet model. We will write the Inception module to be used.


import torch.nn as nn
import torch.nn.functional as F

class Inception(nn.Module):
    def __init__(self, in_channels):
        super(Inception, self).__init__()
        self.branch1x1 = nn.Sequential(
            nn.Conv2d(in_channels, 64, kernel_size=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True)
        )

        self.branch3x3 = nn.Sequential(
            nn.Conv2d(in_channels, 128, kernel_size=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True)
        )

        self.branch5x5 = nn.Sequential(
            nn.Conv2d(in_channels, 32, kernel_size=1),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            nn.Conv2d(32, 64, kernel_size=5, padding=2),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True)
        )

        self.branch_pool = nn.Sequential(
            nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
            nn.Conv2d(in_channels, 32, kernel_size=1),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        branch1 = self.branch1x1(x)
        branch3 = self.branch3x3(x)
        branch5 = self.branch5x5(x)
        branch_pool = self.branch_pool(x)

        outputs = [branch1, branch3, branch5, branch_pool]
        return torch.cat(outputs, 1)

3.3 Defining the Full GoogLeNet


class GoogLeNet(nn.Module):
    def __init__(self, num_classes=10):
        super(GoogLeNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)
        self.pool1 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.conv2 = nn.Conv2d(64, 192, kernel_size=3, padding=1)
        self.pool2 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

        self.inception1 = Inception(192)
        self.inception2 = Inception(256)
        self.inception3 = Inception(480)

        self.pool3 = nn.AvgPool2d(kernel_size=7)
        self.fc = nn.Linear(480, num_classes)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool1(x)
        x = F.relu(self.conv2(x))
        x = self.pool2(x)

        x = self.inception1(x)
        x = self.inception2(x)
        x = self.inception3(x)

        x = self.pool3(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x

model = GoogLeNet()

3.4 Defining the Loss Function and Optimizer

Now that we are ready to train the model, we will define the loss function and the optimizer.


import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

3.5 Training the Model

Now we will train the model. We will track the loss and accuracy during the given epochs.


num_epochs = 10

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 100 == 99:  # Print every 100 batches
            print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{len(trainloader)}], Loss: {running_loss / 100:.4f}')
            running_loss = 0.0
    print('Training complete')

print('Model training finished!')

3.6 Evaluating the Model

Once training is complete, we will evaluate the model’s performance using the test dataset.


correct = 0
total = 0
model.eval()
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy: {100 * correct / total:.2f}%')

4. Conclusion

GoogLeNet offers a powerful network structure that can leverage features at various scales. In this course, we learned the fundamental concepts of GoogLeNet and how to implement it in PyTorch. With this understanding, you will be able to apply similar methods in more complex models.

Additionally, there are many variations of GoogLeNet. Models like Inception v2 and Inception v3 improve performance by adjusting the depth or structure of the model. These variations can help achieve even more accurate predictions. In the next course, we will also cover these variant models.

That concludes the explanation about GoogLeNet. Thank you!