Deep learning has become one of the most important technologies in the field of artificial intelligence, and among them, neural networks are widely used to solve various problems. In this course, we will take a closer look at GoogLeNet, a CNN (Convolutional Neural Network). GoogLeNet gained significant attention by winning the ILSVRC (Imagenet Large Scale Visual Recognition Challenge) in 2014.
1. Overview of GoogLeNet
GoogLeNet, also known as ‘Inception v1’, has a unique structure that includes multiple convolution layers. Its main feature is the ‘Inception module’, which uses filters of various sizes to process images simultaneously. This approach helps the network learn more information without losing details.
2. Structure of GoogLeNet
- Input Layer: Accepts images of size 224×224.
- Convolution Layer: Uses filters of various sizes (1×1, 3×3, 5×5).
- Pooling Layer: Reduces the size of the feature map through down sampling.
- Fully Connected Layer: Provides classification results as the final output.
2.1 Inception Module
The Inception module uses multiple filters to capture details at different levels. Each module is composed as follows:
- 1×1 Convolution
- 3×3 Convolution
- 5×5 Convolution
- 3×3 Max Pooling
All these outputs are combined and passed to the next layer. This way, features at various scales can be obtained.
3. Implementing GoogLeNet in PyTorch
Now let’s look at how to implement GoogLeNet in PyTorch. First, we need to install PyTorch and other essential libraries.
pip install torch torchvision
3.1 Preparing the Dataset
In this example, we will use the CIFAR-10 dataset. This dataset consists of 60,000 images divided into 10 classes.
import torch
import torchvision
import torchvision.transforms as transforms
# Define data transformations
transform = transforms.Compose(
[transforms.Resize((224, 224)),
transforms.ToTensor()])
# Download CIFAR-10 dataset
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32,
shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=32,
shuffle=False, num_workers=2)
3.2 Defining the GoogLeNet Model
Next, we will define the GoogLeNet model. We will write the Inception module to be used.
import torch.nn as nn
import torch.nn.functional as F
class Inception(nn.Module):
def __init__(self, in_channels):
super(Inception, self).__init__()
self.branch1x1 = nn.Sequential(
nn.Conv2d(in_channels, 64, kernel_size=1),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True)
)
self.branch3x3 = nn.Sequential(
nn.Conv2d(in_channels, 128, kernel_size=1),
nn.BatchNorm2d(128),
nn.ReLU(inplace=True),
nn.Conv2d(128, 128, kernel_size=3, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(inplace=True)
)
self.branch5x5 = nn.Sequential(
nn.Conv2d(in_channels, 32, kernel_size=1),
nn.BatchNorm2d(32),
nn.ReLU(inplace=True),
nn.Conv2d(32, 64, kernel_size=5, padding=2),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True)
)
self.branch_pool = nn.Sequential(
nn.MaxPool2d(kernel_size=3, stride=1, padding=1),
nn.Conv2d(in_channels, 32, kernel_size=1),
nn.BatchNorm2d(32),
nn.ReLU(inplace=True)
)
def forward(self, x):
branch1 = self.branch1x1(x)
branch3 = self.branch3x3(x)
branch5 = self.branch5x5(x)
branch_pool = self.branch_pool(x)
outputs = [branch1, branch3, branch5, branch_pool]
return torch.cat(outputs, 1)
3.3 Defining the Full GoogLeNet
class GoogLeNet(nn.Module):
def __init__(self, num_classes=10):
super(GoogLeNet, self).__init__()
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)
self.pool1 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.conv2 = nn.Conv2d(64, 192, kernel_size=3, padding=1)
self.pool2 = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.inception1 = Inception(192)
self.inception2 = Inception(256)
self.inception3 = Inception(480)
self.pool3 = nn.AvgPool2d(kernel_size=7)
self.fc = nn.Linear(480, num_classes)
def forward(self, x):
x = F.relu(self.conv1(x))
x = self.pool1(x)
x = F.relu(self.conv2(x))
x = self.pool2(x)
x = self.inception1(x)
x = self.inception2(x)
x = self.inception3(x)
x = self.pool3(x)
x = x.view(x.size(0), -1)
x = self.fc(x)
return x
model = GoogLeNet()
3.4 Defining the Loss Function and Optimizer
Now that we are ready to train the model, we will define the loss function and the optimizer.
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
3.5 Training the Model
Now we will train the model. We will track the loss and accuracy during the given epochs.
num_epochs = 10
for epoch in range(num_epochs):
model.train()
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
inputs, labels = data
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if i % 100 == 99: # Print every 100 batches
print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{len(trainloader)}], Loss: {running_loss / 100:.4f}')
running_loss = 0.0
print('Training complete')
print('Model training finished!')
3.6 Evaluating the Model
Once training is complete, we will evaluate the model’s performance using the test dataset.
correct = 0
total = 0
model.eval()
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = model(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(f'Accuracy: {100 * correct / total:.2f}%')
4. Conclusion
GoogLeNet offers a powerful network structure that can leverage features at various scales. In this course, we learned the fundamental concepts of GoogLeNet and how to implement it in PyTorch. With this understanding, you will be able to apply similar methods in more complex models.
Additionally, there are many variations of GoogLeNet. Models like Inception v2 and Inception v3 improve performance by adjusting the depth or structure of the model. These variations can help achieve even more accurate predictions. In the next course, we will also cover these variant models.
That concludes the explanation about GoogLeNet. Thank you!