Deep Learning with GANs Using PyTorch, Neural Style Transfer

1. Introduction

In recent years, the fields of artificial intelligence and deep learning have led innovations in information technology. Among them, Generative Adversarial Networks (GAN) and
Neural Style Transfer have gained attention as innovative methodologies for generating and transforming visual content. This course will explain the basic concepts of GAN and
how to implement Neural Style Transfer using PyTorch.

2. Basic Concepts of GAN

GAN consists of two neural networks: a Generator and a Discriminator. The Generator generates fake data, while the Discriminator distinguishes between real and fake data.
These two networks compete and learn from each other. The Generator continuously improves the quality of the data to fool the Discriminator, and the Discriminator learns to better distinguish the
data created by the Generator.

2.1 Structure of GAN

The process of GAN can be summarized as follows:

  1. Generate fake images by feeding random noise into the generator.
  2. Input the generated images and real images into the discriminator.
  3. The discriminator outputs the probability of the input images being real or fake.
  4. The generator improves the fake images based on the feedback from the discriminator.

3. Implementing GAN

Now let’s implement GAN. In this example, we will build a GAN that generates digit images using the MNIST dataset.

3.1 Installing Required Libraries


        pip install torch torchvision matplotlib
    

3.2 Loading MNIST Dataset


import torch
import torchvision.datasets as dsets
import torchvision.transforms as transforms

# Download and load MNIST dataset
def load_mnist(batch_size):
    transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
    train_dataset = dsets.MNIST(root='./data', train=True, transform=transform, download=True)
    train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
    return train_loader

# Set batch size to 100
batch_size = 100
train_loader = load_mnist(batch_size)
        

3.3 Building GAN Model

Now we will define the generator and discriminator models.


import torch.nn as nn

# Define generator model
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(100, 256),
            nn.ReLU(),
            nn.Linear(256, 512),
            nn.ReLU(),
            nn.Linear(512, 1024),
            nn.ReLU(),
            nn.Linear(1024, 784),
            nn.Tanh()  # Pixel value range for MNIST images: -1 to 1
        )

    def forward(self, z):
        return self.model(z).reshape(-1, 1, 28, 28)  # Reshape to MNIST image format

# Define discriminator model
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Flatten(),
            nn.Linear(784, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 1),
            nn.Sigmoid()  # Output values between 0 and 1
        )

    def forward(self, img):
        return self.model(img)
        

3.4 GAN Training Process

Now we will implement the functionality to train GAN.


import torchvision.utils as vutils

def train_gan(epochs, train_loader):
    generator = Generator()
    discriminator = Discriminator()

    criterion = nn.BCELoss()
    lr = 0.0002
    beta1 = 0.5
    g_optimizer = torch.optim.Adam(generator.parameters(), lr=lr, betas=(beta1, 0.999))
    d_optimizer = torch.optim.Adam(discriminator.parameters(), lr=lr, betas=(beta1, 0.999))

    for epoch in range(epochs):
        for i, (imgs, _) in enumerate(train_loader):
            # Generate real and fake labels
            real_labels = torch.ones(imgs.size(0), 1)
            fake_labels = torch.zeros(imgs.size(0), 1)

            # Train discriminator
            discriminator.zero_grad()
            outputs = discriminator(imgs)
            d_loss_real = criterion(outputs, real_labels)
            d_loss_real.backward()

            z = torch.randn(imgs.size(0), 100)  # Noise vector
            fake_imgs = generator(z)
            outputs = discriminator(fake_imgs.detach())
            d_loss_fake = criterion(outputs, fake_labels)
            d_loss_fake.backward()

            d_loss = d_loss_real + d_loss_fake
            d_optimizer.step()

            # Train generator
            generator.zero_grad()
            outputs = discriminator(fake_imgs)
            g_loss = criterion(outputs, real_labels)
            g_loss.backward()
            g_optimizer.step()

            if (i + 1) % 100 == 0:
                print(f'Epoch [{epoch + 1}/{epochs}], Step [{i + 1}/{len(train_loader)}], '
                      f'D Loss: {d_loss.item()}, G Loss: {g_loss.item()}')

        # Save generated images
        if (epoch + 1) % 10 == 0:
            with torch.no_grad():
                fake_imgs = generator(z).detach()
                vutils.save_image(fake_imgs, f'output/fake_images-{epoch + 1}.png', normalize=True)
train_gan(epochs=50, train_loader=train_loader)
        

4. Neural Style Transfer

Neural Style Transfer is a technique that separates the content and style of an image to transform a content image into the characteristics of a style image.
This process is based on Convolutional Neural Networks (CNN) and typically involves the following steps:

  1. Extract content and style images.
  2. Combine the two images to generate the final image.

4.1 Installing Required Libraries


pip install Pillow numpy matplotlib
    

4.2 Preparing the Model


import torch
import torch.nn as nn
from torchvision import models

class StyleTransferModel(nn.Module):
    def __init__(self):
        super(StyleTransferModel, self).__init__()
        self.vgg = models.vgg19(pretrained=True).features.eval()  # Using VGG19 model 

    def forward(self, x):
        return self.vgg(x)
        

4.3 Defining Style and Content Loss


class ContentLoss(nn.Module):
    def __init__(self, target):
        super(ContentLoss, self).__init__()
        self.target = target.detach()  # Prevent gradient calculation for target

    def forward(self, x):
        return nn.functional.mse_loss(x, self.target)

class StyleLoss(nn.Module):
    def __init__(self, target):
        super(StyleLoss, self).__init__()
        self.target = self.gram_matrix(target).detach()  # Calculate Gram matrix 

    def gram_matrix(self, x):
        b, c, h, w = x.size()
        features = x.view(b, c, h * w)
        G = torch.bmm(features, features.transpose(1, 2))  # Create Gram matrix
        return G.div(c * h * w)

    def forward(self, x):
        G = self.gram_matrix(x)
        return nn.functional.mse_loss(G, self.target)
        

4.4 Running Style Transfer

Now we will define the function and loss for style transfer, then implement the training loop. Ultimately, we minimize the combined content and style loss.


def run_style_transfer(content_img, style_img, model, num_steps=500, style_weight=1000000, content_weight=1):
    target = content_img.clone().requires_grad_(True)  # Create initial image
    optimizer = torch.optim.LBFGS([target])  # Use LBFGS optimization technique
    
    style_losses = []
    content_losses = []

    for layer in model.children():
        target = layer(target)
        if isinstance(layer, ContentLoss):
            content_losses.append(target)
        if isinstance(layer, StyleLoss):
            style_losses.append(target)

    for step in range(num_steps):
        def closure():
            optimizer.zero_grad()
            target_data = target.data

            style_loss_val = sum([style_loss(target_data).item() for style_loss in style_losses])
            content_loss_val = sum([content_loss(target_data).item() for content_loss in content_losses])
            total_loss = style_weight * style_loss_val + content_weight * content_loss_val

            total_loss.backward()
            return total_loss

        optimizer.step(closure)

    return target.data
        

5. Conclusion

In this course, we learned about implementing image generation and neural style transfer using GAN. GAN has set a new standard in image generation technology, and
neural style transfer is a methodology for creating unique artistic works by combining images. Both technologies are driving advancements in deep learning and will be
applicable in various fields in the future.

6. References