Deep Learning with GAN using PyTorch, Literary Club for Notorious Offenders

Generative Adversarial Networks (GANs) are considered one of the most innovative advancements in deep learning. GAN consists of two neural networks: a Generator and a Discriminator. The Generator creates data, while the Discriminator determines whether the data is real or fake. This competitive structure helps to enhance each other’s performance. In this course, we will build a GAN using PyTorch and perform data generation around the theme of ‘The Literary Club for Bad Criminals’ in an interesting way.

1. Basic Structure and Principles of GAN

The operation of GAN works as follows:

  • Generator: Takes random noise (z) as input and generates realistic data.
  • Discriminator: Determines whether the input data is real or generated by the Generator.
  • The Generator tries to deceive the Discriminator, while the Discriminator tries to differentiate between the two. As this competition continues, both networks progress further.

2. Preparing Required Libraries and Datasets

Install PyTorch and other necessary libraries. Then you will need to choose the dataset to prepare the data for this process. In this example, we will use the MNIST dataset to generate images of numbers. The MNIST dataset is composed of images of handwritten digits.

2.1 Setting Up the Environment

pip install torch torchvision

2.2 Loading the Dataset

import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Load MNIST Dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

mnist_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
data_loader = DataLoader(dataset=mnist_dataset, batch_size=64, shuffle=True)

3. Constructing the GAN Model

We define the Generator and Discriminator models to implement the Generative Adversarial Network.

3.1 Generator Model

import torch.nn as nn

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(100, 256),
            nn.ReLU(),
            nn.Linear(256, 512),
            nn.ReLU(),
            nn.Linear(512, 1024),
            nn.ReLU(),
            nn.Linear(1024, 28 * 28),  # MNIST image size
            nn.Tanh()  # Adjust the pixel value range of generated images to -1 ~ 1
        )

    def forward(self, z):
        img = self.model(z)
        img = img.view(img.size(0), 1, 28, 28)  # Transform into image shape
        return img

3.2 Discriminator Model

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(28 * 28, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 1),
            nn.Sigmoid()  # Output value between 0 and 1
        )

    def forward(self, img):
        img_flat = img.view(img.size(0), -1)  # Flatten the image
        validity = self.model(img_flat)
        return validity

4. Training Process of GAN

The training process of GAN is carried out as follows:

  • Provide real images and generated images to the Discriminator to calculate its loss.
  • Update the Generator to make the generated images closer to the real ones.
  • Repeat this process to help each network improve.

4.1 Defining Loss Function and Optimizers

import torch.optim as optim

# Create instances of Generator and Discriminator
generator = Generator()
discriminator = Discriminator()

# Set loss function and optimizer
adversarial_loss = nn.BCELoss()
optimizer_G = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
optimizer_D = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))

4.2 Training Loop

num_epochs = 200
for epoch in range(num_epochs):
    for i, (imgs, _) in enumerate(data_loader):
        # Label for real data: 1
        real_imgs = imgs
        valid = torch.ones(imgs.size(0), 1)  # Ground truth for real images
        fake = torch.zeros(imgs.size(0), 1)  # Ground truth for fake images

        # Train Discriminator
        optimizer_D.zero_grad()
        z = torch.randn(imgs.size(0), 100)  # Sample random noise
        generated_imgs = generator(z)  # Generated images
        real_loss = adversarial_loss(discriminator(real_imgs), valid)
        fake_loss = adversarial_loss(discriminator(generated_imgs.detach()), fake)
        d_loss = (real_loss + fake_loss) / 2
        d_loss.backward()
        optimizer_D.step()

        # Train Generator
        optimizer_G.zero_grad()
        g_loss = adversarial_loss(discriminator(generated_imgs), valid)
        g_loss.backward()
        optimizer_G.step()

    print(f'Epoch {epoch}/{num_epochs} | D Loss: {d_loss.item()} | G Loss: {g_loss.item()}')

5. Visualizing Results

After training is completed, we visualize the generated images to check the results.

import matplotlib.pyplot as plt

# Visualize generated images
def show_generated_images(generator, num_images=16):
    z = torch.randn(num_images, 100)  # Sample random noise
    generated_images = generator(z)
    generated_images = generated_images.detach().numpy()

    fig, axs = plt.subplots(4, 4, figsize=(10, 10))
    for i in range(4):
        for j in range(4):
            axs[i, j].imshow(generated_images[i * 4 + j, 0], cmap='gray')
            axs[i, j].axis('off')
    plt.show()

show_generated_images(generator)

In this way, you can build and train a GAN and verify the generated images. The potential applications of GANs are vast, and they can facilitate creative tasks. Now, you can take a step closer to the world of GANs!

6. Conclusion

Generative Adversarial Networks are a very interesting area of deep learning, actively used in many research and development projects. In this course, we explored the basic principles and structures of GAN using PyTorch and covered the process of building and training deep learning models. I hope you gain a deep understanding and interest in GAN through this course and that it greatly helps you in your future deep learning journey.


Deep Learning and Reinforcement Learning using PyTorch

1. Introduction

Generative Adversarial Networks (GANs) are models proposed by Ian Goodfellow in 2014 that generate data through competition between two neural networks. GANs are widely used particularly in image generation, style transfer, and data augmentation. In this post, we will introduce the basic structure of GANs, how to implement them using PyTorch, the basic concepts of reinforcement learning, and various applications.

2. Basic Structure of GANs

GANs consist of two neural networks: a Generator and a Discriminator. The Generator takes random noise as input and generates new data, while the Discriminator distinguishes whether the input data is real or generated. These two networks learn by competing with each other.

2.1 Generator

The Generator takes a noise vector and produces data that looks real. The goal is to deceive the Discriminator.

2.2 Discriminator

The Discriminator assesses the authenticity of the input data. It outputs 1 for real data and 0 for generated data.

2.3 Loss Function of GANs

The loss function of GANs is defined as follows:

min_G max_D V(D, G) = E[log(D(x))] + E[log(1 - D(G(z)))]

Here, E represents expectation, x is real data, and G(z) is the data generated by the Generator. The Generator tries to minimize the loss while the Discriminator tries to maximize the loss.

3. Implementing GANs Using PyTorch

Now, let’s implement a GAN using PyTorch. We will use the MNIST handwritten digits dataset as the dataset.

3.1 Preparing the Dataset

import torch
import torchvision
from torchvision import datasets, transforms

# Data transformation and download
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# MNIST dataset
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)

3.2 Defining the Generator Model

import torch.nn as nn

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Linear(100, 256),
            nn.ReLU(True)
        )
        self.layer2 = nn.Sequential(
            nn.Linear(256, 512),
            nn.ReLU(True)
        )
        self.layer3 = nn.Sequential(
            nn.Linear(512, 1024),
            nn.ReLU(True)
        )
        self.layer4 = nn.Sequential(
            nn.Linear(1024, 28*28),
            nn.Tanh()  # Pixel values are between -1 and 1
        )
    
    def forward(self, z):
        out = self.layer1(z)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        return out.view(-1, 1, 28, 28)  # Reshape to image format

3.3 Defining the Discriminator Model

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Linear(28*28, 1024),
            nn.LeakyReLU(0.2, inplace=True)
        )
        self.layer2 = nn.Sequential(
            nn.Linear(1024, 512),
            nn.LeakyReLU(0.2, inplace=True)
        )
        self.layer3 = nn.Sequential(
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2, inplace=True)
        )
        self.layer4 = nn.Sequential(
            nn.Linear(256, 1),
            nn.Sigmoid()  # Output value is between 0 and 1
        )
    
    def forward(self, x):
        out = self.layer1(x.view(-1, 28*28))  # Flatten
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        return out

3.4 Model Training

import torch.optim as optim

# Initialize models
generator = Generator()
discriminator = Discriminator()

# Set loss function and optimizers
criterion = nn.BCELoss()  # Binary Cross Entropy Loss
optimizer_g = optim.Adam(generator.parameters(), lr=0.0002)
optimizer_d = optim.Adam(discriminator.parameters(), lr=0.0002)

# Training
num_epochs = 200
for epoch in range(num_epochs):
    for i, (images, _) in enumerate(train_loader):
        # Real data labels
        real_labels = torch.ones(images.size(0), 1)
        fake_labels = torch.zeros(images.size(0), 1)

        # Train Discriminator
        optimizer_d.zero_grad()
        outputs = discriminator(images)
        d_loss_real = criterion(outputs, real_labels)
        d_loss_real.backward()
        
        z = torch.randn(images.size(0), 100)
        fake_images = generator(z)
        outputs = discriminator(fake_images.detach())
        d_loss_fake = criterion(outputs, fake_labels)
        d_loss_fake.backward()
        
        optimizer_d.step()
        
        # Train Generator
        optimizer_g.zero_grad()
        outputs = discriminator(fake_images)
        g_loss = criterion(outputs, real_labels)
        g_loss.backward()
        optimizer_g.step()
    
    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], d_loss: {d_loss_real.item() + d_loss_fake.item():.4f}, g_loss: {g_loss.item():.4f}')

3.5 Visualizing the Results

import matplotlib.pyplot as plt

# Function to visualize generated images
def plot_generated_images(generator, n=10):
    z = torch.randn(n, 100)
    with torch.no_grad():
        generated_images = generator(z).cpu()
    generated_images = generated_images.view(-1, 28, 28)
    
    plt.figure(figsize=(10, 1))
    for i in range(n):
        plt.subplot(1, n, i+1)
        plt.imshow(generated_images[i], cmap='gray')
        plt.axis('off')
    plt.show()

# Generate images
plot_generated_images(generator)

4. Basic Concepts of Reinforcement Learning

Reinforcement Learning (RL) is a field of machine learning where an agent learns optimal actions through interaction with the environment. The agent observes states, selects actions, receives rewards, and learns the optimal policy.

4.1 Components of Reinforcement Learning

  • State: Information representing the current environment for the agent.
  • Action: The task that the agent can perform in the current state.
  • Reward: Feedback received from the environment after the agent performs an action.
  • Policy: The probability distribution of the actions the agent can take in each state.

4.2 Reinforcement Learning Algorithms

  • Q-Learning: A value-based method that learns Q values to derive optimal policies.
  • Policy Gradient: A method that directly learns policies.
  • Actor-Critic: A method that learns value functions and policies simultaneously.

4.3 Implementing Reinforcement Learning Using PyTorch

We will use OpenAI’s Gym library for a simple reinforcement learning implementation. Here, we will address the CartPole environment.

4.3.1 Setting up the Gym Environment

import gym

# Create Gym environment
env = gym.make('CartPole-v1')  # CartPole environment

4.3.2 Defining the DQN Model

class DQN(nn.Module):
    def __init__(self, input_size, num_actions):
        super(DQN, self).__init__()
        self.fc1 = nn.Linear(input_size, 24)
        self.fc2 = nn.Linear(24, 24)
        self.fc3 = nn.Linear(24, num_actions)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return self.fc3(x)

4.3.3 Model Training

def train_dqn(env, num_episodes):
    model = DQN(input_size=env.observation_space.shape[0], num_actions=env.action_space.n)
    optimizer = optim.Adam(model.parameters())
    criterion = nn.MSELoss()

    for episode in range(num_episodes):
        state = env.reset()
        state = torch.FloatTensor(state)
        done = False
        total_reward = 0

        while not done:
            q_values = model(state)
            action = torch.argmax(q_values).item()  # or use epsilon-greedy policy

            next_state, reward, done, _ = env.step(action)
            next_state = torch.FloatTensor(next_state)

            total_reward += reward

            # Add DQN update logic here

            state = next_state

        print(f'Episode {episode+1}, Total Reward: {total_reward}')  

    return model

# Start DQN training
train_dqn(env, num_episodes=1000)

5. Conclusion

In this post, we explored the basic concepts of GANs and reinforcement learning as well as implementation methods using PyTorch. GANs are very useful models for data generation, and reinforcement learning is a technique that helps agents learn optimal policies. These technologies can be applied in various fields, and future research and development are expected.

6. References

Deep Learning with PyTorch, GAN, WGAN – Wasserstein GAN

With the advancement of deep learning, the use of Generative Adversarial Networks (GANs) is increasing in various fields such as image generation, reinforcement learning, image transformation, and image combination. GANs are used to generate high-resolution images through the competition between two networks: the Generator and the Discriminator. This article will cover the basic concepts of GANs, as well as the structure and operation of WGAN (Wasserstein GAN), along with example PyTorch code for implementation.

1. Basic Concept of GAN

GAN is a model proposed by Ian Goodfellow in 2014, composed of two neural networks: the Generator and the Discriminator. The Generator takes a random noise vector as input to generate data similar to real data, while the Discriminator determines whether the input data is real or generated. In this process, both neural networks learn in a competitive manner to generate increasingly perfect data.

1.1 Structure of GAN

  • Generator (G): A network that takes random noise as input to generate data.
  • Discriminator (D): A network that distinguishes between real data and generated data.

1.2 Loss Function of GAN

The loss function of GAN is as follows:

    L(D) = -E[log(D(x))] - E[log(1 - D(G(z)))],
    L(G) = -E[log(D(G(z)))]
    

Here, D(x) is the probability that the Discriminator judges the real data as true, and G(z) is the data generated by the Generator.

2. WGAN – Wasserstein GAN

The traditional GAN had the problem of an unstable loss function for the Discriminator and instability in learning. WGAN addresses these issues by using Wasserstein Distance. Wasserstein distance (or Earth Mover’s Distance) is a method to measure the optimal transportation cost between two probability distributions.

2.1 Improvements of WGAN

  • WGAN uses a ‘Critic’, a non-linear regression model, instead of a Discriminator.
  • The loss function of WGAN is as follows:
                L(D) = E[D(x)] - E[D(G(z))],
                L(G) = -E[D(G(z))]
                
  • WGAN guarantees the Lipschitz continuity of the Critic through Weight Clipping.
  • It uses Gradient Penalty techniques to relax Lipschitz constraints.

2.2 Structure of WGAN

WGAN introduces a Critic into the basic structure of GAN, resulting in a modified form. The following is the network structure of WGAN:

  • The previous Discriminator is replaced by the current Critic.

3. WGAN Implementation Using PyTorch

Now we will implement WGAN using PyTorch. This example will build a model to generate handwritten digits using the MNIST dataset.

3.1 Preparing the Dataset

First, we load and preprocess the dataset.


import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Load and preprocess the dataset.
transform = transforms.Compose([
    transforms.Resize(28),
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

train_data = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_data, batch_size=64, shuffle=True)
    

3.2 Defining the WGAN Model

Now it’s time to define the Generator and Critic models.


# Define Generator model
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(100, 256),
            nn.ReLU(),
            nn.Linear(256, 512),
            nn.ReLU(),
            nn.Linear(512, 1024),
            nn.ReLU(),
            nn.Linear(1024, 784),
            nn.Tanh()
        )
        
    def forward(self, z):
        return self.model(z).view(-1, 1, 28, 28)  # Reshape to 28x28 image

# Define Critic model
class Critic(nn.Module):
    def __init__(self):
        super(Critic, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(784, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 1)
        )
    
    def forward(self, img):
        return self.model(img.view(-1, 784))  # Reshape to 784 dimensions
    

3.3 Training Process of WGAN

Now we define the training process for WGAN.


def train_wgan(num_epochs):
    generator = Generator()
    critic = Critic()
    
    # Set optimizers
    optimizer_G = optim.RMSprop(generator.parameters(), lr=0.00005)
    optimizer_C = optim.RMSprop(critic.parameters(), lr=0.00005)

    for epoch in range(num_epochs):
        for i, (imgs, _) in enumerate(train_loader):
            imgs = imgs.to(device)

            # Critic's residual equations
            optimizer_C.zero_grad()
            z = torch.randn(imgs.size(0), 100).to(device)
            fake_imgs = generator(z)
            c_real = critic(imgs)
            c_fake = critic(fake_imgs.detach())
            c_loss = c_fake.mean() - c_real.mean()
            c_loss.backward()
            optimizer_C.step()

            # Weight Clipping
            for p in critic.parameters():
                p.data.clamp_(-0.01, 0.01)

            # Update Generator
            if i % 5 == 0:
                optimizer_G.zero_grad()
                g_loss = -critic(fake_imgs).mean()
                g_loss.backward()
                optimizer_G.step()
            
        print(f'Epoch [{epoch}/{num_epochs}], Loss C: {c_loss.item()}, Loss G: {g_loss.item()}')

# Set GPU usage
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
train_wgan(num_epochs=50)
    

3.4 Visualizing the Results

After training is complete, we visualize the generated images to check the results.


import matplotlib.pyplot as plt

def show_generated_images(num_images):
    z = torch.randn(num_images, 100).to(device)
    generated_imgs = generator(z).cpu().detach()
    
    fig, axes = plt.subplots(1, num_images, figsize=(15, 15))
    for i in range(num_images):
        axes[i].imshow(generated_imgs[i][0], cmap='gray')
        axes[i].axis('off')
    plt.show()

# Visualize the results
show_generated_images(5)
    

4. Conclusion

WGAN provides a more stable training process by utilizing Wasserstein Distance to overcome the issues of traditional GANs. This article introduced the method of implementing WGAN using PyTorch, hoping to enhance the understanding of generative adversarial networks. GANs and their variant models are powerful tools that can yield innovative results in various fields beyond image generation.

5. References

  • Ian J. Goodfellow et al., “Generative Adversarial Nets”, 2014.
  • Martin Arjovsky et al., “Wasserstein Generative Adversarial Networks”, 2017.
  • PyTorch Documentation: https://pytorch.org/docs/stable/index.html

Deep Learning with GANs using PyTorch, WGAN-GP

Generative Adversarial Networks (GAN) is a powerful generative model proposed by Ian Goodfellow in 2014. GAN consists of two neural networks, namely the Generator and the Discriminator, which compete with each other to learn. The Generator tries to create new data that resembles real data, while the Discriminator attempts to distinguish whether the given data is real or generated. They continuously improve and ultimately become capable of reliably generating very realistic data.

This article explains Wasserstein GAN with Gradient Penalty (WGAN-GP), a variant of GAN, and demonstrates how to implement WGAN-GP using PyTorch. WGAN-GP is based on the Wasserstein distance and adds a Gradient Penalty to the Discriminator to enhance training stability.

1. Basic Structure of GAN

The basic structure of GAN is as follows.

  • Generator: Receives random noise as input and generates fake data.
  • Discriminator: Receives real data and fake data produced by the Generator as input and judges how similar they are.

The learning process of GAN consists of the following two steps.

  1. The Generator generates random data from noise.
  2. The Discriminator distinguishes between real data and the generated data.

This process is performed repeatedly, leading to improvements in both networks. However, traditional GANs often face training instability and mode collapse issues, prompting research into various approaches for more stable training.

2. Introduction to WGAN-GP

WGAN aims to address the inherent problems of GAN by introducing the concept of the Wasserstein distance. The Wasserstein distance allows for a clearer definition of the differences between two distributions, facilitating network training. The key idea of WGAN is to introduce the concept of a “critic” instead of a Discriminator. The critic evaluates the distance between the generated data and real data and updates the network using Wasserstein loss rather than mean squared error (MSE) loss based on this evaluation.

By adding Gradient Penalty (GP) in WGAN, training stability is further enhanced by ensuring that the Discriminator adheres to the Lipschitz condition. The Gradient Penalty is defined as follows:

GP = λ * E[(||∇D(x) ||2 - 1)²]

Here, λ is a hyperparameter, and D(x) is the output of the Discriminator. The Gradient Penalty reinforces keeping the gradient of the Discriminator at 1. This approach enables WGAN-GP to overcome the instability of GANs and allows for more stable training.

3. Implementing WGAN-GP in PyTorch

Now, let’s implement WGAN-GP using PyTorch. The following steps will be followed:

  1. Install necessary libraries and load the dataset
  2. Define the Generator and Discriminator models
  3. Implement the WGAN-GP training loop
  4. Visualize the results

3.1 Installing Libraries and Loading the Dataset

First, install the necessary libraries and load the MNIST dataset.

!pip install torch torchvision matplotlib
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.datasets as dsets
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

3.2 Defining the Generator and Discriminator Models

Define the Generator and Discriminator models. The Generator takes a random noise vector as input and transforms it into an image, while the Discriminator evaluates whether the input image is real or fake.

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(100, 256),
            nn.ReLU(True),
            nn.Linear(256, 512),
            nn.ReLU(True),
            nn.Linear(512, 1024),
            nn.ReLU(True),
            nn.Linear(1024, 28 * 28),
            nn.Tanh()
        )
    
    def forward(self, z):
        return self.model(z).reshape(-1, 1, 28, 28)

class Critic(nn.Module):
    def __init__(self):
        super(Critic, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(28 * 28, 1024),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(1024, 512),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(256, 1)
        )

    def forward(self, x):
        return self.model(x.view(-1, 28 * 28))

3.3 Implementing the WGAN-GP Training Loop

Now, let’s implement the training loop of WGAN-GP. During the training process, the Discriminator is updated a certain number of times before updating the Generator. The Gradient Penalty is also included in the loss.

def compute_gradient_penalty(critic, real_samples, fake_samples):
    alpha = torch.rand(real_samples.size(0), 1, 1, 1).expand_as(real_samples)
    interpolated_samples = alpha * real_samples + (1 - alpha) * fake_samples
    interpolated_samples.requires_grad_(True)

    d_interpolated = critic(interpolated_samples)

    gradients = torch.autograd.grad(outputs=d_interpolated, inputs=interpolated_samples,
                                    grad_outputs=torch.ones_like(d_interpolated),
                                    create_graph=True, retain_graph=True)[0]

    gradient_penalty = ((gradients.norm(2, dim=1) - 1) ** 2).mean()
    return gradient_penalty
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

generator = Generator().to(device)
critic = Critic().to(device)

learning_rate = 0.00005
num_epochs = 100
critic_iterations = 5
lambda_gp = 10

criterion = nn.MSELoss()
optimizer_generator = optim.Adam(generator.parameters(), lr=learning_rate)
optimizer_critic = optim.Adam(critic.parameters(), lr=learning_rate)

Real_data = dsets.MNIST(root='./data', train=True, transform=transforms.ToTensor(), download=True)

data_loader = torch.utils.data.DataLoader(Real_data, batch_size=64, shuffle=True)

for epoch in range(num_epochs):
    for i, (real_images, _) in enumerate(data_loader):
        real_images = real_images.to(device)

        for _ in range(critic_iterations):
            optimizer_critic.zero_grad()

            # Generate fake images
            z = torch.randn(real_images.size(0), 100).to(device)
            fake_images = generator(z)

            # Get critic scores
            real_validity = critic(real_images)
            fake_validity = critic(fake_images)
            gradient_penalty = compute_gradient_penalty(critic, real_images.data, fake_images.data)

            # Compute loss
            critic_loss = -torch.mean(real_validity) + torch.mean(fake_validity) + lambda_gp * gradient_penalty
            critic_loss.backward()
            optimizer_critic.step()

        # Update generator
        optimizer_generator.zero_grad()
        
        # Get generator score
        fake_images = generator(z)
        validity = critic(fake_images)
        generator_loss = -torch.mean(validity)
        generator_loss.backward()
        optimizer_generator.step()

    if epoch % 10 == 0:
        print(f"Epoch: {epoch}/{num_epochs}, Critic Loss: {critic_loss.item():.4f}, Generator Loss: {generator_loss.item():.4f}")

3.4 Visualizing Results

Finally, let’s visualize the generated images. This is a good way to verify how well the Generator has learned during the training process.

def show_generated_images(generator, num_images=25):
    z = torch.randn(num_images, 100).to(device)
    generated_images = generator(z).cpu().detach().numpy()

    plt.figure(figsize=(5, 5))
    for i in range(num_images):
        plt.subplot(5, 5, i + 1)
        plt.imshow(generated_images[i][0], cmap='gray')
        plt.axis('off')
    plt.show()

show_generated_images(generator)

4. Conclusion

This article discussed WGAN-GP, a variant of GAN, and demonstrated how to implement it using PyTorch. WGAN-GP offers the advantage of more stable training by leveraging the Wasserstein distance and Gradient Penalty. These GAN-based models can be applied in various fields, including image generation, image translation, and style transfer.

As deep learning continues to advance, GANs and their variants are receiving ongoing attention, and future developments are highly anticipated. I encourage you to also take on various projects using GANs and WGAN-GP!

Deep Learning with PyTorch for GANs, Generating Face Images using VAE

Recently, in the field of artificial intelligence, Generative Adversarial Networks (GAN) and Variational Autoencoders (VAE) have established themselves as key technologies that significantly enhance the efficiency and quality of image generation. In this article, we will take a detailed look at the basic concepts of GAN and VAE, along with the process of generating face images using PyTorch.

1. Overview of GAN (Generative Adversarial Networks)

Generative Adversarial Networks (GAN) have a structure where two neural networks, a Generator and a Discriminator, compete and learn from each other. The Generator tries to create images that are similar to real ones, while the Discriminator tries to determine whether the generated images are real or fake. This process helps the Generator learn to create increasingly realistic images by deceiving the Discriminator.

1.1 How GAN Works

GAN consists of two networks as follows:

  • Generator: Takes random noise as input and generates images similar to real ones.
  • Discriminator: Classifies whether the input image is real or fake.

As the training progresses, the Generator gradually produces higher quality images, while the Discriminator analyzes the images more accurately. This process occurs in the form of a zero-sum game, with the goal of the GAN model being to simultaneously enhance the performance of the two networks.

2. Overview of VAE (Variational Autoencoder)

Variational Autoencoders (VAE) are models that learn the latent space of images or data to generate new data. VAE transforms input data into a lower-dimensional latent space through an encoder, then samples from this latent space using a decoder to reconstruct the images. VAE is a probabilistic model that learns the distribution of input data and generates new samples based on it.

2.1 Structure of VAE

VAE consists of three main components:

  • Encoder: Transforms the input data into latent variables.
  • Sampling: Extracts samples from the latent variables.
  • Decoder: Generates new images using the sampled latent variables.

3. Project Goals and Dataset

The goal of this project is to generate face images similar to real ones using GAN and VAE. For this purpose, we will use the CelebA dataset. The CelebA dataset contains various face images and is suitable for measuring the performance of GAN and VAE.

4. Environment Setup

To carry out this project, Python and the PyTorch framework are required. Below is a list of necessary packages:

pip install torch torchvision matplotlib

5. Implementing GAN with PyTorch

First, we will implement the GAN model. The structure of GAN consists of the following steps:

  • Loading the dataset
  • Defining the Generator and Discriminator
  • Setting up the training loop
  • Visualizing the results

5.1 Loading the Dataset

First, we will download and prepare the CelebA dataset.

import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader

transform = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
])

dataset = ImageFolder(root='path_to_celeba', transform=transform)
dataloader = DataLoader(dataset, batch_size=64, shuffle=True)

5.2 Defining the Generator and Discriminator

We define the Generator and Discriminator of GAN.

import torch
import torch.nn as nn

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(100, 256),
            nn.ReLU(),
            nn.Linear(256, 512),
            nn.ReLU(),
            nn.Linear(512, 1024),
            nn.ReLU(),
            nn.Linear(1024, 3 * 64 * 64),
            nn.Tanh(),
        )

    def forward(self, z):
        z = self.model(z)
        return z.view(-1, 3, 64, 64)

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(3 * 64 * 64, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 1),
            nn.Sigmoid(),
        )

    def forward(self, img):
        img_flat = img.view(img.size(0), -1)
        return self.model(img_flat)

5.3 Setting up the Training Loop

Now we implement the training process for GAN.

import torch.optim as optim

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
generator = Generator().to(device)
discriminator = Discriminator().to(device)

criterion = nn.BCELoss()
g_optimizer = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
d_optimizer = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))

num_epochs = 50
for epoch in range(num_epochs):
    for i, (imgs, _) in enumerate(dataloader):
        imgs = imgs.to(device)
        batch_size = imgs.size(0)

        # Setting labels
        real_labels = torch.ones(batch_size, 1).to(device)
        fake_labels = torch.zeros(batch_size, 1).to(device)

        # Training the Discriminator
        d_optimizer.zero_grad()
        outputs = discriminator(imgs)
        d_loss_real = criterion(outputs, real_labels)
        d_loss_real.backward()

        z = torch.randn(batch_size, 100).to(device)
        fake_imgs = generator(z)
        outputs = discriminator(fake_imgs.detach())
        d_loss_fake = criterion(outputs, fake_labels)
        d_loss_fake.backward()
        
        d_loss = d_loss_real + d_loss_fake
        d_optimizer.step()

        # Training the Generator
        g_optimizer.zero_grad()
        outputs = discriminator(fake_imgs)
        g_loss = criterion(outputs, real_labels)
        g_loss.backward()
        g_optimizer.step()

    print(f'Epoch [{epoch+1}/{num_epochs}], d_loss: {d_loss.item()}, g_loss: {g_loss.item()}')

5.4 Visualizing the Results

We visualize the images generated by the trained Generator.

import matplotlib.pyplot as plt

z = torch.randn(64, 100).to(device)
fake_images = generator(z).detach().cpu()

plt.figure(figsize=(8, 8))
for i in range(64):
    plt.subplot(8, 8, i + 1)
    plt.imshow(fake_images[i].permute(1, 2, 0).numpy() * 0.5 + 0.5)
    plt.axis('off')
plt.show()

6. Implementing VAE with PyTorch

Now let’s implement VAE. The structure of VAE is similar to GAN but uses a probabilistic approach. The implementation steps of VAE are as follows:

  • Preparing the dataset
  • Defining the Encoder and Decoder
  • Setting up the training loop
  • Visualizing the results

6.1 Preparing the Dataset

The dataset is loaded the same way as when using GAN.

6.2 Defining the Encoder and Decoder

We define the Encoder and Decoder of VAE.

class VAE(nn.Module):
    def __init__(self):
        super(VAE, self).__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(3, 16, 4, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(16, 32, 4, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(32, 64, 4, stride=2, padding=1),
            nn.ReLU(),
        )
        self.fc_mu = nn.Linear(64 * 8 * 8, 128)
        self.fc_logvar = nn.Linear(64 * 8 * 8, 128)
        self.fc_decode = nn.Linear(128, 64 * 8 * 8)
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(64, 32, 4, stride=2, padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(32, 16, 4, stride=2, padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(16, 3, 4, stride=2, padding=1),
            nn.Sigmoid(),
        )

    def encode(self, x):
        h = self.encoder(x)
        h = h.view(h.size(0), -1)
        return self.fc_mu(h), self.fc_logvar(h)

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std

    def decode(self, z):
        z = self.fc_decode(z).view(-1, 64, 8, 8)
        return self.decoder(z)

    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar

6.3 Setting up the Training Loop

We implement the training process for VAE. VAE is trained using two losses: the difference between the original image and the reconstructed image (reconstruction loss) and the difference between the distribution of the latent space and the normal distribution (Kullback-Leibler divergence loss).

vae = VAE().to(device)
optimizer = optim.Adam(vae.parameters(), lr=0.0002)

num_epochs = 50
for epoch in range(num_epochs):
    for imgs, _ in dataloader:
        imgs = imgs.to(device)

        optimizer.zero_grad()
        reconstructed, mu, logvar = vae(imgs)

        re_loss = nn.functional.binary_cross_entropy(reconstructed.view(-1, 3 * 64 * 64), imgs.view(-1, 3 * 64 * 64), reduction='sum')
        kl_loss = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
        loss = re_loss + kl_loss

        loss.backward()
        optimizer.step()

    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item()}')

6.4 Visualizing the Results

We restore and visualize images using the trained VAE.

with torch.no_grad():
    z = torch.randn(64, 128).to(device)
    generated_images = vae.decode(z).cpu()

plt.figure(figsize=(8, 8))
for i in range(64):
    plt.subplot(8, 8, i + 1)
    plt.imshow(generated_images[i].permute(1, 2, 0).numpy())
    plt.axis('off')
plt.show()

7. Conclusion

In this article, we explored how to generate face images using GAN and VAE leveraging PyTorch. While GAN learns to generate increasingly realistic images through competition between the Generator and Discriminator, VAE learns the distribution of the latent space to generate new images. Both technologies play a significant role in the field of image generation and can produce remarkable results in different ways.

8. Additional References