Deep Learning with GANs using PyTorch, WGAN-GP

Generative Adversarial Networks (GAN) is a powerful generative model proposed by Ian Goodfellow in 2014. GAN consists of two neural networks, namely the Generator and the Discriminator, which compete with each other to learn. The Generator tries to create new data that resembles real data, while the Discriminator attempts to distinguish whether the given data is real or generated. They continuously improve and ultimately become capable of reliably generating very realistic data.

This article explains Wasserstein GAN with Gradient Penalty (WGAN-GP), a variant of GAN, and demonstrates how to implement WGAN-GP using PyTorch. WGAN-GP is based on the Wasserstein distance and adds a Gradient Penalty to the Discriminator to enhance training stability.

1. Basic Structure of GAN

The basic structure of GAN is as follows.

  • Generator: Receives random noise as input and generates fake data.
  • Discriminator: Receives real data and fake data produced by the Generator as input and judges how similar they are.

The learning process of GAN consists of the following two steps.

  1. The Generator generates random data from noise.
  2. The Discriminator distinguishes between real data and the generated data.

This process is performed repeatedly, leading to improvements in both networks. However, traditional GANs often face training instability and mode collapse issues, prompting research into various approaches for more stable training.

2. Introduction to WGAN-GP

WGAN aims to address the inherent problems of GAN by introducing the concept of the Wasserstein distance. The Wasserstein distance allows for a clearer definition of the differences between two distributions, facilitating network training. The key idea of WGAN is to introduce the concept of a “critic” instead of a Discriminator. The critic evaluates the distance between the generated data and real data and updates the network using Wasserstein loss rather than mean squared error (MSE) loss based on this evaluation.

By adding Gradient Penalty (GP) in WGAN, training stability is further enhanced by ensuring that the Discriminator adheres to the Lipschitz condition. The Gradient Penalty is defined as follows:

GP = λ * E[(||∇D(x) ||2 - 1)²]

Here, λ is a hyperparameter, and D(x) is the output of the Discriminator. The Gradient Penalty reinforces keeping the gradient of the Discriminator at 1. This approach enables WGAN-GP to overcome the instability of GANs and allows for more stable training.

3. Implementing WGAN-GP in PyTorch

Now, let’s implement WGAN-GP using PyTorch. The following steps will be followed:

  1. Install necessary libraries and load the dataset
  2. Define the Generator and Discriminator models
  3. Implement the WGAN-GP training loop
  4. Visualize the results

3.1 Installing Libraries and Loading the Dataset

First, install the necessary libraries and load the MNIST dataset.

!pip install torch torchvision matplotlib
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.datasets as dsets
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

3.2 Defining the Generator and Discriminator Models

Define the Generator and Discriminator models. The Generator takes a random noise vector as input and transforms it into an image, while the Discriminator evaluates whether the input image is real or fake.

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(100, 256),
            nn.ReLU(True),
            nn.Linear(256, 512),
            nn.ReLU(True),
            nn.Linear(512, 1024),
            nn.ReLU(True),
            nn.Linear(1024, 28 * 28),
            nn.Tanh()
        )
    
    def forward(self, z):
        return self.model(z).reshape(-1, 1, 28, 28)

class Critic(nn.Module):
    def __init__(self):
        super(Critic, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(28 * 28, 1024),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(1024, 512),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(256, 1)
        )

    def forward(self, x):
        return self.model(x.view(-1, 28 * 28))

3.3 Implementing the WGAN-GP Training Loop

Now, let’s implement the training loop of WGAN-GP. During the training process, the Discriminator is updated a certain number of times before updating the Generator. The Gradient Penalty is also included in the loss.

def compute_gradient_penalty(critic, real_samples, fake_samples):
    alpha = torch.rand(real_samples.size(0), 1, 1, 1).expand_as(real_samples)
    interpolated_samples = alpha * real_samples + (1 - alpha) * fake_samples
    interpolated_samples.requires_grad_(True)

    d_interpolated = critic(interpolated_samples)

    gradients = torch.autograd.grad(outputs=d_interpolated, inputs=interpolated_samples,
                                    grad_outputs=torch.ones_like(d_interpolated),
                                    create_graph=True, retain_graph=True)[0]

    gradient_penalty = ((gradients.norm(2, dim=1) - 1) ** 2).mean()
    return gradient_penalty
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

generator = Generator().to(device)
critic = Critic().to(device)

learning_rate = 0.00005
num_epochs = 100
critic_iterations = 5
lambda_gp = 10

criterion = nn.MSELoss()
optimizer_generator = optim.Adam(generator.parameters(), lr=learning_rate)
optimizer_critic = optim.Adam(critic.parameters(), lr=learning_rate)

Real_data = dsets.MNIST(root='./data', train=True, transform=transforms.ToTensor(), download=True)

data_loader = torch.utils.data.DataLoader(Real_data, batch_size=64, shuffle=True)

for epoch in range(num_epochs):
    for i, (real_images, _) in enumerate(data_loader):
        real_images = real_images.to(device)

        for _ in range(critic_iterations):
            optimizer_critic.zero_grad()

            # Generate fake images
            z = torch.randn(real_images.size(0), 100).to(device)
            fake_images = generator(z)

            # Get critic scores
            real_validity = critic(real_images)
            fake_validity = critic(fake_images)
            gradient_penalty = compute_gradient_penalty(critic, real_images.data, fake_images.data)

            # Compute loss
            critic_loss = -torch.mean(real_validity) + torch.mean(fake_validity) + lambda_gp * gradient_penalty
            critic_loss.backward()
            optimizer_critic.step()

        # Update generator
        optimizer_generator.zero_grad()
        
        # Get generator score
        fake_images = generator(z)
        validity = critic(fake_images)
        generator_loss = -torch.mean(validity)
        generator_loss.backward()
        optimizer_generator.step()

    if epoch % 10 == 0:
        print(f"Epoch: {epoch}/{num_epochs}, Critic Loss: {critic_loss.item():.4f}, Generator Loss: {generator_loss.item():.4f}")

3.4 Visualizing Results

Finally, let’s visualize the generated images. This is a good way to verify how well the Generator has learned during the training process.

def show_generated_images(generator, num_images=25):
    z = torch.randn(num_images, 100).to(device)
    generated_images = generator(z).cpu().detach().numpy()

    plt.figure(figsize=(5, 5))
    for i in range(num_images):
        plt.subplot(5, 5, i + 1)
        plt.imshow(generated_images[i][0], cmap='gray')
        plt.axis('off')
    plt.show()

show_generated_images(generator)

4. Conclusion

This article discussed WGAN-GP, a variant of GAN, and demonstrated how to implement it using PyTorch. WGAN-GP offers the advantage of more stable training by leveraging the Wasserstein distance and Gradient Penalty. These GAN-based models can be applied in various fields, including image generation, image translation, and style transfer.

As deep learning continues to advance, GANs and their variants are receiving ongoing attention, and future developments are highly anticipated. I encourage you to also take on various projects using GANs and WGAN-GP!

Deep Learning with PyTorch for GANs, Generating Face Images using VAE

Recently, in the field of artificial intelligence, Generative Adversarial Networks (GAN) and Variational Autoencoders (VAE) have established themselves as key technologies that significantly enhance the efficiency and quality of image generation. In this article, we will take a detailed look at the basic concepts of GAN and VAE, along with the process of generating face images using PyTorch.

1. Overview of GAN (Generative Adversarial Networks)

Generative Adversarial Networks (GAN) have a structure where two neural networks, a Generator and a Discriminator, compete and learn from each other. The Generator tries to create images that are similar to real ones, while the Discriminator tries to determine whether the generated images are real or fake. This process helps the Generator learn to create increasingly realistic images by deceiving the Discriminator.

1.1 How GAN Works

GAN consists of two networks as follows:

  • Generator: Takes random noise as input and generates images similar to real ones.
  • Discriminator: Classifies whether the input image is real or fake.

As the training progresses, the Generator gradually produces higher quality images, while the Discriminator analyzes the images more accurately. This process occurs in the form of a zero-sum game, with the goal of the GAN model being to simultaneously enhance the performance of the two networks.

2. Overview of VAE (Variational Autoencoder)

Variational Autoencoders (VAE) are models that learn the latent space of images or data to generate new data. VAE transforms input data into a lower-dimensional latent space through an encoder, then samples from this latent space using a decoder to reconstruct the images. VAE is a probabilistic model that learns the distribution of input data and generates new samples based on it.

2.1 Structure of VAE

VAE consists of three main components:

  • Encoder: Transforms the input data into latent variables.
  • Sampling: Extracts samples from the latent variables.
  • Decoder: Generates new images using the sampled latent variables.

3. Project Goals and Dataset

The goal of this project is to generate face images similar to real ones using GAN and VAE. For this purpose, we will use the CelebA dataset. The CelebA dataset contains various face images and is suitable for measuring the performance of GAN and VAE.

4. Environment Setup

To carry out this project, Python and the PyTorch framework are required. Below is a list of necessary packages:

pip install torch torchvision matplotlib

5. Implementing GAN with PyTorch

First, we will implement the GAN model. The structure of GAN consists of the following steps:

  • Loading the dataset
  • Defining the Generator and Discriminator
  • Setting up the training loop
  • Visualizing the results

5.1 Loading the Dataset

First, we will download and prepare the CelebA dataset.

import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader

transform = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
])

dataset = ImageFolder(root='path_to_celeba', transform=transform)
dataloader = DataLoader(dataset, batch_size=64, shuffle=True)

5.2 Defining the Generator and Discriminator

We define the Generator and Discriminator of GAN.

import torch
import torch.nn as nn

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(100, 256),
            nn.ReLU(),
            nn.Linear(256, 512),
            nn.ReLU(),
            nn.Linear(512, 1024),
            nn.ReLU(),
            nn.Linear(1024, 3 * 64 * 64),
            nn.Tanh(),
        )

    def forward(self, z):
        z = self.model(z)
        return z.view(-1, 3, 64, 64)

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(3 * 64 * 64, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 1),
            nn.Sigmoid(),
        )

    def forward(self, img):
        img_flat = img.view(img.size(0), -1)
        return self.model(img_flat)

5.3 Setting up the Training Loop

Now we implement the training process for GAN.

import torch.optim as optim

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
generator = Generator().to(device)
discriminator = Discriminator().to(device)

criterion = nn.BCELoss()
g_optimizer = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
d_optimizer = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))

num_epochs = 50
for epoch in range(num_epochs):
    for i, (imgs, _) in enumerate(dataloader):
        imgs = imgs.to(device)
        batch_size = imgs.size(0)

        # Setting labels
        real_labels = torch.ones(batch_size, 1).to(device)
        fake_labels = torch.zeros(batch_size, 1).to(device)

        # Training the Discriminator
        d_optimizer.zero_grad()
        outputs = discriminator(imgs)
        d_loss_real = criterion(outputs, real_labels)
        d_loss_real.backward()

        z = torch.randn(batch_size, 100).to(device)
        fake_imgs = generator(z)
        outputs = discriminator(fake_imgs.detach())
        d_loss_fake = criterion(outputs, fake_labels)
        d_loss_fake.backward()
        
        d_loss = d_loss_real + d_loss_fake
        d_optimizer.step()

        # Training the Generator
        g_optimizer.zero_grad()
        outputs = discriminator(fake_imgs)
        g_loss = criterion(outputs, real_labels)
        g_loss.backward()
        g_optimizer.step()

    print(f'Epoch [{epoch+1}/{num_epochs}], d_loss: {d_loss.item()}, g_loss: {g_loss.item()}')

5.4 Visualizing the Results

We visualize the images generated by the trained Generator.

import matplotlib.pyplot as plt

z = torch.randn(64, 100).to(device)
fake_images = generator(z).detach().cpu()

plt.figure(figsize=(8, 8))
for i in range(64):
    plt.subplot(8, 8, i + 1)
    plt.imshow(fake_images[i].permute(1, 2, 0).numpy() * 0.5 + 0.5)
    plt.axis('off')
plt.show()

6. Implementing VAE with PyTorch

Now let’s implement VAE. The structure of VAE is similar to GAN but uses a probabilistic approach. The implementation steps of VAE are as follows:

  • Preparing the dataset
  • Defining the Encoder and Decoder
  • Setting up the training loop
  • Visualizing the results

6.1 Preparing the Dataset

The dataset is loaded the same way as when using GAN.

6.2 Defining the Encoder and Decoder

We define the Encoder and Decoder of VAE.

class VAE(nn.Module):
    def __init__(self):
        super(VAE, self).__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(3, 16, 4, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(16, 32, 4, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(32, 64, 4, stride=2, padding=1),
            nn.ReLU(),
        )
        self.fc_mu = nn.Linear(64 * 8 * 8, 128)
        self.fc_logvar = nn.Linear(64 * 8 * 8, 128)
        self.fc_decode = nn.Linear(128, 64 * 8 * 8)
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(64, 32, 4, stride=2, padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(32, 16, 4, stride=2, padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(16, 3, 4, stride=2, padding=1),
            nn.Sigmoid(),
        )

    def encode(self, x):
        h = self.encoder(x)
        h = h.view(h.size(0), -1)
        return self.fc_mu(h), self.fc_logvar(h)

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std

    def decode(self, z):
        z = self.fc_decode(z).view(-1, 64, 8, 8)
        return self.decoder(z)

    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar

6.3 Setting up the Training Loop

We implement the training process for VAE. VAE is trained using two losses: the difference between the original image and the reconstructed image (reconstruction loss) and the difference between the distribution of the latent space and the normal distribution (Kullback-Leibler divergence loss).

vae = VAE().to(device)
optimizer = optim.Adam(vae.parameters(), lr=0.0002)

num_epochs = 50
for epoch in range(num_epochs):
    for imgs, _ in dataloader:
        imgs = imgs.to(device)

        optimizer.zero_grad()
        reconstructed, mu, logvar = vae(imgs)

        re_loss = nn.functional.binary_cross_entropy(reconstructed.view(-1, 3 * 64 * 64), imgs.view(-1, 3 * 64 * 64), reduction='sum')
        kl_loss = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
        loss = re_loss + kl_loss

        loss.backward()
        optimizer.step()

    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item()}')

6.4 Visualizing the Results

We restore and visualize images using the trained VAE.

with torch.no_grad():
    z = torch.randn(64, 128).to(device)
    generated_images = vae.decode(z).cpu()

plt.figure(figsize=(8, 8))
for i in range(64):
    plt.subplot(8, 8, i + 1)
    plt.imshow(generated_images[i].permute(1, 2, 0).numpy())
    plt.axis('off')
plt.show()

7. Conclusion

In this article, we explored how to generate face images using GAN and VAE leveraging PyTorch. While GAN learns to generate increasingly realistic images through competition between the Generator and Discriminator, VAE learns the distribution of the latent space to generate new images. Both technologies play a significant role in the field of image generation and can produce remarkable results in different ways.

8. Additional References

PyTorch-based GAN Deep Learning, VAE Training

1. Introduction

In recent years, Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) have emerged as revolutionary technologies in the field of artificial intelligence, particularly in data generation and manipulation. These models generate data in different ways, with GANs consisting of two competing neural networks, while VAEs operate by compressing and generating data using a probabilistic model.

2. Concept and Structure of GAN

GAN is a model proposed by Ian Goodfellow in 2014, consisting of a Generator and a Discriminator. The generator takes random noise as input to generate data, while the discriminator determines whether the input data is real or fake. These two networks compete against each other during training, with the generator progressively creating more realistic data.

2.1 How GAN Works

The training process of GAN is as follows:

  1. Training the Generator: The generator receives a random noise vector as input and generates fake images. The generated images are passed as input to the discriminator.
  2. Training the Discriminator: The discriminator receives real and fake images and outputs probabilities for each. The goal of the discriminator is to correctly identify fake images.
  3. Loss Function Calculation: The loss functions for both the generator and discriminator are computed. The objective of the generator is to fool the discriminator, while the discriminator’s goal is to correctly identify fake images.
  4. Network Updates: The weights of the networks are updated based on the loss.
  5. Repetition: The above process is repeated, improving the performance of each network.

3. Concept and Structure of VAE

Variational Autoencoder (VAE) is a variant of autoencoders that provides the ability to generate new data by modeling the distribution of the data. VAEs consist of an Encoder and a Decoder and learn the latent space of the data.

3.1 How VAE Works

The training process of VAE is as follows:

  1. Input Data Encoding: The encoder maps the input data to the latent space, generating mean and variance.
  2. Sampling: Sampling from the latent space using the mean and variance.
  3. Decoding: Inputting the sampled latent vector into the decoder to generate data similar to the original data.
  4. Loss Function Calculation: VAE minimizes a loss function that includes reconstruction loss and Kullback-Leibler (KL) divergence.
  5. Network Updates: Weights are updated based on the loss.
  6. Repetition: The above process is repeated to enhance the quality of the model.

4. Differences Between GAN and VAE

While both GAN and VAE are models for generating data, there are several key differences in their approaches:

  • Model Structure: GAN has a competitive structure formed by a generator and a discriminator, while VAE consists of an encoder-decoder structure.
  • Loss Function: GAN learns through the adversarial relationship between two networks, while VAE learns through reconstruction and KL divergence.
  • Data Generation Method: GAN excels at generating realistic images, whereas VAE emphasizes diversity and continuity.

5. Implementing GAN Using PyTorch

Now let’s implement GAN using PyTorch. We will look at an example of generating handwritten digit images from the MNIST dataset.

5.1 Library Installation

pip install torch torchvision matplotlib

5.2 Loading the Dataset

import torch
from torchvision import datasets, transforms

transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_data = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)

5.3 Defining the GAN Model

class Generator(torch.nn.Module):
        def __init__(self):
            super(Generator, self).__init__()
            self.model = torch.nn.Sequential(
                torch.nn.Linear(100, 256),
                torch.nn.ReLU(),
                torch.nn.Linear(256, 512),
                torch.nn.ReLU(),
                torch.nn.Linear(512, 1024),
                torch.nn.ReLU(),
                torch.nn.Linear(1024, 784),
                torch.nn.Tanh()
            )

        def forward(self, x):
            return self.model(x).view(-1, 1, 28, 28)

    class Discriminator(torch.nn.Module):
        def __init__(self):
            super(Discriminator, self).__init__()
            self.model = torch.nn.Sequential(
                torch.nn.Flatten(),
                torch.nn.Linear(784, 512),
                torch.nn.LeakyReLU(0.2),
                torch.nn.Linear(512, 256),
                torch.nn.LeakyReLU(0.2),
                torch.nn.Linear(256, 1),
                torch.nn.Sigmoid()
            )

        def forward(self, x):
            return self.model(x)

5.4 Training the Model

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
generator = Generator().to(device)
discriminator = Discriminator().to(device)

criterion = torch.nn.BCELoss()
optimizer_G = torch.optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
optimizer_D = torch.optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))

num_epochs = 200
for epoch in range(num_epochs):
    for i, (images, _) in enumerate(train_loader):
        images = images.to(device)
        batch_size = images.size(0)

        # Generate real and fake labels
        real_labels = torch.ones(batch_size, 1).to(device)
        fake_labels = torch.zeros(batch_size, 1).to(device)

        # Training the Discriminator
        optimizer_D.zero_grad()
        outputs = discriminator(images)
        d_loss_real = criterion(outputs, real_labels)
        d_loss_real.backward()

        noise = torch.randn(batch_size, 100).to(device)
        fake_images = generator(noise)
        outputs = discriminator(fake_images.detach())
        d_loss_fake = criterion(outputs, fake_labels)
        d_loss_fake.backward()
        optimizer_D.step()

        # Training the Generator
        optimizer_G.zero_grad()
        outputs = discriminator(fake_images)
        g_loss = criterion(outputs, real_labels)
        g_loss.backward()
        optimizer_G.step()

    print(f'Epoch [{epoch}/{num_epochs}], d_loss: {d_loss_real.item() + d_loss_fake.item():.4f}, g_loss: {g_loss.item():.4f}')

6. Implementing VAE Using PyTorch

Now let’s implement VAE. Again, we will look at an example of generating handwritten digit images using the MNIST dataset.

6.1 Defining the VAE Model

class VAE(torch.nn.Module):
        def __init__(self):
            super(VAE, self).__init__()
            self.encoder = torch.nn.Sequential(
                torch.nn.Flatten(),
                torch.nn.Linear(784, 400),
                torch.nn.ReLU()
            )

            self.fc_mu = torch.nn.Linear(400, 20)
            self.fc_var = torch.nn.Linear(400, 20)

            self.decoder = torch.nn.Sequential(
                torch.nn.Linear(20, 400),
                torch.nn.ReLU(),
                torch.nn.Linear(400, 784),
                torch.nn.Sigmoid()
            )

        def encode(self, x):
            h = self.encoder(x)
            return self.fc_mu(h), self.fc_var(h)

        def reparameterize(self, mu, logvar):
            std = torch.exp(0.5 * logvar)
            eps = torch.randn_like(std)
            return mu + eps * std

        def decode(self, z):
            return self.decoder(z)

        def forward(self, x):
            mu, logvar = self.encode(x)
            z = self.reparameterize(mu, logvar)
            recon_x = self.decode(z)
            return recon_x, mu, logvar

6.2 VAE Loss Function

def vae_loss(recon_x, x, mu, logvar):
        BCE = torch.nn.functional.binary_cross_entropy(recon_x, x, reduction='sum')
        return BCE + 0.5 * torch.sum(torch.exp(logvar) + mu.pow(2) - 1 - logvar)

6.3 Training the VAE Model

vae = VAE().to(device)
optimizer_VAE = torch.optim.Adam(vae.parameters(), lr=1e-3)

num_epochs = 100
for epoch in range(num_epochs):
    for i, (images, _) in enumerate(train_loader):
        images = images.to(device)

        optimizer_VAE.zero_grad()
        recon_images, mu, logvar = vae(images)
        loss = vae_loss(recon_images, images, mu, logvar)
        loss.backward()
        optimizer_VAE.step()

    print(f'Epoch [{epoch}/{num_epochs}], Loss: {loss.item():.4f}')

7. Conclusion

In this article, we explored the concepts and structures of GAN and VAE, and implemented these models using PyTorch. While GAN is powerful in generating realistic images through the competitive structure of a generator and discriminator, VAE demonstrates excellent performance in modeling and generating data through the latent space. Understanding and leveraging the characteristics of these two models can help solve various data generation problems.

8. References

Creating GAN Deep Learning and VAE Using PyTorch

1. Introduction

The advancements in artificial intelligence have increased the importance of Generative Models. Generative models play a role in generating data that is structurally different from each other, with GAN (Generative Adversarial Networks) and VAE (Variational Autoencoder) being widely used. This article will detail how to implement GAN and VAE using PyTorch.

2. GAN (Generative Adversarial Networks)

GAN is a model proposed by Ian Goodfellow in 2014, where two neural networks (the generator and the discriminator) compete against each other during training. The generator creates fake data while the discriminator is responsible for distinguishing between real and fake data.

2.1 Structure of GAN

GAN consists of the following structure:

  • Generator: Takes random noise as input and generates high-quality fake data that resembles real data.
  • Discriminator: Reviews the input data to determine whether it is real or fake.

2.2 GAN Training Process

The GAN training process includes the following steps.

  1. The generator generates random noise to create fake data.
  2. The discriminator receives the generated fake data and real data, outputting probabilities for each class.
  3. The generator tries to minimize the loss to make the discriminator judge the fake data as real.
  4. The discriminator minimizes its loss to output a high probability for real data and a low probability for fake data.

2.3 GAN Implementation Code

Below is a Python code to implement a simple GAN:


import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

# Define Generator class
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(100, 256),
            nn.ReLU(inplace=True),
            nn.Linear(256, 512),
            nn.ReLU(inplace=True),
            nn.Linear(512, 1024),
            nn.ReLU(inplace=True),
            nn.Linear(1024, 784),
            nn.Tanh()
        )

    def forward(self, z):
        return self.model(z)

# Define Discriminator class
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(784, 512),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(256, 1),
            nn.Sigmoid()
        )

    def forward(self, img):
        return self.model(img)

# Data loading
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize([0.5], [0.5])
])
mnist = datasets.MNIST('data', train=True, download=True, transform=transform)
dataloader = torch.utils.data.DataLoader(mnist, batch_size=64, shuffle=True)

# Initialize models
generator = Generator()
discriminator = Discriminator()

# Set loss function and optimization algorithm
criterion = nn.BCELoss()
optimizer_G = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
optimizer_D = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))

# GAN training
num_epochs = 100
for epoch in range(num_epochs):
    for i, (imgs, _) in enumerate(dataloader):
        # Set labels for real and fake data
        real_labels = torch.ones(imgs.size(0), 1)
        fake_labels = torch.zeros(imgs.size(0), 1)

        # Train discriminator
        optimizer_D.zero_grad()
        outputs = discriminator(imgs.view(imgs.size(0), -1))
        d_loss_real = criterion(outputs, real_labels)
        d_loss_real.backward()

        z = torch.randn(imgs.size(0), 100)
        fake_imgs = generator(z)
        outputs = discriminator(fake_imgs.detach())
        d_loss_fake = criterion(outputs, fake_labels)
        d_loss_fake.backward()
        optimizer_D.step()

        # Train generator
        optimizer_G.zero_grad()
        outputs = discriminator(fake_imgs)
        g_loss = criterion(outputs, real_labels)
        g_loss.backward()
        optimizer_G.step()

    print(f'Epoch [{epoch+1}/{num_epochs}], d_loss: {d_loss_real.item() + d_loss_fake.item()}, g_loss: {g_loss.item()}')
        

3. VAE (Variational Autoencoder)

VAE is a model proposed by D. P. Kingma and M. Welling in 2013, which generates data in a probabilistic manner. VAE is composed of an encoder and a decoder, where the encoder compresses the data into a latent space, and the decoder reconstructs the data from this latent space.

3.1 Structure of VAE

The main components of VAE are as follows:

  • Encoder: Transforms input data into a latent vector, which is learned to follow a normal distribution.
  • Decoder: Takes the latent vector as input and generates output similar to the original data.

3.2 VAE Training Process

The training process for VAE is as follows.

  1. Pass the data through the encoder to obtain the mean and variance.
  2. Use the reparameterization trick to sample.
  3. Pass the sampled latent vector through the decoder to reconstruct the data.
  4. Calculate the loss between the reconstructed data and the original data.

3.3 VAE Implementation Code

Below is a Python code to implement a simple VAE:


class VAE(nn.Module):
    def __init__(self):
        super(VAE, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(784, 400),
            nn.ReLU()
        )
        self.fc_mu = nn.Linear(400, 20)
        self.fc_logvar = nn.Linear(400, 20)
        self.decoder = nn.Sequential(
            nn.Linear(20, 400),
            nn.ReLU(),
            nn.Linear(400, 784),
            nn.Sigmoid()
        )

    def reparametrize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std

    def forward(self, x):
        h1 = self.encoder(x.view(-1, 784))
        mu = self.fc_mu(h1)
        logvar = self.fc_logvar(h1)
        z = self.reparametrize(mu, logvar)
        return self.decoder(z), mu, logvar

# VAE training
vae = VAE()
optimizer = optim.Adam(vae.parameters(), lr=0.001)
criterion = nn.BCELoss(reduction='sum')

num_epochs = 10
for epoch in range(num_epochs):
    for imgs, _ in dataloader:
        optimizer.zero_grad()
        recon_batch, mu, logvar = vae(imgs)
        recon_loss = criterion(recon_batch, imgs.view(-1, 784))
        # Kullback-Leibler divergence
        kld = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
        loss = recon_loss + kld
        loss.backward()
        optimizer.step()
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item()}')
        

4. Conclusion

GAN and VAE each have unique advantages and can be used in various generative tasks. This article has explained how to implement GAN and VAE using PyTorch, providing an opportunity to understand the principles behind each model and implement them in code. Generative models like GAN and VAE are utilized in numerous fields, such as image generation, style transfer, and data augmentation. These models have the potential to advance further and play a significant role in the field of artificial intelligence.

Training Data Collection for GAN Deep Learning and RNN using PyTorch

The advancement of artificial intelligence and machine learning has brought innovation to all areas of our lives. Among them, GAN (Generative Adversarial Networks) and RNN (Recurrent Neural Networks) are gaining attention as very powerful deep learning techniques.
In this article, we will implement a GAN model using PyTorch and discuss how to collect training data for RNN in detail.

1. What is GAN?

GAN is a learning method in which two neural networks (Generator and Discriminator) compete with each other.
The Generator generates data similar to reality, and the Discriminator determines whether this data is real or fake.
GAN is used in various fields such as image generation, video creation, and music generation.

2. Structure of GAN

GAN consists of two parts:

  • Generator: Generates new data based on a given random vector.
  • Discriminator: Distinguishes between real data and fake data generated by the Generator.

The two networks compete to improve each other’s performance, and through this process, they generate higher quality data.

3. Learning Process of GAN

The learning process of GAN generally includes the following steps:

  • (1) Generate random noise and input it into the Generator.
  • (2) The Generator generates fake data.
  • (3) The Discriminator receives real and fake data and outputs predictions for each.
  • (4) GAN updates the weights of the Generator based on the Discriminator’s output.
  • (5) Repeat this process until training is complete.

4. PyTorch Implementation of GAN

Environment Setup

First, you need to install the PyTorch library. Run the command below to install it.

pip install torch torchvision

GAN Code Example Using PyTorch

Below is a simple implementation example of GAN. We will create a model that generates handwritten digits using the MNIST dataset.


import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
from torchvision import datasets
from torch.utils.data import DataLoader

# Hyperparameters
latent_size = 64
batch_size = 100
learning_rate = 0.0002
num_epochs = 200

# Load dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

mnist = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
data_loader = DataLoader(dataset=mnist, batch_size=batch_size, shuffle=True)

# Define Generator class
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.main = nn.Sequential(
            nn.Linear(latent_size, 256),
            nn.ReLU(True),
            nn.Linear(256, 512),
            nn.ReLU(True),
            nn.Linear(512, 1024),
            nn.ReLU(True),
            nn.Linear(1024, 784),
            nn.Tanh()
        )

    def forward(self, x):
        return self.main(x)

# Define Discriminator class
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.main = nn.Sequential(
            nn.Linear(784, 1024),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(1024, 512),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(256, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.main(x)

generator = Generator().cuda()
discriminator = Discriminator().cuda()

criterion = nn.BCELoss()
optimizer_G = optim.Adam(generator.parameters(), lr=learning_rate)
optimizer_D = optim.Adam(discriminator.parameters(), lr=learning_rate)

# Start training
for epoch in range(num_epochs):
    for i, (images, _) in enumerate(data_loader):
        # Real data labels
        real_images = images.view(-1, 28*28).cuda()
        real_labels = torch.ones(batch_size, 1).cuda()
        # Fake data labels
        noise = torch.randn(batch_size, latent_size).cuda()
        fake_images = generator(noise)
        fake_labels = torch.zeros(batch_size, 1).cuda()

        # Discriminator training
        optimizer_D.zero_grad()
        outputs_real = discriminator(real_images)
        outputs_fake = discriminator(fake_images.detach())
        loss_D_real = criterion(outputs_real, real_labels)
        loss_D_fake = criterion(outputs_fake, fake_labels)
        loss_D = loss_D_real + loss_D_fake
        loss_D.backward()
        optimizer_D.step()

        # Generator training
        optimizer_G.zero_grad()
        outputs = discriminator(fake_images)
        loss_G = criterion(outputs, real_labels)
        loss_G.backward()
        optimizer_G.step()

    print(f"Epoch [{epoch+1}/{num_epochs}], Loss D: {loss_D.item()}, Loss G: {loss_G.item()}")
    if (epoch+1) % 10 == 0:
        # Code to save results can be added here
        pass
    

5. Introduction to RNN (Recurrent Neural Network)

RNN is a neural network structure suitable for processing ordered data, or sequence data. For example, data such as text, music, and time-series data fall into this category.
RNN works by remembering previous states and updating the current state based on these memories.

Structure of RNN

RNN consists of the following components:

  • Input Layer: The first layer of the model that receives sequence data.
  • Hidden Layer: Remembers previous states and combines them with the current input to produce outputs.
  • Output Layer: The layer that generates the final output.

6. Collecting Training Data for RNN

To train an RNN, appropriate training data is required. Here, we will explain the process of collecting and preprocessing text data.

6.1 Data Collection

The data that can be used to train RNNs varies. For example, text data in various forms such as movie reviews, novels, and news articles is possible.
Data can be collected using web scraping tools (e.g., BeautifulSoup).


import requests
from bs4 import BeautifulSoup

url = 'https://example.com/articles'  # Change to the desired URL
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

articles = []
for item in soup.find_all('article'):
    title = item.find('h2').text
    content = item.find('p').text
    articles.append(f"{title}\n{content}")

with open('data.txt', 'w', encoding='utf-8') as f:
    for article in articles:
        f.write(article + "\n\n")
    

6.2 Data Preprocessing

The collected data needs to undergo a preprocessing procedure before being used as input to the RNN model. A typical preprocessing process includes:

  • Lowercasing
  • Removing special characters and numbers
  • Removing stop words

import re
import nltk
from nltk.corpus import stopwords

# Downloading NLTK's list of stopwords
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))

def preprocess_text(text):
    # Lowercasing
    text = text.lower()
    # Remove special characters and numbers
    text = re.sub(r'[^a-z\s]', '', text)
    # Remove stop words
    text = ' '.join([word for word in text.split() if word not in stop_words])
    return text

# Apply preprocessing
preprocessed_articles = [preprocess_text(article) for article in articles]
    

7. RNN Model Implementation Example

Environment Setup

pip install torch torchvision nltk

RNN Code Example Using PyTorch

Below is a simple RNN model implementation example. It processes text data using word embedding.


import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

# Define RNN model
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()
        self.embedding = nn.Embedding(input_size, hidden_size)
        self.rnn = nn.RNN(hidden_size, hidden_size)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.embedding(x)
        output, hidden = self.rnn(x)
        output = self.fc(output[-1])
        return output

# Create training dataset
class TextDataset(Dataset):
    def __init__(self, texts, labels):
        self.texts = texts
        self.labels = labels

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        return torch.tensor(self.texts[idx]), torch.tensor(self.labels[idx])

# Set hyperparameters
input_size = 1000  # Number of words
hidden_size = 128
output_size = 2  # Number of classes to classify (e.g., positive/negative)
num_epochs = 20
learning_rate = 0.001

# Load and preprocess data
# Here replaced with dummy data.
texts = [...]  # Preprocessed text data
labels = [...]  # Corresponding class labels

dataset = TextDataset(texts, labels)
data_loader = DataLoader(dataset, batch_size=32, shuffle=True)

# Initialize model
model = RNN(input_size, hidden_size, output_size)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Start training
for epoch in range(num_epochs):
    for texts, labels in data_loader:
        optimizer.zero_grad()
        outputs = model(texts)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item()}")
    

8. Conclusion

In this article, we learned the basic principles and implementation examples of GAN and RNN using PyTorch.
We examined the process of generating image data using GAN and processing text data in the case of RNN.
These technologies will continue to evolve and be used in more fields.
I encourage you to start new projects using these technologies.