Deep Learning with PyTorch for GANs, Generating Face Images using VAE

Recently, in the field of artificial intelligence, Generative Adversarial Networks (GAN) and Variational Autoencoders (VAE) have established themselves as key technologies that significantly enhance the efficiency and quality of image generation. In this article, we will take a detailed look at the basic concepts of GAN and VAE, along with the process of generating face images using PyTorch.

1. Overview of GAN (Generative Adversarial Networks)

Generative Adversarial Networks (GAN) have a structure where two neural networks, a Generator and a Discriminator, compete and learn from each other. The Generator tries to create images that are similar to real ones, while the Discriminator tries to determine whether the generated images are real or fake. This process helps the Generator learn to create increasingly realistic images by deceiving the Discriminator.

1.1 How GAN Works

GAN consists of two networks as follows:

Generator: Takes random noise as input and generates images similar to real ones.
Discriminator: Classifies whether the input image is real or fake.

As the training progresses, the Generator gradually produces higher quality images, while the Discriminator analyzes the images more accurately. This process occurs in the form of a zero-sum game, with the goal of the GAN model being to simultaneously enhance the performance of the two networks.

2. Overview of VAE (Variational Autoencoder)

Variational Autoencoders (VAE) are models that learn the latent space of images or data to generate new data. VAE transforms input data into a lower-dimensional latent space through an encoder, then samples from this latent space using a decoder to reconstruct the images. VAE is a probabilistic model that learns the distribution of input data and generates new samples based on it.

2.1 Structure of VAE

VAE consists of three main components:

Encoder: Transforms the input data into latent variables.
Sampling: Extracts samples from the latent variables.
Decoder: Generates new images using the sampled latent variables.

3. Project Goals and Dataset

The goal of this project is to generate face images similar to real ones using GAN and VAE. For this purpose, we will use the CelebA dataset. The CelebA dataset contains various face images and is suitable for measuring the performance of GAN and VAE.

4. Environment Setup

To carry out this project, Python and the PyTorch framework are required. Below is a list of necessary packages:

pip install torch torchvision matplotlib

5. Implementing GAN with PyTorch

First, we will implement the GAN model. The structure of GAN consists of the following steps:

Loading the dataset
Defining the Generator and Discriminator
Setting up the training loop
Visualizing the results

5.1 Loading the Dataset

First, we will download and prepare the CelebA dataset.

import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader

transform = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
])

dataset = ImageFolder(root='path_to_celeba', transform=transform)
dataloader = DataLoader(dataset, batch_size=64, shuffle=True)

5.2 Defining the Generator and Discriminator

We define the Generator and Discriminator of GAN.

import torch
import torch.nn as nn

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(100, 256),
            nn.ReLU(),
            nn.Linear(256, 512),
            nn.ReLU(),
            nn.Linear(512, 1024),
            nn.ReLU(),
            nn.Linear(1024, 3 * 64 * 64),
            nn.Tanh(),
        )

    def forward(self, z):
        z = self.model(z)
        return z.view(-1, 3, 64, 64)

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(3 * 64 * 64, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 1),
            nn.Sigmoid(),
        )

    def forward(self, img):
        img_flat = img.view(img.size(0), -1)
        return self.model(img_flat)

5.3 Setting up the Training Loop

Now we implement the training process for GAN.

import torch.optim as optim

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
generator = Generator().to(device)
discriminator = Discriminator().to(device)

criterion = nn.BCELoss()
g_optimizer = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
d_optimizer = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))

num_epochs = 50
for epoch in range(num_epochs):
    for i, (imgs, _) in enumerate(dataloader):
        imgs = imgs.to(device)
        batch_size = imgs.size(0)

        # Setting labels
        real_labels = torch.ones(batch_size, 1).to(device)
        fake_labels = torch.zeros(batch_size, 1).to(device)

        # Training the Discriminator
        d_optimizer.zero_grad()
        outputs = discriminator(imgs)
        d_loss_real = criterion(outputs, real_labels)
        d_loss_real.backward()

        z = torch.randn(batch_size, 100).to(device)
        fake_imgs = generator(z)
        outputs = discriminator(fake_imgs.detach())
        d_loss_fake = criterion(outputs, fake_labels)
        d_loss_fake.backward()
        
        d_loss = d_loss_real + d_loss_fake
        d_optimizer.step()

        # Training the Generator
        g_optimizer.zero_grad()
        outputs = discriminator(fake_imgs)
        g_loss = criterion(outputs, real_labels)
        g_loss.backward()
        g_optimizer.step()

    print(f'Epoch [{epoch+1}/{num_epochs}], d_loss: {d_loss.item()}, g_loss: {g_loss.item()}')

5.4 Visualizing the Results

We visualize the images generated by the trained Generator.

import matplotlib.pyplot as plt

z = torch.randn(64, 100).to(device)
fake_images = generator(z).detach().cpu()

plt.figure(figsize=(8, 8))
for i in range(64):
    plt.subplot(8, 8, i + 1)
    plt.imshow(fake_images[i].permute(1, 2, 0).numpy() * 0.5 + 0.5)
    plt.axis('off')
plt.show()

6. Implementing VAE with PyTorch

Now let’s implement VAE. The structure of VAE is similar to GAN but uses a probabilistic approach. The implementation steps of VAE are as follows:

Preparing the dataset
Defining the Encoder and Decoder
Setting up the training loop
Visualizing the results

6.1 Preparing the Dataset

The dataset is loaded the same way as when using GAN.

6.2 Defining the Encoder and Decoder

We define the Encoder and Decoder of VAE.

class VAE(nn.Module):
    def __init__(self):
        super(VAE, self).__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(3, 16, 4, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(16, 32, 4, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(32, 64, 4, stride=2, padding=1),
            nn.ReLU(),
        )
        self.fc_mu = nn.Linear(64 * 8 * 8, 128)
        self.fc_logvar = nn.Linear(64 * 8 * 8, 128)
        self.fc_decode = nn.Linear(128, 64 * 8 * 8)
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(64, 32, 4, stride=2, padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(32, 16, 4, stride=2, padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(16, 3, 4, stride=2, padding=1),
            nn.Sigmoid(),
        )

    def encode(self, x):
        h = self.encoder(x)
        h = h.view(h.size(0), -1)
        return self.fc_mu(h), self.fc_logvar(h)

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std

    def decode(self, z):
        z = self.fc_decode(z).view(-1, 64, 8, 8)
        return self.decoder(z)

    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar

6.3 Setting up the Training Loop

We implement the training process for VAE. VAE is trained using two losses: the difference between the original image and the reconstructed image (reconstruction loss) and the difference between the distribution of the latent space and the normal distribution (Kullback-Leibler divergence loss).

vae = VAE().to(device)
optimizer = optim.Adam(vae.parameters(), lr=0.0002)

num_epochs = 50
for epoch in range(num_epochs):
    for imgs, _ in dataloader:
        imgs = imgs.to(device)

        optimizer.zero_grad()
        reconstructed, mu, logvar = vae(imgs)

        re_loss = nn.functional.binary_cross_entropy(reconstructed.view(-1, 3 * 64 * 64), imgs.view(-1, 3 * 64 * 64), reduction='sum')
        kl_loss = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
        loss = re_loss + kl_loss

        loss.backward()
        optimizer.step()

    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item()}')

6.4 Visualizing the Results

We restore and visualize images using the trained VAE.

with torch.no_grad():
    z = torch.randn(64, 128).to(device)
    generated_images = vae.decode(z).cpu()

plt.figure(figsize=(8, 8))
for i in range(64):
    plt.subplot(8, 8, i + 1)
    plt.imshow(generated_images[i].permute(1, 2, 0).numpy())
    plt.axis('off')
plt.show()

7. Conclusion

In this article, we explored how to generate face images using GAN and VAE leveraging PyTorch. While GAN learns to generate increasingly realistic images through competition between the Generator and Discriminator, VAE learns the distribution of the latent space to generate new images. Both technologies play a significant role in the field of image generation and can produce remarkable results in different ways.