Recently, in the field of artificial intelligence, Generative Adversarial Networks (GAN) and Variational Autoencoders (VAE) have established themselves as key technologies that significantly enhance the efficiency and quality of image generation. In this article, we will take a detailed look at the basic concepts of GAN and VAE, along with the process of generating face images using PyTorch.
1. Overview of GAN (Generative Adversarial Networks)
Generative Adversarial Networks (GAN) have a structure where two neural networks, a Generator and a Discriminator, compete and learn from each other. The Generator tries to create images that are similar to real ones, while the Discriminator tries to determine whether the generated images are real or fake. This process helps the Generator learn to create increasingly realistic images by deceiving the Discriminator.
1.1 How GAN Works
GAN consists of two networks as follows:
- Generator: Takes random noise as input and generates images similar to real ones.
- Discriminator: Classifies whether the input image is real or fake.
As the training progresses, the Generator gradually produces higher quality images, while the Discriminator analyzes the images more accurately. This process occurs in the form of a zero-sum game, with the goal of the GAN model being to simultaneously enhance the performance of the two networks.
2. Overview of VAE (Variational Autoencoder)
Variational Autoencoders (VAE) are models that learn the latent space of images or data to generate new data. VAE transforms input data into a lower-dimensional latent space through an encoder, then samples from this latent space using a decoder to reconstruct the images. VAE is a probabilistic model that learns the distribution of input data and generates new samples based on it.
2.1 Structure of VAE
VAE consists of three main components:
- Encoder: Transforms the input data into latent variables.
- Sampling: Extracts samples from the latent variables.
- Decoder: Generates new images using the sampled latent variables.
3. Project Goals and Dataset
The goal of this project is to generate face images similar to real ones using GAN and VAE. For this purpose, we will use the CelebA dataset. The CelebA dataset contains various face images and is suitable for measuring the performance of GAN and VAE.
4. Environment Setup
To carry out this project, Python and the PyTorch framework are required. Below is a list of necessary packages:
pip install torch torchvision matplotlib
5. Implementing GAN with PyTorch
First, we will implement the GAN model. The structure of GAN consists of the following steps:
- Loading the dataset
- Defining the Generator and Discriminator
- Setting up the training loop
- Visualizing the results
5.1 Loading the Dataset
First, we will download and prepare the CelebA dataset.
import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader
transform = transforms.Compose([
transforms.Resize((64, 64)),
transforms.ToTensor(),
])
dataset = ImageFolder(root='path_to_celeba', transform=transform)
dataloader = DataLoader(dataset, batch_size=64, shuffle=True)
5.2 Defining the Generator and Discriminator
We define the Generator and Discriminator of GAN.
import torch
import torch.nn as nn
class Generator(nn.Module):
def __init__(self):
super(Generator, self).__init__()
self.model = nn.Sequential(
nn.Linear(100, 256),
nn.ReLU(),
nn.Linear(256, 512),
nn.ReLU(),
nn.Linear(512, 1024),
nn.ReLU(),
nn.Linear(1024, 3 * 64 * 64),
nn.Tanh(),
)
def forward(self, z):
z = self.model(z)
return z.view(-1, 3, 64, 64)
class Discriminator(nn.Module):
def __init__(self):
super(Discriminator, self).__init__()
self.model = nn.Sequential(
nn.Linear(3 * 64 * 64, 512),
nn.LeakyReLU(0.2),
nn.Linear(512, 256),
nn.LeakyReLU(0.2),
nn.Linear(256, 1),
nn.Sigmoid(),
)
def forward(self, img):
img_flat = img.view(img.size(0), -1)
return self.model(img_flat)
5.3 Setting up the Training Loop
Now we implement the training process for GAN.
import torch.optim as optim
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
generator = Generator().to(device)
discriminator = Discriminator().to(device)
criterion = nn.BCELoss()
g_optimizer = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
d_optimizer = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))
num_epochs = 50
for epoch in range(num_epochs):
for i, (imgs, _) in enumerate(dataloader):
imgs = imgs.to(device)
batch_size = imgs.size(0)
# Setting labels
real_labels = torch.ones(batch_size, 1).to(device)
fake_labels = torch.zeros(batch_size, 1).to(device)
# Training the Discriminator
d_optimizer.zero_grad()
outputs = discriminator(imgs)
d_loss_real = criterion(outputs, real_labels)
d_loss_real.backward()
z = torch.randn(batch_size, 100).to(device)
fake_imgs = generator(z)
outputs = discriminator(fake_imgs.detach())
d_loss_fake = criterion(outputs, fake_labels)
d_loss_fake.backward()
d_loss = d_loss_real + d_loss_fake
d_optimizer.step()
# Training the Generator
g_optimizer.zero_grad()
outputs = discriminator(fake_imgs)
g_loss = criterion(outputs, real_labels)
g_loss.backward()
g_optimizer.step()
print(f'Epoch [{epoch+1}/{num_epochs}], d_loss: {d_loss.item()}, g_loss: {g_loss.item()}')
5.4 Visualizing the Results
We visualize the images generated by the trained Generator.
import matplotlib.pyplot as plt
z = torch.randn(64, 100).to(device)
fake_images = generator(z).detach().cpu()
plt.figure(figsize=(8, 8))
for i in range(64):
plt.subplot(8, 8, i + 1)
plt.imshow(fake_images[i].permute(1, 2, 0).numpy() * 0.5 + 0.5)
plt.axis('off')
plt.show()
6. Implementing VAE with PyTorch
Now let’s implement VAE. The structure of VAE is similar to GAN but uses a probabilistic approach. The implementation steps of VAE are as follows:
- Preparing the dataset
- Defining the Encoder and Decoder
- Setting up the training loop
- Visualizing the results
6.1 Preparing the Dataset
The dataset is loaded the same way as when using GAN.
6.2 Defining the Encoder and Decoder
We define the Encoder and Decoder of VAE.
class VAE(nn.Module):
def __init__(self):
super(VAE, self).__init__()
self.encoder = nn.Sequential(
nn.Conv2d(3, 16, 4, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(16, 32, 4, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(32, 64, 4, stride=2, padding=1),
nn.ReLU(),
)
self.fc_mu = nn.Linear(64 * 8 * 8, 128)
self.fc_logvar = nn.Linear(64 * 8 * 8, 128)
self.fc_decode = nn.Linear(128, 64 * 8 * 8)
self.decoder = nn.Sequential(
nn.ConvTranspose2d(64, 32, 4, stride=2, padding=1),
nn.ReLU(),
nn.ConvTranspose2d(32, 16, 4, stride=2, padding=1),
nn.ReLU(),
nn.ConvTranspose2d(16, 3, 4, stride=2, padding=1),
nn.Sigmoid(),
)
def encode(self, x):
h = self.encoder(x)
h = h.view(h.size(0), -1)
return self.fc_mu(h), self.fc_logvar(h)
def reparameterize(self, mu, logvar):
std = torch.exp(0.5 * logvar)
eps = torch.randn_like(std)
return mu + eps * std
def decode(self, z):
z = self.fc_decode(z).view(-1, 64, 8, 8)
return self.decoder(z)
def forward(self, x):
mu, logvar = self.encode(x)
z = self.reparameterize(mu, logvar)
return self.decode(z), mu, logvar
6.3 Setting up the Training Loop
We implement the training process for VAE. VAE is trained using two losses: the difference between the original image and the reconstructed image (reconstruction loss) and the difference between the distribution of the latent space and the normal distribution (Kullback-Leibler divergence loss).
vae = VAE().to(device)
optimizer = optim.Adam(vae.parameters(), lr=0.0002)
num_epochs = 50
for epoch in range(num_epochs):
for imgs, _ in dataloader:
imgs = imgs.to(device)
optimizer.zero_grad()
reconstructed, mu, logvar = vae(imgs)
re_loss = nn.functional.binary_cross_entropy(reconstructed.view(-1, 3 * 64 * 64), imgs.view(-1, 3 * 64 * 64), reduction='sum')
kl_loss = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
loss = re_loss + kl_loss
loss.backward()
optimizer.step()
print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item()}')
6.4 Visualizing the Results
We restore and visualize images using the trained VAE.
with torch.no_grad():
z = torch.randn(64, 128).to(device)
generated_images = vae.decode(z).cpu()
plt.figure(figsize=(8, 8))
for i in range(64):
plt.subplot(8, 8, i + 1)
plt.imshow(generated_images[i].permute(1, 2, 0).numpy())
plt.axis('off')
plt.show()
7. Conclusion
In this article, we explored how to generate face images using GAN and VAE leveraging PyTorch. While GAN learns to generate increasingly realistic images through competition between the Generator and Discriminator, VAE learns the distribution of the latent space to generate new images. Both technologies play a significant role in the field of image generation and can produce remarkable results in different ways.