1. Introduction
The advancements in artificial intelligence have increased the importance of Generative Models. Generative models play a role in generating data that is structurally different from each other, with GAN (Generative Adversarial Networks) and VAE (Variational Autoencoder) being widely used. This article will detail how to implement GAN and VAE using PyTorch.
2. GAN (Generative Adversarial Networks)
GAN is a model proposed by Ian Goodfellow in 2014, where two neural networks (the generator and the discriminator) compete against each other during training. The generator creates fake data while the discriminator is responsible for distinguishing between real and fake data.
2.1 Structure of GAN
GAN consists of the following structure:
- Generator: Takes random noise as input and generates high-quality fake data that resembles real data.
- Discriminator: Reviews the input data to determine whether it is real or fake.
2.2 GAN Training Process
The GAN training process includes the following steps.
- The generator generates random noise to create fake data.
- The discriminator receives the generated fake data and real data, outputting probabilities for each class.
- The generator tries to minimize the loss to make the discriminator judge the fake data as real.
- The discriminator minimizes its loss to output a high probability for real data and a low probability for fake data.
2.3 GAN Implementation Code
Below is a Python code to implement a simple GAN:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
# Define Generator class
class Generator(nn.Module):
def __init__(self):
super(Generator, self).__init__()
self.model = nn.Sequential(
nn.Linear(100, 256),
nn.ReLU(inplace=True),
nn.Linear(256, 512),
nn.ReLU(inplace=True),
nn.Linear(512, 1024),
nn.ReLU(inplace=True),
nn.Linear(1024, 784),
nn.Tanh()
)
def forward(self, z):
return self.model(z)
# Define Discriminator class
class Discriminator(nn.Module):
def __init__(self):
super(Discriminator, self).__init__()
self.model = nn.Sequential(
nn.Linear(784, 512),
nn.LeakyReLU(0.2, inplace=True),
nn.Linear(512, 256),
nn.LeakyReLU(0.2, inplace=True),
nn.Linear(256, 1),
nn.Sigmoid()
)
def forward(self, img):
return self.model(img)
# Data loading
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize([0.5], [0.5])
])
mnist = datasets.MNIST('data', train=True, download=True, transform=transform)
dataloader = torch.utils.data.DataLoader(mnist, batch_size=64, shuffle=True)
# Initialize models
generator = Generator()
discriminator = Discriminator()
# Set loss function and optimization algorithm
criterion = nn.BCELoss()
optimizer_G = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
optimizer_D = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))
# GAN training
num_epochs = 100
for epoch in range(num_epochs):
for i, (imgs, _) in enumerate(dataloader):
# Set labels for real and fake data
real_labels = torch.ones(imgs.size(0), 1)
fake_labels = torch.zeros(imgs.size(0), 1)
# Train discriminator
optimizer_D.zero_grad()
outputs = discriminator(imgs.view(imgs.size(0), -1))
d_loss_real = criterion(outputs, real_labels)
d_loss_real.backward()
z = torch.randn(imgs.size(0), 100)
fake_imgs = generator(z)
outputs = discriminator(fake_imgs.detach())
d_loss_fake = criterion(outputs, fake_labels)
d_loss_fake.backward()
optimizer_D.step()
# Train generator
optimizer_G.zero_grad()
outputs = discriminator(fake_imgs)
g_loss = criterion(outputs, real_labels)
g_loss.backward()
optimizer_G.step()
print(f'Epoch [{epoch+1}/{num_epochs}], d_loss: {d_loss_real.item() + d_loss_fake.item()}, g_loss: {g_loss.item()}')
3. VAE (Variational Autoencoder)
VAE is a model proposed by D. P. Kingma and M. Welling in 2013, which generates data in a probabilistic manner. VAE is composed of an encoder and a decoder, where the encoder compresses the data into a latent space, and the decoder reconstructs the data from this latent space.
3.1 Structure of VAE
The main components of VAE are as follows:
- Encoder: Transforms input data into a latent vector, which is learned to follow a normal distribution.
- Decoder: Takes the latent vector as input and generates output similar to the original data.
3.2 VAE Training Process
The training process for VAE is as follows.
- Pass the data through the encoder to obtain the mean and variance.
- Use the reparameterization trick to sample.
- Pass the sampled latent vector through the decoder to reconstruct the data.
- Calculate the loss between the reconstructed data and the original data.
3.3 VAE Implementation Code
Below is a Python code to implement a simple VAE:
class VAE(nn.Module):
def __init__(self):
super(VAE, self).__init__()
self.encoder = nn.Sequential(
nn.Linear(784, 400),
nn.ReLU()
)
self.fc_mu = nn.Linear(400, 20)
self.fc_logvar = nn.Linear(400, 20)
self.decoder = nn.Sequential(
nn.Linear(20, 400),
nn.ReLU(),
nn.Linear(400, 784),
nn.Sigmoid()
)
def reparametrize(self, mu, logvar):
std = torch.exp(0.5 * logvar)
eps = torch.randn_like(std)
return mu + eps * std
def forward(self, x):
h1 = self.encoder(x.view(-1, 784))
mu = self.fc_mu(h1)
logvar = self.fc_logvar(h1)
z = self.reparametrize(mu, logvar)
return self.decoder(z), mu, logvar
# VAE training
vae = VAE()
optimizer = optim.Adam(vae.parameters(), lr=0.001)
criterion = nn.BCELoss(reduction='sum')
num_epochs = 10
for epoch in range(num_epochs):
for imgs, _ in dataloader:
optimizer.zero_grad()
recon_batch, mu, logvar = vae(imgs)
recon_loss = criterion(recon_batch, imgs.view(-1, 784))
# Kullback-Leibler divergence
kld = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
loss = recon_loss + kld
loss.backward()
optimizer.step()
print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item()}')
4. Conclusion
GAN and VAE each have unique advantages and can be used in various generative tasks. This article has explained how to implement GAN and VAE using PyTorch, providing an opportunity to understand the principles behind each model and implement them in code. Generative models like GAN and VAE are utilized in numerous fields, such as image generation, style transfer, and data augmentation. These models have the potential to advance further and play a significant role in the field of artificial intelligence.