1. Introduction
In recent years, the fields of artificial intelligence and deep learning have led innovations in information technology. Among them, Generative Adversarial Networks (GAN) and
Neural Style Transfer have gained attention as innovative methodologies for generating and transforming visual content. This course will explain the basic concepts of GAN and
how to implement Neural Style Transfer using PyTorch.
2. Basic Concepts of GAN
GAN consists of two neural networks: a Generator and a Discriminator. The Generator generates fake data, while the Discriminator distinguishes between real and fake data.
These two networks compete and learn from each other. The Generator continuously improves the quality of the data to fool the Discriminator, and the Discriminator learns to better distinguish the
data created by the Generator.
2.1 Structure of GAN
The process of GAN can be summarized as follows:
- Generate fake images by feeding random noise into the generator.
- Input the generated images and real images into the discriminator.
- The discriminator outputs the probability of the input images being real or fake.
- The generator improves the fake images based on the feedback from the discriminator.
3. Implementing GAN
Now let’s implement GAN. In this example, we will build a GAN that generates digit images using the MNIST dataset.
3.1 Installing Required Libraries
pip install torch torchvision matplotlib
3.2 Loading MNIST Dataset
import torch
import torchvision.datasets as dsets
import torchvision.transforms as transforms
# Download and load MNIST dataset
def load_mnist(batch_size):
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_dataset = dsets.MNIST(root='./data', train=True, transform=transform, download=True)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
return train_loader
# Set batch size to 100
batch_size = 100
train_loader = load_mnist(batch_size)
3.3 Building GAN Model
Now we will define the generator and discriminator models.
import torch.nn as nn
# Define generator model
class Generator(nn.Module):
def __init__(self):
super(Generator, self).__init__()
self.model = nn.Sequential(
nn.Linear(100, 256),
nn.ReLU(),
nn.Linear(256, 512),
nn.ReLU(),
nn.Linear(512, 1024),
nn.ReLU(),
nn.Linear(1024, 784),
nn.Tanh() # Pixel value range for MNIST images: -1 to 1
)
def forward(self, z):
return self.model(z).reshape(-1, 1, 28, 28) # Reshape to MNIST image format
# Define discriminator model
class Discriminator(nn.Module):
def __init__(self):
super(Discriminator, self).__init__()
self.model = nn.Sequential(
nn.Flatten(),
nn.Linear(784, 512),
nn.LeakyReLU(0.2),
nn.Linear(512, 256),
nn.LeakyReLU(0.2),
nn.Linear(256, 1),
nn.Sigmoid() # Output values between 0 and 1
)
def forward(self, img):
return self.model(img)
3.4 GAN Training Process
Now we will implement the functionality to train GAN.
import torchvision.utils as vutils
def train_gan(epochs, train_loader):
generator = Generator()
discriminator = Discriminator()
criterion = nn.BCELoss()
lr = 0.0002
beta1 = 0.5
g_optimizer = torch.optim.Adam(generator.parameters(), lr=lr, betas=(beta1, 0.999))
d_optimizer = torch.optim.Adam(discriminator.parameters(), lr=lr, betas=(beta1, 0.999))
for epoch in range(epochs):
for i, (imgs, _) in enumerate(train_loader):
# Generate real and fake labels
real_labels = torch.ones(imgs.size(0), 1)
fake_labels = torch.zeros(imgs.size(0), 1)
# Train discriminator
discriminator.zero_grad()
outputs = discriminator(imgs)
d_loss_real = criterion(outputs, real_labels)
d_loss_real.backward()
z = torch.randn(imgs.size(0), 100) # Noise vector
fake_imgs = generator(z)
outputs = discriminator(fake_imgs.detach())
d_loss_fake = criterion(outputs, fake_labels)
d_loss_fake.backward()
d_loss = d_loss_real + d_loss_fake
d_optimizer.step()
# Train generator
generator.zero_grad()
outputs = discriminator(fake_imgs)
g_loss = criterion(outputs, real_labels)
g_loss.backward()
g_optimizer.step()
if (i + 1) % 100 == 0:
print(f'Epoch [{epoch + 1}/{epochs}], Step [{i + 1}/{len(train_loader)}], '
f'D Loss: {d_loss.item()}, G Loss: {g_loss.item()}')
# Save generated images
if (epoch + 1) % 10 == 0:
with torch.no_grad():
fake_imgs = generator(z).detach()
vutils.save_image(fake_imgs, f'output/fake_images-{epoch + 1}.png', normalize=True)
train_gan(epochs=50, train_loader=train_loader)
4. Neural Style Transfer
Neural Style Transfer is a technique that separates the content and style of an image to transform a content image into the characteristics of a style image.
This process is based on Convolutional Neural Networks (CNN) and typically involves the following steps:
- Extract content and style images.
- Combine the two images to generate the final image.
4.1 Installing Required Libraries
pip install Pillow numpy matplotlib
4.2 Preparing the Model
import torch
import torch.nn as nn
from torchvision import models
class StyleTransferModel(nn.Module):
def __init__(self):
super(StyleTransferModel, self).__init__()
self.vgg = models.vgg19(pretrained=True).features.eval() # Using VGG19 model
def forward(self, x):
return self.vgg(x)
4.3 Defining Style and Content Loss
class ContentLoss(nn.Module):
def __init__(self, target):
super(ContentLoss, self).__init__()
self.target = target.detach() # Prevent gradient calculation for target
def forward(self, x):
return nn.functional.mse_loss(x, self.target)
class StyleLoss(nn.Module):
def __init__(self, target):
super(StyleLoss, self).__init__()
self.target = self.gram_matrix(target).detach() # Calculate Gram matrix
def gram_matrix(self, x):
b, c, h, w = x.size()
features = x.view(b, c, h * w)
G = torch.bmm(features, features.transpose(1, 2)) # Create Gram matrix
return G.div(c * h * w)
def forward(self, x):
G = self.gram_matrix(x)
return nn.functional.mse_loss(G, self.target)
4.4 Running Style Transfer
Now we will define the function and loss for style transfer, then implement the training loop. Ultimately, we minimize the combined content and style loss.
def run_style_transfer(content_img, style_img, model, num_steps=500, style_weight=1000000, content_weight=1):
target = content_img.clone().requires_grad_(True) # Create initial image
optimizer = torch.optim.LBFGS([target]) # Use LBFGS optimization technique
style_losses = []
content_losses = []
for layer in model.children():
target = layer(target)
if isinstance(layer, ContentLoss):
content_losses.append(target)
if isinstance(layer, StyleLoss):
style_losses.append(target)
for step in range(num_steps):
def closure():
optimizer.zero_grad()
target_data = target.data
style_loss_val = sum([style_loss(target_data).item() for style_loss in style_losses])
content_loss_val = sum([content_loss(target_data).item() for content_loss in content_losses])
total_loss = style_weight * style_loss_val + content_weight * content_loss_val
total_loss.backward()
return total_loss
optimizer.step(closure)
return target.data
5. Conclusion
In this course, we learned about implementing image generation and neural style transfer using GAN. GAN has set a new standard in image generation technology, and
neural style transfer is a methodology for creating unique artistic works by combining images. Both technologies are driving advancements in deep learning and will be
applicable in various fields in the future.