root, 라이브스마트의 작성자

Use of PyTorch for GAN Deep Learning, Probabilistic Generative Model

In this post, we will take a closer look at Generative Adversarial Networks (GAN). GAN is a generative model proposed by Ian Goodfellow in 2014, which uses two neural networks (Generator and Discriminator) to generate data. The key aspect of GAN that we focus on is that the two neural networks compete with each other, which allows for the generation of more advanced data.

1. Basic Structure of GAN

GAN consists of the following two components:

Generator: It is responsible for generating new data. It takes random noise as input and outputs data that is similar to real data.
Discriminator: It distinguishes whether the given data is real data or data generated by the Generator.

The Generator and Discriminator are trained through the following loss functions:

Generator Loss Function: It encourages the Discriminator to classify the output of the Generator as real data.
Discriminator Loss Function: It learns to distinguish between the distribution of real data and data generated by the Generator as much as possible.

2. Training Process of GAN

The training process of the GAN model consists of the following steps:

Select a random sample from the real dataset.
Generate fake data by inputting random noise into the Generator.
Feed the Discriminator with both real and fake data, calculating their respective probabilities.
Update the Generator and Discriminator based on their respective loss functions.
Repeat this process.

3. Implementing GAN Using PyTorch

Now, let’s implement a simple GAN using PyTorch. In this example, we will implement a GAN model that generates digit images using the MNIST dataset.

3.1 Installing Required Libraries


# Install required libraries
!pip install torch torchvision matplotlib

3.2 Loading and Preprocessing the Dataset


import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

# Download and preprocess the MNIST dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_set = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_set, batch_size=64, shuffle=True)

3.3 Defining Generator and Discriminator Models


import torch.nn as nn

# Define Generator model
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.fc = nn.Sequential(
            nn.Linear(100, 256),
            nn.ReLU(),
            nn.Linear(256, 512),
            nn.ReLU(),
            nn.Linear(512, 1024),
            nn.ReLU(),
            nn.Linear(1024, 28 * 28),
            nn.Tanh(),
        )

    def forward(self, x):
        x = self.fc(x)
        return x.view(-1, 1, 28, 28)

# Define Discriminator model
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.fc = nn.Sequential(
            nn.Linear(28 * 28, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 1),
            nn.Sigmoid(),
        )

    def forward(self, x):
        x = x.view(-1, 28 * 28)
        return self.fc(x)

3.4 Model Training


# Setting hyperparameters
num_epochs = 200
learning_rate = 0.0002
beta1 = 0.5

# Initialize models
generator = Generator()
discriminator = Discriminator()

# Define loss function and optimization algorithm
criterion = nn.BCELoss()
optimizerG = torch.optim.Adam(generator.parameters(), lr=learning_rate, betas=(beta1, 0.999))
optimizerD = torch.optim.Adam(discriminator.parameters(), lr=learning_rate, betas=(beta1, 0.999))

# Training loop
for epoch in range(num_epochs):
    for i, (data, _) in enumerate(train_loader):
        # Setting labels for real and fake data
        real_labels = torch.ones(data.size(0), 1)
        fake_labels = torch.zeros(data.size(0), 1)

        # Training Discriminator
        optimizerD.zero_grad()
        outputs = discriminator(data)
        lossD_real = criterion(outputs, real_labels)
        lossD_real.backward()

        noise = torch.randn(data.size(0), 100)
        fake_data = generator(noise)
        outputs = discriminator(fake_data.detach())
        lossD_fake = criterion(outputs, fake_labels)
        lossD_fake.backward()
        optimizerD.step()

        # Training Generator
        optimizerG.zero_grad()
        outputs = discriminator(fake_data)
        lossG = criterion(outputs, real_labels)
        lossG.backward()
        optimizerG.step()

    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss D: {lossD_real.item() + lossD_fake.item():.4f}, Loss G: {lossG.item():.4f}')

3.5 Visualizing Results


# Function to visualize generated images
def visualize(generator):
    noise = torch.randn(64, 100)
    fake_data = generator(noise)
    fake_data = fake_data.detach().numpy()
    fake_data = (fake_data + 1) / 2  # Normalize to [0, 1]

    plt.figure(figsize=(8, 8))
    for i in range(fake_data.shape[0]):
        plt.subplot(8, 8, i+1)
        plt.axis('off')
        plt.imshow(fake_data[i][0], cmap='gray')
    plt.show()

# Visualize results
visualize(generator)

4. Applications of GAN

GANs are used not only for image generation but also in various fields:

Image Generation: GAN can be used to generate high-quality images.
Style Transfer: GAN can be used to transform the style of an image. For instance, it can convert a daytime photo to nighttime.
Data Augmentation: GAN can be used to augment datasets by generating new data.

5. Conclusion

In this post, we explored the concept of GAN and a simple implementation method using PyTorch. GAN is a type of generative model with various potential applications. With the current advancements in GANs and various model variations being proposed, learning and utilizing GANs will be a very useful skill.

I hope this post has helped in understanding GAN and aided in practical implementation. I will return with more diverse topics on deep learning in the future!

Deep Learning GAN Training with PyTorch, Controller Training

Hello! In this post, we will implement GAN (Generative Adversarial Networks) using PyTorch and explore the training of a controller in detail. GAN consists of two neural networks, the Generator and the Discriminator, that compete against each other to generate realistic data.

1. Basic Structure of GAN

The basic structure of GAN is as follows:

Generator: Takes random noise as input and generates fake data.
Discriminator: Classifies input data into real and fake data.

The two networks are trained through competition, resulting in the generator creating increasingly realistic data and the discriminator making more accurate classifications.

2. Training Process of GAN

The training process of GAN progresses through the following steps:

Generate fake data by inputting a random noise vector into the generator.
Input the fake data and real data into the discriminator to compute real/fake probabilities.
Train the discriminator based on the loss of the discriminator.
Train the generator based on the loss of the generator.
Repeat steps 1 to 4.

3. Implementing GAN with PyTorch

Now let’s implement GAN using PyTorch. Below is an example of the implementation of the basic GAN structure.

Installing PyTorch

First, we need to install PyTorch. It can be installed in an environment where Python is installed with the following command:

pip install torch torchvision

Defining the Model

First, we will define the generator and the discriminator.


import torch
import torch.nn as nn
import torch.optim as optim

# Generator
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(100, 256),
            nn.ReLU(),
            nn.Linear(256, 512),
            nn.ReLU(),
            nn.Linear(512, 1024),
            nn.ReLU(),
            nn.Linear(1024, 28 * 28),
            nn.Tanh()
        )

    def forward(self, x):
        return self.model(x).view(-1, 1, 28, 28)

# Discriminator
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28 * 28, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.model(x)

Defining the Training Function

A function to define the training process is also needed:


def train_gan(generator, discriminator, data_loader, num_epochs=100, learning_rate=0.0002):
    criterion = nn.BCELoss()
    optimizer_g = optim.Adam(generator.parameters(), lr=learning_rate)
    optimizer_d = optim.Adam(discriminator.parameters(), lr=learning_rate)

    for epoch in range(num_epochs):
        for real_data, _ in data_loader:
            batch_size = real_data.size(0)
            real_labels = torch.ones(batch_size, 1)
            fake_labels = torch.zeros(batch_size, 1)

            # Training Discriminator
            optimizer_d.zero_grad()
            outputs = discriminator(real_data)
            d_loss_real = criterion(outputs, real_labels)
            d_loss_real.backward()

            noise = torch.randn(batch_size, 100)
            fake_data = generator(noise)
            outputs = discriminator(fake_data.detach())
            d_loss_fake = criterion(outputs, fake_labels)
            d_loss_fake.backward()

            optimizer_d.step()

            # Training Generator
            optimizer_g.zero_grad()
            outputs = discriminator(fake_data)
            g_loss = criterion(outputs, real_labels)
            g_loss.backward()

            optimizer_g.step()

        print(f'Epoch [{epoch}/{num_epochs}], d_loss: {d_loss_real.item() + d_loss_fake.item()}, g_loss: {g_loss.item()}')

Preparing the Dataset

We will use the MNIST dataset. Let’s write the code to load the data.


from torchvision import datasets, transforms
from torch.utils.data import DataLoader

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
data_loader = DataLoader(dataset, batch_size=64, shuffle=True)

4. Training the GAN

Now that the model and data loader are ready, let’s train the GAN.


generator = Generator()
discriminator = Discriminator()

train_gan(generator, discriminator, data_loader, num_epochs=50)

5. Visualizing Results

After training is complete, let’s visualize the generated images.


import matplotlib.pyplot as plt

def show_generated_images(generator, num_images=25):
    noise = torch.randn(num_images, 100)
    generated_images = generator(noise).detach().cpu().numpy()
    
    plt.figure(figsize=(5, 5))
    for i in range(num_images):
        plt.subplot(5, 5, i + 1)
        plt.imshow(generated_images[i][0], cmap='gray')
        plt.axis('off')
    plt.show()

show_generated_images(generator)

6. Training the Controller

Now we will proceed with the training of the controller using GAN. Controller training is the process of learning the optimal actions to achieve specific goals in a given environment. Here, we will explore how this process can be carried out using GAN.

The use of GAN in controller training is an interesting approach. The generator of GAN plays a role in generating actions for various scenarios, while the discriminator evaluates how well these actions meet the goals.

Below is an example code to train a simple controller using GAN.


# Define the controller network
class Controller(nn.Module):
    def __init__(self):
        super(Controller, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(100, 256),
            nn.ReLU(),
            nn.Linear(256, 3)  # For example, the dimension of actions (3D actions)
        )

    def forward(self, x):
        return self.model(x)

# Define the training process
def train_controller(gan, controller, num_epochs=100):
    optimizer_c = optim.Adam(controller.parameters(), lr=0.001)

    for epoch in range(num_epochs):
        noise = torch.randn(64, 100)
        actions = controller(noise)
        
        # Generate actions using GAN's generator
        generated_data = gan.generator(noise)
        
        # Evaluate actions and compute loss
        loss = calculate_loss(generated_data, actions)  # Loss function needs to be user-defined
        optimizer_c.zero_grad()
        loss.backward()
        optimizer_c.step()

        if epoch % 10 == 0:
            print(f'Epoch [{epoch}/{num_epochs}], Controller Loss: {loss.item()}')

# Start training the controller
controller = Controller()
train_controller(generator, controller)

7. Conclusion

In this post, we explored the process of implementing GAN with PyTorch and training a simple controller based on it. GAN is highly useful for generating data similar to real data and has various potential applications. We have shown that the scope of GAN can be extended through controller training.

Furthermore, GAN can be utilized in various fields beyond image generation, including text and video generation, so consider using this concept to challenge yourself with your own projects!

The Development of GAN Deep Learning Using PyTorch: Progress Over the Last 5 Years

In the world of deep learning, GANs (Generative Adversarial Networks) have emerged as one of the most innovative and fascinating research topics. First proposed by Ian Goodfellow in 2014, GANs enable powerful image generation models through a competitive relationship between a generator and a discriminator. In this article, we will explain the basic concepts of GANs, examine advancements over the past five years, and provide an example of GAN implementation using PyTorch.

1. Basic Concepts of GAN

GAN consists of a generator and a discriminator. The generator creates fake data, while the discriminator determines whether this data is real or fake. The two networks evolve competitively, allowing the generator to produce increasingly realistic data. The goals of GANs are as follows:

The generator must produce fake data that mimics the distribution of real data.
The discriminator must be able to distinguish between the generated data and real data.

1.1 Mathematical Foundation of GAN

The training process of GAN involves optimizing the two networks. The following loss function is used for this purpose:

L(D, G) = E[log D(x)] + E[log(1 – D(G(z)))]

Here, D is the discriminator, G is the generator, x is real data, and z is a random noise vector. The goal of GANs is for the two networks to enhance each other through a zero-sum game.

2. Recent Developments in GAN

In the past five years, GANs have undergone various modifications and improvements. Below are some of them:

2.1 DCGAN (Deep Convolutional GAN)

DCGAN improved the performance of GAN by utilizing CNNs (Convolutional Neural Networks). By introducing CNNs into the traditional GAN structure, it successfully generated high-quality images.

2.2 WGAN (Wasserstein GAN)

WGAN introduced the concept of Wasserstein distance to improve the training stability of GANs. WGAN converges faster and more stably than traditional GANs and can produce higher quality images.

2.3 CycleGAN

CycleGAN is used to solve image transformation problems. For example, it can be used for tasks such as transforming photographs into artistic styles. CycleGAN has the ability to learn without paired image sets.

2.4 StyleGAN

StyleGAN is a state-of-the-art GAN architecture for generating high-quality images. This model allows for style adjustments during the generation process, enabling the creation of images in various styles.

3. GAN Implementation Using PyTorch

Now, let’s implement a basic GAN using PyTorch. The following code is a simple example of a GAN that generates digit images using the MNIST dataset.

3.1 Import Libraries

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import matplotlib.pyplot as plt
import numpy as np

3.2 Load Dataset

Load the MNIST dataset and perform transformations.

# Load and transform MNIST dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

mnist = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
dataloader = torch.utils.data.DataLoader(mnist, batch_size=64, shuffle=True)

3.3 Define Generator and Discriminator Models

Define the models for the generator and discriminator.

# Define generator model
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(100, 256),
            nn.ReLU(True),
            nn.Linear(256, 512),
            nn.ReLU(True),
            nn.Linear(512, 1024),
            nn.ReLU(True),
            nn.Linear(1024, 784),
            nn.Tanh(),
        )

    def forward(self, z):
        return self.model(z)

# Define discriminator model
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(784, 1024),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(1024, 512),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(256, 1),
            nn.Sigmoid(),
        )

    def forward(self, img):
        return self.model(img.view(img.size(0), -1))

3.4 Define Loss Function and Optimizers

Define the loss function and optimizers for training the GAN.

criterion = nn.BCELoss()
generator = Generator()
discriminator = Discriminator()

optimizer_G = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
optimizer_D = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))

3.5 GAN Training Process

Now, let’s define a function for training the GAN.

def train_gan(num_epochs=50):
    G_losses = []
    D_losses = []
    for epoch in range(num_epochs):
        for i, (imgs, _) in enumerate(dataloader):
            # Real image labels are 1, fake image labels are 0
            real_labels = torch.ones(imgs.size(0), 1)
            fake_labels = torch.zeros(imgs.size(0), 1)

            # Train discriminator
            optimizer_D.zero_grad()
            outputs = discriminator(imgs)
            D_loss_real = criterion(outputs, real_labels)
            D_loss_real.backward()

            z = torch.randn(imgs.size(0), 100)
            fake_imgs = generator(z)
            outputs = discriminator(fake_imgs.detach())
            D_loss_fake = criterion(outputs, fake_labels)
            D_loss_fake.backward()
            optimizer_D.step()

            # Train generator
            optimizer_G.zero_grad()
            outputs = discriminator(fake_imgs)
            G_loss = criterion(outputs, real_labels)
            G_loss.backward()
            optimizer_G.step()

            G_losses.append(G_loss.item())
            D_losses.append(D_loss_real.item() + D_loss_fake.item())

        print(f'Epoch [{epoch}/{num_epochs}], D_loss: {D_loss_fake.item() + D_loss_real.item()}, G_loss: {G_loss.item()}')
    
    return G_losses, D_losses

3.6 Execute Training and Visualize Results

Run the training and visualize the loss values.

G_losses, D_losses = train_gan(num_epochs=50)

plt.plot(G_losses, label='Generator Loss')
plt.plot(D_losses, label='Discriminator Loss')
plt.title('Losses during Training')
plt.xlabel('Iterations')
plt.ylabel('Loss')
plt.legend()
plt.show()

4. Conclusion

In this article, we explored the basic concepts of GANs and recent advancements over the past five years, as well as demonstrating an implementation of GAN using PyTorch. GANs continue to evolve and hold a significant place in the field of deep learning. Future research directions are expected to focus on developing more stable training methods and generating high-resolution images.

5. References

Goodfellow, I., et al. (2014). Generative Adversarial Nets. Advances in Neural Information Processing Systems.
Brock, A., Donahue, J., & Simonyan, K. (2019). Large Scale GAN Training for High Fidelity Natural Image Synthesis. International Conference on Learning Representations.
Karras, T., Laine, S., & Aila, T. (2019). A Style-Based Generator Architecture for Generative Adversarial Networks. IEEE/CVF Conference on Computer Vision and Pattern Recognition.
CycleGAN: Unpaired Image-to-Image Translation using Cycle Consistent Adversarial Networks. IEEE International Conference on Computer Vision (ICCV).

Using PyTorch for GAN Deep Learning, First Music Generation RNN

1. Introduction

As artificial intelligence (AI) technology advances, various attempts are being made in the field of music generation. In particular, among deep learning models, Generative Adversarial Networks (GANs) show excellent performance in learning patterns from existing data to generate new data. In this article, we will implement an RNN (Recurrent Neural Network) based music generation model using PyTorch. This model focuses on utilizing the principles of GAN to generate natural music.

2. Overview and Principles of GAN

A GAN (Generative Adversarial Network) consists of two neural networks, namely a Generator and a Discriminator. The Generator tries to create data similar to the real data, while the Discriminator tries to distinguish whether the generated data is real or fake. These two networks compete and learn from each other.

2.1 Structure of GAN

The structure of GAN is as follows:

Generator: Takes random noise as input and generates data.
Discriminator: Responsible for distinguishing between the generated data and actual data.

This structure enables GAN to generate highly creative data.

2.2 GAN Learning Process

The learning of GAN proceeds in an alternating fashion between the two networks:

First, the Generator takes random noise as input and generates fake data.
Next, the Discriminator receives both fake and real data and assesses the authenticity of each data.
The Generator learns to make the Discriminator incorrectly judge fake data as real.
On the other hand, the Discriminator learns to accurately distinguish fake data.

3. Music Generation Using RNN

Music is sequential data, and RNN is suitable for handling such sequences. RNN is designed so that outputs from previous time steps can influence the current input, making it well-suited for generating music sequences.

3.1 Structure of RNN

RNN mainly consists of the following components:

Input Layer: The data input at each time step.
Hidden Layer: Responsible for retaining information about the previous state.
Output Layer: Provides the final output of the model.

3.2 Learning of RNN

The learning of RNN is performed through sequential data, utilizing a reference loss function for optimization. The loss is calculated, and weights are updated through backpropagation.

4. Preparing Music Data

Music data is needed to train the model. Generally, MIDI file format is used. This data is converted to text format and preprocessed to fit the model.

4.1 Reading MIDI Files

MIDI files are read and necessary information is extracted using libraries like mido in Python. Now, let’s describe how to extract note information from a MIDI file.

4.2 Data Preprocessing

python
import mido

def extract_notes(midi_file):
    midi = mido.MidiFile(midi_file)
    notes = []
    
    for track in midi.tracks:
        for message in track:
            if message.type == 'note_on' and message.velocity > 0:
                notes.append(message.note)
    
    return notes

notes = extract_notes('example.mid')
print(notes)

The code above is a function that extracts note information from a MIDI file. Each note is represented by a MIDI number.

5. Model Implementation

In the model implementation phase, GAN and RNN models are constructed using PyTorch. Next, we design the RNN structure and combine it with the GAN structure to define the final music generation model.

5.1 Defining the RNN Model

python
import torch
import torch.nn as nn

class RNNModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNNModel, self).__init__()
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, _ = self.rnn(x)
        out = self.fc(out[:, -1, :])
        return out

This code defines a class for the RNN model. You can set input size, hidden layer size, and output size.

5.2 Defining the GAN Structure

python
class GAN(nn.Module):
    def __init__(self, generator, discriminator):
        super(GAN, self).__init__()
        self.generator = generator
        self.discriminator = discriminator

    def forward(self, noise):
        generated_data = self.generator(noise)
        validity = self.discriminator(generated_data)
        return validity

Here, we have defined the GAN structure with a generator and a discriminator. The generator takes noise as input to generate data, and the discriminator assesses the validity of this data.

6. Training Process

During the training process, the generator and discriminator networks are trained alternately to improve their respective performances. Here is an example of a training loop.

6.1 Implementing the Training Loop

python
def train_gan(generator, discriminator, gan, dataloader, num_epochs, device):
    criterion = nn.BCELoss()
    optimizer_g = torch.optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
    optimizer_d = torch.optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))

    for epoch in range(num_epochs):
        for real_data in dataloader:
            batch_size = real_data.size(0)
            real_data = real_data.to(device)

            # Discriminator training
            optimizer_d.zero_grad()
            noise = torch.randn(batch_size, 100).to(device)
            fake_data = generator(noise)
            validity_real = discriminator(real_data)
            validity_fake = discriminator(fake_data.detach())

            loss_d = criterion(validity_real, torch.ones(batch_size, 1).to(device)) + \
                      criterion(validity_fake, torch.zeros(batch_size, 1).to(device))
            loss_d.backward()
            optimizer_d.step()

            # Generator training
            optimizer_g.zero_grad()
            validity = discriminator(fake_data)
            loss_g = criterion(validity, torch.ones(batch_size, 1).to(device))
            loss_g.backward()
            optimizer_g.step()
        
        print(f"Epoch[{epoch}/{num_epochs}] Loss D: {loss_d.item()}, Loss G: {loss_g.item()}")

This function defines the training process of GAN. It outputs the losses of discriminator and generator for each epoch to monitor the training process.

7. Result Generation

After training is complete, the model can be used to generate new music. The generated music can be saved as a MIDI file.

7.1 Music Generation and Saving

python
def generate_music(generator, num_samples, device):
    noise = torch.randn(num_samples, 100).to(device)
    generated_music = generator(noise)
    
    # Code to save as MIDI file
    # ...
    
    return generated_music

8. Conclusion

In this article, we explored the process of implementing a GAN-based music generation RNN model using PyTorch. By utilizing the principles of GAN and the characteristics of RNN, we explored new possibilities in music generation. Through such models, it will be possible to experimentally generate music, potentially bringing creative changes to the music industry.

9. References

Goodfellow et al., “Generative Adversarial Nets,” NeurIPS, 2014.
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation.
PyTorch Documentation: https://pytorch.org/docs/stable/index.html
Mido Documentation: https://mido.readthedocs.io/en/latest/

Deep Learning with GAN Using PyTorch, First Deep Neural Network

1. What is GAN?

Generative Adversarial Networks (GAN) is one of the fundamental deep learning models proposed by Ian Goodfellow in 2014. GAN consists of two neural networks:
Generator and Discriminator. The generator tries to create fake data, while the discriminator attempts to determine whether the data is real or fake.
These two networks compete against each other during training, hence the term “Adversarial.” This process is unsupervised, allowing the generator to produce data that increasingly resembles real data.

2. Structure of GAN

The structure of GAN operates as follows:

Generator Network: Takes a random noise vector as input and generates fake images.
Discriminator Network: Responsible for distinguishing between real and fake images.
During the training process, the generator learns to produce images that the discriminator cannot easily classify. This causes both networks to improve and compete with each other.

3. How GAN Works

The training process of GAN consists of the following iterative steps:

Training the Discriminator: The discriminator receives real images and fake images produced by the generator as input and updates its parameters to classify these images correctly.
Training the Generator: The generator evaluates the quality of images produced through the trained discriminator, updating its parameters to prevent the discriminator from recognizing its images as fake.
This process is repeated, allowing both networks to become progressively stronger against each other.

4. Implementing GAN in PyTorch

Now, let’s implement GAN in PyTorch. Our goal is to generate digit images using the MNIST dataset.

4.1 Installing Required Libraries

        pip install torch torchvision matplotlib

4.2 Preparing the Dataset

The MNIST dataset can be easily retrieved through PyTorch’s torchvision library. The code below shows how to load and preprocess the data.

        
import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Data transformation settings
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Load MNIST dataset
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

4.3 Defining the Generator and Discriminator Networks

Now we define the two networks of GAN. We will create the generator and discriminator using a simple neural network structure.

        
import torch.nn as nn

# Define Generator Network
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(100, 256),
            nn.ReLU(),
            nn.Linear(256, 512),
            nn.ReLU(),
            nn.Linear(512, 1024),
            nn.ReLU(),
            nn.Linear(1024, 28 * 28),
            nn.Tanh()
        )

    def forward(self, z):
        return self.model(z).view(-1, 1, 28, 28)

# Define Discriminator Network
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28 * 28, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 1),
            nn.Sigmoid()
        )

    def forward(self, img):
        return self.model(img)

4.4 Setting Loss Function and Optimization Algorithm

The loss function used for both the generator and discriminator is binary cross-entropy loss. We adopt the Adam optimization algorithm.

        
import torch.optim as optim

# Define loss function and optimization algorithm
criterion = nn.BCELoss()
generator = Generator()
discriminator = Discriminator()
optimizer_G = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
optimizer_D = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))

4.5 Implementing the Training Process

Training proceeds as follows:

        
import numpy as np

num_epochs = 50
for epoch in range(num_epochs):
    for i, (images, _) in enumerate(train_loader):
        batch_size = images.size(0)
        
        # Set real and fake labels
        real_labels = torch.ones(batch_size, 1)
        fake_labels = torch.zeros(batch_size, 1)

        # Train Discriminator
        outputs = discriminator(images)
        d_loss_real = criterion(outputs, real_labels)

        noise = torch.randn(batch_size, 100)
        fake_images = generator(noise)
        outputs = discriminator(fake_images.detach())
        d_loss_fake = criterion(outputs, fake_labels)

        d_loss = d_loss_real + d_loss_fake
        optimizer_D.zero_grad()
        d_loss.backward()
        optimizer_D.step()

        # Train Generator
        outputs = discriminator(fake_images)
        g_loss = criterion(outputs, real_labels)

        optimizer_G.zero_grad()
        g_loss.backward()
        optimizer_G.step()

        # Output training status
        if (i + 1) % 100 == 0:
            print(f'Epoch [{epoch + 1}/{num_epochs}], Step [{i + 1}/{len(train_loader)}], d_loss: {d_loss.item():.4f}, g_loss: {g_loss.item():.4f}')

4.6 Visualizing Generated Images

After training is complete, we visualize the images generated.

        
import matplotlib.pyplot as plt

def visualize_generator(num_images):
    noise = torch.randn(num_images, 100)
    with torch.no_grad():
        generated_images = generator(noise)

    plt.figure(figsize=(10, 10))
    for i in range(num_images):
        plt.subplot(5, 5, i + 1)
        plt.imshow(generated_images[i][0].cpu().numpy(), cmap='gray')
        plt.axis('off')
    plt.show()

visualize_generator(25)

5. Applications of GAN

GANs can be used in various fields beyond image generation. For example, they are employed in style transfer, image restoration, video generation, and have garnered significant attention in the field of artificial intelligence.
The development of GANs is revealing new possibilities through generative models.

6. Conclusion

In this tutorial, we learned the basics of implementing GANs using PyTorch and understood how GAN operates through actual code. GAN is a technology that will continue to evolve and holds great potential in various applications.
I recommend exploring various modified models based on GAN in the future.