Deep Learning PyTorch Course, Q-Learning

A Deep Dive into Q-Learning Using Deep Learning

1. What is Q-Learning?

Q-Learning is a form of reinforcement learning that helps an agent learn the optimal behaviors by interacting with the environment. The core idea of Q-Learning is to use a queue that stores the values for possible actions in each state. This aids the agent in determining the optimal action it can take.

Q-Learning is generally based on the Markov Decision Process (MDP) and is composed of the following elements:

  • State (S): The situation the agent is in within the environment.
  • Action (A): The possible actions the agent can take.
  • Reward (R): The score the agent receives for taking a specific action.
  • Value Function (Q): A measure of how good a particular action is in a given state.

2. Q-Learning Algorithm

The Q-Learning algorithm includes the basic idea of updating the Q function. The agent follows the procedure outlined below at each time step:

  1. Select an action based on the current state.
  2. Observe the new state and receive a reward after performing the selected action.
  3. Update the Q function.

The Q function update can be expressed using the following formula:

Q(S, A) <- Q(S, A) + α(R + γ * max(Q(S', A')) - Q(S, A))

Here, α represents the learning rate, and γ denotes the discount factor. These two elements determine how much the agent reflects on past experiences.

3. Implementing Q-Learning with PyTorch

Now, let’s implement Q-Learning simply using PyTorch. In this example, we will create an environment using OpenAI’s Gym library and train a Q-Learning agent.

import gym
import numpy as np
import random

# Hyperparameters
LEARNING_RATE = 0.1
DISCOUNT_FACTOR = 0.9
EPISODES = 1000

# Environment setup
env = gym.make('Taxi-v3')
Q_table = np.zeros([env.observation_space.n, env.action_space.n])

def select_action(state, epsilon):
    if random.uniform(0, 1) < epsilon:
        return env.action_space.sample()  # Select random action
    else:
        return np.argmax(Q_table[state])  # Select action with the highest Q value

for episode in range(EPISODES):
    state = env.reset()
    done = False
    epsilon = 1.0 / (episode / 100 + 1)  # Exploration rate

    while not done:
        action = select_action(state, epsilon)
        next_state, reward, done, _ = env.step(action)
        
        # Update Q function
        Q_table[state][action] += LEARNING_RATE * (reward + DISCOUNT_FACTOR * np.max(Q_table[next_state]) - Q_table[state][action])
        
        state = next_state

print("Training Complete")

# Sample Test
state = env.reset()
done = False
while not done:
    action = np.argmax(Q_table[state])  # Select optimal action
    state, reward, done, _ = env.step(action)
    env.render()  # Render environment

4. Advantages and Disadvantages of Q-Learning

The main advantages of Q-Learning are:

  • A simple and easy-to-understand algorithm
  • Operates well in model-free environments

However, it has the following disadvantages:

  • Learning speed may decrease when the state space is large
  • The exploration-exploitation balance can be challenging

© 2023 Deep Learning Blog. All rights reserved.

Deep Learning PyTorch Course, What is an Autoencoder

The autoencoder, a field of deep learning, is a representative technique of unsupervised learning and a model that compresses and reconstructs input data. In this course, we will start with the concept of autoencoders and take a closer look at how to implement them in PyTorch.

1. Concept of Autoencoders

An Autoencoder is a neural network-based unsupervised learning algorithm. It comprises an encoder and a decoder, where the encoder compresses the input data into a latent space and the decoder reconstructs this latent space data back into the original data format.

1.1 Encoder and Decoder

The autoencoder consists of the following two main components:

  • Encoder: Converts the input data into latent variables. In this process, the dimensionality of the input data is reduced while preserving most of the information.
  • Decoder: Reconstructs the original data from the latent variables created by the encoder. The reconstructed data should be most similar to the input data.

1.2 Purpose of Autoencoders

The primary aim of autoencoders is to automatically learn the essential characteristics of input data and compress and reconstruct the data in a way that minimizes information loss. This allows various applications such as data denoising, dimensionality reduction, and generative modeling.

2. Structure of Autoencoders

The structure of an autoencoder can generally be divided into three layers:

  • Input Layer: The layer where the input data enters.
  • Latent Space: The intermediate layer where data is encoded, usually with a lower dimension than the input layer.
  • Output Layer: The layer that outputs the reconstructed data.

3. Implementing Autoencoders in PyTorch

Now that we understand the basic concepts and structure of autoencoders, let’s implement them using PyTorch. In this example, we will use a simple MNIST dataset to encode and decode digit images.

3.1 Installing PyTorch

You can install PyTorch using the following command:

pip install torch torchvision

3.2 Loading the Dataset

We will use the datasets module from the torchvision library to load the MNIST dataset.

import torch
from torchvision import datasets, transforms

# Load and transform MNIST dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Lambda(lambda x: x.view(-1))])
mnist_data = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
mnist_loader = torch.utils.data.DataLoader(mnist_data, batch_size=64, shuffle=True)

3.3 Defining the Autoencoder Class

Now, let’s create a simple autoencoder class that defines the encoder and decoder.

import torch.nn as nn

class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder, self).__init__()
        # Encoder
        self.encoder = nn.Sequential(
            nn.Linear(28 * 28, 128),
            nn.ReLU(True),
            nn.Linear(128, 64),
            nn.ReLU(True))
        
        # Decoder
        self.decoder = nn.Sequential(
            nn.Linear(64, 128),
            nn.ReLU(True),
            nn.Linear(128, 28 * 28),
            nn.Sigmoid())
    
    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

3.4 Training the Model

Having prepared the model, we will proceed to training. We will use Mean Squared Error (MSE) as the loss function and Adam as the optimizer.

import torch.optim as optim

# Initialize model, loss function, and optimizer
model = Autoencoder()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
num_epochs = 10
for epoch in range(num_epochs):
    for data in mnist_loader:
        img, _ = data
        # Initialize activated parameters and loss
        optimizer.zero_grad()
        # Forward pass of the model
        output = model(img)
        loss = criterion(output, img)
        # Backward pass and optimization
        loss.backward()
        optimizer.step()

    print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')

3.5 Visualizing the Results

Once training is completed, you can visualize the original images and the reconstructed images to check the results.

import matplotlib.pyplot as plt

# Visualizing the network's output
with torch.no_grad():
    for data in mnist_loader:
        img, _ = data
        output = model(img)
        break

# Comparing original images and reconstructed images
plt.figure(figsize=(9, 2))
for i in range(8):
    # Original image
    plt.subplot(2, 8, i + 1)
    plt.imshow(img[i].view(28, 28), cmap='gray')
    plt.axis('off')
    
    # Reconstructed image
    plt.subplot(2, 8, i + 9)
    plt.imshow(output[i].view(28, 28), cmap='gray')
    plt.axis('off')
plt.show()

4. Use Cases of Autoencoders

Autoencoders can be applied in various fields. Here are some use cases:

  • Dimensionality Reduction: Useful for reducing unnecessary dimensions of data while retaining important information.
  • Denoising: Can be used to remove noise from input data.
  • Anomaly Detection: Learns the patterns of normal data and can identify abnormal data with respect to these patterns.
  • Data Generation: Can also be used to generate new data.

5. Conclusion

Through this course, we have learned the basic concepts, structure, and implementation methods of autoencoders in PyTorch. Autoencoders are powerful tools that can be effectively applied to various problems. In the future, we hope you utilize autoencoders to conduct various experiments.

6. References

Below are materials and references used in this course:

Deep Learning PyTorch Course, Types of Generative Models

Deep learning has shown remarkable advancements in recent years, significantly impacting various fields. Among them, generative models are gaining attention due to their ability to create data samples. In this article, we will explore various types of generative models, explain how each model works, and provide example code using PyTorch.

What is a Generative Model?

A generative model is a machine learning model that generates new samples from a given data distribution. It can create new data that is similar to the given data but does not exist in the actual data. Generative models are primarily used in various fields such as image generation, text generation, and music generation. The main types of generative models include:

1. Autoencoders

Autoencoders are artificial neural networks that operate by compressing input data and reconstructing the input data from the compressed representation. Autoencoders can generate data through a latent space.

Structure of Autoencoders

Autoencoders can be broadly divided into two parts:

  • Encoder: Maps input data to a latent representation.
  • Decoder: Reconstructs the original data from the latent representation.

Creating an Autoencoder with PyTorch

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Data preprocessing
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Load MNIST dataset
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)

# Define the autoencoder model
class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(784, 256),
            nn.ReLU(),
            nn.Linear(256, 64)
        )
        self.decoder = nn.Sequential(
            nn.Linear(64, 256),
            nn.ReLU(),
            nn.Linear(256, 784),
            nn.Sigmoid()
        )

    def forward(self, x):
        x = x.view(-1, 784)  # 28*28 = 784
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded

# Define model, loss function, and optimizer
model = Autoencoder()
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training
num_epochs = 10
for epoch in range(num_epochs):
    for data in train_loader:
        img, _ = data
        optimizer.zero_grad()
        output = model(img)
        loss = criterion(output, img.view(-1, 784))
        loss.backward()
        optimizer.step()
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')
    

The above code is a simple example that trains an autoencoder on MNIST data. The encoder compresses 784 input nodes to 64 latent variables, and the decoder restores them back to 784 outputs.

2. Generative Adversarial Networks (GANs)

GANs are structured in a way where two neural networks, a generator and a discriminator, learn competitively. The generator creates fake data that resembles real data, and the discriminator determines whether the data is real or fake.

How GANs Work

The training process of GANs proceeds as follows:

  1. The generator takes random noise as input and generates fake images.
  2. The discriminator takes real images and the generated images as input and judges the authenticity of the two types of images.
  3. The more accurately the discriminator identifies fake images, the more the generator learns to create refined images.

Creating a GAN Model with PyTorch

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(100, 256),
            nn.ReLU(),
            nn.Linear(256, 512),
            nn.ReLU(),
            nn.Linear(512, 1024),
            nn.ReLU(),
            nn.Linear(1024, 784),
            nn.Tanh()
        )

    def forward(self, x):
        return self.model(x)

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(784, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.model(x)

# Create model instances
generator = Generator()
discriminator = Discriminator()

# Define loss function and optimizers
criterion = nn.BCELoss()
optimizer_g = optim.Adam(generator.parameters(), lr=0.0002)
optimizer_d = optim.Adam(discriminator.parameters(), lr=0.0002)

# Training process
num_epochs = 100
for epoch in range(num_epochs):
    for data in train_loader:
        real_images, _ = data
        real_labels = torch.ones(real_images.size(0), 1)
        fake_labels = torch.zeros(real_images.size(0), 1)

        # Discriminator training
        optimizer_d.zero_grad()
        outputs = discriminator(real_images.view(-1, 784))
        d_loss_real = criterion(outputs, real_labels)
        d_loss_real.backward()

        noise = torch.randn(real_images.size(0), 100)
        fake_images = generator(noise)
        outputs = discriminator(fake_images.detach())
        d_loss_fake = criterion(outputs, fake_labels)
        d_loss_fake.backward()

        optimizer_d.step()

        # Generator training
        optimizer_g.zero_grad()
        outputs = discriminator(fake_images)
        g_loss = criterion(outputs, real_labels)
        g_loss.backward()
        optimizer_g.step()

    print(f'Epoch [{epoch+1}/{num_epochs}], d_loss: {d_loss_real.item() + d_loss_fake.item():.4f}, g_loss: {g_loss.item():.4f}')
    

The above code is a basic example of implementing GANs. The Generator takes a 100-dimensional random noise input and generates a 784-dimensional image, while the Discriminator judges these images.

3. Variational Autoencoders (VAEs)

VAEs are an extension of autoencoders and are generative models. VAEs learn the latent distribution of the data to generate new samples. They can sample latent variables of different data points to create diverse samples.

Structure of VAEs

VAEs use variational estimation techniques to map input data to a latent space. VAEs consist of an encoder and a decoder, where the encoder maps the input data to mean and variance, and generates data points through sampling processes.

Creating a VAE Model with PyTorch

class VAE(nn.Module):
    def __init__(self):
        super(VAE, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(784, 256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU()
        )
        self.fc_mean = nn.Linear(128, 20)
        self.fc_logvar = nn.Linear(128, 20)
        self.decoder = nn.Sequential(
            nn.Linear(20, 128),
            nn.ReLU(),
            nn.Linear(128, 256),
            nn.ReLU(),
            nn.Linear(256, 784),
            nn.Sigmoid()
        )

    def encode(self, x):
        h = self.encoder(x.view(-1, 784))
        return self.fc_mean(h), self.fc_logvar(h)

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std

    def decode(self, z):
        return self.decoder(z)

    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar

# Define loss function
def loss_function(recon_x, x, mu, logvar):
    BCE = nn.functional.binary_cross_entropy(recon_x, x.view(-1, 784), reduction='sum')
    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    return BCE + KLD

# Initialize model and training process
model = VAE()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training process
num_epochs = 10
for epoch in range(num_epochs):
    for data in train_loader:
        img, _ = data
        optimizer.zero_grad()
        recon_batch, mu, logvar = model(img)
        loss = loss_function(recon_batch, img, mu, logvar)
        loss.backward()
        optimizer.step()
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')
    

4. Research Trends and Conclusion

Generative models enable the generation of reliable data, making them applicable in various fields. GANs, VAEs, and autoencoders are widely used in applications such as image generation, video generation, and text generation. These models maximize the potential for use in data science and artificial intelligence, along with deep learning.

As deep learning technologies continue to evolve, generative models are also advancing. Further experiments and research based on the basic concepts and examples covered in this article are necessary.

If you wish to delve deeper into the potential applications of generative models through deep learning, it is recommended to refer to papers or advanced learning materials for more case studies.

Hope this post helps in understanding generative models and appreciating the allure of deep learning.

Deep Learning PyTorch Course, Concept of Generative Models

Welcome to the world of deep learning! Today, we will delve into why generative models are important and how to implement them in PyTorch.

1. What is a Generative Model?

A Generative Model refers to a model that generates new data by modeling a given data distribution. It originates from statistical concepts and aims to understand the distribution from a given dataset and create new samples based on it.

Generative models are broadly divided into two types:

  • Probabilistic Generative Models
  • Deep Generative Models such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs)

2. Applications of Generative Models

Generative models are used in various fields:

  • Image Generation: For example, high-resolution images can be generated using GANs.
  • Text Generation: It can be used in natural language processing to automatically write articles on a specific topic.
  • Music Generation: AI can assist in composing new music.
  • Model Training: It can be used as a data augmentation tool to improve the performance of the model.

3. How Generative Models Work

Generative models operate by learning the underlying structure of the data. These models focus on generating new samples that are similar to the data and do so through the following processes.

  1. Data Collection: Sufficiently diverse data must be collected to train the model.
  2. Model Design: Choose a model architecture that can well reflect the characteristics of the data.
  3. Training: Train the model to learn the distribution of the data.
  4. Sampling: Use the trained model to generate new data.

4. Implementing Generative Models in PyTorch

Now, let’s implement a simple generative model using PyTorch. In this section, we will create a simple GAN model.

4.1 Overview of GAN

GAN consists of two neural network models, namely the Generator and the Discriminator. The goal of the generator is to produce fake data that is similar to real data, while the objective of the discriminator is to determine whether the input data is real or fake. The two networks are in a competitive relationship, improving each other’s performance in the process.

4.2 GAN Code Example

Below is an example code for implementing GAN using PyTorch:

    
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

# Hyperparameters
latent_size = 100
num_epochs = 200
batch_size = 64
learning_rate = 0.0002

# Transforms
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# MNIST dataset
mnist = torchvision.datasets.MNIST(root='./data/', train=True, transform=transform, download=True)
data_loader = torch.utils.data.DataLoader(dataset=mnist, batch_size=batch_size, shuffle=True)

# Generator model
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.network = nn.Sequential(
            nn.Linear(latent_size, 256),
            nn.ReLU(True),
            nn.Linear(256, 512),
            nn.ReLU(True),
            nn.Linear(512, 784),
            nn.Tanh()
        )

    def forward(self, x):
        return self.network(x).view(-1, 1, 28, 28)

# Discriminator model
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.network = nn.Sequential(
            nn.Linear(784, 512),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(256, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.network(x.view(-1, 784))

# Initialize models
generator = Generator()
discriminator = Discriminator()

# Loss and optimizer
criterion = nn.BCELoss()
optimizer_g = optim.Adam(generator.parameters(), lr=learning_rate)
optimizer_d = optim.Adam(discriminator.parameters(), lr=learning_rate)

# Training
for epoch in range(num_epochs):
    for i, (real_images, _) in enumerate(data_loader):
        # Labels
        real_labels = torch.ones(batch_size, 1)
        fake_labels = torch.zeros(batch_size, 1)

        # Train discriminator
        optimizer_d.zero_grad()
        outputs = discriminator(real_images)
        d_loss_real = criterion(outputs, real_labels)

        z = torch.randn(batch_size, latent_size)
        fake_images = generator(z)
        outputs = discriminator(fake_images.detach())
        d_loss_fake = criterion(outputs, fake_labels)

        d_loss = d_loss_real + d_loss_fake
        d_loss.backward()
        optimizer_d.step()

        # Train generator
        optimizer_g.zero_grad()
        outputs = discriminator(fake_images)
        g_loss = criterion(outputs, real_labels)
        g_loss.backward()
        optimizer_g.step()

    # Print losses and save generated images
    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch + 1}/{num_epochs}], d_loss: {d_loss.item():.4f}, g_loss: {g_loss.item():.4f}')

        with torch.no_grad():
            fake_images = generator(z)
            fake_images = fake_images.view(-1, 1, 28, 28)
            grid = torchvision.utils.make_grid(fake_images, normalize=True)
            plt.imshow(grid.detach().numpy().transpose(1, 2, 0))
            plt.show()
    
    

4.3 Code Explanation

The above code shows the implementation of a simple GAN model. Let’s take a closer look at each part:

  • Data Loading: Downloads and normalizes the MNIST dataset.
  • Generator: Takes a random vector of 100 dimensions as input and generates a 28×28 size image.
  • Discriminator: Takes the input image and predicts whether it is real or fake.
  • Training Process: Trains the discriminator and generator alternately. The discriminator learns to distinguish between real and generated images, while the generator learns to produce images that fool the discriminator.

5. Future and Development Direction of Generative Models

Generative models have many possibilities, and their applications are expected to grow in various fields. In particular, deep generative models such as GANs and VAEs have made significant advancements in recent years, and new techniques and architectures for them are continuously being developed.

Moreover, generative models provide innovative opportunities in diverse areas such as healthcare, arts, autonomous driving, and robotics, and ethical and legal issues arising from them are also important factors to consider.

Conclusion

Today, we explored the concept of generative models and a simple GAN implementation using PyTorch. Generative models hold great potential in data generation, data augmentation, and various other fields, and future advancements are expected. Now, we hope you will step into the world of generative models!

© 2023 Deep Learning Institute

Deep Learning PyTorch Course, Variational Autoencoder

Deep learning is a field of machine learning that utilizes neural networks to learn patterns from data. In this article, we will delve deeply into Variational Autoencoder (VAE).

1. What is an Autoencoder?

An autoencoder is an unsupervised learning method that generally learns the process of compressing input data and then reconstructing it. An autoencoder consists of two parts: an encoder and a decoder.

  • Encoder: Maps input data to a latent space.
  • Decoder: Restores data from the latent space to the original input data.

1.1 The Process of Autoencoder

The training process of an autoencoder proceeds in a way that reduces the difference between the input data and the output data. To do this, a loss function is used to measure the difference between the actual output and the predicted output. The Mean Squared Error (MSE) loss function is commonly used.

2. Variational Autoencoder (VAE)

The Variational Autoencoder is an extended model of the traditional autoencoder, aimed at estimating the probability distribution of the input data. VAE, as a generative model, has the ability to generate new data.

2.1 Components of VAE

VAE consists of the following two main components:

  • Latent Variable: When encoding input data, the encoder outputs the mean (μ) and standard deviation (σ) to estimate the distribution of the latent variables.
  • Reconstruction Loss: Measures the difference between the output generated by the decoder and the original input.

2.2 Loss Function

VAE’s loss function can be divided into two parts:

  • Reconstruction Loss: Measures the loss between the actual input and the reconstructed input.
  • Kullback-Leibler Divergence: Measures the difference between the latent distribution and the normal distribution.

Definition of VAE Loss Function:

L = E[log p(x|z)] - D_{KL}(q(z|x) || p(z))
    

Where:

  • E[log p(x|z)]: The log likelihood for z given input x.
  • D_{KL}: Kullback-Leibler Divergence which measures the difference between two distributions.

3. Implementing VAE with PyTorch

Now that we understand the basic components and loss function of the Variational Autoencoder, let’s implement VAE using PyTorch.

3.1 Install Libraries

pip install torch torchvision matplotlib
    

3.2 Prepare Dataset

We will implement a VAE to recognize handwritten digits using the MNIST dataset. MNIST is a dataset consisting of 28×28 pixel grayscale images.

import torch
from torchvision import datasets, transforms

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Lambda(lambda x: x.view(-1))
])

mnist_train = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(mnist_train, batch_size=128, shuffle=True)
    

3.3 Define Model

To construct the Variational Autoencoder model, we define the encoder and decoder classes.

import torch.nn as nn

class Encoder(nn.Module):
    def __init__(self, input_dim, latent_dim):
        super(Encoder, self).__init__()
        self.fc1 = nn.Linear(input_dim, 400)
        self.fc21 = nn.Linear(400, latent_dim)  # Mean
        self.fc22 = nn.Linear(400, latent_dim)  # Log Variance
        
    def forward(self, x):
        h1 = torch.relu(self.fc1(x))
        mu = self.fc21(h1)
        logvar = self.fc22(h1)
        return mu, logvar


class Decoder(nn.Module):
    def __init__(self, latent_dim, output_dim):
        super(Decoder, self).__init__()
        self.fc1 = nn.Linear(latent_dim, 400)
        self.fc2 = nn.Linear(400, output_dim)
        
    def forward(self, z):
        h2 = torch.relu(self.fc1(z))
        return torch.sigmoid(self.fc2(h2))
    
class VAE(nn.Module):
    def __init__(self, input_dim, latent_dim):
        super(VAE, self).__init__()
        self.encoder = Encoder(input_dim, latent_dim)
        self.decoder = Decoder(latent_dim, input_dim)
        
    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std
    
    def forward(self, x):
        mu, logvar = self.encoder(x)
        z = self.reparameterize(mu, logvar)
        return self.decoder(z), mu, logvar
    

3.4 Define Loss Function

We define the loss function for VAE. Here, we will implement it using PyTorch’s functionalities.

def vae_loss(recon_x, x, mu, logvar):
    BCE = nn.functional.binary_cross_entropy(recon_x, x, reduction='sum')
    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    return BCE + KLD
    

3.5 Train the Model

We train the model using a training loop. We recompute the loss function and perform backpropagation to update the weights.

import torch.optim as optim

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = VAE(784, 20).to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-3)

model.train()
for epoch in range(10):
    train_loss = 0
    for batch_idx, (data, _) in enumerate(train_loader):
        data = data.to(device)
        optimizer.zero_grad()
        recon_batch, mu, logvar = model(data)
        loss = vae_loss(recon_batch, data, mu, logvar)
        loss.backward()
        train_loss += loss.item()
        optimizer.step()
    
    print(f'Epoch {epoch+1}, Loss: {train_loss / len(train_loader.dataset)}')
    

3.6 Check Results

Once the training is complete, we can use the model to generate new data and see how similar it is to the training data.

import matplotlib.pyplot as plt

def visualize_results(model, num_images=10):
    with torch.no_grad():
        z = torch.randn(num_images, 20).to(device)
        sample = model.decoder(z).cpu()
        sample = sample.view(num_images, 1, 28, 28)
        
    plt.figure(figsize=(10, 1))
    for i in range(num_images):
        plt.subplot(1, num_images, i + 1)
        plt.imshow(sample[i].squeeze(), cmap='gray')
        plt.axis('off')
    plt.show()

visualize_results(model)
    

4. Conclusion

In this tutorial, we explored the concept of the Variational Autoencoder and how to implement it using PyTorch. VAE has the capability to learn the latent distribution of data and generate new samples, which can be utilized in various generative modeling tasks. This technique can be applied for interesting tasks such as generating images, text, and audio data.

Furthermore, VAE can contribute to the implementation of more powerful and diverse generative models when combined with other generative models like GAN. In particular, VAE helps explore and sample from the latent space of high-dimensional data.

References