root, 라이브스마트의 작성자

Deep Learning PyTorch Course, Dynamic Programming

The recent advancements in artificial intelligence and machine learning have been remarkable, with deep learning emerging as one of the most promising fields. Deep learning is a powerful method for learning meaningful patterns from data. In this course, we will cover how to build deep learning models using the PyTorch framework and the dynamic programming techniques involved.

1. What is Dynamic Programming?

Dynamic Programming (DP) is a methodology for solving complex problems by breaking them down into simpler subproblems. Generally, it involves solving large problems by dividing them into smaller ones and then combining the results to obtain a final solution, utilizing memoization or a table to store the results of subproblems.

1.1 Characteristics of Dynamic Programming

Overlapping Subproblems: When the same subproblem is solved multiple times.
Optimal Substructure: The optimal solution of a problem can be constructed from optimal solutions of its subproblems.

2. Introduction to PyTorch

PyTorch is an open-source machine learning library developed by Facebook, primarily used for deep learning research and prototyping. Its excellent flexibility and performance have led to its widespread usage, supporting tensor operations, automatic differentiation, and GPU acceleration.

3. Example of Dynamic Programming Using PyTorch

Here, we will explain the basic usage of PyTorch by using a dynamic programming algorithm to compute the Fibonacci sequence.

3.1 Definition of the Fibonacci Sequence

The Fibonacci sequence is defined as F(n) = F(n-1) + F(n-2) (n >= 2), with initial conditions F(0) = 0 and F(1) = 1. Using dynamic programming, we can compute this sequence efficiently.

3.2 Implementation in PyTorch

import torch

def fibonacci_dynamic(n):
    # Initialize a tensor to store Fibonacci numbers
    fib = torch.zeros(n + 1, dtype=torch.long)
    fib[1] = 1

    # Fill the tensor using dynamic programming
    for i in range(2, n + 1):
        fib[i] = fib[i - 1] + fib[i - 2]

    return fib[n]

# Example Execution
n = 10
result = fibonacci_dynamic(n)
print(f"Fibonacci number at position {n} is: {result.item()}")

3.3 Code Explanation

The code above demonstrates the process of efficiently calculating the n-th term of the Fibonacci sequence using PyTorch.

Tensor Initialization: A tensor of size n+1 initialized to zero is created using the command torch.zeros(n + 1, dtype=torch.long).
Dynamic Programming Implementation: Each Fibonacci number is calculated and stored through a loop.
Returning the Result: Finally, the n-th Fibonacci number is returned.

4. Applications of Dynamic Programming

Dynamic programming is very useful in a variety of algorithmic problems and optimization problems. Notable examples include the Longest Common Subsequence (LCS) problem, Knapsack problem, and Coin Change problem.

4.1 Longest Common Subsequence (LCS)

This is a problem of finding the longest common subsequence of two strings, which can be effectively solved using dynamic programming.

def lcs(X, Y):
    m = len(X)
    n = len(Y)
    L = torch.zeros(m + 1, n + 1)

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                L[i][j] = L[i - 1][j - 1] + 1
            else:
                L[i][j] = max(L[i - 1][j], L[i][j - 1])

    return L[m][n]

# Example Execution
X = 'AGGTAB'
Y = 'GXTXAYB'
result = lcs(X, Y)
print(f"Length of LCS is: {result.item()}")

4.2 Code Explanation

The code above calculates the length of the longest common subsequence of the two strings X and Y.

Table Initialization: A 2D tensor of size (m+1)x(n+1) is created based on the lengths of the two strings.
Comparison and Update: Each character of the two strings is compared to update the LCS length.
Returning the Result: Finally, the length of the LCS is returned.

5. Conclusion

Dynamic programming is a crucial technique in coding interviews and algorithm problems. When combined with PyTorch, it can be an even more powerful tool. Through this course, we have learned the basic principles of dynamic programming and examples utilizing PyTorch. These techniques can be effectively applied to solving a variety of real-world problems.

6. References

Deep Learning PyTorch Course, Principles of GAN Operation

Generative Adversarial Networks (GAN) is an innovative deep learning technique introduced by Ian Goodfellow and his colleagues in 2014. GAN consists of two neural networks known as the ‘Generator’ and the ‘Discriminator’. These two networks compete with each other as they learn, aiming to generate high-quality data. In this course, we will explore the mechanism of GAN, its components, the training process, and an implementation example using PyTorch in detail.

1. Basic Structure of GAN

GAN is set up as a competitive structure between two neural networks, namely the Generator and the Discriminator. This structure works as follows:

Generator: Takes a random noise vector as input and generates fake data.
Discriminator: Determines whether the given data is real or fake data created by the Generator.

These two networks are trained simultaneously, with the Generator improving to create fake data that deceives the Discriminator, and the Discriminator improving to distinguish between fake and real data.

2. Mathematical Operating Principle of GAN

The goal of GAN is to minimize the following cost function:


D\*(x) = log(D(x)) + log(1 - D(G(z)))

Where,

D(x): The output of the Discriminator for the real data x. (Closer to 1 means real data, closer to 0 means fake data)
G(z): The data generated by the Generator through random noise z.
D(G(z)): The probability returned by the Discriminator for the generated data.

The goal is for the Discriminator to output 1 for real data and 0 for generated data. This allows the Generator to continuously produce data that is increasingly similar to real data.

3. Components of GAN

3.1 Generator

The Generator is typically composed of fully connected layers or convolutional layers. It takes a random vector z as input and generates information similar to real data.

3.2 Discriminator

The Discriminator receives input data (either real or generated) to judge whether it is real or fake. This can also be designed as a fully connected or convolutional network.

4. Training Process of GAN

The training of GAN consists of the following steps:

Select real data and sample a random noise vector z.
The Generator takes the noise z as input and creates fake data.
The Discriminator evaluates the real data and the data created by the Generator.
Calculate the Discriminator’s loss and perform backpropagation to update the Discriminator.
Calculate the Generator’s loss and perform backpropagation to update the Generator.

This process is repeated, improving both networks.

5. PyTorch Implementation Example of GAN

The following is a simple example of implementing GAN using PyTorch. Here, we will create a model that generates digit images using the MNIST dataset.

5.1 Install Libraries and Load Dataset


import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

First, we import the necessary libraries and load the MNIST dataset.


# Download and load the MNIST dataset
transform = transforms.Compose([
    transforms.Resize(28),
    transforms.ToTensor(),
])

mnist = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
dataloader = DataLoader(mnist, batch_size=64, shuffle=True)

5.2 Define the Generator Model

The Generator model takes random noise as input and generates images similar to real ones.


class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(100, 128),
            nn.ReLU(),
            nn.Linear(128, 256),
            nn.ReLU(),
            nn.Linear(256, 512),
            nn.ReLU(),
            nn.Linear(512, 28*28),  # MNIST image size
            nn.Tanh()  # Adjusting pixel value range to [-1, 1]
        )

    def forward(self, z):
        return self.model(z).view(-1, 1, 28, 28)

5.3 Define the Discriminator Model

The Discriminator model takes an input image and determines whether it is real or generated.


class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Flatten(),  # Flattening the image shape into one dimension
            nn.Linear(28*28, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 1),
            nn.Sigmoid()  # Output probability
        )

    def forward(self, x):
        return self.model(x)

5.4 Define Loss Function and Optimizers


# Create Generator and Discriminator
generator = Generator()
discriminator = Discriminator()

# Loss function
criterion = nn.BCELoss()

# Optimizers
optimizer_g = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
optimizer_d = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))

5.5 GAN Training Loop

Now, we define the loop to train GAN. In each epoch, we update the Discriminator and Generator.


num_epochs = 50

for epoch in range(num_epochs):
    for real_images, _ in dataloader:
        batch_size = real_images.size(0)

        # Labels for real images
        real_labels = torch.ones(batch_size, 1)
        # Labels for fake images
        fake_labels = torch.zeros(batch_size, 1)

        # Train Discriminator
        discriminator.zero_grad()
        outputs = discriminator(real_images)
        d_loss_real = criterion(outputs, real_labels)
        d_loss_real.backward()

        # Generate fake data
        noise = torch.randn(batch_size, 100)
        fake_images = generator(noise)

        outputs = discriminator(fake_images.detach())
        d_loss_fake = criterion(outputs, fake_labels)
        d_loss_fake.backward()

        optimizer_d.step()

        # Train Generator
        generator.zero_grad()
        outputs = discriminator(fake_images)
        g_loss = criterion(outputs, real_labels)
        g_loss.backward()
        optimizer_g.step()

    print(f'Epoch [{epoch+1}/{num_epochs}], d_loss: {d_loss_real.item() + d_loss_fake.item()}, g_loss: {g_loss.item()}')

6. Applications of GAN

GAN can be applied in various fields. Some examples include:

Image generation and transformation
Video generation
Music generation
Data augmentation
Medical image analysis
Style transfer

7. Conclusion

GAN is a highly innovative concept in the field of deep learning, widely used for data generation and transformation. In this course, we explored the basic principles of GAN and a simple implementation method using PyTorch. Despite being a very challenging technique due to the complexity of the model and instability during training, its potential is tremendous.

I encourage you to learn about various modifications and advanced techniques of GAN and apply them to real-world projects.

Deep Learning PyTorch Course, GAN Implementation

In this course, we will provide an in-depth explanation of how to implement GAN (Generative Adversarial Network) using PyTorch. GAN is a tool for training good generative models and is used in various fields such as image generation, style transfer, and data augmentation. The course will start with the basic concepts of GAN, implement each component, and finally help you understand how GAN works through practical examples.

1. Basic Concepts of GAN

GAN consists of two main components: the Generator and the Discriminator. These two models learn by competing against each other, which is the core of GAN.

1.1 Generator

The role of the generator is to take random noise as input and generate fake data that is similar to real data. This model learns how to mimic real data.

1.2 Discriminator

The discriminator serves to distinguish whether the input data is real data or fake data generated by the generator. This model learns how to differentiate between real and fake data.

1.3 Training Process of GAN

The training of GAN progresses in a way that the generator and discriminator compete against each other. The generator tries to create increasingly better fake data to fool the discriminator, while the discriminator strives to recognize such fake data. As this process repeats, both models progressively improve.

2. Implementing Components of GAN

Now, we will implement the key components necessary to build GAN through coding. Here, we will implement a simple GAN and create a model to generate handwritten digits from the MNIST dataset.

2.1 Setting Up the Environment

First, we will install the necessary libraries and download the MNIST dataset to prepare it.

!pip install torch torchvision matplotlib
import torch
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import matplotlib.pyplot as plt

2.2 Loading the Dataset

We load the MNIST dataset and perform preprocessing.

# Preparing the dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
mnist = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
dataloader = torch.utils.data.DataLoader(mnist, batch_size=64, shuffle=True)

2.3 Implementing the Generator Model

The generator is a neural network that takes an input noise vector and transforms it into an image.

import torch.nn as nn

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(100, 256),
            nn.ReLU(),
            nn.Linear(256, 512),
            nn.ReLU(),
            nn.Linear(512, 1024),
            nn.ReLU(),
            nn.Linear(1024, 28 * 28),
            nn.Tanh()
        )
    
    def forward(self, z):
        return self.model(z).view(-1, 1, 28, 28)

2.4 Implementing the Discriminator Model

The discriminator is a model that determines whether the input image is a real image or a fake image.

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(28 * 28, 512),
            nn.LeakyReLU(0.2),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2),
            nn.Linear(256, 1),
            nn.Sigmoid() 
        )
    
    def forward(self, img):
        return self.model(img.view(-1, 28 * 28))

2.5 Initializing the Models

We initialize the generator and discriminator models, define the loss function, and set the optimizer.

generator = Generator()
discriminator = Discriminator()

criterion = nn.BCELoss()
optimizer_gen = torch.optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
optimizer_disc = torch.optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))

2.6 GAN Training Loop

Next, we will implement the training loop for GAN. We will compute the loss for the generator and the discriminator and update the weights using the optimizer.

def train_gan(num_epochs):
    for epoch in range(num_epochs):
        for i, (imgs, _) in enumerate(dataloader):
            z = torch.randn(imgs.size(0), 100)
            real_labels = torch.ones(imgs.size(0), 1)
            fake_labels = torch.zeros(imgs.size(0), 1)

            # Training the discriminator
            optimizer_disc.zero_grad()
            outputs = discriminator(imgs)
            d_loss_real = criterion(outputs, real_labels)
            d_loss_real.backward()

            fake_imgs = generator(z)
            outputs = discriminator(fake_imgs.detach())
            d_loss_fake = criterion(outputs, fake_labels)
            d_loss_fake.backward()
            optimizer_disc.step()

            # Training the generator
            optimizer_gen.zero_grad()
            outputs = discriminator(fake_imgs)
            g_loss = criterion(outputs, real_labels)
            g_loss.backward()
            optimizer_gen.step()

        if (epoch + 1) % 10 == 0:
            print(f'Epoch [{epoch + 1}/{num_epochs}], d_loss: {d_loss_real.item() + d_loss_fake.item():.4f}, g_loss: {g_loss.item():.4f}')

3. Running GAN

Now, let’s train the GAN and visualize the generated image results.

num_epochs = 100
train_gan(num_epochs)

def show_generated_images(generator, num_images=16):
    z = torch.randn(num_images, 100)
    fake_images = generator(z).detach()
    plt.figure(figsize=(10, 10))
    for i in range(num_images):
        plt.subplot(4, 4, i + 1)
        plt.imshow(fake_images[i][0], cmap='gray')
        plt.axis('off')
    plt.show()

show_generated_images(generator)

4. Conclusion

In this course, we explored the basic concepts of GAN and the process of implementing a simple GAN model using PyTorch. GAN can be applied in various fields such as image generation and style transfer, expanding the possibilities of artificial intelligence. It would also be beneficial to explore more complex variations of GAN based on this course.

This concludes the course on implementing GAN using deep learning with PyTorch. If you have any questions or need more information during the learning process, feel free to ask in the comments!

Deep Learning PyTorch Course, CycleGAN

The advancement of deep learning has opened up possibilities for image transformation and generation models in various fields. Generative Adversarial Networks (GANs) lie at the core of these advancements, and among them, CycleGAN is particularly recognized as a useful model for style transfer.
In this article, I will explain the principles, applications, and implementation process of CycleGAN using Python’s PyTorch library in detail.

1. Overview of CycleGAN

CycleGAN is a model used to learn image transformation between two image domains. This model consists of two generators that convert images from one domain to another and two discriminators that differentiate between the generated images and the real images in their respective domains.
CycleGAN is particularly advantageous when there is no direct correspondence required between the two domains. For example, it can be used for tasks such as converting photos to paintings or transforming summer images into winter images.

2. Structure of CycleGAN

The basic structure of CycleGAN consists of four main components.

Generator G: Converts images from domain X to images in domain Y.
Generator F: Converts images from domain Y to images in domain X.
Discriminator D_X: Differentiates between real images from domain X and transformed images generated by G.
Discriminator D_Y: Differentiates between real images from domain Y and transformed images generated by F.

2.1. Loss Function

CycleGAN is trained using several loss functions. The main loss functions include:

Adversarial Loss: Evaluates the performance of the generator based on the discriminator’s ability to distinguish between generated and real images.
Cycle Consistency Loss: Applies the principle that the original image should be reconstructed after transforming from X to Y and then back to X. In other words, it should follow F(G(X)) ≈ X.

3. Implementing CycleGAN

Now, let’s implement CycleGAN using PyTorch. This process includes data preparation, model definition, setting loss functions and optimization, the training loop, and results visualization.

3.1. Data Preparation

To train CycleGAN, two image domains are needed. We will use ‘summer’ and ‘winter’ image datasets as examples. Popular public datasets such as Apple2Orange and Horse2Zebra can be utilized. The code below shows how to load the datasets.


import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader

# Define data transformations
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(256),
    transforms.ToTensor(),
])

# Load data
summer_dataset = ImageFolder(root='data/summer', transform=transform)
winter_dataset = ImageFolder(root='data/winter', transform=transform)

summer_loader = DataLoader(summer_dataset, batch_size=1, shuffle=True)
winter_loader = DataLoader(winter_dataset, batch_size=1, shuffle=True)

3.2. Model Definition

In CycleGAN, we define generators that follow a structure like U-Net to learn high-dimensional features. The following code defines a simple generator model.


import torch
import torch.nn as nn
import torch.nn.functional as F

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.hidden_layers = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=7, stride=1, padding=3),
            nn.ReLU(),
            nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1),
            nn.ReLU(),
            # Intermediate layers
            nn.Conv2d(128, 256, kernel_size=3, stride=2, padding=1),
            nn.ReLU(),
            # Decoder
            nn.ConvTranspose2d(256, 128, kernel_size=3, stride=2, padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(128, 64, kernel_size=3, stride=2, padding=1),
            nn.ReLU(),
            nn.Conv2d(64, 3, kernel_size=7, stride=1, padding=3),
        )

    def forward(self, x):
        return self.hidden_layers(x)

3.3. Loss Function and Optimization Setup

Now we will set the loss functions and optimization algorithms. We will use the binary cross-entropy loss function for real-fake discrimination and Cycle Consistency Loss.


criterion_gan = nn.BCELoss()
criterion_cycle = nn.L1Loss()

# Adam optimizer
optimizer_G = torch.optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))

3.4. Training Loop

In the training loop, we train the model and record loss values. The basic structure of a training loop can be written as follows.


num_epochs = 200
for epoch in range(num_epochs):
    for (summer_images, winter_images) in zip(summer_loader, winter_loader):
        real_A = summer_images[0].to(device)
        real_B = winter_images[0].to(device)

        # Calculate generative loss
        fake_B = generator_G(real_A)
        cycled_A = generator_F(fake_B)

        loss_cycle = criterion_cycle(cycled_A, real_A) 

        # Calculate Adversarial Loss
        loss_G = criterion_gan(discriminator_D_Y(fake_B), real_labels) + loss_cycle

        # Backpropagation and optimization
        optimizer_G.zero_grad()
        loss_G.backward()
        optimizer_G.step()

        # Record results
        print(f'Epoch [{epoch}/{num_epochs}], Loss: {loss_G.item()}')

3.5. Results Visualization

After training is complete, we generate some images to visualize the results of CycleGAN and show them to the user. The following code shows how to save and visualize the resulting images.


import matplotlib.pyplot as plt

# Function to generate and save images
def save_image(tensor, filename):
    image = tensor.detach().cpu().numpy()
    image = image.transpose((1, 2, 0))
    plt.imsave(filename, (image * 255).astype('uint8'))

# Generate images using the trained generator
with torch.no_grad():
    for i, summer_images in enumerate(summer_loader):
        fake_images = generator_G(summer_images[0].to(device))
        save_image(fake_images, f'output/image_{i}.png')
    break

4. Applications of CycleGAN

Besides image transformation and style transfer, CycleGAN can be utilized in various fields. For example, it can be used in medical imaging, video transformation, and fashion design.

4.1. Medical Image Processing

CycleGAN is greatly helpful in identifying pathological changes in medical images. By converting a patient’s CT scan to an MRI image, it can make it easier for doctors to compare and analyze.

4.2. Video Transformation

CycleGAN can be used to transform the style of a video from one to another. For example, it can be used to convert summer landscapes in a real-time video stream to winter settings.

4.3. Fashion Design

CycleGAN can bring innovation to the fashion design field. It can assist designers in simulating and designing clothing in various styles.

5. Conclusion

CycleGAN is a very useful tool in the field of image transformation. This model is suitable for various applications such as video and fashion and plays a crucial role in overcoming limitations in the vision field.
In this article, we explored the basic principles of CycleGAN, its implementation, and the process of result visualization in detail. Future research and advancements are anticipated, and understanding CycleGAN will hopefully greatly aid in future developments.

Deep Learning PyTorch Course, DCGAN

In this course, we will take a closer look at DCGAN (Deep Convolutional GAN), a type of Generative Adversarial Networks (GAN), which is a field of deep learning. DCGAN is a model specialized for image generation and transformation tasks, particularly excelling in high-resolution image generation.

1. Understanding GAN

GAN consists of two neural networks: a Generator and a Discriminator. The Generator generates fake data that resembles real data, while the Discriminator distinguishes between real and fake data. These two networks compete and learn from each other, with the Generator increasingly generating more realistic data.

1.1 Basic Concept of GAN

The learning process of GAN occurs as follows:

1. The Generator G takes a random noise vector z as input and generates a fake image G(z).
2. The Discriminator D takes both a real image x and the generated image G(z) as input and outputs the probabilities of each being real/fake.
3. The Generator learns to mislead D into thinking the fake image is real, while the Discriminator learns to accurately distinguish real images.

2. Concept of DCGAN

DCGAN extends GAN to deep convolutional networks. DCGAN uses convolutional layers to learn a spatial hierarchy for better performance in image generation tasks. DCGAN has the following structural features:

Uses stride for downsampling instead of traditional pooling layers.
Applies Batch Normalization to stabilize learning.
Uses ReLU activation function, and Tanh activation function in the output layer of the Generator.

2.1 Structure of DCGAN

The structure of DCGAN is as follows:

Generator G:
- Input: Random noise vector z
- Layers: Several transposed convolution layers with batch normalization and ReLU activation function
- Output: Generated image
Discriminator D:
- Input: Image (real or generated)
- Layers: Several convolution layers with batch normalization and Leaky ReLU activation function
- Output: Probability of being real/fake

3. Python Implementation of DCGAN

Now, we will implement DCGAN in Python. Using PyTorch, we can train the model at high speed utilizing various supported GPUs. The following code establishes the basic structure of DCGAN.

3.1 Installing Required Libraries

!pip install torch torchvision

3.2 Loading the Dataset

In this example, we will use the MNIST dataset to generate handwritten digits. We will proceed to load and preprocess the data.


import torch
import torchvision
import torchvision.transforms as transforms

# Dataset transformation definition: Normalize images to 0-1 and convert to tensor
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Load MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)

3.3 Defining the Generator and Discriminator

Now we will implement the Generator and Discriminator models. As explained earlier, the Generator uses transposed convolution layers to generate images, while the Discriminator uses convolution layers to discriminate images.


import torch.nn as nn

class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.ConvTranspose2d(100, 256, 4, 1, 0, bias=False),
            nn.BatchNorm2d(256),
            nn.ReLU(True),
            nn.ConvTranspose2d(256, 128, 4, 2, 1, bias=False),
            nn.BatchNorm2d(128),
            nn.ReLU(True),
            nn.ConvTranspose2d(128, 1, 4, 2, 1, bias=False),
            nn.Tanh()
        )

    def forward(self, input):
        return self.model(input)

class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Conv2d(1, 128, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2),
            nn.Conv2d(128, 256, 4, 2, 1, bias=False),
            nn.BatchNorm2d(256),
            nn.LeakyReLU(0.2),
            nn.Conv2d(256, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
        )

    def forward(self, input):
        return self.model(input)

3.4 Model Initialization

We will instantiate the Generator and Discriminator models and define the loss function and optimization algorithm. Here, we will use binary cross-entropy loss and the Adam optimizer.


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Instantiate models
generator = Generator().to(device)
discriminator = Discriminator().to(device)

# Define loss function and optimizer
criterion = nn.BCELoss()
optimizerG = torch.optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
optimizerD = torch.optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))

3.5 Training Loop

We will proceed with the training of DCGAN. In each iteration, we will log the loss of the Generator and Discriminator, and output some sample images to verify that the model is learning correctly.


num_epochs = 50
for epoch in range(num_epochs):
    for i, (images, _) in enumerate(train_loader):
        # Prepare training data
        images = images.to(device)

        # Define labels
        batch_size = images.size(0)
        labels = torch.full((batch_size,), 1, device=device)  # Labels for real images
        noise = torch.randn(batch_size, 100, 1, 1, device=device)  # Input noise for the Generator

        # ------------------- Discriminator Training -------------------
        optimizerD.zero_grad()

        # Loss for real images
        output = discriminator(images).view(-1)
        lossD_real = criterion(output, labels)
        lossD_real.backward()

        # Generate fake images and calculate loss
        fake_images = generator(noise)
        labels.fill_(0)  # Labels for fake images
        output = discriminator(fake_images.detach()).view(-1)
        lossD_fake = criterion(output, labels)
        lossD_fake.backward()

        # Optimize Discriminator
        optimizerD.step()

        # ------------------- Generator Training -------------------
        optimizerG.zero_grad()
        labels.fill_(1)  # The Generator wants to classify fake images as real
        output = discriminator(fake_images).view(-1)
        lossG = criterion(output, labels)
        lossG.backward()
        optimizerG.step()

    # Output results
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss D: {lossD_real.item() + lossD_fake.item()}, Loss G: {lossG.item()}')

3.6 Visualizing Results

After the training, generated images can be visualized to check the results. For example, we can use matplotlib to output some sample images.


import matplotlib.pyplot as plt

def show_generated_images(num_images=25):
    noise = torch.randn(num_images, 100, 1, 1, device=device)
    with torch.no_grad():
        generated_images = generator(noise).cpu().detach().numpy()
    generated_images = (generated_images + 1) / 2  # Convert to [0, 1] range

    plt.figure(figsize=(10, 10))
    for i in range(num_images):
        plt.subplot(5, 5, i + 1)
        plt.imshow(generated_images[i][0], cmap='gray')
        plt.axis('off')
    plt.show()

show_generated_images()

4. Conclusion

In this course, we explored the theory and implementation process of DCGAN. GAN holds great potential in generative modeling, and DCGAN demonstrates particularly strong performance in the field of image generation. We encourage you to apply real cases to directly experience the model training process.

Challenge yourself with various image generation tasks using DCGAN!