Deep Learning PyTorch Course, Concept of Generative Models

Welcome to the world of deep learning! Today, we will delve into why generative models are important and how to implement them in PyTorch.

1. What is a Generative Model?

A Generative Model refers to a model that generates new data by modeling a given data distribution. It originates from statistical concepts and aims to understand the distribution from a given dataset and create new samples based on it.

Generative models are broadly divided into two types:

  • Probabilistic Generative Models
  • Deep Generative Models such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs)

2. Applications of Generative Models

Generative models are used in various fields:

  • Image Generation: For example, high-resolution images can be generated using GANs.
  • Text Generation: It can be used in natural language processing to automatically write articles on a specific topic.
  • Music Generation: AI can assist in composing new music.
  • Model Training: It can be used as a data augmentation tool to improve the performance of the model.

3. How Generative Models Work

Generative models operate by learning the underlying structure of the data. These models focus on generating new samples that are similar to the data and do so through the following processes.

  1. Data Collection: Sufficiently diverse data must be collected to train the model.
  2. Model Design: Choose a model architecture that can well reflect the characteristics of the data.
  3. Training: Train the model to learn the distribution of the data.
  4. Sampling: Use the trained model to generate new data.

4. Implementing Generative Models in PyTorch

Now, let’s implement a simple generative model using PyTorch. In this section, we will create a simple GAN model.

4.1 Overview of GAN

GAN consists of two neural network models, namely the Generator and the Discriminator. The goal of the generator is to produce fake data that is similar to real data, while the objective of the discriminator is to determine whether the input data is real or fake. The two networks are in a competitive relationship, improving each other’s performance in the process.

4.2 GAN Code Example

Below is an example code for implementing GAN using PyTorch:

    
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

# Hyperparameters
latent_size = 100
num_epochs = 200
batch_size = 64
learning_rate = 0.0002

# Transforms
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# MNIST dataset
mnist = torchvision.datasets.MNIST(root='./data/', train=True, transform=transform, download=True)
data_loader = torch.utils.data.DataLoader(dataset=mnist, batch_size=batch_size, shuffle=True)

# Generator model
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.network = nn.Sequential(
            nn.Linear(latent_size, 256),
            nn.ReLU(True),
            nn.Linear(256, 512),
            nn.ReLU(True),
            nn.Linear(512, 784),
            nn.Tanh()
        )

    def forward(self, x):
        return self.network(x).view(-1, 1, 28, 28)

# Discriminator model
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.network = nn.Sequential(
            nn.Linear(784, 512),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(256, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.network(x.view(-1, 784))

# Initialize models
generator = Generator()
discriminator = Discriminator()

# Loss and optimizer
criterion = nn.BCELoss()
optimizer_g = optim.Adam(generator.parameters(), lr=learning_rate)
optimizer_d = optim.Adam(discriminator.parameters(), lr=learning_rate)

# Training
for epoch in range(num_epochs):
    for i, (real_images, _) in enumerate(data_loader):
        # Labels
        real_labels = torch.ones(batch_size, 1)
        fake_labels = torch.zeros(batch_size, 1)

        # Train discriminator
        optimizer_d.zero_grad()
        outputs = discriminator(real_images)
        d_loss_real = criterion(outputs, real_labels)

        z = torch.randn(batch_size, latent_size)
        fake_images = generator(z)
        outputs = discriminator(fake_images.detach())
        d_loss_fake = criterion(outputs, fake_labels)

        d_loss = d_loss_real + d_loss_fake
        d_loss.backward()
        optimizer_d.step()

        # Train generator
        optimizer_g.zero_grad()
        outputs = discriminator(fake_images)
        g_loss = criterion(outputs, real_labels)
        g_loss.backward()
        optimizer_g.step()

    # Print losses and save generated images
    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch + 1}/{num_epochs}], d_loss: {d_loss.item():.4f}, g_loss: {g_loss.item():.4f}')

        with torch.no_grad():
            fake_images = generator(z)
            fake_images = fake_images.view(-1, 1, 28, 28)
            grid = torchvision.utils.make_grid(fake_images, normalize=True)
            plt.imshow(grid.detach().numpy().transpose(1, 2, 0))
            plt.show()
    
    

4.3 Code Explanation

The above code shows the implementation of a simple GAN model. Let’s take a closer look at each part:

  • Data Loading: Downloads and normalizes the MNIST dataset.
  • Generator: Takes a random vector of 100 dimensions as input and generates a 28×28 size image.
  • Discriminator: Takes the input image and predicts whether it is real or fake.
  • Training Process: Trains the discriminator and generator alternately. The discriminator learns to distinguish between real and generated images, while the generator learns to produce images that fool the discriminator.

5. Future and Development Direction of Generative Models

Generative models have many possibilities, and their applications are expected to grow in various fields. In particular, deep generative models such as GANs and VAEs have made significant advancements in recent years, and new techniques and architectures for them are continuously being developed.

Moreover, generative models provide innovative opportunities in diverse areas such as healthcare, arts, autonomous driving, and robotics, and ethical and legal issues arising from them are also important factors to consider.

Conclusion

Today, we explored the concept of generative models and a simple GAN implementation using PyTorch. Generative models hold great potential in data generation, data augmentation, and various other fields, and future advancements are expected. Now, we hope you will step into the world of generative models!

© 2023 Deep Learning Institute

Deep Learning PyTorch Course, Variational Autoencoder

Deep learning is a field of machine learning that utilizes neural networks to learn patterns from data. In this article, we will delve deeply into Variational Autoencoder (VAE).

1. What is an Autoencoder?

An autoencoder is an unsupervised learning method that generally learns the process of compressing input data and then reconstructing it. An autoencoder consists of two parts: an encoder and a decoder.

  • Encoder: Maps input data to a latent space.
  • Decoder: Restores data from the latent space to the original input data.

1.1 The Process of Autoencoder

The training process of an autoencoder proceeds in a way that reduces the difference between the input data and the output data. To do this, a loss function is used to measure the difference between the actual output and the predicted output. The Mean Squared Error (MSE) loss function is commonly used.

2. Variational Autoencoder (VAE)

The Variational Autoencoder is an extended model of the traditional autoencoder, aimed at estimating the probability distribution of the input data. VAE, as a generative model, has the ability to generate new data.

2.1 Components of VAE

VAE consists of the following two main components:

  • Latent Variable: When encoding input data, the encoder outputs the mean (μ) and standard deviation (σ) to estimate the distribution of the latent variables.
  • Reconstruction Loss: Measures the difference between the output generated by the decoder and the original input.

2.2 Loss Function

VAE’s loss function can be divided into two parts:

  • Reconstruction Loss: Measures the loss between the actual input and the reconstructed input.
  • Kullback-Leibler Divergence: Measures the difference between the latent distribution and the normal distribution.

Definition of VAE Loss Function:

L = E[log p(x|z)] - D_{KL}(q(z|x) || p(z))
    

Where:

  • E[log p(x|z)]: The log likelihood for z given input x.
  • D_{KL}: Kullback-Leibler Divergence which measures the difference between two distributions.

3. Implementing VAE with PyTorch

Now that we understand the basic components and loss function of the Variational Autoencoder, let’s implement VAE using PyTorch.

3.1 Install Libraries

pip install torch torchvision matplotlib
    

3.2 Prepare Dataset

We will implement a VAE to recognize handwritten digits using the MNIST dataset. MNIST is a dataset consisting of 28×28 pixel grayscale images.

import torch
from torchvision import datasets, transforms

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Lambda(lambda x: x.view(-1))
])

mnist_train = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(mnist_train, batch_size=128, shuffle=True)
    

3.3 Define Model

To construct the Variational Autoencoder model, we define the encoder and decoder classes.

import torch.nn as nn

class Encoder(nn.Module):
    def __init__(self, input_dim, latent_dim):
        super(Encoder, self).__init__()
        self.fc1 = nn.Linear(input_dim, 400)
        self.fc21 = nn.Linear(400, latent_dim)  # Mean
        self.fc22 = nn.Linear(400, latent_dim)  # Log Variance
        
    def forward(self, x):
        h1 = torch.relu(self.fc1(x))
        mu = self.fc21(h1)
        logvar = self.fc22(h1)
        return mu, logvar


class Decoder(nn.Module):
    def __init__(self, latent_dim, output_dim):
        super(Decoder, self).__init__()
        self.fc1 = nn.Linear(latent_dim, 400)
        self.fc2 = nn.Linear(400, output_dim)
        
    def forward(self, z):
        h2 = torch.relu(self.fc1(z))
        return torch.sigmoid(self.fc2(h2))
    
class VAE(nn.Module):
    def __init__(self, input_dim, latent_dim):
        super(VAE, self).__init__()
        self.encoder = Encoder(input_dim, latent_dim)
        self.decoder = Decoder(latent_dim, input_dim)
        
    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std
    
    def forward(self, x):
        mu, logvar = self.encoder(x)
        z = self.reparameterize(mu, logvar)
        return self.decoder(z), mu, logvar
    

3.4 Define Loss Function

We define the loss function for VAE. Here, we will implement it using PyTorch’s functionalities.

def vae_loss(recon_x, x, mu, logvar):
    BCE = nn.functional.binary_cross_entropy(recon_x, x, reduction='sum')
    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    return BCE + KLD
    

3.5 Train the Model

We train the model using a training loop. We recompute the loss function and perform backpropagation to update the weights.

import torch.optim as optim

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = VAE(784, 20).to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-3)

model.train()
for epoch in range(10):
    train_loss = 0
    for batch_idx, (data, _) in enumerate(train_loader):
        data = data.to(device)
        optimizer.zero_grad()
        recon_batch, mu, logvar = model(data)
        loss = vae_loss(recon_batch, data, mu, logvar)
        loss.backward()
        train_loss += loss.item()
        optimizer.step()
    
    print(f'Epoch {epoch+1}, Loss: {train_loss / len(train_loader.dataset)}')
    

3.6 Check Results

Once the training is complete, we can use the model to generate new data and see how similar it is to the training data.

import matplotlib.pyplot as plt

def visualize_results(model, num_images=10):
    with torch.no_grad():
        z = torch.randn(num_images, 20).to(device)
        sample = model.decoder(z).cpu()
        sample = sample.view(num_images, 1, 28, 28)
        
    plt.figure(figsize=(10, 1))
    for i in range(num_images):
        plt.subplot(1, num_images, i + 1)
        plt.imshow(sample[i].squeeze(), cmap='gray')
        plt.axis('off')
    plt.show()

visualize_results(model)
    

4. Conclusion

In this tutorial, we explored the concept of the Variational Autoencoder and how to implement it using PyTorch. VAE has the capability to learn the latent distribution of data and generate new samples, which can be utilized in various generative modeling tasks. This technique can be applied for interesting tasks such as generating images, text, and audio data.

Furthermore, VAE can contribute to the implementation of more powerful and diverse generative models when combined with other generative models like GAN. In particular, VAE helps explore and sample from the latent space of high-dimensional data.

References

Deep Learning Pytorch Course, Bellman Optimality Equation

As the combination of deep learning and reinforcement learning continues to advance, the Bellman Optimum Equation has become one of the core concepts in reinforcement learning. In this post, we will discuss the basic principles of the Bellman Optimum Equation, how to implement it using deep learning, and provide code examples using PyTorch.

1. Understanding the Bellman Optimum Equation

The Bellman Optimum Equation defines how to choose the optimal action in each state of a Markov Decision Process (MDP). This equation can be used when trying to maximize the total sum of future rewards.

1.1 Markov Decision Process (MDP)

An MDP consists of the following four elements:

  • S: State space
  • A: Action space
  • P: Transition probability
  • R: Reward function

1.2 Bellman Equation

The Bellman Equation expresses the value of the current state when choosing the optimal action at a specific state s as follows:

V(s) = max_a [R(s,a) + γ * Σ P(s'|s,a) * V(s')]

Where:

  • V(s) is the value of state s
  • a is the possible action
  • γ is the discount factor (0 ≤ γ < 1)
  • P(s'|s,a) is the probability of transitioning to the next state s' after taking action a in state s
  • R(s,a) is the reward of taking action a in the current state

2. The Bellman Optimum Equation and Deep Learning

When combining deep learning with reinforcement learning, techniques such as Q-learning are mainly used to approximate the Bellman Equation. Here, the Q-function represents the expected reward when taking a specific action in a specific state.

2.1 Bellman Equation of Q-learning

In the case of Q-learning, the Bellman Equation is expressed as follows:

Q(s,a) = R(s,a) + γ * max_a' Q(s',a')

3. Implementing the Bellman Equation with Python and PyTorch

In this section, we will look at how to implement a simple Q-learning agent using PyTorch.

3.1 Preparing the Environment

First, we need to install the required libraries. The following libraries are necessary:

pip install torch numpy gym

3.2 Defining the Q-Network

Next, we will define the Q-network, which will be implemented using a neural network from PyTorch.

import torch
import torch.nn as nn
import numpy as np

class QNetwork(nn.Module):
    def __init__(self, state_dim, action_dim):
        super(QNetwork, self).__init__()
        self.fc1 = nn.Linear(state_dim, 64)
        self.fc2 = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, action_dim)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        return self.fc3(x)

3.3 Defining the Agent Class

Now we will define the agent class that will perform the Q-learning algorithm.

class Agent:
    def __init__(self, state_dim, action_dim, learning_rate=0.001, gamma=0.99):
        self.action_dim = action_dim
        self.gamma = gamma
        self.q_network = QNetwork(state_dim, action_dim)
        self.optimizer = torch.optim.Adam(self.q_network.parameters(), lr=learning_rate)

    def choose_action(self, state, epsilon):
        if np.random.rand() < epsilon:  # explore
            return np.random.choice(self.action_dim)
        else:  # exploit
            state_tensor = torch.FloatTensor(state)
            with torch.no_grad():
                q_values = self.q_network(state_tensor)
            return torch.argmax(q_values).item()

    def learn(self, state, action, reward, next_state, done):
        state_tensor = torch.FloatTensor(state)
        next_state_tensor = torch.FloatTensor(next_state)

        q_values = self.q_network(state_tensor)
        target = reward + (1-done) * self.gamma * torch.max(self.q_network(next_state_tensor))

        loss = nn.MSELoss()(q_values[action], target)

        self.optimizer.zero_grad()
        loss.backward()
        self.optimizer.step()

3.4 Defining the Training Process

Now we will define the process of training the agent. We will set up a simple environment using OpenAI’s Gym library.

import gym

def train_agent(episodes=1000):
    env = gym.make('CartPole-v1')
    agent = Agent(state_dim=4, action_dim=2)

    for episode in range(episodes):
        state = env.reset()
        done = False
        total_reward = 0
        epsilon = max(0.1, 1.0 - episode / 500)  # epsilon-greedy to introduce significant variability

        while not done:
            action = agent.choose_action(state, epsilon)
            next_state, reward, done, _ = env.step(action)
            agent.learn(state, action, reward, next_state, done)
            state = next_state
            total_reward += reward

        print(f'Episode: {episode}, Total Reward: {total_reward}')

    env.close()

# Start training
train_agent()

4. Result Analysis and Conclusion

After training is complete, you can visualize how well the agent performs in the CartPole environment. Throughout the training process, you can observe how the agent behaves and improves its performance. The concept of following the optimal path highlighted by the Bellman Optimum Equation becomes even more powerful when used in conjunction with deep learning.

In this tutorial, we understood the concept of the Bellman Optimum Equation and explored how to implement a Q-learning agent using PyTorch. The Bellman Equation is a fundamental principle of reinforcement learning and is crucial in various application areas. We hope this will greatly aid you in your future journey in deep learning and reinforcement learning.

This article has been written to help understand deep learning and reinforcement learning. We hope it has been helpful with various examples.

Deep Learning PyTorch Course, Bellman Expectation Equation

The development of deep learning and reinforcement learning has brought innovative changes to many fields. Among these, the Bellman Expectation Equation is a crucial component of reinforcement learning. In this process, we will delve into the concept of the Bellman Expectation Equation, its mathematical background, and how to implement it using PyTorch.

1. What is the Bellman Expectation Equation?

The Bellman Expectation Equation is a formula used in dynamic programming that defines the value of a certain state. It represents the expected reward when moving the agent according to a given policy (the rule for selecting actions).

The Bellman Expectation Equation is expressed as follows:


V^\pi(s) = \mathbb{E}_\pi \left[ r_t + \gamma V^\pi(s_{t+1}) | s_t = s \right]

Here, V^\pi(s) is the expected value under policy \pi at state s, r_t is the reward at time t, \gamma is the discount factor, and s_{t+1} is the next state.
Using the Bellman Expectation Equation is extremely useful for evaluating all possible policies and finding the optimal policy.

2. Key Concepts of the Bellman Expectation Equation

To understand the Bellman Expectation Equation, the following basic concepts are necessary:

2.1 States and Actions

In reinforcement learning, a State indicates the situation the agent is currently in, while an Action is the set of actions the agent can choose in this state. These two elements are essential for the agent to interact with the environment.

2.2 Policy

A Policy is the rule that determines which action the agent will take in a specific state. A policy can be defined probabilistically, and the optimal policy selects the action that yields the maximum expected reward in a given state.

2.3 Reward

A Reward is the feedback received from the environment when the agent selects a specific action. Rewards serve as a criterion for evaluating how well the agent is achieving its goals.

3. Geometric Interpretation of the Bellman Expectation Equation

Geometrically interpreting the Bellman Expectation Equation, the value of each state can be viewed as the average of future expected rewards attainable through that action. This means calculating the expected reward from the action taken by the agent in a given state.

4. Implementing the Bellman Expectation Equation in PyTorch

Now, let’s explore how to implement the Bellman Expectation Equation using PyTorch. A simple example would be to train an agent using OpenAI’s Gym library and apply the Bellman Expectation Equation through this.

4.1. Setting Up the Environment

First, install the necessary libraries and set up the environment. OpenAI Gym is a library that provides various reinforcement learning environments.


!pip install gym
!pip install torch
!pip install matplotlib

4.2. Implementing the Bellman Expectation Equation

The example below implements an MDP (Markov Decision Process) environment with a simple table state space and applies the Bellman Expectation Equation.


import numpy as np
import torch

class SimpleMDP:
def __init__(self):
self.states = [0, 1, 2]
self.actions = [0, 1] # 0: left, 1: right
self.transition_probs = {
0: {0: (0, 0.8), 1: (1, 0.2)},
1: {0: (0, 0.3), 1: (2, 0.7)},
2: {0: (2, 1.0), 1: (2, 1.0)},
}
self.rewards = [0, 1, 10] # rewards for each state
self.gamma = 0.9 # discount factor

def get_next_state(self, state, action):
next_state, prob = self.transition_probs[state][action]
return next_state, prob

def get_reward(self, state):
return self.rewards[state]

def value_iteration(self, theta=1e-6):
V = np.zeros(len(self.states)) # initialize state values
while True:
delta = 0
for s in self.states:
v = V[s]
V[s] = max(sum(prob * (self.get_reward(next_state) + self.gamma * V[next_state])
for next_state, prob in [self.get_next_state(s, a) for a in self.actions])
for a in self.actions)
delta = max(delta, abs(v - V[s]))
if delta < theta: break return V # Initialize MDP environment and perform value iteration mdp_environment = SimpleMDP() values = mdp_environment.value_iteration() print("State values:", values)

4.3. Code Explanation

In the above code, the SimpleMDP class defines the states, actions, and transition probabilities of a simple Markov decision process. It uses the value iteration algorithm to update the value of each state. The algorithm calculates the expected reward for the next state for all possible actions at each state and selects the maximum value among them.

5. Experiments and Results

After applying the Bellman Expectation Equation, the obtained state values are output as follows.


State values: [0.0, 9.0, 10.0]

These results represent the expected rewards the agent can achieve in each state. The value of 10 in state 2 indicates that this state has the highest reward.

6. Conclusion

In this lecture, we covered the theoretical background of the Bellman Expectation Equation as well as how to practically implement it using PyTorch. The Bellman Expectation Equation is a fundamental formula in reinforcement learning, essential for optimizing agent behavior in various environments.

We hope you continue to explore and practice various techniques and theories in reinforcement learning. May all who have taken their first steps into the world of deep learning and reinforcement learning achieve great results through the Bellman Expectation Equation.

Deep Learning PyTorch Course, Principles of Monte Carlo Tree Search

In the field of deep learning and artificial intelligence, various algorithms exist for problem solving. One of them, Monte Carlo Tree Search (MCTS), is a widely used algorithm for decision-making in uncertain environments. In this article, we will deeply explain the principles of MCTS and provide an implementation example using PyTorch.

Overview of Monte Carlo Tree Search

MCTS is an algorithm utilized in various fields such as game theory, optimization problems, and robotics, which simulates situations and makes decisions based on the results. The core idea of MCTS is to explore the tree through random sampling. In other words, it tests various actions possible from a specific state and evaluates how good each action is to determine the optimal action.

Four Stages of MCTS

  1. Selection: Consider all possible actions from the current state and proceed to the next state according to the selection criteria.
  2. Expansion: Add a new node from the selected state. This node represents the resulting state after performing the selected action.
  3. Simulation: Randomly select actions from the expanded node to play through to the end of the game and evaluate the results.
  4. Backpropagation: Learn from the simulation results to the parent node. At this time, update the number of wins, visitations, etc., for the nodes.

Combining with Deep Learning

MCTS can perform the basic stages using simple rule-based methods, but it can exhibit even stronger performance when combined with deep learning. For example, deep learning can be used to predict the value of actions or more accurately evaluate the value of states. This is particularly effective in complex environments.

Implementing MCTS with PyTorch

Now, let’s implement Monte Carlo Tree Search using PyTorch. We will use a simple Tic-Tac-Toe game as an example.

Setting Up the Environment

First, we will install the required libraries:

pip install torch numpy

Building the Game Environment

We will build a basic environment for the Tic-Tac-Toe game:

import numpy as np

class TicTacToe:
    def __init__(self):
        self.board = np.zeros((3, 3), dtype=int)
        self.current_player = 1

    def reset(self):
        self.board.fill(0)
        self.current_player = 1

    def available_actions(self):
        return np.argwhere(self.board == 0)

    def take_action(self, action):
        self.board[action[0], action[1]] = self.current_player
        self.current_player = 3 - self.current_player  # Switch between players

    def is_winner(self, player):
        return any(np.all(self.board[i, :] == player) for i in range(3)) or \
               any(np.all(self.board[:, j] == player) for j in range(3)) or \
               np.all(np.diag(self.board) == player) or \
               np.all(np.diag(np.fliplr(self.board)) == player)

    def is_full(self):
        return np.all(self.board != 0)

    def get_state(self):
        return self.board.copy()

Implementing MCTS

Now we will implement the MCTS algorithm. The code below shows a basic construction method for MCTS.

import random

class MCTSNode:
    def __init__(self, state, parent=None):
        self.state = state
        self.parent = parent
        self.children = []
        self.visits = 0
        self.wins = 0

    def ucb1(self, exploration_constant=1.41):
        if self.visits == 0:
            return float("inf")
        return self.wins / self.visits + exploration_constant * np.sqrt(np.log(self.parent.visits) / self.visits)

def mcts(root_state, iterations):
    root_node = MCTSNode(root_state)
    
    for _ in range(iterations):
        node = root_node
        state = root_state.copy()

        # Selection
        while node.children:
            node = max(node.children, key=lambda n: n.ucb1())
            state.take_action(node.state)

        # Expansion
        available_actions = state.available_actions()
        if available_actions.size > 0:
            action = random.choice(available_actions)
            state.take_action(action)
            new_node = MCTSNode(action, parent=node)
            node.children.append(new_node)
            node = new_node

        # Simulation
        while not state.is_full():
            available_actions = state.available_actions()
            if not available_actions.any():
                break
            action = random.choice(available_actions)
            state.take_action(action)
            if state.is_winner(1):  # Player 1 is the maximizer
                node.wins += 1

        # Backpropagation
        while node is not None:
            node.visits += 1
            node = node.parent
            
    return max(root_node.children, key=lambda n: n.visits).state

Running the Game

Finally, let’s execute the actual game using MCTS.

def play_game():
    game = TicTacToe()
    game.reset()

    while not game.is_full():
        if game.current_player == 1:
            action = mcts(game.get_state(), iterations=1000)
        else:
            available_actions = game.available_actions()
            action = random.choice(available_actions)

        game.take_action(action)
        print(game.get_state())
        
        if game.is_winner(1):
            print("Player 1 wins!")
            return
        elif game.is_winner(2):
            print("Player 2 wins!")
            return
    
    print("Draw!")

play_game()

Conclusion

In this article, we examined the principles of Monte Carlo Tree Search and how to implement it using PyTorch. MCTS is a powerful tool for modeling decision-making processes, particularly in uncertain environments. We hope this simple Tic-Tac-Toe example helped in understanding the basic flow of MCTS. We encourage you to study the applications of MCTS in more complex games or problems in the future.