Deep Learning PyTorch Course, Implementation of Tic-Tac-Toe Game using Monte Carlo Tree Search

This article explains the process of implementing a Tic-Tac-Toe game using the Monte Carlo Tree Search (MCTS) algorithm with deep learning and PyTorch. We will primarily understand how MCTS works and how AI can play the Tic-Tac-Toe game using it.

Tic-Tac-Toe Game Overview

Tic-Tac-Toe is a game played on a 3×3 square grid where two players take turns placing X or O. The player who manages to place three of their marks in a row, column, or diagonal wins the game.

Step 1: Environment Setup

To follow this tutorial, you need to install the necessary packages. Here are the main libraries required.

pip install torch numpy matplotlib

Step 2: Game Environment Implementation

First, we implement the Tic-Tac-Toe game environment. We must define the game’s rules and create a class to represent the state.


import numpy as np

class TicTacToe:
    def __init__(self):
        self.board = np.zeros((3, 3), dtype=int)  # 0: empty, 1: X, -1: O
        self.current_player = 1  # 1: X's turn, -1: O's turn

    def reset(self):
        self.board = np.zeros((3, 3), dtype=int)
        self.current_player = 1

    def make_move(self, row, col):
        if self.board[row, col] == 0:
            self.board[row, col] = self.current_player
            self.current_player *= -1

    def check_winner(self):
        for player in [1, -1]:
            for row in range(3):
                if np.all(self.board[row, :] == player):  # Check rows
                    return player
            for col in range(3):
                if np.all(self.board[:, col] == player):  # Check columns
                    return player
            if np.all(np.diag(self.board) == player) or np.all(np.diag(np.fliplr(self.board)) == player):
                return player
        return None if np.any(self.board == 0) else 0  # Game is ongoing
        
    def display(self):
        symbols = {1: 'X', -1: 'O', 0: ' '}
        for row in self.board:
            print("|".join(symbols[x] for x in row))
            print("-" * 5)
        print("\n")

# Game Test
game = TicTacToe()
game.make_move(0, 0)
game.display()
game.make_move(1, 1)
game.display()

Step 3: Monte Carlo Tree Search (MCTS) Algorithm

MCTS is a method to solve decision-making problems in uncertain situations. Essentially, this algorithm consists of the following four steps:

Selection: Select a node from the current tree.
Expansion: Expand possible actions from the selected node.
Simulation: Play the game from the expanded node to obtain the result.
Backpropagation: Update the information for the parent node with the result.

MCTS Class Implementation


import random
from collections import defaultdict

class MCTSNode:
    def __init__(self, state, parent=None):
        self.state = state  # Current game state
        self.parent = parent
        self.children = []  # Child nodes
        self.wins = 0  # Number of wins
        self.visits = 0  # Number of visits

    def ucb1(self):
        if self.visits == 0:
            return float('inf')  # Select nodes that have not been visited before
        return self.wins / self.visits + np.sqrt(2 * np.log(self.parent.visits) / self.visits)

class MCTS:
    def __init__(self, iterations):
        self.iterations = iterations

    def search(self, game):
        root = MCTSNode(state=game)

        for _ in range(self.iterations):
            node = self.select(root)
            winner = self.simulate(node.state)
            self.backpropagate(node, winner)

        return max(root.children, key=lambda child: child.visits).state

    def select(self, node):
        while node.children:
            node = max(node.children, key=lambda child: child.ucb1())
        if node.visits > 0:
            for action in self.get_valid_moves(node.state):
                child_state = node.state.copy()
                child_state.make_move(action[0], action[1])
                child_node = MCTSNode(state=child_state, parent=node)
                node.children.append(child_node)
        return random.choice(node.children) if node.children else node

    def simulate(self, state):
        current_player = state.current_player
        while True:
            winner = state.check_winner()
            if winner is not None:
                return winner
            valid_moves = self.get_valid_moves(state)
            move = random.choice(valid_moves)
            state.make_move(move[0], move[1])

    def backpropagate(self, node, winner):
        while node is not None:
            node.visits += 1
            if winner == 1:  # X wins
                node.wins += 1
            node = node.parent

    def get_valid_moves(self, state):
        return [(row, col) for row in range(3) for col in range(3) if state.board[row, col] == 0]

# MCTS Usage Example
mcts = MCTS(iterations=1000)
move = mcts.search(game)
print("AI's choice:", move)

Step 4: Implementing the Game Between AI and User

Now, let’s implement a game between the user and the AI using the completed MCTS.


def play_game():
    game = TicTacToe()
    while True:
        game.display()
        if game.current_player == 1:  # User's turn
            row, col = map(int, input("Enter the row and column numbers (0, 1, or 2): ").split())
            game.make_move(row, col)
        else:  # AI's turn
            print("AI is choosing...")
            move = mcts.search(game)
            game.make_move(move[0], move[1])
            print(f"AI chose the position: {move}")

        winner = game.check_winner()
        if winner is not None:
            game.display()
            if winner == 1:
                print("Congratulations! You won!")
            elif winner == -1:
                print("AI won!")
            else:
                print("It's a draw!")
            break

play_game()

Conclusion

In this tutorial, we explored the basic guidelines of deep learning and PyTorch. The process of implementing a simple Tic-Tac-Toe AI using Monte Carlo Tree Search can be technically challenging, but it was an extremely interesting experience in the end. We hope to move forward and develop a more complete AI using various algorithms and techniques.