This article explains the process of implementing a Tic-Tac-Toe game using the Monte Carlo Tree Search (MCTS) algorithm with deep learning and PyTorch. We will primarily understand how MCTS works and how AI can play the Tic-Tac-Toe game using it.
Tic-Tac-Toe Game Overview
Tic-Tac-Toe is a game played on a 3×3 square grid where two players take turns placing X or O. The player who manages to place three of their marks in a row, column, or diagonal wins the game.
Step 1: Environment Setup
To follow this tutorial, you need to install the necessary packages. Here are the main libraries required.
pip install torch numpy matplotlib
Step 2: Game Environment Implementation
First, we implement the Tic-Tac-Toe game environment. We must define the game’s rules and create a class to represent the state.
import numpy as np
class TicTacToe:
def __init__(self):
self.board = np.zeros((3, 3), dtype=int) # 0: empty, 1: X, -1: O
self.current_player = 1 # 1: X's turn, -1: O's turn
def reset(self):
self.board = np.zeros((3, 3), dtype=int)
self.current_player = 1
def make_move(self, row, col):
if self.board[row, col] == 0:
self.board[row, col] = self.current_player
self.current_player *= -1
def check_winner(self):
for player in [1, -1]:
for row in range(3):
if np.all(self.board[row, :] == player): # Check rows
return player
for col in range(3):
if np.all(self.board[:, col] == player): # Check columns
return player
if np.all(np.diag(self.board) == player) or np.all(np.diag(np.fliplr(self.board)) == player):
return player
return None if np.any(self.board == 0) else 0 # Game is ongoing
def display(self):
symbols = {1: 'X', -1: 'O', 0: ' '}
for row in self.board:
print("|".join(symbols[x] for x in row))
print("-" * 5)
print("\n")
# Game Test
game = TicTacToe()
game.make_move(0, 0)
game.display()
game.make_move(1, 1)
game.display()
Step 3: Monte Carlo Tree Search (MCTS) Algorithm
MCTS is a method to solve decision-making problems in uncertain situations. Essentially, this algorithm consists of the following four steps:
- Selection: Select a node from the current tree.
- Expansion: Expand possible actions from the selected node.
- Simulation: Play the game from the expanded node to obtain the result.
- Backpropagation: Update the information for the parent node with the result.
MCTS Class Implementation
import random
from collections import defaultdict
class MCTSNode:
def __init__(self, state, parent=None):
self.state = state # Current game state
self.parent = parent
self.children = [] # Child nodes
self.wins = 0 # Number of wins
self.visits = 0 # Number of visits
def ucb1(self):
if self.visits == 0:
return float('inf') # Select nodes that have not been visited before
return self.wins / self.visits + np.sqrt(2 * np.log(self.parent.visits) / self.visits)
class MCTS:
def __init__(self, iterations):
self.iterations = iterations
def search(self, game):
root = MCTSNode(state=game)
for _ in range(self.iterations):
node = self.select(root)
winner = self.simulate(node.state)
self.backpropagate(node, winner)
return max(root.children, key=lambda child: child.visits).state
def select(self, node):
while node.children:
node = max(node.children, key=lambda child: child.ucb1())
if node.visits > 0:
for action in self.get_valid_moves(node.state):
child_state = node.state.copy()
child_state.make_move(action[0], action[1])
child_node = MCTSNode(state=child_state, parent=node)
node.children.append(child_node)
return random.choice(node.children) if node.children else node
def simulate(self, state):
current_player = state.current_player
while True:
winner = state.check_winner()
if winner is not None:
return winner
valid_moves = self.get_valid_moves(state)
move = random.choice(valid_moves)
state.make_move(move[0], move[1])
def backpropagate(self, node, winner):
while node is not None:
node.visits += 1
if winner == 1: # X wins
node.wins += 1
node = node.parent
def get_valid_moves(self, state):
return [(row, col) for row in range(3) for col in range(3) if state.board[row, col] == 0]
# MCTS Usage Example
mcts = MCTS(iterations=1000)
move = mcts.search(game)
print("AI's choice:", move)
Step 4: Implementing the Game Between AI and User
Now, let’s implement a game between the user and the AI using the completed MCTS.
def play_game():
game = TicTacToe()
while True:
game.display()
if game.current_player == 1: # User's turn
row, col = map(int, input("Enter the row and column numbers (0, 1, or 2): ").split())
game.make_move(row, col)
else: # AI's turn
print("AI is choosing...")
move = mcts.search(game)
game.make_move(move[0], move[1])
print(f"AI chose the position: {move}")
winner = game.check_winner()
if winner is not None:
game.display()
if winner == 1:
print("Congratulations! You won!")
elif winner == -1:
print("AI won!")
else:
print("It's a draw!")
break
play_game()
Conclusion
In this tutorial, we explored the basic guidelines of deep learning and PyTorch. The process of implementing a simple Tic-Tac-Toe AI using Monte Carlo Tree Search can be technically challenging, but it was an extremely interesting experience in the end. We hope to move forward and develop a more complete AI using various algorithms and techniques.