Deep Learning PyTorch Course, Bidirectional RNN Structure

The advancement of deep learning technology is increasing the demand for processing sequence data. RNN (Recurrent Neural Network) is one of the representative structures for processing such sequence data. In this article, we will take a closer look at the concept of Bidirectional RNN (Bi-directional RNN) and how to implement it using PyTorch.

1. Understanding RNN (Recurrent Neural Network)

RNN is a neural network with a cyclic structure that has the ability to process sequence data (e.g., text, time series). While conventional neural networks receive input once and produce output, RNN remembers previous states and uses them to update the current state. This enables RNN to learn the temporal dependencies of sequences.

1.1. Basic Structure of RNN

The basic structure of RNN is similar to that of a basic neuron, but it has a structure that connects repeatedly over time. Below is a representation of the information flow of a single RNN cell:

     h(t-1)
      |
      v
     (W_hh)
      |
     +---------+
     |         |
    input --> (tanh) --> h(t)
     |         |
     +---------+

In this structure, h(t-1) is the hidden state from the previous time step, and this value is used to calculate the current hidden state h(t). Here, the weight W_hh plays a role in transforming the previous hidden state to the current hidden state.

1.2. Limitations of RNN

RNN faces the problem of “memory limitations” when processing long sequences. In particular, the initial input information can be lost in long sequences. To address this, structures such as LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) have been developed.

2. Bidirectional RNN (Bi-directional RNN)

Bidirectional RNN is a structure that can process sequences in two directions. This means that it can obtain information from both the past (forward) and the future (backward). This structure operates as follows.

2.1. Basic Idea of Bidirectional RNN

Bidirectional RNN uses two RNN layers. One layer processes the input sequence in a forward direction, while the other layer processes the input sequence in a backward direction. Below is a simple illustration of the structure of Bidirectional RNN:

  Forward     Backward
   RNN         RNN
     |           |
    h(t-1)   h(t+1)
       \    +--> (merge) --> h(t)
        \   |
         h(t)

Both the forward RNN and backward RNN process the input simultaneously, and these two hidden states are combined to create the final output. By doing so, RNN can more effectively utilize all the information of the sequence.

3. Implementing Bidirectional RNN with PyTorch

Now, let’s implement a Bidirectional RNN using PyTorch. In this example, we will use a random sequence as data and create a model to predict the next character using the Bidirectional RNN.

3.1. Importing Required Libraries

python
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

3.2. Preparing the Data

The input data will be a simple string, and we will predict the next character of this string. The string data will be transformed into a sequence of characters that appear consecutively. Below is a simple data preparation code:

python
# Setting data and character set
data = "hello deep learning with pytorch"
chars = sorted(list(set(data)))
char_to_index = {ch: ix for ix, ch in enumerate(chars)}
index_to_char = {ix: ch for ix, ch in enumerate(chars)}

# Hyperparameters
seq_length = 5
input_size = len(chars)
hidden_size = 128
num_layers = 2
output_size = len(chars)

# Creating dataset
inputs = []
targets = []
for i in range(len(data) - seq_length):
    inputs.append([char_to_index[ch] for ch in data[i:i + seq_length]])
    targets.append(char_to_index[data[i + seq_length]])

inputs = np.array(inputs)
targets = np.array(targets)

3.3. Defining the Bidirectional RNN Model

Now, let’s define the Bidirectional RNN model. In PyTorch, we can create RNN layers using nn.RNN() or nn.LSTM(). Here, we will use nn.RNN():

python
class BiRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers):
        super(BiRNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        # Bidirectional RNN layer
        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True, bidirectional=True)
        self.fc = nn.Linear(hidden_size * 2, output_size) # Considering both directions, hidden_size * 2
        
    def forward(self, x):
        # Pass data through RNN
        out, _ = self.rnn(x)
        # Get the output of the last time step
        out = out[:, -1, :]   
        
        # Generate the final output
        out = self.fc(out)
        return out

3.4. Training the Model

Having defined the model, let’s implement the training process. We will use PyTorch’s DataLoader to support batch processing and CrossEntropyLoss as the loss function:

python
# Setting hyperparameters
num_epochs = 200
batch_size = 10
learning_rate = 0.01

# Initializing model, loss function, and optimizer
model = BiRNN(input_size, hidden_size, output_size, num_layers)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Training loop
for epoch in range(num_epochs):
    # Convert data to tensor
    x_batch = torch.tensor(inputs, dtype=torch.float32).view(-1, seq_length, input_size)
    y_batch = torch.tensor(targets, dtype=torch.long)

    # Zero gradients
    model.zero_grad()

    # Model prediction
    outputs = model(x_batch)
    
    # Calculate loss
    loss = criterion(outputs, y_batch)
    
    # Backpropagation and weight update
    loss.backward()
    optimizer.step()

    if (epoch+1) % 20 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

3.5. Evaluating the Model

After training the model, we will evaluate it using test data and learn how to predict the next character for an input sequence:

python
def predict_next_char(model, input_seq):
    model.eval()  # Switch to evaluation mode
    with torch.no_grad():
        input_tensor = torch.tensor([[char_to_index[ch] for ch in input_seq]], dtype=torch.float32)
        input_tensor = input_tensor.view(-1, seq_length, input_size)
        output = model(input_tensor)
        _, predicted_index = torch.max(output, 1)
    return index_to_char[predicted_index.item()]

# Prediction test
test_seq = "hello"
predicted_char = predict_next_char(model, test_seq)
print(f'Input sequence: {test_seq} Predicted next character: {predicted_char}')

4. Conclusion

In this article, we thoroughly explored the concept of Bidirectional RNN and how to implement it using PyTorch. Bidirectional RNN is a powerful structure that can utilize information from both the past and the future, making it useful in various sequence data processing tasks such as natural language processing (NLP). This RNN structure can learn the patterns and dependencies of sequence data more effectively.

We will continue to explore various deep learning techniques and architectures, and I hope this article will greatly assist you in your deep learning studies!