Deep Learning PyTorch Course, Recurrent Neural Networks (RNN)

Recurrent Neural Networks (RNNs) are deep learning models with powerful capabilities for processing sequence data. In this course, we will start with the fundamental concepts of RNNs and provide a detailed explanation of how to implement them using PyTorch.

1. Overview of Recurrent Neural Networks (RNNs)

Recurrent Neural Networks (RNNs) are a type of neural network structure designed so that previous information can influence the present information. They are primarily used for processing sequence data (e.g., natural language text, time series data). Traditional neural networks assume the independence of input data, but RNNs can learn dependencies over time.

In the basic structure of an RNN, the input at each time point is fed into the model along with the hidden state from the previous time point. This connectivity allows RNNs to process the flow of information according to the sequence.

2. Structure of RNNs

The basic structure of an RNN is as follows:

Input layer: Takes in sequence data.
Hidden layer: Consists of multiple layers temporally connected.
Output layer: Provides the final prediction results.

RNN Structure

Mathematical Representation: The update of an RNN is expressed as follows:

h_t = f(W_hhh_t-1 + W_xhx_t + b_h)

y_t = W_hyh_t + b_y

3. Limitations of RNNs

Traditional RNNs have limitations in learning dependencies for long sequences. This leads to the vanishing gradient problem, for which various RNN variants such as LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) have been proposed as solutions.

4. Implementing RNN with PyTorch

In this section, we will implement a simple RNN model using PyTorch. We will use the famous IMDB movie review dataset to classify the sentiment of movie reviews as positive or negative.

4.1 Loading and Preprocessing Data

We will use PyTorch’s torchtext library to load and preprocess the IMDB data.


import torch
from torchtext.datasets import IMDB
from torchtext.data import Field, BucketIterator

TEXT = Field(tokenize='spacy', include_lengths=True)
LABEL = Field(dtype=torch.float)

train_data, test_data = IMDB.splits(TEXT, LABEL)
TEXT.build_vocab(train_data, max_size=25000)
LABEL.build_vocab(train_data)

train_iterator, test_iterator = BucketIterator.splits(
    (train_data, test_data), 
    batch_size=64, 
    sort_within_batch=True)

The above code shows the process of loading the IMDB dataset and preprocessing it by defining fields for text and labels.

4.2 Defining the RNN Model

We define the RNN model. We will implement the basic model by inheriting from PyTorch’s nn.Module.


import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, input_dim, emb_dim, hidden_dim, output_dim):
        super().__init__()
        
        self.embedding = nn.Embedding(input_dim, emb_dim)
        self.rnn = nn.RNN(emb_dim, hidden_dim)
        self.fc = nn.Linear(hidden_dim, output_dim)
        self.dropout = nn.Dropout(0.5)
    
    def forward(self, text, text_length):
        embedded = self.dropout(self.embedding(text))
        packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_length)
        packed_output, hidden = self.rnn(packed_embedded)
        output, output_length = nn.utils.rnn.pad_packed_sequence(packed_output)
        return self.fc(hidden.squeeze(0))

This code constructs the RNN model using input dimension, embedding dimension, hidden dimension, and output dimension as arguments. This model consists of an embedding layer, an RNN layer, and an output layer.

4.3 Training the Model

Next, we will look at the process of training the model. We will use binary cross-entropy as the loss function and Adam as the optimization method.


import torch.optim as optim

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = RNN(len(TEXT.vocab), 100, 256, 1)
model = model.to(device)

optimizer = optim.Adam(model.parameters())
criterion = nn.BCEWithLogitsLoss()
criterion = criterion.to(device)

def train(model, iterator, optimizer, criterion):
    model.train()
    epoch_loss = 0
    
    for batch in iterator:
        text, text_length = batch.text
        labels = batch.label
        
        optimizer.zero_grad()
        predictions = model(text, text_length).squeeze(1)
        
        loss = criterion(predictions, labels)
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
        
    return epoch_loss / len(iterator)

The train function trains the model on the given batch of data and returns the loss.

4.4 Evaluating the Model

It is also necessary to define a function to evaluate the model. You can evaluate it using the following code.


def evaluate(model, iterator, criterion):
    model.eval()
    epoch_loss = 0
    
    with torch.no_grad():
        for batch in iterator:
            text, text_length = batch.text
            labels = batch.label
            
            predictions = model(text, text_length).squeeze(1)
            loss = criterion(predictions, labels)
            epoch_loss += loss.item()
        
    return epoch_loss / len(iterator)

The evaluate function assesses the model on the evaluation data and returns the loss value.

4.5 Training and Evaluation Loop

Finally, we write a training and evaluation loop to perform the model training.


N_EPOCHS = 5

for epoch in range(N_EPOCHS):
    train_loss = train(model, train_iterator, optimizer, criterion)
    valid_loss = evaluate(model, test_iterator, criterion)

    print(f'Epoch: {epoch+1:02}, Train Loss: {train_loss:.3f}, Valid Loss: {valid_loss:.3f}')

This loop trains the model according to the given number of epochs and outputs the training loss and validation loss at each epoch.

5. Conclusion

In this course, we learned the basic concepts of Recurrent Neural Networks (RNNs) and how to implement this model using PyTorch. RNNs are effective for processing sequence data, but they have limitations for long sequences. Therefore, it is necessary to consider variant models such as LSTM and GRU. Building on this knowledge, it would also be beneficial to experiment with various sequence data.