Deep Learning PyTorch Course, Recurrent Neural Networks

1. Introduction

Deep learning is a branch of artificial intelligence that uses artificial neural networks to learn patterns from data and make predictions. In this lecture, we will take a closer look at the concept of Recurrent Neural Networks (RNNs) and how to implement RNN models using PyTorch.

2. What is a Recurrent Neural Network?

Recurrent Neural Networks (RNNs) are a type of neural network designed to process sequence data. While typical artificial neural networks have a fixed input size and process data at once, RNNs maintain an internal state that remembers past information and affects the current output. This is particularly useful in fields like Natural Language Processing (NLP).

2.1 Structure of RNN

The basic structure of an RNN is as follows. At each time step, the input \( x_t \) is processed along with the previous hidden state \( h_{t-1} \) to generate a new hidden state \( h_t \). This can be expressed with the following formula:

    h_t = f(W_h * h_{t-1} + W_x * x_t)
    

Here, \( f \) is the activation function, \( W_h \) is the weight of the hidden state, and \( W_x \) is the weight of the input.

2.2 Advantages and Disadvantages of RNN

RNNs are strong at processing sequence data, but they exhibit challenges in learning from long sequences due to issues like vanishing gradients or exploding gradients. To overcome these problems, improved architectures like Long Short-Term Memory (LSTM) or Gated Recurrent Unit (GRU) are used.

3. Implementing RNN Using PyTorch

Now, let’s implement a basic RNN model using PyTorch. In this example, we will tackle a simple natural language processing problem, which is predicting the next word for each word in a sentence.

3.1 Preparing the Data

First, we will import the necessary libraries and prepare the data. For this example, we will use simple sentences.

    import torch
    import torch.nn as nn
    import numpy as np
    from sklearn.preprocessing import OneHotEncoder

    # Data preparation
    sentences = ['I ate rice', 'I like apples', 'I code']
    words = set(' '.join(sentences).split())
    word_to_index = {word: i for i, word in enumerate(words)}
    index_to_word = {i: word for i, word in enumerate(words)}
    

The code above extracts words from the sentences and assigns an index to each word. Now, let’s move forward to convert the words into one-hot encoding.

    # One-hot encoding
    ohe = OneHotEncoder(sparse=False)
    X = []
    y = []

    for sentence in sentences:
        words = sentence.split()
        for i in range(len(words) - 1):
            X.append(word_to_index[words[i]])
            y.append(word_to_index[words[i + 1]])

    X = np.array(X).reshape(-1, 1)
    y = np.array(y).reshape(-1, 1)

    X_onehot = ohe.fit_transform(X)
    y_onehot = ohe.fit_transform(y)
    

3.2 Building the RNN Model

Now let’s build the RNN model. In PyTorch, RNN can be implemented using the nn.RNN class.

    class RNNModel(nn.Module):
        def __init__(self, input_size, hidden_size, output_size):
            super(RNNModel, self).__init__()
            self.hidden_size = hidden_size
            self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
            self.fc = nn.Linear(hidden_size, output_size)

        def forward(self, x):
            h0 = torch.zeros(1, x.size(0), self.hidden_size)
            out, _ = self.rnn(x, h0)
            out = self.fc(out[:, -1, :])
            return out
    

3.3 Training the Model

After creating the model, we will set up the loss function and optimization method, and proceed with the training.

    input_size = len(words)
    hidden_size = 5
    output_size = len(words)

    model = RNNModel(input_size, hidden_size, output_size)
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

    num_epochs = 1000
    for epoch in range(num_epochs):
        model.train()
        optimizer.zero_grad()

        X_tensor = torch.Tensor(X_onehot).view(-1, 1, input_size)
        y_tensor = torch.Tensor(y).long().view(-1)

        outputs = model(X_tensor)
        loss = criterion(outputs, y_tensor)
        loss.backward()
        optimizer.step()

        if (epoch + 1) % 100 == 0:
            print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')
    

3.4 Evaluating the Model

After the training is complete, we will evaluate the model. The following explains the process of predicting the next word for a new input.

    def predict_next_word(model, current_word):
        model.eval()
        with torch.no_grad():
            input_index = word_to_index[current_word]
            input_onehot = ohe.transform([[input_index]])
            input_tensor = torch.Tensor(input_onehot).view(-1, 1, input_size)
            output = model(input_tensor)
            next_word_index = torch.argmax(output).item()
            return index_to_word[next_word_index]

    # Prediction
    next_word = predict_next_word(model, 'I')
    print(f"Next word prediction: {next_word}")
    

4. Conclusion

In this lecture, we explored the concept of Recurrent Neural Networks (RNNs) and how to implement a basic RNN model using PyTorch. RNNs are powerful tools for processing sequence data, but variations like LSTM or GRU may be required for long sequences.

4.1 Future Directions for RNN

RNNs are just the basic form, and recently, more advanced models like Transformer have gained attention in the field of natural language processing. To further advance to strong models, an understanding of various deep learning techniques and architectures is necessary.

4.2 Additional Learning Resources

If you want a deeper understanding of recurrent neural networks, the following resources are recommended:

  • Deep Learning Book: “Deep Learning” by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
  • PyTorch Official Documentation
  • Deep Learning courses on Coursera

5. References

  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
  • Pereyra, G., et al. (2017). Dealing with the curse of dimensionality in RNNs.
  • Sepp Hochreiter, Jürgen Schmidhuber, (1997). Long Short-Term Memory.