Deep Learning PyTorch Course, RNN, LSTM, GRU Performance Comparison

Deep learning has become an essential technology in the fields of data science and artificial intelligence today.
In this course, we will discuss in depth the key artificial neural network structures for processing sequence data: RNN (Recurrent Neural Network), LSTM (Long Short-Term Memory), and GRU (Gated Recurrent Unit), and compare the performance of each model.

1. Understanding RNN (Recurrent Neural Network)

RNN is a type of neural network designed to process sequentially input data. Unlike traditional neural networks, RNN can learn the temporal dependencies of sequence data by using the previous output state as the current input.

1.1. RNN Structure

The basic structure of an RNN is as follows:


    h_t = f(W_hh * h_{t-1} + W_xh * x_t)

Here, h_t is the current state, h_{t-1} is the previous state, x_t is the current input, W_hh and W_xh are weight parameters, and f is the activation function.

1.2. Limitations of RNN

RNN struggles to solve the long-term dependency problem. This is because RNN finds it difficult to remember information that occurred a long time ago in long sequences.

2. Introduction to LSTM (Long Short-Term Memory)

LSTM is a structure devised to overcome the limitations of RNN, demonstrating strong performance in learning long sequence data.

2.1. LSTM Structure

LSTM performs the role of selectively remembering and forgetting information through cell states and gate mechanisms. The basic equations for LSTM are as follows:


    f_t = σ(W_f * [h_{t-1}, x_t] + b_f)  // Forget gate
    i_t = σ(W_i * [h_{t-1}, x_t] + b_i)  // Input gate
    o_t = σ(W_o * [h_{t-1}, x_t] + b_o)  // Output gate
    C_t = f_t * C_{t-1} + i_t * tanh(W_c * [h_{t-1}, x_t] + b_c)  // Cell state update
    h_t = o_t * tanh(C_t)  // Final output

2.2. Advantages of LSTM

LSTM can maintain the flow of information smoothly, even in long sequences, and is a powerful tool for improving the performance of deep learning models.

3. Comparison of GRU (Gated Recurrent Unit)

GRU is a simplified model of LSTM that achieves similar performance with fewer parameters.

3.1. GRU Structure


    z_t = σ(W_z * [h_{t-1}, x_t] + b_z)  // Update gate
    r_t = σ(W_r * [h_{t-1}, x_t] + b_r)  // Reset gate
    h_t = (1 - z_t) * h_{t-1} + z_t * tanh(W_h * [r_t * h_{t-1}, x_t] + b_h)  // Final output

3.2. Advantages of GRU

GRU can be trained with fewer resources while maintaining similar performance to LSTM. Additionally, its relatively simpler structure improves computational efficiency.

4. Practical Comparison of RNN, LSTM, and GRU Performance

Now, we will implement RNN, LSTM, and GRU models using PyTorch and compare their performance. We will proceed with a simple time series prediction problem.

4.1. Data Preparation

The code below generates simple time series data.


import numpy as np
import torch
import torch.nn as nn
import matplotlib.pyplot as plt

# Generate time series data
def create_dataset(seq, time_step=1):
    X, Y = [], []
    for i in range(len(seq) - time_step - 1):
        X.append(seq[i:(i + time_step)])
        Y.append(seq[i + time_step])
    return np.array(X), np.array(Y)

# Time series data
data = np.sin(np.arange(0, 100, 0.1))
time_step = 10
X, Y = create_dataset(data, time_step)

# Convert to PyTorch tensors
X = torch.FloatTensor(X).view(-1, time_step, 1)
Y = torch.FloatTensor(Y)

4.2. Model Implementation

Now we will implement each model. The RNN model is as follows:


class RNNModel(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(RNNModel, self).__init__()
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, 1)

    def forward(self, x):
        out, _ = self.rnn(x)
        out = self.fc(out[:, -1, :])
        return out

# Initialize model
rnn_model = RNNModel(input_size=1, hidden_size=5)

Next, let’s implement the LSTM model:


class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(LSTMModel, self).__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, 1)

    def forward(self, x):
        out, _ = self.lstm(x)
        out = self.fc(out[:, -1, :])
        return out

# Initialize model
lstm_model = LSTMModel(input_size=1, hidden_size=5)

Finally, we will implement the GRU model:


class GRUModel(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(GRUModel, self).__init__()
        self.gru = nn.GRU(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, 1)

    def forward(self, x):
        out, _ = self.gru(x)
        out = self.fc(out[:, -1, :])
        return out

# Initialize model
gru_model = GRUModel(input_size=1, hidden_size=5)

4.3. Model Training

We will train the models and compare their performance.


def train_model(model, X_train, Y_train, num_epochs=100, learning_rate=0.01):
    criterion = nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

    for epoch in range(num_epochs):
        model.train()
        optimizer.zero_grad()
        outputs = model(X_train)
        loss = criterion(outputs, Y_train.view(-1, 1))
        loss.backward()
        optimizer.step()

    return model

# Train models
rnn_trained = train_model(rnn_model, X, Y)
lstm_trained = train_model(lstm_model, X, Y)
gru_trained = train_model(gru_model, X, Y)

4.4. Performance Evaluation

We will evaluate the performance of each model.


def evaluate_model(model, X_test):
    model.eval()
    with torch.no_grad():
        predictions = model(X_test)
    return predictions

# Predictions
rnn_predictions = evaluate_model(rnn_trained, X)
lstm_predictions = evaluate_model(lstm_trained, X)
gru_predictions = evaluate_model(gru_trained, X)

# Visualization of results
plt.figure(figsize=(12, 8))
plt.plot(Y.numpy(), label='True')
plt.plot(rnn_predictions.numpy(), label='RNN Predictions')
plt.plot(lstm_predictions.numpy(), label='LSTM Predictions')
plt.plot(gru_predictions.numpy(), label='GRU Predictions')
plt.legend()
plt.show()

5. Conclusion

In this course, we understood the basic concepts of RNN, LSTM, and GRU, their implementation methods, and compared their performance to grasp the characteristics of these models. RNN is the most basic form, while LSTM and GRU are powerful tools that can be selected based on specific needs. It is important to choose the appropriate model according to the business problem.

References

For further learning, please refer to the following resources: