Deep Learning PyTorch Course, GRU Structure

The advancement of deep learning is based on innovations in various network architectures, including Recurrent Neural Networks (RNN). In particular, the Gated Recurrent Unit (GRU) is a simple yet powerful type of RNN that performs exceptionally well in fields like time series data and Natural Language Processing (NLP). In this content, we will take a detailed look at the structure, operation principles, and code examples using PyTorch for GRU.

1. What is GRU?

GRU is a variant model of recurrent neural networks proposed by Kyunghyun Cho in 2014, which has many similarities with Long Short-Term Memory (LSTM). However, GRU is composed of a simpler structure, has fewer neurons, and allows for easier computations, leading to faster training speeds. GRU uses two gates to control the flow of information: the update gate and the reset gate.

2. Structure of GRU

The structure of GRU is composed as follows:

Input (x): The input vector at the current time step
State (h): The state vector from the previous time step
Update Gate (z): Determines how much of the new information and the existing information to reflect
Reset Gate (r): Determines how much of the previous state to ignore
Candidate State (h~): The candidate state for calculating the new state

3. Mathematical Representation of GRU

The main equations of GRU are as follows:

z_t = σ(W_z * x_t + U_z * h_{t-1})
r_t = σ(W_r * x_t + U_r * h_{t-1})
h~_t = tanh(W_h * x_t + U_h * (r_t * h_{t-1}))
h_t = (1 - z_t) * h_{t-1} + z_t * h~_t

Where:

σ is the sigmoid function
tanh is the hyperbolic tangent function
W and U represent the weight matrices
t denotes the current time step, and t-1 denotes the previous time step

4. Advantages of GRU

GRU has the following advantages:

The system is relatively simple, making experimentation and application easy.
It has fewer required parameters and fast computation speeds.
It delivers performance similar to LSTM across various scenarios.

5. Implementing GRU with PyTorch

Now let’s implement the GRU model using PyTorch. In the example below, we will create a simple time series prediction model.

5.1 Data Preparation

For a quick example, we will use the values of the sine function as time series data. The model will learn to predict the next value based on the previous sequence values.

import numpy as np
import torch
import torch.nn as nn
import matplotlib.pyplot as plt

# Generate time series data
def generate_data(seq_length):
    x = np.linspace(0, 100, seq_length)
    y = np.sin(x) + np.random.normal(scale=0.1, size=seq_length)  # Adding noise
    return y

# Convert data into sequences
def create_sequences(data, seq_length):
    sequences = []
    labels = []
    
    for i in range(len(data) - seq_length):
        sequences.append(data[i:i + seq_length])
        labels.append(data[i + seq_length])
    
    return np.array(sequences), np.array(labels)

# Generate and prepare data
data = generate_data(200)
seq_length = 10
X, y = create_sequences(data, seq_length)

# Check the data
print("X shape:", X.shape)
print("y shape:", y.shape)

5.2 Defining the GRU Model

To define the GRU model, we will create a GRU class that inherits from PyTorch’s nn.Module class.

class GRUModel(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(GRUModel, self).__init__()
        self.gru = nn.GRU(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, 1)
    
    def forward(self, x):
        out, _ = self.gru(x)
        out = self.fc(out[:, -1, :])  # Use only the last output
        return out

# Initialize the model
input_size = 1  # Input data dimension
hidden_size = 16  # Size of the hidden layer in GRU
model = GRUModel(input_size, hidden_size)

5.3 Model Training

To train the model, we will define the loss function and optimization algorithm, and implement the training loop.

# Loss function and optimization algorithm
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Convert data to tensor
X_tensor = torch.FloatTensor(X).unsqueeze(-1)  # (batch_size, seq_length, input_size)
y_tensor = torch.FloatTensor(y).unsqueeze(-1)  # (batch_size, 1)

# Train the model
num_epochs = 200
for epoch in range(num_epochs):
    model.train()
    
    optimizer.zero_grad()
    outputs = model(X_tensor)
    loss = criterion(outputs, y_tensor)
    loss.backward()
    optimizer.step()
    
    if (epoch + 1) % 20 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

5.4 Model Evaluation and Prediction

After training the model, we will visualize the prediction results.

# Evaluate the model
model.eval()
with torch.no_grad():
    predicted = model(X_tensor).numpy()
    
# Visualize prediction results
plt.figure(figsize=(12, 5))
plt.plot(data, label='Original Data')
plt.plot(np.arange(seq_length, len(predicted) + seq_length), predicted, label='Predicted', color='red')
plt.legend()
plt.show()

6. Conclusion

In this tutorial, we explored the basic structure and operational principles of the Gated Recurrent Unit (GRU), and detailed the process of implementing a GRU model using PyTorch. GRU is a model that is simple yet has many potential applications, widely used in areas such as Natural Language Processing and time series prediction.

In the future, we hope to continue research on optimizing deep learning models by utilizing GRU in various ways.

7. References

Cho, K., Merrienboer, B., Gulcehre, C., et al. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation.
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to Sequence Learning with Neural Networks.