Hugging Face Transformers Tutorial: Loading Pre-trained Models

1. Introduction

The advancement of deep learning has achieved remarkable results, especially in the field of natural language processing (NLP). At the center of these advancements are
pre-trained models. Hugging Face provides a powerful library called Transformers that makes it easier to use these pre-trained models. In this course, we will learn in detail how to load pre-trained models using Hugging Face’s
Transformers library.

2. What is the Hugging Face Transformers Library?

The Hugging Face Transformers library is a library that provides various natural language processing (NLP) models,
including BERT, GPT, RoBERTa, T5, and several others. With this library, developers can easily load pre-trained language models and
perform various NLP tasks based on them.

3. Environment Setup

Before we get started, we need to install the required libraries. You can install the basic libraries using the command below.

pip install transformers torch

Here, transformers is the Hugging Face library, and torch is the
PyTorch framework. If you want to use TensorFlow instead of PyTorch, you can install TensorFlow.

4. Loading Pre-trained Models

Now let’s load a pre-trained model. For example, we can understand the meaning of text using the BERT model. Below is a way to load the BERT model using Python code.

from transformers import BertTokenizer, BertModel

# Load BERT model's tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

# Example sentence
sentence = "Hugging Face is creating a tool that democratizes AI."

# Tokenize the sentence and convert it to input vectors
inputs = tokenizer(sentence, return_tensors='pt')
outputs = model(**inputs)

# Check the output
print(outputs)

In the code above, the BertTokenizer class converts the input sentence into a
format that the BERT model can understand. The BertModel class loads the actual model and passes
the transformed input through the model to generate the output.

5. Analyzing Output Results

The outputs variable in the code above contains two main pieces of information:

  • last_hidden_state: The last hidden state, showing the vector representation of each token.
  • pooler_output: A vector summarizing the entire input sequence, mainly used for classification tasks.

The vector representation of each token is very useful information for natural language processing. The hidden state outputted for each token can be accessed as below.

# Accessing the last hidden state
hidden_states = outputs.last_hidden_state
print(hidden_states.shape)  # (batch size, sequence length, hidden dimension)

6. Using Various Pre-trained Models

Hugging Face supports several other models in addition to BERT. It provides various models so that users can choose the models suitable for different tasks. The usage of models like GPT-2, RoBERTa,
T5 is quite similar. For example, if you want to use the GPT-2 model, you can load it as follows.

from transformers import GPT2Tokenizer, GPT2Model

# Load GPT-2 model's tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')

# Example sentence
sentence = "Hugging Face has become a leader in NLP."

# Tokenize the sentence and convert it to input vectors
inputs = tokenizer(sentence, return_tensors='pt')
outputs = model(**inputs)

print(outputs)

7. Training Your Own Model

In addition to obtaining pre-trained models, users can also fine-tune models for their datasets. This process involves the following steps:

  1. Data preparation and preprocessing
  2. Loading a pre-trained model
  3. Setting up the loss function and optimizer for training
  4. Training the model

7.1 Data Preparation and Preprocessing

Data can be prepared in a format such as a CSV file, and a series of processes are required to load and preprocess it.

import pandas as pd

# Load the dataset
data = pd.read_csv('data.csv')
print(data.head())  # Check the first 5 rows of the dataset

7.2 Loading a Pre-trained Model

You can load the model in the way explained above.

7.3 Setting Up Loss Function and Optimizer

For model training, the loss function and optimizer need to be set. For example, you can use the
AdamW optimizer and CrossEntropyLoss loss function.

from transformers import AdamW

optimizer = AdamW(model.parameters(), lr=5e-5)  # Set the learning rate
loss_fn = torch.nn.CrossEntropyLoss()

7.4 Training the Model

You can train the model using the preprocessed data along with the configured loss function and optimizer.
Typically, you set the number of epochs and iterate to optimize the model.

for epoch in range(num_epochs):
    model.train()
    outputs = model(**inputs)  # Model's output
    loss = loss_fn(outputs, labels)  # Calculate loss
    loss.backward()  # Backpropagation
    optimizer.step()  # Update weights
    optimizer.zero_grad()  # Reset gradients

8. Conclusion

Through this course, we have learned how to use Hugging Face’s Transformers library to load pre-trained models and perform various tasks based on them. This library serves as a powerful tool in the field of
natural language processing, especially helping to utilize models with a consistent dataset and easy API. Now you are equipped with the ability to use Hugging Face’s Transformers for your own projects.

This article is part of a deep learning course using Hugging Face Transformers. For more courses, please refer to related materials for your study.