Hugging Face Transformers Tutorial, Loading Pre-trained BERT for Ensemble Training

Today, we will learn how to apply the most commonly used BERT (Bidirectional Encoder Representations from Transformers) model in ensemble training. In this process, we will explain how to load a pre-trained BERT model using the Hugging Face Transformers library and build an ensemble model based on it.

How does BERT work?

The BERT model is a model that performs bidirectional transfer learning to understand context. This means that it considers how the words on the left and right of the input sentence form the context simultaneously. Through this, it can understand the meaning of words in a deeper way. BERT is pre-trained in an unsupervised manner on large amounts of text data and can be fine-tuned for various downstream tasks.

Installing the Hugging Face Library

The Hugging Face Transformers library makes it easy to use various pre-trained models like BERT. To install the library, you can run the command below:

pip install transformers torch

Loading a Pre-trained BERT Model

Now we will load a pre-trained BERT model using the Hugging Face Transformers library. The code below is a simple example that loads the BERT model and its tokenizer.


from transformers import BertTokenizer, BertModel

# Loading BERT tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

# Test sentence
text = "Hello, how are you?"
inputs = tokenizer(text, return_tensors='pt')

# Inputting into BERT model and getting output
outputs = model(**inputs)
print(outputs)
    

Code Explanation

  • from transformers import BertTokenizer, BertModel: Imports BERT tokenizer and model from Hugging Face’s Transformers.
  • tokenizer = BertTokenizer.from_pretrained('bert-base-uncased'): Loads a pre-trained BERT tokenizer.
  • model = BertModel.from_pretrained('bert-base-uncased'): Loads a pre-trained BERT model.
  • inputs = tokenizer(text, return_tensors='pt'): Tokenizes the input sentence and converts it to a PyTorch tensor.
  • outputs = model(**inputs): Inputs it to the model and gets the output.

Overview of Ensemble Training

Ensemble learning is a method to improve final prediction performance by combining the prediction results of multiple models. It allows for more reliable prediction results by merging the strengths of various learning models. Generally, there can be several techniques in ensemble learning, with Bagging and Boosting being widely used.

Building Ensemble Models with BERT

Now let’s see how to construct an ensemble model using the BERT model. We will train multiple BERT models and combine their predictions to derive the final prediction.

Model Overview

We will construct the ensemble model with the following structure:

  • Creating and training multiple BERT models
  • Collecting the prediction results of each model
  • Combining the prediction results to generate the final prediction

Preparing the Dataset

We will use a simple text classification problem. For example, we can assume problems like email spam filtering. First, we will prepare a convenient dataset as follows.


import pandas as pd

# Creating an example dataset
data = {'text': ["Free money now", "Hello friend, how are you?", "Limited time offer", "Nice to see you"],
        'label': [1, 0, 1, 0]}  # 1: Spam, 0: Regular mail
df = pd.DataFrame(data)
    

Training the Model and Performing Ensemble

Now let’s proceed to train each BERT model. The trained models will be saved for ensemble purposes.


from sklearn.model_selection import train_test_split
import torch

# Splitting the data
train_texts, test_texts, train_labels, test_labels = train_test_split(df['text'], df['label'], test_size=0.2, random_state=42)

# Preparing data for BERT model
train_encodings = tokenizer(list(train_texts), truncation=True, padding=True, return_tensors='pt')
test_encodings = tokenizer(list(test_texts), truncation=True, padding=True, return_tensors='pt')

class BERTClassifier(torch.nn.Module):
    def __init__(self):
        super(BERTClassifier, self).__init__()
        self.bert = BertModel.from_pretrained('bert-base-uncased')
        self.classifier = torch.nn.Linear(self.bert.config.hidden_size, 2)  # 2 classes (spam, non-spam)

    def forward(self, input_ids, attention_mask):
        output = self.bert(input_ids, attention_mask=attention_mask)[1]
        return self.classifier(output)

# Declaring model and optimizer
model1 = BERTClassifier()
model2 = BERTClassifier()  # Example for the second model
optimizer = torch.optim.Adam(model1.parameters(), lr=5e-5)

# Simple training loop
model1.train()
for epoch in range(3):  # 3 epochs
    optimizer.zero_grad()
    outputs = model1(input_ids=train_encodings['input_ids'], attention_mask=train_encodings['attention_mask'])
    loss = torch.nn.CrossEntropyLoss()(outputs, torch.tensor(train_labels.values))
    loss.backward()
    optimizer.step()
    print(f'Epoch {epoch + 1}, Loss: {loss.item()}')
    # Train model2 in the same way

# Saving the model
torch.save(model1.state_dict(), 'bert_model1.pth')
torch.save(model2.state_dict(), 'bert_model2.pth')
    

Performing Predictions and Ensemble Results

Once model training is complete, we can combine the prediction results of each model to generate the final prediction values.


# Defining the prediction function
def predict(model, encodings):
    model.eval()
    with torch.no_grad():
        outputs = model(input_ids=encodings['input_ids'], attention_mask=encodings['attention_mask'])
    return torch.argmax(outputs, dim=1)

# Loading models
model1.load_state_dict(torch.load('bert_model1.pth'))
model2.load_state_dict(torch.load('bert_model2.pth'))

# Individual model predictions
preds_model1 = predict(model1, test_encodings)
preds_model2 = predict(model2, test_encodings)

# Ensemble prediction
final_preds = (preds_model1 + preds_model2) / 2
final_preds = (final_preds > 0.5).int()  # Using 0.5 as the threshold for binary prediction
print(f'Final Prediction: {final_preds}')
    

Conclusion

Today, we explored how to load a pre-trained BERT model using the Hugging Face Transformers library and examined a simple ensemble training method based on it. BERT shows excellent performance in complex natural language processing tasks, and we can expect even better performance when utilizing ensemble techniques. In the future, consider applying these techniques to various tasks to achieve better results.

References

Using Hugging Face Transformers Course, Source Language M2M100 Tokenization

With the development of deep learning, Natural Language Processing (NLP) has undergone significant changes. In particular, Hugging Face’s transformers library has established itself as a powerful tool for NLP tasks. In this course, we will introduce the multilingual translation and tokenization process using the M2M100 model.

Overview of the M2M100 Model

The M2M100 (Multilingual To Multilingual Translation) model is a multilingual model that supports direct translation between more than 100 languages. Existing translation models used an indirect translation method that translated from the source language to an intermediate language (e.g., English) and then converted to the target language. The M2M100 overcomes this limitation by enabling direct conversion among multiple languages, significantly improving translation efficiency between various language pairs.

What is Tokenization?

Tokenization is the process of dividing the input text into smaller units called tokens. After converting it into a list format, unique indices are assigned to each token. Tokenization is an essential process in NLP and is necessary before inputting text data into the model.

Environment Setup

Before proceeding with the course, you need to install the required libraries. Specifically, we will install transformers and torch. You can install them with the following command:

        pip install transformers torch
    

Loading the Tokenizer

To load the tokenizer for the M2M100 model, we will use the M2M100Tokenizer class provided by the transformers library.

        
import torch
from transformers import M2M100Tokenizer

# Load the tokenizer for the M2M100 model
tokenizer = M2M100Tokenizer.from_pretrained("facebook/m2m100_418M")
        
    

Tokenization Process

Now we are ready to tokenize the text. Below is an example of tokenizing the sentence ‘Hello, everyone!’.

        
# Input text
text = "Hello, everyone!"

# Tokenizing the text
encoded_input = tokenizer(text, return_tensors="pt")

# Output tokens and indices
print("Tokenized tokens:", tokenizer.convert_ids_to_tokens(encoded_input['input_ids'][0]))
print("Token indices:", encoded_input['input_ids'])
        
    

Tokenization Result

The output generated after running the above code shows how the input text has been tokenized and indexed. You can check the actual values of the tokens using the convert_ids_to_tokens method.

Multilingual Translation

Using the tokenized data, we can perform multilingual translation. I will show you an example of translating Korean to English using the M2M100 model.

        
from transformers import M2M100ForConditionalGeneration

# Load the M2M100 model
model = M2M100ForConditionalGeneration.from_pretrained("facebook/m2m100_418M")

# Korean text
text = "Hello, everyone!"
encoded_input = tokenizer(text, return_tensors="pt")

# Translation
translated_tokens = model.generate(**encoded_input, forced_bos_token_id=tokenizer.get_lang_id("en"))

# Translation result
translated_text = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)
print("Translation result:", translated_text[0])
        
    

Interpretation of the Translation Result

You can check if the Korean sentence has been accurately translated into English using the code above. The generate method generates the translated result based on the input token data.

Conclusion

In this course, we explored the multilingual tokenization and translation process using Hugging Face’s M2M100 model. The progress in the field of natural language processing will continue, and using such tools will enable better communication across various languages. I hope that interest and research in NLP and deep learning will continue in the future.

References

Using Hugging Face Transformers Tutorial, Sample Image Dataset

Recently, the Hugging Face library has been widely used in both natural language processing (NLP) and computer vision (CV) fields within artificial intelligence and deep learning. In this article, we will explain how to process image datasets and train models using the Hugging Face Transformers library, and we will explore this in detail with example code.

1. What are Hugging Face and Transformers?

Hugging Face is a library that provides various pre-trained models related to natural language processing, making it easy to use models like BERT, GPT-2, and T5. However, recently, image processing models such as Vision Transformer (ViT) and CLIP have been added, which demonstrate strong performance in computer vision tasks as well.

2. Required Packages and Environment Setup

Before using the Hugging Face Transformers library, you must first install the required packages. You can easily install them with the following code.

pip install transformers torchvision torch

3. Sample Image Dataset

In this tutorial, we will use the CIFAR-10 dataset as sample data. CIFAR-10 consists of 60,000 32×32 color images distributed across 10 classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). This dataset is very suitable for image classification problems.

3.1 Loading the Dataset

We will use the torchvision library in Python to load the CIFAR-10 dataset and split it into training and validation sets.

import torchvision
import torchvision.transforms as transforms

# Set data transformations
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

# Load training and validation datasets
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)

3.2 Data Preprocessing

The loaded dataset needs to undergo a transformation process where the image data is converted to tensor form and normalized. In the code above, the ToTensor() method is used to convert images to tensors, and the Normalize() method performs normalization based on the mean and standard deviation.

4. Building the Vision Transformer (ViT) Model

Now we will build the ViT model to classify images from the CIFAR-10 dataset. The model definition can be easily implemented using the Hugging Face Transformers library.

from transformers import ViTForImageClassification, ViTFeatureExtractor

# Initialize ViT model and feature extractor
feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224')
model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224', num_labels=10)

The above code initializes the Vision Transformer model, where the `num_labels` parameter is used to set the number of classes. Here, we set it to 10, as we have 10 classes.

5. Model Training

To train the model, we need to define the loss function and optimization algorithm. In this case, we will use the CrossEntropyLoss loss function and the Adam optimizer.

import torch.optim as optim

# Define loss function and optimization algorithm
criterion = torch.nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Set device for model training
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Train the model
for epoch in range(10):  # Train for 10 epochs
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)

        # Zero the gradients
        optimizer.zero_grad()

        # Forward + backward + optimize
        outputs = model(inputs).logits
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 2000 == 1999:  # Print every 2000 mini-batches
            print(f"[{epoch + 1}, {i + 1}] loss: {running_loss / 2000:.3f}")
            running_loss = 0.0

The above code represents the process of training the model according to the number of epochs and mini-batches. It calculates the loss for each batch and updates the weights through backpropagation.

6. Model Evaluation

To evaluate the trained model, we will use the test dataset. The method for evaluating the model’s accuracy is as follows.

correct = 0
total = 0

with torch.no_grad():
    for data in testloader:
        images, labels = data
        images, labels = images.to(device), labels.to(device)
        outputs = model(images).logits
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the 10000 test images: {100 * correct / total:.2f}%')

The above code calculates the model’s accuracy on the test dataset. It compares the predicted values against the actual labels for each image to compute the accuracy.

7. Conclusion

In this article, we explored how to process CIFAR-10 data using the Hugging Face Transformers library and how to build a Vision Transformer model to classify images. Utilizing the Hugging Face library allows for easily constructing complex models and optimizing performance with various datasets. We encourage you to continue exploring deep learning use cases with diverse models and datasets in the future.

If you have any questions or need additional information, please feel free to leave a comment.

Hugging Face Transformers Tutorial: Loading Pre-trained Models

1. Introduction

The advancement of deep learning has achieved remarkable results, especially in the field of natural language processing (NLP). At the center of these advancements are
pre-trained models. Hugging Face provides a powerful library called Transformers that makes it easier to use these pre-trained models. In this course, we will learn in detail how to load pre-trained models using Hugging Face’s
Transformers library.

2. What is the Hugging Face Transformers Library?

The Hugging Face Transformers library is a library that provides various natural language processing (NLP) models,
including BERT, GPT, RoBERTa, T5, and several others. With this library, developers can easily load pre-trained language models and
perform various NLP tasks based on them.

3. Environment Setup

Before we get started, we need to install the required libraries. You can install the basic libraries using the command below.

pip install transformers torch

Here, transformers is the Hugging Face library, and torch is the
PyTorch framework. If you want to use TensorFlow instead of PyTorch, you can install TensorFlow.

4. Loading Pre-trained Models

Now let’s load a pre-trained model. For example, we can understand the meaning of text using the BERT model. Below is a way to load the BERT model using Python code.

from transformers import BertTokenizer, BertModel

# Load BERT model's tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

# Example sentence
sentence = "Hugging Face is creating a tool that democratizes AI."

# Tokenize the sentence and convert it to input vectors
inputs = tokenizer(sentence, return_tensors='pt')
outputs = model(**inputs)

# Check the output
print(outputs)

In the code above, the BertTokenizer class converts the input sentence into a
format that the BERT model can understand. The BertModel class loads the actual model and passes
the transformed input through the model to generate the output.

5. Analyzing Output Results

The outputs variable in the code above contains two main pieces of information:

  • last_hidden_state: The last hidden state, showing the vector representation of each token.
  • pooler_output: A vector summarizing the entire input sequence, mainly used for classification tasks.

The vector representation of each token is very useful information for natural language processing. The hidden state outputted for each token can be accessed as below.

# Accessing the last hidden state
hidden_states = outputs.last_hidden_state
print(hidden_states.shape)  # (batch size, sequence length, hidden dimension)

6. Using Various Pre-trained Models

Hugging Face supports several other models in addition to BERT. It provides various models so that users can choose the models suitable for different tasks. The usage of models like GPT-2, RoBERTa,
T5 is quite similar. For example, if you want to use the GPT-2 model, you can load it as follows.

from transformers import GPT2Tokenizer, GPT2Model

# Load GPT-2 model's tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')

# Example sentence
sentence = "Hugging Face has become a leader in NLP."

# Tokenize the sentence and convert it to input vectors
inputs = tokenizer(sentence, return_tensors='pt')
outputs = model(**inputs)

print(outputs)

7. Training Your Own Model

In addition to obtaining pre-trained models, users can also fine-tune models for their datasets. This process involves the following steps:

  1. Data preparation and preprocessing
  2. Loading a pre-trained model
  3. Setting up the loss function and optimizer for training
  4. Training the model

7.1 Data Preparation and Preprocessing

Data can be prepared in a format such as a CSV file, and a series of processes are required to load and preprocess it.

import pandas as pd

# Load the dataset
data = pd.read_csv('data.csv')
print(data.head())  # Check the first 5 rows of the dataset

7.2 Loading a Pre-trained Model

You can load the model in the way explained above.

7.3 Setting Up Loss Function and Optimizer

For model training, the loss function and optimizer need to be set. For example, you can use the
AdamW optimizer and CrossEntropyLoss loss function.

from transformers import AdamW

optimizer = AdamW(model.parameters(), lr=5e-5)  # Set the learning rate
loss_fn = torch.nn.CrossEntropyLoss()

7.4 Training the Model

You can train the model using the preprocessed data along with the configured loss function and optimizer.
Typically, you set the number of epochs and iterate to optimize the model.

for epoch in range(num_epochs):
    model.train()
    outputs = model(**inputs)  # Model's output
    loss = loss_fn(outputs, labels)  # Calculate loss
    loss.backward()  # Backpropagation
    optimizer.step()  # Update weights
    optimizer.zero_grad()  # Reset gradients

8. Conclusion

Through this course, we have learned how to use Hugging Face’s Transformers library to load pre-trained models and perform various tasks based on them. This library serves as a powerful tool in the field of
natural language processing, especially helping to utilize models with a consistent dataset and easy API. Now you are equipped with the ability to use Hugging Face’s Transformers for your own projects.

This article is part of a deep learning course using Hugging Face Transformers. For more courses, please refer to related materials for your study.

huggingface transformers tutorial, classification accuracy

The advancement of deep learning and natural language processing (NLP) is one of the key elements driving today’s technological innovation. While there are several libraries and frameworks available, the Hugging Face Transformers library is particularly designed to be intuitive and user-friendly. This article will discuss how to build a document classification model using Hugging Face’s Transformers and evaluate the model’s performance.

1. What is Hugging Face Transformers?

The Hugging Face Transformers library supports various model architectures and makes it easy to use pre-trained models. Transformers are models that have revolutionized natural language processing, based on the Attention Mechanism. These models are pre-trained on large datasets and can be fine-tuned for specific tasks.

2. Installing Required Libraries

We will install the necessary libraries to use Hugging Face Transformers. The primary libraries we will use are transformers, torch, and datasets. Use the following command to install them:

!pip install transformers torch datasets

3. Preparing the Dataset

We will prepare the dataset for document classification. Here, we will use the AG News dataset. AG News is a dataset for news article classification, which has four classes:

  • World
  • Sports
  • Business
  • Science/Technology

Running the following code will download the dataset and split it into training and testing data.

from datasets import load_dataset

dataset = load_dataset("ag_news")

4. Data Preprocessing

After loading the data, we need to separate the texts and labels and perform the necessary preprocessing. The following code shows the process of checking sample data and labels.

train_texts = dataset['train']['text']
train_labels = dataset['train']['label']

test_texts = dataset['test']['text']
test_labels = dataset['test']['label']

print("Sample news article:", train_texts[0])
print("Label:", train_labels[0])

5. Preparing the Model and Tokenizer

Now, we will load the pre-trained model and tokenizer using the transformers library. Here, we will use the BertForSequenceClassification model.

from transformers import BertTokenizer, BertForSequenceClassification

model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=4)

6. Data Tokenization

We tokenize each text for document classification according to the BERT model. The following code adds padding to facilitate batch processing.

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

7. Training the Model

We use the Trainer class to train the model. The Trainer automatically handles training and evaluation. The following code includes the setup and preparation process for training.

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['test']
)

trainer.train()

8. Evaluating the Model

After training the model, we can measure its performance through the evaluation function. We will use the metrics library to calculate accuracy.

import numpy as np
from sklearn.metrics import accuracy_score

predictions, label_ids, _ = trainer.predict(tokenized_datasets['test'])
preds = np.argmax(predictions, axis=1)

accuracy = accuracy_score(label_ids, preds)
print("Classification accuracy:", accuracy)

9. Conclusion

We learned how to load a dataset and perform text classification using a pre-trained model with Hugging Face Transformers. Through this process, we saw the usefulness of transformer models in natural language processing tasks. Additionally, further performance improvements can be achieved by trying hyperparameter tuning or various models.

10. References

For readers looking for more information and examples, the following resources are recommended: