Using Hugging Face Transformers, Loading Pre-trained BERT Model for Multi-class Classification

Loading a Pre-trained BERT Model for Multi-class Classification

BERT (Bidirectional Encoder Representations from Transformers) is a natural language processing model proposed by Google that utilizes a bidirectional Transformer architecture for contextual understanding. BERT can be applied to various natural language processing tasks through pre-training and fine-tuning stages. In this tutorial, we will introduce how to load a pre-trained BERT model using the Hugging Face Transformers library to solve a multi-class classification problem.

1. Environment Setup

This tutorial requires the following libraries:

  • transformers
  • torch (PyTorch)
  • numpy
  • pandas

You can install the required libraries using the following command:

!pip install transformers torch numpy pandas

2. Preparing the Data

First, we need to prepare a dataset for the multi-class classification problem. As an example, let’s create a simple dataframe with text and labels.

import pandas as pd

data = {
    'text': [
        'I like natural language processing.',
        'PyTorch and TensorFlow are popular.',
        'Deep learning is a field of machine learning.',
        'Conversational AI is gaining a lot of attention.',
        'Text classification is an important task.'
    ],
    'label': [0, 1, 1, 2, 0]
}

df = pd.DataFrame(data)

3. Data Preprocessing

Prepare the data in the format required by the BERT model. We use the BERT Tokenizer to tokenize the text and generate input IDs and attention masks.

from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Tokenization and generating input IDs and attention masks
def encode_data(text):
    return tokenizer(text, padding='max_length', truncation=True, max_length=128, return_tensors='pt')

encoded_texts = [encode_data(text)['input_ids'] for text in df['text']]
attention_masks = [encode_data(text)['attention_mask'] for text in df['text']]

4. Splitting the Dataset

We split the data into training and validation sets. Here, we will use 80% of the data for training and 20% for validation.

from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(
    df['text'],
    df['label'],
    test_size=0.2,
    random_state=42
)

5. Creating Data Loaders

Using PyTorch’s DataLoader, we create data loaders for batch processing.

import torch
from torch.utils.data import Dataset, DataLoader

class TextDataset(Dataset):
    def __init__(self, texts, labels):
        self.texts = texts
        self.labels = labels

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = encode_data(self.texts[idx])
        return {
            'input_ids': text['input_ids'].squeeze(),
            'attention_mask': text['attention_mask'].squeeze(),
            'labels': torch.tensor(self.labels[idx])
        }

train_dataset = TextDataset(X_train.tolist(), y_train.tolist())
val_dataset = TextDataset(X_val.tolist(), y_val.tolist())

train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=2, shuffle=False)

6. Loading the Model

Load the pre-trained BERT model from Hugging Face’s Transformers library. We will add a classifier here to address the multi-class classification problem.

from transformers import BertForSequenceClassification

model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=3)

7. Training the Model

To train the model, we set up a loss function and optimization algorithm, and create a simple training loop.

from transformers import AdamW
from tqdm import tqdm

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

optimizer = AdamW(model.parameters(), lr=1e-5)

# Model training
model.train()
for epoch in range(3):  # Number of epochs
    for batch in tqdm(train_loader):
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)

        optimizer.zero_grad()
        outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        loss.backward()
        optimizer.step()

    print(f'Epoch {epoch+1} Loss: {loss.item()}')

8. Validation and Performance Evaluation

We evaluate the model’s performance using the validation data. Here we measure the accuracy.

model.eval()
correct = 0
total = 0

with torch.no_grad():
    for batch in val_loader:
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)

        outputs = model(input_ids, attention_mask=attention_mask)
        _, predicted = torch.max(outputs.logits, dim=1)

        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = correct / total
print(f'Accuracy: {accuracy:.2f}') 

9. Conclusion

In this tutorial, we learned how to utilize a pre-trained BERT model for multi-class classification problems using the Hugging Face Transformers library. BERT demonstrates powerful performance, making it applicable to many natural language processing problems you may want to analyze. In real projects, you should achieve optimal results through various experiments and tuning processes. Transformer models are rapidly advancing, so continuous learning is necessary.

If you have any further questions, feel free to ask!