huggingface transformers tutorial, convert BART tokenization results to numpy array

Recent advancements in deep learning models have been remarkable in the fields of artificial intelligence and natural language processing. In particular, the Hugging Face Transformers library allows easy access to various natural language processing (NLP) models. In this practical session, we will explore how to use the BART (Bidirectional and Auto-Regressive Transformers) model to process text and convert its results into a NumPy array.

Introduction to BART Model

BART is a model developed by Facebook AI Research that demonstrates excellent performance in text generation and summarization tasks. BART adopts an Encoder-Decoder structure, which is advantageous for understanding input text and generating new text based on it. BART is particularly effective at handling complex structures of text, such as graphical nodes, and performs well across various NLP tasks.

Installing Hugging Face Transformers

To use the Hugging Face Transformers library, you need to install it. You can easily install it using the command below.

pip install transformers

Using BART Tokenizer

To use BART, let’s initialize the tokenizer and explore the process of tokenizing text. The following code is an example of tokenizing basic text using the BART tokenizer.

from transformers import BartTokenizer

# Initialize BART tokenizer
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')

# Input text
text = "Deep learning is the technology of the future."

# Tokenize the text
tokens = tokenizer(text)
print(tokens)

Tokenization Result

In the above code, we used the BART tokenizer to tokenize the input text. The result of the tokenization is expressed in the form of a dictionary and contains various pieces of information.

Output: 
{'input_ids': [0, 10024, 327, 1311, 1346, 231, 1620, 2], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1]}

Converting to NumPy Array

We will convert the tokenization results into a NumPy array to create a suitable format for model input. First, we need to install and import the NumPy library.

import numpy as np

Code Example

The following is an example code that includes the entire process:

import numpy as np
from transformers import BartTokenizer

# Initialize BART tokenizer
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')

# Input text
text = "Deep learning is the technology of the future."

# Tokenize the text
tokens = tokenizer(text)

# Convert input_ids to NumPy array
input_ids_np = np.array(tokens['input_ids'])
attention_mask_np = np.array(tokens['attention_mask'])

# Output
print("Input IDs:", input_ids_np)
print("Attention Mask:", attention_mask_np)

Checking Results

When the above code is executed, you will find that the tokenization results have been converted into NumPy arrays. These arrays can be used as inputs for the model.

Inputting into the Model

You can input the generated numerical arrays into the model to perform text generation or other NLP tasks. For example, you can proceed with summarizing sentences. Below is an example of summarizing the input text using the BART model.

from transformers import BartForConditionalGeneration

# Load BART model
model = BartForConditionalGeneration.from_pretrained('facebook/bart-large')

# Input to the model
output_sequences = model.generate(input_ids_np[np.newaxis, :], attention_mask=attention_mask_np[np.newaxis, :])

# Decode results
summary = tokenizer.decode(output_sequences[0], skip_special_tokens=True)
print("Generated Summary:", summary)

Model Output

You can check the generated summary text as a result of the above code. The BART model extracts meaningful content by generating natural text based on the input sentence.

Conclusion

In this tutorial, we learned how to use the BART model through the Hugging Face Transformers library, use the tokenizer, and convert the results into NumPy arrays. This process helps in establishing a foundational understanding of natural language processing and acquiring basic skills in utilizing models. We hope that you achieve more results by applying approaches like BART in various text processing tasks in the future.

Based on the knowledge gained from this tutorial, we hope you attempt more projects and expand your understanding of deep learning. Thank you.