Hugging Face Transformers Tutorial, Mobile BERT Inference Using the Last Hidden Layer

In recent years, deep learning-based models have gained popularity and made significant advancements in the field of Natural Language Processing (NLP). Among them, Hugging Face’s transformer models are popular due to their ease of use and performance. In particular, Mobile BERT is a lightweight version of the BERT model, designed to be effectively used in mobile environments. In this course, we will introduce how to extract the output of the last hidden layer using the Mobile BERT model.

1. What is Mobile BERT?

Mobile BERT is a model released by Google as a lightweight version of BERT. The BERT model is based on two main components: Encoder and Decoder, and Mobile BERT has optimized the Encoder to be lightweight so that it can be used on various mobile devices. Mobile BERT has 4 times fewer parameters and applies various techniques to enhance computational efficiency.

2. Installing the Hugging Face Library

To use the Hugging Face transformer model, you first need to install the required libraries. You can use the command below to install the libraries.

pip install transformers torch

3. Loading the Mobile BERT Model

Once the model is installed, you can load the Mobile BERT model. Here is a basic code snippet.

from transformers import MobileBertTokenizer, MobileBertModel

# Load Mobile BERT model and tokenizer
tokenizer = MobileBertTokenizer.from_pretrained('google/mobilebert-uncased')
model = MobileBertModel.from_pretrained('google/mobilebert-uncased')

4. Preprocessing Input Data

The data input to the Mobile BERT model must be in text format and should be converted into the appropriate format through the tokenizer. Here’s how to preprocess the input sentence.

# Define input sentence
input_text = "Try using Hugging Face's transformer!"

# Tokenize the sentence and convert to indices
inputs = tokenizer(input_text, return_tensors='pt')

5. Inference via the Model

Once preprocessing is complete, the data can be input to the Mobile BERT model to obtain the output of the last hidden layer. The output can be computed using the model’s forward method.

with torch.no_grad():
    outputs = model(**inputs)

# The output of the last hidden layer is stored in outputs[0].
last_hidden_states = outputs.last_hidden_state
print(last_hidden_states.shape)  # (batch size, sequence length, hidden size)

6. Interpreting Results

The output of the last hidden layer is returned as a 3-dimensional tensor. The first dimension of the tensor is the batch size, the second dimension is the sequence length (the number of words in the sentence), and the third dimension is the dimension of the hidden layer. For example, if the batch size is 1, the sequence length is 10, and the hidden layer dimension is 768, the shape of the output will be (1, 10, 768).

7. Application Example: Extracting Embedding Vectors

The output of the last hidden layer can be used as the embedding vectors for each word. These vectors can be utilized for various NLP tasks.

# Extract embedding vector of the first word
word_embedding = last_hidden_states[0, 0, :]  # (768,)
print(word_embedding.shape)

8. Summary

In this post, we explored how to utilize the Mobile BERT model using the Hugging Face transformer library. We covered data preprocessing, inference processes, and how to obtain the sequence output of the last hidden layer. These methods can be employed in various NLP applications and are being used in many research and industrial fields.

9. References

You can find more detailed information at the following links: