Using Hugging Face Transformers: Check the Image After Preprocessing

Data preprocessing is very important in the field of deep learning. This is especially true when dealing with high-dimensional data such as image processing. In this course, we will explain in detail how to utilize the transformer models provided by Hugging Face and present ways to directly verify the results through the image preprocessing process. This course will address image classification problems, providing explanations along with the necessary Python code for each step.

1. What is Hugging Face?

Hugging Face provides libraries for natural language processing (NLP) and other machine learning tasks. In particular, it offers implementations of transformer models and a variety of pretrained models, helping researchers and developers to easily utilize these models.

2. Importance of Preprocessing

Data preprocessing has a significant impact on model performance. For image data, the following tasks are included:

Resizing: Consistently resizes images of various dimensions to the same size.
Normalization: Adjusts pixel values to a range of 0 to 1.
Data Augmentation: Allows the model to learn from a more diverse set of data.

3. Environment Setup

For this course, you need to install the following libraries. Run the code below if early installation is required.

pip install transformers torch torchvision

4. Loading and Preprocessing Images

Now, let’s write the code to load and preprocess the images. The code below uses PIL (Python Imaging Library) to process images.

from PIL import Image
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

# Load image
image_path = 'your_image_path.jpg'  # Please specify the image path
image = Image.open(image_path)

# Define preprocessing tasks
transform = transforms.Compose([
    transforms.Resize((256, 256)),        # Resize
    transforms.ToTensor(),                 # Convert to tensor
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))  # Normalize
])

# Apply preprocessing
processed_image = transform(image)

# Visualize the transformed image
plt.imshow(processed_image.permute(1, 2, 0) / 2 + 0.5)  # Transform back to image for visualization
plt.axis('off')
plt.show()

5. Explanation of Preprocessing

Each preprocessing step used in the code above serves the following roles:

transforms.Resize((256, 256)): Resizes the image to 256×256. This is to match the input size of the model.
transforms.ToTensor(): Converts the image to tensor format. This matches the way PyTorch models process data.
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)): Normalizes each channel of the image. This speeds up the learning process and aids in model convergence.

6. Using Hugging Face Transformer Models

You can easily use pretrained transformer models through the Hugging Face library. In the next step, we will look at multimodal models that use both images and text simultaneously.

from transformers import ViTForImageClassification, ViTFeatureExtractor
import torch

# Load model and feature extractor
model_name = 'google/vit-base-patch16-224-in21k'
model = ViTForImageClassification.from_pretrained(model_name)
extractor = ViTFeatureExtractor.from_pretrained(model_name)

# Extract image features
inputs = extractor(images=image, return_tensors="pt")

# Inference
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class_idx = logits.argmax(-1).item()

print(f'Predicted class index: {predicted_class_idx}

7. Verifying Results

In the code above, we classified the image using the ViT (Vision Transformer) model. You can verify the predicted class index through the model’s output. The predicted class index can be interpreted by converting it to a label matching the dataset.

from transformers import AutoTokenizer

# Load classification labels
labels = AutoTokenizer.from_pretrained(model_name).tokenizer  # Or define your own list
predicted_label = labels[predicted_class_idx]

print(f'Predicted label: {predicted_label}

8. Summary and Conclusion

In this course, we explored how to perform image preprocessing and classification tasks using Hugging Face transformers. Data preprocessing is an essential process for creating a good model, and you can expand the provided code to suit your dataset. By utilizing the Hugging Face library, you can easily work with various data and models, enabling a lot of research and development possibilities.

I hope the knowledge you gained in this course leads to a deeper understanding! Feel free to adjust the code to your needs and try various experiments.