Using Hugging Face Transformers, Setting TrainingArguments

In the field of deep learning and natural language processing (NLP), the Hugging Face‘s Transformers library is a very useful tool. In this course, we will explain in detail the TrainingArguments class used in Hugging Face’s Trainer API, how to configure it, and provide actual code examples.

What is TrainingArguments?

The TrainingArguments class is used to define various hyperparameters and settings for model training. This class allows you to set multiple arguments that include training, validation, and logging requirements.

Main Parameters of TrainingArguments

  • output_dir: The directory path where model checkpoints will be saved.
  • num_train_epochs: Sets how many times to iterate through the entire training dataset.
  • per_device_train_batch_size: The batch size to use per device (e.g., GPU).
  • learning_rate: Sets the learning rate.
  • evaluation_strategy: Sets the evaluation strategy. For example, options like “epoch” or “steps” are available.
  • logging_dir: The directory path where log files will be saved.
  • weight_decay: Applies regularization using weight decay.
  • save_total_limit: Limits the maximum number of checkpoints to be saved.

Setting Up TrainingArguments

Now let’s practically set up the parameters needed for training using TrainingArguments. The example code below describes how to use this class and the role of each parameter.

Python Example Code

from transformers import TrainingArguments

# Create TrainingArguments object
training_args = TrainingArguments(
    output_dir='./results',                       # Directory path to save checkpoints
    num_train_epochs=3,                           # Number of epochs to train
    per_device_train_batch_size=16,               # Batch size to use on each device
    per_device_eval_batch_size=64,                # Batch size to use for evaluation
    learning_rate=2e-5,                           # Learning rate
    evaluation_strategy="epoch",                   # Evaluation strategy
    logging_dir='./logs',                          # Directory to save log files
    weight_decay=0.01,                            # Weight decay
    save_total_limit=2                            # Maximum number of saved checkpoints
)

print(training_args)

Code Explanation

The code above is an example of creating a TrainingArguments object. Let’s take a closer look at each parameter:

  • output_dir='./results': Specifies the folder where the model checkpoints will be saved after training.
  • num_train_epochs=3: Trains the model by iterating through the entire dataset 3 times.
  • per_device_train_batch_size=16: Uses a batch of 16 samples for training on each device.
  • per_device_eval_batch_size=64: Processes 64 samples in a batch for evaluation on each device.
  • learning_rate=2e-5: Sets the learning rate at the start of training.
  • evaluation_strategy="epoch": Configures the model to be evaluated after each epoch ends.
  • logging_dir='./logs': Directory to save training logs.
  • weight_decay=0.01: Applies 1% weight decay to prevent model overfitting.
  • save_total_limit=2: Limits the maximum number of checkpoints being saved to 2.

Integrating TrainingArguments with the Trainer API

After setting the training parameters, you can use the Trainer API to train your model. Below is an example showing how to integrate the Trainer class with TrainingArguments.

from transformers import Trainer, TrainingArguments, AutoModelForSequenceClassification, AutoTokenizer

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Prepare training and evaluation datasets (example is omitted)
train_dataset = ...
eval_dataset = ...

# Create Trainer object
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset
)

# Train the model
trainer.train()

Code Explanation

The code above performs the following steps:

  • Loads the BERT model for classification tasks using AutoModelForSequenceClassification.
  • Also loads the appropriate tokenizer using AutoTokenizer.
  • Declared empty variables as examples to insert the training and evaluation datasets. Actual datasets should be prepared and assigned.
  • Creates a Trainer object, which takes the model, training arguments, training dataset, and evaluation dataset.
  • Finally, calls trainer.train() to start the model training.

Common Configurations for TrainingArguments

Though there are various arguments in TrainingArguments, let’s look at a few commonly used configurations:

1. Gradient Accumulation

If you encounter memory limitations that make it difficult to train with large batches during model training, you can use gradient accumulation. For example, if the batch size is set to 32 and you accumulate gradients over 4 batches, the total effective batch size will be 128.

training_args = TrainingArguments(
    per_device_train_batch_size=8,
    gradient_accumulation_steps=4,  # Accumulate gradients over 4 batches
)

2. Mixed Precision Training

If your GPU supports Mixed Precision Training, it can accelerate training and reduce memory usage. In this case, you can add the fp16=True setting.

training_args = TrainingArguments(
    fp16=True,  # Mixed precision training
)

3. Early Stopping

You can configure early stopping to prevent unnecessary training if there is no improvement in performance. This should be combined with EarlyStoppingCallback.

from transformers import EarlyStoppingCallback

trainer = Trainer(
    ...
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],  # Stop if no improvement for 3 epochs
)

Conclusion

In this course, we thoroughly explained how to set up the TrainingArguments class in Hugging Face’s Transformers library. You can optimize model training through various hyperparameters.

To train deep learning models more effectively, it is important to make good use of the various parameters in TrainingArguments. We hope you find the optimal hyperparameters through experimentation, continuously improving the model’s performance.

If you have any further questions or would like to know more, please leave a comment, and we will be happy to respond.

© 2023 Hugging Face Transformers Course