허깅페이스 트렌스포머 활용강좌, 앙상블 트레이닝에 사용할 사전학습 BERT 불러오기

오늘은 딥러닝 모델 중에서 가장 많이 사용되는 BERT(Bidirectional Encoder Representations from Transformers) 모델을 활용하여 앙상블 트레이닝에 적용하는 방법을 배워보겠습니다. 이 과정에서는 허깅페이스(Hugging Face)의 Transformers 라이브러리를 사용하여 사전학습된 BERT 모델을 로드하고, 이를 기반으로 앙상블 모델을 구축하는 방법을 설명할 것입니다.

어떻게 BERT가 작동하는가?

BERT 모델은 문맥을 이해하기 위해 양방향으로 전이 학습을 수행하는 모델입니다. 즉, 입력 문장의 왼편과 오른편의 단어들이 문맥을 어떻게 형성하는지를 동시에 고려합니다. 이를 통해 단어의 의미를 더욱 깊이 있는 방식으로 이해할 수 있습니다. BERT는 unsupervised 방식으로 대량의 텍스트 데이터에서 사전학습을 진행한 후, 다양한 다운스트림 태스크에 적합하게 fine-tuning 할 수 있습니다.

허깅페이스 라이브러리 설치

허깅페이스 Transformers 라이브러리는 BERT와 같은 다양한 사전학습 모델을 쉽게 사용할 수 있도록 해줍니다. 라이브러리를 설치하기 위해서는 아래의 명령어를 실행하여 설치할 수 있습니다:

pip install transformers torch

사전학습된 BERT 모델 불러오기

이제 Hugging Face Transformers 라이브러리를 사용하여 사전학습된 BERT 모델을 불러오겠습니다. 아래의 코드는 BERT 모델과 토크나이저를 불러오는 간단한 코드입니다.


from transformers import BertTokenizer, BertModel

# BERT 토크나이저와 모델 불러오기
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

# 테스트 문장
text = "Hello, how are you?"
inputs = tokenizer(text, return_tensors='pt')

# BERT 모델에 입력하여 출력 받기
outputs = model(**inputs)
print(outputs)

코드 설명

from transformers import BertTokenizer, BertModel: 허깅페이스의 트랜스포머에서 BERT 토크나이저와 모델을 가져옵니다.
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased'): 사전학습된 BERT 토크나이저를 불러옵니다.
model = BertModel.from_pretrained('bert-base-uncased'): 사전학습된 BERT 모델을 불러옵니다.
inputs = tokenizer(text, return_tensors='pt'): 입력 문장을 토큰화하고 PyTorch 텐서로 변환합니다.
outputs = model(**inputs): 모델에 입력하여 출력을 받습니다.

앙상블 트레이닝 개요

앙상블 학습은 여러 모델의 예측 결과를 조합하여 최종 예측 성능을 향상시키는 방법입니다. 다양한 학습 모델의 장점을 합쳐서 보다 신뢰할 수 있는 예측 결과를 얻을 수 있습니다. 일반적으로 앙상블 학습에는 여러가지 기법이 있을 수 있으며, Bagging과 Boosting 방식이 널리 사용됩니다.

BERT로 앙상블 모델 구성하기

이제 BERT 모델을 사용하여 앙상블 모델을 구성하는 방법을 살펴보겠습니다. 여러 개의 BERT 모델을 학습시키고, 그 예측 값을 합쳐서 최종 예측을 도출해 보겠습니다.

모델 개요

우리는 다음과 같은 구조로 앙상블 모델을 구성할 것입니다:

여러 개의 BERT 모델을 생성하여 훈련
각 모델의 예측 값을 수집
예측 값을 결합하여 최종 예측 생성

데이터셋 준비

우리는 간단한 텍스트 분류 문제를 사용할 것입니다. 예를 들어, 이메일 스팸 필터링 등의 문제를 가정할 수 있습니다. 먼저, 아래와 같이 편리한 데이터셋을 준비합니다.


import pandas as pd

# 예시 데이터셋 생성
data = {'text': ["Free money now", "Hello friend, how are you?", "Limited time offer", "Nice to see you"],
        'label': [1, 0, 1, 0]}  # 1: 스팸, 0: 일반 메일
df = pd.DataFrame(data)

모델 학습 및 앙상블 수행

이제 각각의 BERT 모델을 학습시키도록 하겠습니다. 학습된 모델은 앙상블을 위해 저장됩니다.


from sklearn.model_selection import train_test_split
import torch

# 데이터 분할
train_texts, test_texts, train_labels, test_labels = train_test_split(df['text'], df['label'], test_size=0.2, random_state=42)

# BERT 모델을 위한 데이터 준비
train_encodings = tokenizer(list(train_texts), truncation=True, padding=True, return_tensors='pt')
test_encodings = tokenizer(list(test_texts), truncation=True, padding=True, return_tensors='pt')

class BERTClassifier(torch.nn.Module):
    def __init__(self):
        super(BERTClassifier, self).__init__()
        self.bert = BertModel.from_pretrained('bert-base-uncased')
        self.classifier = torch.nn.Linear(self.bert.config.hidden_size, 2)  # 2 클래스 (스팸, 비스팸)

    def forward(self, input_ids, attention_mask):
        output = self.bert(input_ids, attention_mask=attention_mask)[1]
        return self.classifier(output)

# 모델 및 옵티마이저 선언
model1 = BERTClassifier()
model2 = BERTClassifier()  # 두 번째 모델 예시
optimizer = torch.optim.Adam(model1.parameters(), lr=5e-5)

# 간단한 학습 루프
model1.train()
for epoch in range(3):  # 3번 에폭
    optimizer.zero_grad()
    outputs = model1(input_ids=train_encodings['input_ids'], attention_mask=train_encodings['attention_mask'])
    loss = torch.nn.CrossEntropyLoss()(outputs, torch.tensor(train_labels.values))
    loss.backward()
    optimizer.step()
    print(f'Epoch {epoch + 1}, Loss: {loss.item()}')
    # 동일한 방식으로 model2도 학습

# 모델 저장
torch.save(model1.state_dict(), 'bert_model1.pth')
torch.save(model2.state_dict(), 'bert_model2.pth')

예측 수행 및 앙상블 결과

모델 학습이 완료되면, 각 모델의 예측 결과를 결합하여 최종 예측 값도 생성할 수 있습니다.


# 예측 함수 정의
def predict(model, encodings):
    model.eval()
    with torch.no_grad():
        outputs = model(input_ids=encodings['input_ids'], attention_mask=encodings['attention_mask'])
    return torch.argmax(outputs, dim=1)

# 모델 로드
model1.load_state_dict(torch.load('bert_model1.pth'))
model2.load_state_dict(torch.load('bert_model2.pth'))

# 개별 모델 예측
preds_model1 = predict(model1, test_encodings)
preds_model2 = predict(model2, test_encodings)

# 앙상블 예측
final_preds = (preds_model1 + preds_model2) / 2
final_preds = (final_preds > 0.5).int()  # 0.5를 임계값으로 사용하여 이진 예측
print(f'최종 예측: {final_preds}')

결론

오늘은 허깅페이스 트렌스포머 라이브러리를 사용하여 사전학습된 BERT 모델을 불러오고, 이를 기반으로 간단한 앙상블 트레이닝 방법을 살펴보았습니다. BERT는 복잡한 자연어 처리 태스크에서 훌륭한 성능을 보이며, 앙상블 기법을 사용할 경우 더욱 향상된 성능을 기대할 수 있습니다. 앞으로 다양한 태스크에서도 이러한 기법을 적용하여 더 나은 결과를 만들어 보시기 바랍니다.