{"id":36071,"date":"2024-11-01T09:45:30","date_gmt":"2024-11-01T09:45:30","guid":{"rendered":"http:\/\/atmokpo.com\/w\/?p=36071"},"modified":"2024-11-01T09:45:30","modified_gmt":"2024-11-01T09:45:30","slug":"hugging-face-transformers-tutorial-bert-classification-fine-tuning","status":"publish","type":"post","link":"https:\/\/atmokpo.com\/w\/36071\/","title":{"rendered":"Hugging Face Transformers Tutorial, BERT Classification Fine-Tuning"},"content":{"rendered":"<p><body><\/p>\n<p>With the advancement of deep learning, many innovations have occurred in the field of Natural Language Processing (NLP). In particular, Hugging Face&#8217;s Transformers library provides several powerful pre-trained models, allowing researchers and developers to easily utilize natural language processing models. This course will detail how to perform text classification using the BERT (Bidirectional Encoder Representations from Transformers) model.<\/p>\n<h2>1. What is BERT?<\/h2>\n<p>BERT is a natural language processing model released by Google, characterized by its &#8216;bidirectional&#8217; feature. This provides very robust capabilities in understanding the context of text. BERT outperforms traditional word embedding techniques as it comprehends context regardless of the position of words when processing text data.<\/p>\n<h2>2. Introduction to Hugging Face Transformers Library<\/h2>\n<p>Hugging Face&#8217;s Transformers library is a Python library that allows easy use of various transformer models, including BERT. It is widely used in the NLP field and allows fine-tuning of pre-trained models for efficient use in specific tasks.<\/p>\n<h3>2.1 Installing<\/h3>\n<p>To install the Hugging Face Transformers library, use the following pip command:<\/p>\n<pre><code>pip install transformers<\/code><\/pre>\n<h2>3. Classifying Text with BERT<\/h2>\n<p>In this course, we will implement a model to classify whether movie reviews from the IMDB dataset are positive or negative. The dataset has the following structure:<\/p>\n<ul>\n<li>Text: Movie review<\/li>\n<li>Label: Positive (1) or Negative (0)<\/li>\n<\/ul>\n<h3>3.1 Preparing the Dataset<\/h3>\n<p>First, we download and preprocess the dataset.<\/p>\n<pre><code>\nimport pandas as pd\nfrom sklearn.model_selection import train_test_split\n\n# Load IMDB dataset\nurl = \"https:\/\/ai.stanford.edu\/~amaas\/data\/sentiment\/aclImdb_v1.tar.gz\"\n# Load and preprocess data\n# Here goes the data loading and preprocessing code. (This is a simple example)\ndata = pd.read_csv(\"imdb_reviews.csv\")\ndata['label'] = data['sentiment'].apply(lambda x: 1 if x == 'positive' else 0)\n\ntrain_texts, val_texts, train_labels, val_labels = train_test_split(data['review'], data['label'], test_size=0.2)\n<\/code><\/pre>\n<h3>3.2 BERT Tokenization<\/h3>\n<p>To convert the text data to fit the BERT model, we use a tokenizer. The tokenizer splits the text and converts it into the model&#8217;s input format.<\/p>\n<pre><code>\nfrom transformers import BertTokenizer\n\n# Initialize BERT tokenizer\ntokenizer = BertTokenizer.from_pretrained('bert-base-uncased')\n\n# Function to convert to BERT input format\ndef encode(texts):\n    return tokenizer(texts.tolist(), padding=True, truncation=True, return_tensors='pt')\n\ntrain_encodings = encode(train_texts)\nval_encodings = encode(val_texts)\n<\/code><\/pre>\n<h3>3.3 Creating the Dataset<\/h3>\n<p>Convert the encodings created by the tokenizer into PyTorch tensors to create the dataset.<\/p>\n<pre><code>\nimport torch\n\nclass IMDbDataset(torch.utils.data.Dataset):\n    def __init__(self, encodings, labels):\n        self.encodings = encodings\n        self.labels = labels\n\n    def __getitem__(self, idx):\n        item = {key: val[idx] for key, val in self.encodings.items()}\n        item['labels'] = torch.tensor(self.labels[idx])\n        return item\n\n    def __len__(self):\n        return len(self.labels)\n\ntrain_dataset = IMDbDataset(train_encodings, train_labels.values)\nval_dataset = IMDbDataset(val_encodings, val_labels.values)\n<\/code><\/pre>\n<h3>4. Defining the BERT Model<\/h3>\n<p>Now, we will load the BERT model provided by Hugging Face&#8217;s Transformers library and fine-tune it for classification tasks.<\/p>\n<pre><code>\nfrom transformers import BertForSequenceClassification\n\n# Load BERT model\nmodel = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)\n<\/code><\/pre>\n<h3>5. Training the Model<\/h3>\n<p>We can use the Trainer API to train the model. This API automatically handles the training loop, making it very convenient.<\/p>\n<pre><code>\nfrom transformers import Trainer, TrainingArguments\n\n# Set up training environment\ntraining_args = TrainingArguments(\n    output_dir='.\/results',\n    num_train_epochs=3,\n    per_device_train_batch_size=8,\n    per_device_eval_batch_size=8,\n    warmup_steps=500,\n    weight_decay=0.01,\n    logging_dir='.\/logs',\n)\n\ntrainer = Trainer(\n    model=model,\n    args=training_args,\n    train_dataset=train_dataset,\n    eval_dataset=val_dataset,\n)\n\n# Start training\ntrainer.train()\n<\/code><\/pre>\n<h3>6. Evaluating the Model<\/h3>\n<p>Evaluate the trained model to check its performance.<\/p>\n<pre><code>\ntrainer.evaluate()\n<\/code><\/pre>\n<h2>Conclusion<\/h2>\n<p>In this course, we learned how to perform text classification using the BERT model through the Hugging Face Transformers library. BERT exhibits excellent performance on various NLP tasks, and utilizing pre-trained models can yield good results even with a small amount of data. I hope you will utilize BERT in various NLP projects in the future.<\/p>\n<h2>References<\/h2>\n<ul>\n<li><a href=\"https:\/\/huggingface.co\/docs\/transformers\/index\">Hugging Face Transformers Documentation<\/a><\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1810.04805\">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding<\/a><\/li>\n<\/ul>\n<p><\/body><\/p>\n","protected":false},"excerpt":{"rendered":"<p>With the advancement of deep learning, many innovations have occurred in the field of Natural Language Processing (NLP). In particular, Hugging Face&#8217;s Transformers library provides several powerful pre-trained models, allowing researchers and developers to easily utilize natural language processing models. This course will detail how to perform text classification using the BERT (Bidirectional Encoder Representations &hellip; <a href=\"https:\/\/atmokpo.com\/w\/36071\/\" class=\"more-link\">\ub354 \ubcf4\uae30<span class=\"screen-reader-text\"> &#8220;Hugging Face Transformers Tutorial, BERT Classification Fine-Tuning&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[108],"tags":[],"class_list":["post-36071","post","type-post","status-publish","format-standard","hentry","category---en"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Hugging Face Transformers Tutorial, BERT Classification Fine-Tuning - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/atmokpo.com\/w\/36071\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Hugging Face Transformers Tutorial, BERT Classification Fine-Tuning - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"og:description\" content=\"With the advancement of deep learning, many innovations have occurred in the field of Natural Language Processing (NLP). In particular, Hugging Face&#8217;s Transformers library provides several powerful pre-trained models, allowing researchers and developers to easily utilize natural language processing models. This course will detail how to perform text classification using the BERT (Bidirectional Encoder Representations &hellip; \ub354 \ubcf4\uae30 &quot;Hugging Face Transformers Tutorial, BERT Classification Fine-Tuning&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/atmokpo.com\/w\/36071\/\" \/>\n<meta property=\"og:site_name\" content=\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-01T09:45:30+00:00\" \/>\n<meta name=\"author\" content=\"root\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:site\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"root\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"3\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/atmokpo.com\/w\/36071\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36071\/\"},\"author\":{\"name\":\"root\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\"},\"headline\":\"Hugging Face Transformers Tutorial, BERT Classification Fine-Tuning\",\"datePublished\":\"2024-11-01T09:45:30+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36071\/\"},\"wordCount\":398,\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"articleSection\":[\"Using Hugging Face\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/atmokpo.com\/w\/36071\/\",\"url\":\"https:\/\/atmokpo.com\/w\/36071\/\",\"name\":\"Hugging Face Transformers Tutorial, BERT Classification Fine-Tuning - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#website\"},\"datePublished\":\"2024-11-01T09:45:30+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36071\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/atmokpo.com\/w\/36071\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/atmokpo.com\/w\/36071\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\ud648\",\"item\":\"https:\/\/atmokpo.com\/w\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Hugging Face Transformers Tutorial, BERT Classification Fine-Tuning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/atmokpo.com\/w\/#website\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/atmokpo.com\/w\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"contentUrl\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"width\":400,\"height\":400,\"caption\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\"},\"image\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/bebubo4\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\",\"name\":\"root\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"caption\":\"root\"},\"sameAs\":[\"http:\/\/atmokpo.com\/w\"],\"url\":\"https:\/\/atmokpo.com\/w\/author\/root\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Hugging Face Transformers Tutorial, BERT Classification Fine-Tuning - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/atmokpo.com\/w\/36071\/","og_locale":"ko_KR","og_type":"article","og_title":"Hugging Face Transformers Tutorial, BERT Classification Fine-Tuning - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","og_description":"With the advancement of deep learning, many innovations have occurred in the field of Natural Language Processing (NLP). In particular, Hugging Face&#8217;s Transformers library provides several powerful pre-trained models, allowing researchers and developers to easily utilize natural language processing models. This course will detail how to perform text classification using the BERT (Bidirectional Encoder Representations &hellip; \ub354 \ubcf4\uae30 \"Hugging Face Transformers Tutorial, BERT Classification Fine-Tuning\"","og_url":"https:\/\/atmokpo.com\/w\/36071\/","og_site_name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","article_published_time":"2024-11-01T09:45:30+00:00","author":"root","twitter_card":"summary_large_image","twitter_creator":"@bebubo4","twitter_site":"@bebubo4","twitter_misc":{"\uae00\uc4f4\uc774":"root","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"3\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/atmokpo.com\/w\/36071\/#article","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/36071\/"},"author":{"name":"root","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7"},"headline":"Hugging Face Transformers Tutorial, BERT Classification Fine-Tuning","datePublished":"2024-11-01T09:45:30+00:00","mainEntityOfPage":{"@id":"https:\/\/atmokpo.com\/w\/36071\/"},"wordCount":398,"publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"articleSection":["Using Hugging Face"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/atmokpo.com\/w\/36071\/","url":"https:\/\/atmokpo.com\/w\/36071\/","name":"Hugging Face Transformers Tutorial, BERT Classification Fine-Tuning - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/#website"},"datePublished":"2024-11-01T09:45:30+00:00","breadcrumb":{"@id":"https:\/\/atmokpo.com\/w\/36071\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/atmokpo.com\/w\/36071\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/atmokpo.com\/w\/36071\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\ud648","item":"https:\/\/atmokpo.com\/w\/en\/"},{"@type":"ListItem","position":2,"name":"Hugging Face Transformers Tutorial, BERT Classification Fine-Tuning"}]},{"@type":"WebSite","@id":"https:\/\/atmokpo.com\/w\/#website","url":"https:\/\/atmokpo.com\/w\/","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","description":"","publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/atmokpo.com\/w\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/atmokpo.com\/w\/#organization","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","url":"https:\/\/atmokpo.com\/w\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/","url":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","contentUrl":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","width":400,"height":400,"caption":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8"},"image":{"@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/bebubo4"]},{"@type":"Person","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7","name":"root","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","caption":"root"},"sameAs":["http:\/\/atmokpo.com\/w"],"url":"https:\/\/atmokpo.com\/w\/author\/root\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36071","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/comments?post=36071"}],"version-history":[{"count":1,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36071\/revisions"}],"predecessor-version":[{"id":36072,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36071\/revisions\/36072"}],"wp:attachment":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/media?parent=36071"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/categories?post=36071"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/tags?post=36071"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}