{"id":36141,"date":"2024-11-01T09:46:04","date_gmt":"2024-11-01T09:46:04","guid":{"rendered":"http:\/\/atmokpo.com\/w\/?p=36141"},"modified":"2024-11-01T09:46:04","modified_gmt":"2024-11-01T09:46:04","slug":"transformers-tutorial-with-hugging-face-imdb-dataset","status":"publish","type":"post","link":"https:\/\/atmokpo.com\/w\/36141\/","title":{"rendered":"Transformers Tutorial with Hugging Face, IMDB Dataset"},"content":{"rendered":"<p><body><\/p>\n<p>Hello! Today, we will take a detailed look at how to train a sentiment analysis model using the IMDB dataset with Hugging Face&#8217;s Transformers library, which is widely used in the field of natural language processing. We will go through the entire process from data preparation to model training, evaluation, and prediction.<\/p>\n<h2>1. Introduction<\/h2>\n<p>The IMDB dataset is a dataset that contains movie reviews and is widely used for the task of classifying whether a given review is positive (1) or negative (0). This dataset consists of 25,000 reviews, each written in natural language text data. Deep learning models help understand this text data and classify sentiments.<\/p>\n<h2>2. Environment Setup<\/h2>\n<p>First, we will install the necessary libraries and set up the environment. The libraries used with Hugging Face Transformers are <strong>torch<\/strong> and <strong>datasets<\/strong>. The code below shows how to install the required libraries.<\/p>\n<pre><code>!pip install transformers torch datasets<\/code><\/pre>\n<h2>3. Loading Dataset<\/h2>\n<p>We will use the <strong>datasets<\/strong> library to load the IMDB dataset. Execute the following code to load the data.<\/p>\n<pre><code>from datasets import load_dataset\n\ndataset = load_dataset(\"imdb\")\nprint(dataset)<\/code><\/pre>\n<p>The code above loads the IMDB dataset and prints the structure of the dataset. From the output, you can check the size of the training and test data.<\/p>\n<h2>4. Data Preprocessing<\/h2>\n<p>We need to preprocess the text data so that the model can understand it. The typical preprocessing steps are as follows:<\/p>\n<ul>\n<li>Remove unnecessary characters<\/li>\n<li>Convert to lowercase<\/li>\n<li>Tokenization<\/li>\n<\/ul>\n<p>You can use a tokenizer based on the BERT model using the Hugging Face Transformers library. We will set up the tokenizer and preprocess the data with the following code.<\/p>\n<pre><code>from transformers import BertTokenizer\n\ntokenizer = BertTokenizer.from_pretrained(\"bert-base-uncased\")\n\ndef encode_review(review):\n    return tokenizer(review, padding=\"max_length\", truncation=True, max_length=512, return_tensors='pt')['input_ids'][0]\n\n# Preprocess some reviews from the training data\ntrain_encodings = { 'input_ids': [], 'label': [] }\nfor review, label in zip(dataset['train']['text'], dataset['train']['label']):\n    train_encodings['input_ids'].append(encode_review(review))\n    train_encodings['label'].append(label)<\/code><\/pre>\n<h2>5. Splitting Dataset<\/h2>\n<p>To split the training dataset into a training set and a validation set, we load the dataset and use PyTorch&#8217;s DataLoader to divide the data. Please refer to the code below.<\/p>\n<pre><code>import torch\n\nclass IMDBDataset(torch.utils.data.Dataset):\n    def __init__(self, encodings, labels):\n        self.encodings = encodings\n        self.labels = labels\n\n    def __getitem__(self, idx):\n        item = { 'input_ids': self.encodings['input_ids'][idx],\n                 'labels': torch.tensor(self.labels[idx]) }\n        return item\n\n    def __len__(self):\n        return len(self.labels)\n\ntrain_dataset = IMDBDataset(train_encodings, train_encodings['label'])<\/code><\/pre>\n<h2>6. Model Setup<\/h2>\n<p>Now we need to set up the model. We can use the BERT model for transfer learning in sentiment analysis. The code below shows how to load the BERT model.<\/p>\n<pre><code>from transformers import BertForSequenceClassification\n\nmodel = BertForSequenceClassification.from_pretrained(\"bert-base-uncased\", num_labels=2)<\/code><\/pre>\n<h2>7. Training<\/h2>\n<p>To train the model, we need to set up the optimizer and loss function. The code below shows the process of training the model using the Adam optimizer.<\/p>\n<pre><code>from transformers import AdamW\n    from transformers import Trainer, TrainingArguments\n\n    training_args = TrainingArguments(\n        output_dir='.\/results',\n        num_train_epochs=3,\n        per_device_train_batch_size=8,\n        logging_dir='.\/logs',\n    )\n\n    trainer = Trainer(\n        model=model,\n        args=training_args,\n        train_dataset=train_dataset,\n    )\n\n    trainer.train()<\/code><\/pre>\n<h2>8. Evaluation<\/h2>\n<p>You can use the validation set to evaluate the performance of the model. The evaluation metric is set to accuracy.<\/p>\n<pre><code>eval_result = trainer.evaluate()\n    print(eval_result)<\/code><\/pre>\n<h2>9. Prediction<\/h2>\n<p>After training is completed, you can use the model to perform sentiment predictions on new reviews.<\/p>\n<pre><code>def predict_review(review):\n        encoding = encode_review(review)\n        with torch.no_grad():\n            logits = model(torch.tensor(encoding).unsqueeze(0))[0]\n            predicted_label = torch.argmax(logits, dim=-1).item()\n        return predicted_label\n\nsample_review = \"This movie was fantastic! I loved it.\"\npredicted_label = predict_review(sample_review)\nprint(f\"Predicted label for the review: {predicted_label}\") # 1: Positive, 0: Negative<\/code><\/pre>\n<h2>10. Conclusion<\/h2>\n<p>In this tutorial, we explored the entire process of building a movie review sentiment analysis model using the IMDB dataset with Hugging Face Transformers. By going through the stages of loading the dataset, preprocessing, model training, and evaluation, I hope you were able to understand the flow of text classification using deep learning. The Hugging Face library offers powerful features, so be sure to try using it for various NLP tasks.<\/p>\n<p>Thank you!<\/p>\n<p><\/body><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hello! Today, we will take a detailed look at how to train a sentiment analysis model using the IMDB dataset with Hugging Face&#8217;s Transformers library, which is widely used in the field of natural language processing. We will go through the entire process from data preparation to model training, evaluation, and prediction. 1. Introduction The &hellip; <a href=\"https:\/\/atmokpo.com\/w\/36141\/\" class=\"more-link\">\ub354 \ubcf4\uae30<span class=\"screen-reader-text\"> &#8220;Transformers Tutorial with Hugging Face, IMDB Dataset&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[108],"tags":[],"class_list":["post-36141","post","type-post","status-publish","format-standard","hentry","category---en"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Transformers Tutorial with Hugging Face, IMDB Dataset - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/atmokpo.com\/w\/36141\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Transformers Tutorial with Hugging Face, IMDB Dataset - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"og:description\" content=\"Hello! Today, we will take a detailed look at how to train a sentiment analysis model using the IMDB dataset with Hugging Face&#8217;s Transformers library, which is widely used in the field of natural language processing. We will go through the entire process from data preparation to model training, evaluation, and prediction. 1. Introduction The &hellip; \ub354 \ubcf4\uae30 &quot;Transformers Tutorial with Hugging Face, IMDB Dataset&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/atmokpo.com\/w\/36141\/\" \/>\n<meta property=\"og:site_name\" content=\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-01T09:46:04+00:00\" \/>\n<meta name=\"author\" content=\"root\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:site\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"root\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"4\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/atmokpo.com\/w\/36141\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36141\/\"},\"author\":{\"name\":\"root\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\"},\"headline\":\"Transformers Tutorial with Hugging Face, IMDB Dataset\",\"datePublished\":\"2024-11-01T09:46:04+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36141\/\"},\"wordCount\":462,\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"articleSection\":[\"Using Hugging Face\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/atmokpo.com\/w\/36141\/\",\"url\":\"https:\/\/atmokpo.com\/w\/36141\/\",\"name\":\"Transformers Tutorial with Hugging Face, IMDB Dataset - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#website\"},\"datePublished\":\"2024-11-01T09:46:04+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36141\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/atmokpo.com\/w\/36141\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/atmokpo.com\/w\/36141\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\ud648\",\"item\":\"https:\/\/atmokpo.com\/w\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Transformers Tutorial with Hugging Face, IMDB Dataset\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/atmokpo.com\/w\/#website\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/atmokpo.com\/w\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"contentUrl\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"width\":400,\"height\":400,\"caption\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\"},\"image\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/bebubo4\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\",\"name\":\"root\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"caption\":\"root\"},\"sameAs\":[\"http:\/\/atmokpo.com\/w\"],\"url\":\"https:\/\/atmokpo.com\/w\/author\/root\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Transformers Tutorial with Hugging Face, IMDB Dataset - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/atmokpo.com\/w\/36141\/","og_locale":"ko_KR","og_type":"article","og_title":"Transformers Tutorial with Hugging Face, IMDB Dataset - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","og_description":"Hello! Today, we will take a detailed look at how to train a sentiment analysis model using the IMDB dataset with Hugging Face&#8217;s Transformers library, which is widely used in the field of natural language processing. We will go through the entire process from data preparation to model training, evaluation, and prediction. 1. Introduction The &hellip; \ub354 \ubcf4\uae30 \"Transformers Tutorial with Hugging Face, IMDB Dataset\"","og_url":"https:\/\/atmokpo.com\/w\/36141\/","og_site_name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","article_published_time":"2024-11-01T09:46:04+00:00","author":"root","twitter_card":"summary_large_image","twitter_creator":"@bebubo4","twitter_site":"@bebubo4","twitter_misc":{"\uae00\uc4f4\uc774":"root","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"4\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/atmokpo.com\/w\/36141\/#article","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/36141\/"},"author":{"name":"root","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7"},"headline":"Transformers Tutorial with Hugging Face, IMDB Dataset","datePublished":"2024-11-01T09:46:04+00:00","mainEntityOfPage":{"@id":"https:\/\/atmokpo.com\/w\/36141\/"},"wordCount":462,"publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"articleSection":["Using Hugging Face"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/atmokpo.com\/w\/36141\/","url":"https:\/\/atmokpo.com\/w\/36141\/","name":"Transformers Tutorial with Hugging Face, IMDB Dataset - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/#website"},"datePublished":"2024-11-01T09:46:04+00:00","breadcrumb":{"@id":"https:\/\/atmokpo.com\/w\/36141\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/atmokpo.com\/w\/36141\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/atmokpo.com\/w\/36141\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\ud648","item":"https:\/\/atmokpo.com\/w\/en\/"},{"@type":"ListItem","position":2,"name":"Transformers Tutorial with Hugging Face, IMDB Dataset"}]},{"@type":"WebSite","@id":"https:\/\/atmokpo.com\/w\/#website","url":"https:\/\/atmokpo.com\/w\/","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","description":"","publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/atmokpo.com\/w\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/atmokpo.com\/w\/#organization","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","url":"https:\/\/atmokpo.com\/w\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/","url":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","contentUrl":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","width":400,"height":400,"caption":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8"},"image":{"@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/bebubo4"]},{"@type":"Person","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7","name":"root","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","caption":"root"},"sameAs":["http:\/\/atmokpo.com\/w"],"url":"https:\/\/atmokpo.com\/w\/author\/root\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36141","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/comments?post=36141"}],"version-history":[{"count":1,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36141\/revisions"}],"predecessor-version":[{"id":36142,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36141\/revisions\/36142"}],"wp:attachment":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/media?parent=36141"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/categories?post=36141"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/tags?post=36141"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}