{"id":36221,"date":"2024-11-01T09:46:45","date_gmt":"2024-11-01T09:46:45","guid":{"rendered":"http:\/\/atmokpo.com\/w\/?p=36221"},"modified":"2024-11-01T09:46:45","modified_gmt":"2024-11-01T09:46:45","slug":"hugging-face-transformers-course-preprocessing-with-regular-expressions","status":"publish","type":"post","link":"https:\/\/atmokpo.com\/w\/36221\/","title":{"rendered":"Hugging Face Transformers Course, Preprocessing with Regular Expressions"},"content":{"rendered":"<p><body><\/p>\n<p>With the recent advancements in artificial intelligence and machine learning, deep learning technologies are being utilized in many fields. In particular, in the field of Natural Language Processing (NLP), the Hugging Face Transformers library has made it easy to use various models. In this course, we will explain in detail the data preprocessing techniques using regular expressions along with an example of document classification using Hugging Face Transformers.<\/p>\n<h2>1. What is Hugging Face Transformers?<\/h2>\n<p>Hugging Face Transformers is a Python library that provides various deep learning models commonly used in Natural Language Processing (NLP). It includes many of the latest models such as BERT, GPT-2, and T5, designed for users to easily access and utilize. This library is written in Python, making it widely used by data scientists and researchers.<\/p>\n<h2>2. The Importance of Regular Expressions and Preprocessing<\/h2>\n<p>Regular expressions are a very useful tool for finding or transforming specific patterns in strings. By using regular expressions to remove unnecessary characters and perform pattern matching before inputting data into the model, the quality of the data can be improved. Preprocessing directly affects the model&#8217;s performance, so it requires sufficient attention.<\/p>\n<h2>3. Environment Setup<\/h2>\n<p>First, we will install Hugging Face Transformers and the necessary libraries. Run the command below to install the libraries:<\/p>\n<pre><code>pip install transformers pandas re<\/code><\/pre>\n<h2>4. Preparing the Data<\/h2>\n<p>In this example, we will use a simple dataset for sentiment analysis. The data consists of sentences that represent positive and negative sentiments.<\/p>\n<pre><code>import pandas as pd\n\ndata = {\n    \"text\": [\n        \"This product is really good!\",\n        \"Not great. I was very disappointed.\",\n        \"It's not a bad product.\",\n        \"I hope for a refund.\",\n        \"It really exceeded my expectations!\",\n    ],\n    \"label\": [1, 0, 1, 0, 1]  # 1: positive, 0: negative\n}\n\ndf = pd.DataFrame(data)\nprint(df)<\/code><\/pre>\n<h2>5. Data Preprocessing Using Regular Expressions<\/h2>\n<p>Next, we will perform data preprocessing using regular expressions. For example, we will remove special characters or numbers and convert all characters to lowercase.<\/p>\n<pre><code>import re\n\ndef preprocess_text(text):\n    # Convert to lowercase\n    text = text.lower()\n    # Remove special characters and numbers\n    text = re.sub(r'[^a-z\uac00-\ud7a3\\s]', '', text)\n    return text\n\ndf['cleaned_text'] = df['text'].apply(preprocess_text)\nprint(df[['text', 'cleaned_text']])<\/code><\/pre>\n<h2>6. Training the Model Using Hugging Face Transformers<\/h2>\n<p>After preprocessing is complete, we will train a model for sentiment analysis using a transformer model. Below is an example code using the BERT model.<\/p>\n<pre><code>from transformers import BertTokenizer, BertForSequenceClassification\nfrom transformers import Trainer, TrainingArguments\nimport torch\nfrom sklearn.model_selection import train_test_split\n\n# Split the data\nX_train, X_test, y_train, y_test = train_test_split(df['cleaned_text'], df['label'], test_size=0.2, random_state=42)\n\n# Load BERT tokenizer and model\ntokenizer = BertTokenizer.from_pretrained('bert-base-uncased')\nmodel = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)\n\n# Tokenize the data\ntrain_encodings = tokenizer(X_train.tolist(), padding=True, truncation=True, return_tensors='pt')\ntest_encodings = tokenizer(X_test.tolist(), padding=True, truncation=True, return_tensors='pt')\n\n# Define PyTorch dataset class\nclass TextDataset(torch.utils.data.Dataset):\n    def __init__(self, encodings, labels):\n        self.encodings = encodings\n        self.labels = labels\n\n    def __getitem__(self, idx):\n        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}\n        item['labels'] = torch.tensor(self.labels[idx])\n        return item\n\n    def __len__(self):\n        return len(self.labels)\n\n# Prepare the dataset\ntrain_dataset = TextDataset(train_encodings, y_train.tolist())\ntest_dataset = TextDataset(test_encodings, y_test.tolist())\n\n# Set training arguments\ntraining_args = TrainingArguments(\n    output_dir='.\/results',\n    num_train_epochs=3,\n    per_device_train_batch_size=4,\n    per_device_eval_batch_size=4,\n    warmup_steps=500,\n    weight_decay=0.01,\n    logging_dir='.\/logs',\n)\n\n# Define the trainer\ntrainer = Trainer(\n    model=model,\n    args=training_args,\n    train_dataset=train_dataset,\n    eval_dataset=test_dataset,\n)\n\n# Train the model\ntrainer.train()<\/code><\/pre>\n<h2>7. Model Evaluation<\/h2>\n<p>After the model training is complete, you can evaluate the model&#8217;s performance. Calculate the accuracy and visualize the confusion matrix to analyze the model&#8217;s performance.<\/p>\n<pre><code>from sklearn.metrics import accuracy_score, confusion_matrix\nimport matplotlib.pyplot as plt\nimport seaborn as sns\n\n# Perform predictions\npredictions = trainer.predict(test_dataset)\npreds = predictions.predictions.argmax(-1)\n\n# Calculate accuracy\naccuracy = accuracy_score(y_test, preds)\nprint(f'Accuracy: {accuracy:.2f}')\n\n# Visualize the confusion matrix\ncm = confusion_matrix(y_test, preds)\nplt.figure(figsize=(8, 6))\nsns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Negative', 'Positive'], yticklabels=['Negative', 'Positive'])\nplt.ylabel('Actual')\nplt.xlabel('Predicted')\nplt.title('Confusion Matrix')\nplt.show()<\/code><\/pre>\n<h2>8. Conclusion<\/h2>\n<p>In this course, we explained how to build a basic sentiment analysis model using the Hugging Face Transformers library. We saw how improving data quality through regular expression preprocessing can lead to high performance when using transformer models. It would be beneficial to continue working on projects utilizing various natural language processing technologies.<\/p>\n<p>Thank you!<\/p>\n<p><\/body><\/p>\n","protected":false},"excerpt":{"rendered":"<p>With the recent advancements in artificial intelligence and machine learning, deep learning technologies are being utilized in many fields. In particular, in the field of Natural Language Processing (NLP), the Hugging Face Transformers library has made it easy to use various models. In this course, we will explain in detail the data preprocessing techniques using &hellip; <a href=\"https:\/\/atmokpo.com\/w\/36221\/\" class=\"more-link\">\ub354 \ubcf4\uae30<span class=\"screen-reader-text\"> &#8220;Hugging Face Transformers Course, Preprocessing with Regular Expressions&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[108],"tags":[],"class_list":["post-36221","post","type-post","status-publish","format-standard","hentry","category---en"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Hugging Face Transformers Course, Preprocessing with Regular Expressions - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/atmokpo.com\/w\/36221\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Hugging Face Transformers Course, Preprocessing with Regular Expressions - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"og:description\" content=\"With the recent advancements in artificial intelligence and machine learning, deep learning technologies are being utilized in many fields. In particular, in the field of Natural Language Processing (NLP), the Hugging Face Transformers library has made it easy to use various models. In this course, we will explain in detail the data preprocessing techniques using &hellip; \ub354 \ubcf4\uae30 &quot;Hugging Face Transformers Course, Preprocessing with Regular Expressions&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/atmokpo.com\/w\/36221\/\" \/>\n<meta property=\"og:site_name\" content=\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-01T09:46:45+00:00\" \/>\n<meta name=\"author\" content=\"root\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:site\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"root\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"4\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/atmokpo.com\/w\/36221\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36221\/\"},\"author\":{\"name\":\"root\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\"},\"headline\":\"Hugging Face Transformers Course, Preprocessing with Regular Expressions\",\"datePublished\":\"2024-11-01T09:46:45+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36221\/\"},\"wordCount\":390,\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"articleSection\":[\"Using Hugging Face\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/atmokpo.com\/w\/36221\/\",\"url\":\"https:\/\/atmokpo.com\/w\/36221\/\",\"name\":\"Hugging Face Transformers Course, Preprocessing with Regular Expressions - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#website\"},\"datePublished\":\"2024-11-01T09:46:45+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36221\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/atmokpo.com\/w\/36221\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/atmokpo.com\/w\/36221\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\ud648\",\"item\":\"https:\/\/atmokpo.com\/w\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Hugging Face Transformers Course, Preprocessing with Regular Expressions\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/atmokpo.com\/w\/#website\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/atmokpo.com\/w\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"contentUrl\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"width\":400,\"height\":400,\"caption\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\"},\"image\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/bebubo4\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\",\"name\":\"root\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"caption\":\"root\"},\"sameAs\":[\"http:\/\/atmokpo.com\/w\"],\"url\":\"https:\/\/atmokpo.com\/w\/author\/root\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Hugging Face Transformers Course, Preprocessing with Regular Expressions - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/atmokpo.com\/w\/36221\/","og_locale":"ko_KR","og_type":"article","og_title":"Hugging Face Transformers Course, Preprocessing with Regular Expressions - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","og_description":"With the recent advancements in artificial intelligence and machine learning, deep learning technologies are being utilized in many fields. In particular, in the field of Natural Language Processing (NLP), the Hugging Face Transformers library has made it easy to use various models. In this course, we will explain in detail the data preprocessing techniques using &hellip; \ub354 \ubcf4\uae30 \"Hugging Face Transformers Course, Preprocessing with Regular Expressions\"","og_url":"https:\/\/atmokpo.com\/w\/36221\/","og_site_name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","article_published_time":"2024-11-01T09:46:45+00:00","author":"root","twitter_card":"summary_large_image","twitter_creator":"@bebubo4","twitter_site":"@bebubo4","twitter_misc":{"\uae00\uc4f4\uc774":"root","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"4\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/atmokpo.com\/w\/36221\/#article","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/36221\/"},"author":{"name":"root","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7"},"headline":"Hugging Face Transformers Course, Preprocessing with Regular Expressions","datePublished":"2024-11-01T09:46:45+00:00","mainEntityOfPage":{"@id":"https:\/\/atmokpo.com\/w\/36221\/"},"wordCount":390,"publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"articleSection":["Using Hugging Face"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/atmokpo.com\/w\/36221\/","url":"https:\/\/atmokpo.com\/w\/36221\/","name":"Hugging Face Transformers Course, Preprocessing with Regular Expressions - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/#website"},"datePublished":"2024-11-01T09:46:45+00:00","breadcrumb":{"@id":"https:\/\/atmokpo.com\/w\/36221\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/atmokpo.com\/w\/36221\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/atmokpo.com\/w\/36221\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\ud648","item":"https:\/\/atmokpo.com\/w\/en\/"},{"@type":"ListItem","position":2,"name":"Hugging Face Transformers Course, Preprocessing with Regular Expressions"}]},{"@type":"WebSite","@id":"https:\/\/atmokpo.com\/w\/#website","url":"https:\/\/atmokpo.com\/w\/","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","description":"","publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/atmokpo.com\/w\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/atmokpo.com\/w\/#organization","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","url":"https:\/\/atmokpo.com\/w\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/","url":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","contentUrl":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","width":400,"height":400,"caption":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8"},"image":{"@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/bebubo4"]},{"@type":"Person","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7","name":"root","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","caption":"root"},"sameAs":["http:\/\/atmokpo.com\/w"],"url":"https:\/\/atmokpo.com\/w\/author\/root\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36221","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/comments?post=36221"}],"version-history":[{"count":1,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36221\/revisions"}],"predecessor-version":[{"id":36222,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36221\/revisions\/36222"}],"wp:attachment":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/media?parent=36221"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/categories?post=36221"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/tags?post=36221"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}