{"id":36119,"date":"2024-11-01T09:45:55","date_gmt":"2024-11-01T09:45:55","guid":{"rendered":"http:\/\/atmokpo.com\/w\/?p=36119"},"modified":"2024-11-01T09:45:55","modified_gmt":"2024-11-01T09:45:55","slug":"hands-on-course-on-hugging-face-transformers-loading-mlm-pipeline-with-distilbert","status":"publish","type":"post","link":"https:\/\/atmokpo.com\/w\/36119\/","title":{"rendered":"Hands-on Course on Hugging Face Transformers, Loading MLM Pipeline with DistilBERT"},"content":{"rendered":"<p><body><\/p>\n<article>\n<header>\n<p>October 1, 2023 | Deep Learning<\/p>\n<\/header>\n<section>\n<h2>1. Introduction<\/h2>\n<p>Recently, with the increasing importance of text data in the field of natural language processing, various deep learning models are being developed. Among them, the <strong>Hugging Face<\/strong> <strong>Transformers<\/strong> library is a well-known library used for various NLP tasks. In this course, we will discuss how to build a masked language modeling (Masking Language Modeling, MLM) pipeline using the <strong>DistilBERT<\/strong> model.<\/p>\n<\/section>\n<section>\n<h2>2. Introduction to Hugging Face Transformers<\/h2>\n<p>The Hugging Face Transformers library provides various transfer learning models such as BERT, GPT-2, and T5, showcasing high performance for NLP-related tasks. This library offers an API that helps to easily load and use models.<\/p>\n<ul>\n<li><strong>Easy API:<\/strong> You can easily load models and tokenizers.<\/li>\n<li><strong>Diverse Models:<\/strong> You can use various state-of-the-art models such as BERT, GPT, and T5.<\/li>\n<li><strong>Community Support:<\/strong> An active community and continuous updates are ongoing.<\/li>\n<\/ul>\n<\/section>\n<section>\n<h2>3. What is DistilBERT?<\/h2>\n<p><strong>DistilBERT<\/strong> is a lightweight version of the BERT model, which is 60% faster and has 40% fewer parameters than the original BERT model. Nevertheless, it can be used more effectively in practice while maintaining similar performance.<\/p>\n<p>This model has been successfully used in several NLP tasks, particularly demonstrating excellent performance in tasks related to contextual understanding.<\/p>\n<\/section>\n<section>\n<h2>4. Understanding MLM (Masked Language Modeling) Pipeline<\/h2>\n<p>MLM is a method of predicting unknown words from context. For example, predicting the word that fits in the masked part as in &#8220;I like [MASK].&#8221; This technique is one of the ways BERT and its derivative models are trained.<\/p>\n<p>The main advantage of MLM is that it helps the model learn various patterns of language, which aids in enhancing the performance of natural language understanding.<\/p>\n<\/section>\n<section>\n<h2>5. Loading the DistilBERT Model<\/h2>\n<p>Now let&#8217;s load the DistilBERT model and build a simple MLM pipeline. First, we will install the required libraries.<\/p>\n<pre>\n                <code>pip install transformers torch<\/code>\n            <\/pre>\n<h3>5.1 Loading DistilBERT Model and Tokenizer<\/h3>\n<p>We will load the DistilBERT model and tokenizer using the Hugging Face Transformers library. You can use the following code for this purpose.<\/p>\n<pre>\n                <code>\nfrom transformers import DistilBertTokenizer, DistilBertForMaskedLM\nimport torch\n\n# Load DistilBERT tokenizer and model\ntokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')\nmodel = DistilBertForMaskedLM.from_pretrained('distilbert-base-uncased')\n                <\/code>\n            <\/pre>\n<p>This code loads the DistilBERT model and its corresponding tokenizer. The tokenizer is responsible for converting text into index form.<\/p>\n<\/section>\n<section>\n<h2>6. Implementing the MLM Pipeline<\/h2>\n<p>Now, let&#8217;s implement MLM as an example. First, we will prepare an input sentence, add the `[MASK]` token, and then make a model prediction.<\/p>\n<pre>\n                <code>\n# Input sentence\ninput_text = \"I like [MASK].\"\n\n# Tokenization\ninput_ids = tokenizer.encode(input_text, return_tensors='pt')\n\n# Prediction\nwith torch.no_grad():\n    outputs = model(input_ids)\n    predictions = outputs.logits\n\n# Index of the masked token\nmasked_index = torch.where(input_ids == tokenizer.mask_token_id)[1]\npredicted_index = predictions[0, masked_index].argmax(dim=-1)\n\n# Predicted word\npredicted_token = tokenizer.decode(predicted_index)\nprint(f\"Predicted word: {predicted_token}\")\n                <\/code>\n            <\/pre>\n<p>The code above tokenizes the input sentence and outputs the prediction results through the model. Finally, you can check the predicted word printed out.<\/p>\n<\/section>\n<section>\n<h2>7. Analyzing Results<\/h2>\n<p>In the example above, for the sentence &#8220;I like [MASK].&#8221;, the model outputs the most suitable word in the form of `{predicted_token}`. For example, an output like &#8220;I like apples.&#8221; is expected.<\/p>\n<p>Based on these results, you can evaluate the model&#8217;s performance or think about how it could be applied to real data.<\/p>\n<\/section>\n<section>\n<h2>8. Conclusion<\/h2>\n<p>In this course, we explored the process of implementing the MLM pipeline using the DistilBERT model from the Hugging Face Transformers library. This method will be very helpful in acquiring various data preprocessing and model application techniques required in the field of natural language processing.<\/p>\n<p>We hope you continue your learning on various models and tasks. Thank you!<\/p>\n<\/section>\n<footer>\n<p>\u00a9 2023 Your Blog Name. All rights reserved.<\/p>\n<\/footer>\n<\/article>\n<p><\/body><\/p>\n","protected":false},"excerpt":{"rendered":"<p>October 1, 2023 | Deep Learning 1. Introduction Recently, with the increasing importance of text data in the field of natural language processing, various deep learning models are being developed. Among them, the Hugging Face Transformers library is a well-known library used for various NLP tasks. In this course, we will discuss how to build &hellip; <a href=\"https:\/\/atmokpo.com\/w\/36119\/\" class=\"more-link\">\ub354 \ubcf4\uae30<span class=\"screen-reader-text\"> &#8220;Hands-on Course on Hugging Face Transformers, Loading MLM Pipeline with DistilBERT&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[108],"tags":[],"class_list":["post-36119","post","type-post","status-publish","format-standard","hentry","category---en"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Hands-on Course on Hugging Face Transformers, Loading MLM Pipeline with DistilBERT - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/atmokpo.com\/w\/36119\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Hands-on Course on Hugging Face Transformers, Loading MLM Pipeline with DistilBERT - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"og:description\" content=\"October 1, 2023 | Deep Learning 1. Introduction Recently, with the increasing importance of text data in the field of natural language processing, various deep learning models are being developed. Among them, the Hugging Face Transformers library is a well-known library used for various NLP tasks. In this course, we will discuss how to build &hellip; \ub354 \ubcf4\uae30 &quot;Hands-on Course on Hugging Face Transformers, Loading MLM Pipeline with DistilBERT&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/atmokpo.com\/w\/36119\/\" \/>\n<meta property=\"og:site_name\" content=\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-01T09:45:55+00:00\" \/>\n<meta name=\"author\" content=\"root\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:site\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"root\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"3\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/atmokpo.com\/w\/36119\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36119\/\"},\"author\":{\"name\":\"root\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\"},\"headline\":\"Hands-on Course on Hugging Face Transformers, Loading MLM Pipeline with DistilBERT\",\"datePublished\":\"2024-11-01T09:45:55+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36119\/\"},\"wordCount\":524,\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"articleSection\":[\"Using Hugging Face\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/atmokpo.com\/w\/36119\/\",\"url\":\"https:\/\/atmokpo.com\/w\/36119\/\",\"name\":\"Hands-on Course on Hugging Face Transformers, Loading MLM Pipeline with DistilBERT - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#website\"},\"datePublished\":\"2024-11-01T09:45:55+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36119\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/atmokpo.com\/w\/36119\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/atmokpo.com\/w\/36119\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\ud648\",\"item\":\"https:\/\/atmokpo.com\/w\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Hands-on Course on Hugging Face Transformers, Loading MLM Pipeline with DistilBERT\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/atmokpo.com\/w\/#website\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/atmokpo.com\/w\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"contentUrl\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"width\":400,\"height\":400,\"caption\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\"},\"image\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/bebubo4\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\",\"name\":\"root\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"caption\":\"root\"},\"sameAs\":[\"http:\/\/atmokpo.com\/w\"],\"url\":\"https:\/\/atmokpo.com\/w\/author\/root\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Hands-on Course on Hugging Face Transformers, Loading MLM Pipeline with DistilBERT - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/atmokpo.com\/w\/36119\/","og_locale":"ko_KR","og_type":"article","og_title":"Hands-on Course on Hugging Face Transformers, Loading MLM Pipeline with DistilBERT - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","og_description":"October 1, 2023 | Deep Learning 1. Introduction Recently, with the increasing importance of text data in the field of natural language processing, various deep learning models are being developed. Among them, the Hugging Face Transformers library is a well-known library used for various NLP tasks. In this course, we will discuss how to build &hellip; \ub354 \ubcf4\uae30 \"Hands-on Course on Hugging Face Transformers, Loading MLM Pipeline with DistilBERT\"","og_url":"https:\/\/atmokpo.com\/w\/36119\/","og_site_name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","article_published_time":"2024-11-01T09:45:55+00:00","author":"root","twitter_card":"summary_large_image","twitter_creator":"@bebubo4","twitter_site":"@bebubo4","twitter_misc":{"\uae00\uc4f4\uc774":"root","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"3\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/atmokpo.com\/w\/36119\/#article","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/36119\/"},"author":{"name":"root","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7"},"headline":"Hands-on Course on Hugging Face Transformers, Loading MLM Pipeline with DistilBERT","datePublished":"2024-11-01T09:45:55+00:00","mainEntityOfPage":{"@id":"https:\/\/atmokpo.com\/w\/36119\/"},"wordCount":524,"publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"articleSection":["Using Hugging Face"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/atmokpo.com\/w\/36119\/","url":"https:\/\/atmokpo.com\/w\/36119\/","name":"Hands-on Course on Hugging Face Transformers, Loading MLM Pipeline with DistilBERT - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/#website"},"datePublished":"2024-11-01T09:45:55+00:00","breadcrumb":{"@id":"https:\/\/atmokpo.com\/w\/36119\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/atmokpo.com\/w\/36119\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/atmokpo.com\/w\/36119\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\ud648","item":"https:\/\/atmokpo.com\/w\/en\/"},{"@type":"ListItem","position":2,"name":"Hands-on Course on Hugging Face Transformers, Loading MLM Pipeline with DistilBERT"}]},{"@type":"WebSite","@id":"https:\/\/atmokpo.com\/w\/#website","url":"https:\/\/atmokpo.com\/w\/","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","description":"","publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/atmokpo.com\/w\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/atmokpo.com\/w\/#organization","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","url":"https:\/\/atmokpo.com\/w\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/","url":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","contentUrl":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","width":400,"height":400,"caption":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8"},"image":{"@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/bebubo4"]},{"@type":"Person","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7","name":"root","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","caption":"root"},"sameAs":["http:\/\/atmokpo.com\/w"],"url":"https:\/\/atmokpo.com\/w\/author\/root\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36119","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/comments?post=36119"}],"version-history":[{"count":1,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36119\/revisions"}],"predecessor-version":[{"id":36120,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36119\/revisions\/36120"}],"wp:attachment":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/media?parent=36119"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/categories?post=36119"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/tags?post=36119"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}