{"id":36063,"date":"2024-11-01T09:45:25","date_gmt":"2024-11-01T09:45:25","guid":{"rendered":"http:\/\/atmokpo.com\/w\/?p=36063"},"modified":"2024-11-01T09:45:25","modified_gmt":"2024-11-01T09:45:25","slug":"using-hugging-face-transformers-course-bert-cls-tokens-document-vector-representation-function-and-bert-preprocessing","status":"publish","type":"post","link":"https:\/\/atmokpo.com\/w\/36063\/","title":{"rendered":"Using Hugging Face Transformers Course, BERT [CLS] Token&#8217;s Document Vector Representation Function and BERT Preprocessing"},"content":{"rendered":"<p><body><\/p>\n<p>In the field of deep learning and natural language processing, BERT (Bidirectional Encoder Representations from Transformers) has achieved innovative results and has become an essential tool among many researchers and developers. In this course, we will explain in detail the document vector representation function using the [CLS] token based on the BERT model and the BERT preprocessing methods utilizing the Hugging Face library.<\/p>\n<h2>1. What is BERT?<\/h2>\n<p>BERT is a natural language processing (NLP) model announced by Google in 2018, based on the Transformer architecture. BERT adopts a method of learning the relationships between the words of input sentences bidirectionally, enabling a richer expression of the words&#8217; meanings. As a result, BERT demonstrates outstanding performance in various natural language processing tasks.<\/p>\n<h2>2. Characteristics of BERT<\/h2>\n<ul>\n<li><strong>Bidirectionality:<\/strong> BERT reads the sentence from left to right and from right to left, thereby understanding the context of the words.<\/li>\n<li><strong>Large-scale Pre-training:<\/strong> BERT learns various linguistic patterns through pre-training on a massive amount of data.<\/li>\n<li><strong>[CLS] Token:<\/strong> The input sequence of BERT starts with a special token called [CLS], and the vector of this token represents the high-level representation of the entire document.<\/li>\n<\/ul>\n<h2>3. BERT Preprocessing Steps<\/h2>\n<p>To use BERT, the input data must be appropriately preprocessed. The data preprocessing process is a step that transforms the input data into a format that the BERT model can understand. Here, we will explain the basic steps of BERT preprocessing.<\/p>\n<h3>3.1. Input Sequence Processing<\/h3>\n<p>The data to be input into the BERT model is preprocessed in the following steps:<\/p>\n<ol>\n<li>Text Tokenization: The BERT tokenizer is used to split the input text into tokens.<\/li>\n<li>Index Transformation: Each token is converted into a unique index.<\/li>\n<li>Attention Mask Generation: An attention mask is created to distinguish whether each token in the input sequence is actual data or padding.<\/li>\n<li>Segment ID Generation: If the input consists of multiple sentences, an ID is generated to indicate which segment each sentence belongs to.<\/li>\n<\/ol>\n<h3>3.2. BERT Tokenization Example<\/h3>\n<p>The following Python code demonstrates how to preprocess BERT input sequences using Hugging Face&#8217;s Transformers library:<\/p>\n<pre><code>\nimport torch\nfrom transformers import BertTokenizer\n\n# Initialize BERT tokenizer\ntokenizer = BertTokenizer.from_pretrained('bert-base-uncased')\n\n# Example sentence\ntext = \"Deep learning is a field of artificial intelligence.\"\n\n# Text tokenization\ninputs = tokenizer(text, return_tensors=\"pt\")\n\n# Output results\nprint(\"Input IDs:\", inputs['input_ids'])\nprint(\"Attention Mask:\", inputs['attention_mask'])\n    <\/code><\/pre>\n<h2>4. Document Vector Representation Using the [CLS] Token<\/h2>\n<p>The vector representation of the [CLS] token in BERT&#8217;s output represents the high-level meaning of the input document. This vector is commonly used in tasks such as document classification and sentiment analysis. Predictions can be made based on the understanding of the entire document using the vector of the [CLS] token.<\/p>\n<h3>4.1. Example Using BERT Model<\/h3>\n<p>The following is an example of extracting the vector representation of the [CLS] token using the BERT model:<\/p>\n<pre><code>\nfrom transformers import BertModel\n\n# Initialize BERT model\nmodel = BertModel.from_pretrained('bert-base-uncased')\n\n# Pass input data to the model\nwith torch.no_grad():\n    outputs = model(**inputs)\n\n# Extract vector of [CLS] token\ncls_vector = outputs.last_hidden_state[0][0]\n\n# Output results\nprint(\"CLS Vector:\", cls_vector)\n    <\/code><\/pre>\n<h2>5. Complete Code Example<\/h2>\n<p>We will comprehensively look at the process of extracting the preprocessing and vector representation of the [CLS] token using the whole code example:<\/p>\n<pre><code>\nimport torch\nfrom transformers import BertTokenizer, BertModel\n\n# Initialize BERT tokenizer and model\ntokenizer = BertTokenizer.from_pretrained('bert-base-uncased')\nmodel = BertModel.from_pretrained('bert-base-uncased')\n\n# Example sentence\ntext = \"Deep learning is a field of artificial intelligence.\"\n\n# Text tokenization\ninputs = tokenizer(text, return_tensors=\"pt\")\n\n# Pass to the model for prediction\nwith torch.no_grad():\n    outputs = model(**inputs)\n\n# Extract vector of [CLS] token\ncls_vector = outputs.last_hidden_state[0][0]\n\nprint(\"Input IDs:\", inputs['input_ids'])\nprint(\"Attention Mask:\", inputs['attention_mask'])\nprint(\"CLS Vector:\", cls_vector)\n    <\/code><\/pre>\n<h2>6. Conclusion<\/h2>\n<p>In this course, we explored how to extract the preprocessing steps and vector representations of the [CLS] token using the Hugging Face library and BERT. Utilizing BERT allows for effective representation of the high-level meaning of documents, which can yield competitive performance in various natural language processing tasks. We hope you will enhance your skills through more practical applications and exercises using BERT in the future.<\/p>\n<footer>\n<p>If you found this article helpful, please share this blog!<\/p>\n<\/footer>\n<p><\/body><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the field of deep learning and natural language processing, BERT (Bidirectional Encoder Representations from Transformers) has achieved innovative results and has become an essential tool among many researchers and developers. In this course, we will explain in detail the document vector representation function using the [CLS] token based on the BERT model and the &hellip; <a href=\"https:\/\/atmokpo.com\/w\/36063\/\" class=\"more-link\">\ub354 \ubcf4\uae30<span class=\"screen-reader-text\"> &#8220;Using Hugging Face Transformers Course, BERT [CLS] Token&#8217;s Document Vector Representation Function and BERT Preprocessing&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[108],"tags":[],"class_list":["post-36063","post","type-post","status-publish","format-standard","hentry","category---en"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Using Hugging Face Transformers Course, BERT [CLS] Token&#039;s Document Vector Representation Function and BERT Preprocessing - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/atmokpo.com\/w\/36063\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Using Hugging Face Transformers Course, BERT [CLS] Token&#039;s Document Vector Representation Function and BERT Preprocessing - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"og:description\" content=\"In the field of deep learning and natural language processing, BERT (Bidirectional Encoder Representations from Transformers) has achieved innovative results and has become an essential tool among many researchers and developers. In this course, we will explain in detail the document vector representation function using the [CLS] token based on the BERT model and the &hellip; \ub354 \ubcf4\uae30 &quot;Using Hugging Face Transformers Course, BERT [CLS] Token&#8217;s Document Vector Representation Function and BERT Preprocessing&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/atmokpo.com\/w\/36063\/\" \/>\n<meta property=\"og:site_name\" content=\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-01T09:45:25+00:00\" \/>\n<meta name=\"author\" content=\"root\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:site\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"root\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"3\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/atmokpo.com\/w\/36063\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36063\/\"},\"author\":{\"name\":\"root\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\"},\"headline\":\"Using Hugging Face Transformers Course, BERT [CLS] Token&#8217;s Document Vector Representation Function and BERT Preprocessing\",\"datePublished\":\"2024-11-01T09:45:25+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36063\/\"},\"wordCount\":536,\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"articleSection\":[\"Using Hugging Face\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/atmokpo.com\/w\/36063\/\",\"url\":\"https:\/\/atmokpo.com\/w\/36063\/\",\"name\":\"Using Hugging Face Transformers Course, BERT [CLS] Token's Document Vector Representation Function and BERT Preprocessing - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#website\"},\"datePublished\":\"2024-11-01T09:45:25+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36063\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/atmokpo.com\/w\/36063\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/atmokpo.com\/w\/36063\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\ud648\",\"item\":\"https:\/\/atmokpo.com\/w\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Using Hugging Face Transformers Course, BERT [CLS] Token&#8217;s Document Vector Representation Function and BERT Preprocessing\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/atmokpo.com\/w\/#website\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/atmokpo.com\/w\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"contentUrl\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"width\":400,\"height\":400,\"caption\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\"},\"image\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/bebubo4\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\",\"name\":\"root\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"caption\":\"root\"},\"sameAs\":[\"http:\/\/atmokpo.com\/w\"],\"url\":\"https:\/\/atmokpo.com\/w\/author\/root\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Using Hugging Face Transformers Course, BERT [CLS] Token's Document Vector Representation Function and BERT Preprocessing - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/atmokpo.com\/w\/36063\/","og_locale":"ko_KR","og_type":"article","og_title":"Using Hugging Face Transformers Course, BERT [CLS] Token's Document Vector Representation Function and BERT Preprocessing - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","og_description":"In the field of deep learning and natural language processing, BERT (Bidirectional Encoder Representations from Transformers) has achieved innovative results and has become an essential tool among many researchers and developers. In this course, we will explain in detail the document vector representation function using the [CLS] token based on the BERT model and the &hellip; \ub354 \ubcf4\uae30 \"Using Hugging Face Transformers Course, BERT [CLS] Token&#8217;s Document Vector Representation Function and BERT Preprocessing\"","og_url":"https:\/\/atmokpo.com\/w\/36063\/","og_site_name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","article_published_time":"2024-11-01T09:45:25+00:00","author":"root","twitter_card":"summary_large_image","twitter_creator":"@bebubo4","twitter_site":"@bebubo4","twitter_misc":{"\uae00\uc4f4\uc774":"root","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"3\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/atmokpo.com\/w\/36063\/#article","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/36063\/"},"author":{"name":"root","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7"},"headline":"Using Hugging Face Transformers Course, BERT [CLS] Token&#8217;s Document Vector Representation Function and BERT Preprocessing","datePublished":"2024-11-01T09:45:25+00:00","mainEntityOfPage":{"@id":"https:\/\/atmokpo.com\/w\/36063\/"},"wordCount":536,"publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"articleSection":["Using Hugging Face"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/atmokpo.com\/w\/36063\/","url":"https:\/\/atmokpo.com\/w\/36063\/","name":"Using Hugging Face Transformers Course, BERT [CLS] Token's Document Vector Representation Function and BERT Preprocessing - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/#website"},"datePublished":"2024-11-01T09:45:25+00:00","breadcrumb":{"@id":"https:\/\/atmokpo.com\/w\/36063\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/atmokpo.com\/w\/36063\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/atmokpo.com\/w\/36063\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\ud648","item":"https:\/\/atmokpo.com\/w\/en\/"},{"@type":"ListItem","position":2,"name":"Using Hugging Face Transformers Course, BERT [CLS] Token&#8217;s Document Vector Representation Function and BERT Preprocessing"}]},{"@type":"WebSite","@id":"https:\/\/atmokpo.com\/w\/#website","url":"https:\/\/atmokpo.com\/w\/","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","description":"","publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/atmokpo.com\/w\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/atmokpo.com\/w\/#organization","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","url":"https:\/\/atmokpo.com\/w\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/","url":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","contentUrl":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","width":400,"height":400,"caption":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8"},"image":{"@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/bebubo4"]},{"@type":"Person","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7","name":"root","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","caption":"root"},"sameAs":["http:\/\/atmokpo.com\/w"],"url":"https:\/\/atmokpo.com\/w\/author\/root\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36063","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/comments?post=36063"}],"version-history":[{"count":1,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36063\/revisions"}],"predecessor-version":[{"id":36064,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36063\/revisions\/36064"}],"wp:attachment":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/media?parent=36063"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/categories?post=36063"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/tags?post=36063"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}