{"id":36197,"date":"2024-11-01T09:46:33","date_gmt":"2024-11-01T09:46:33","guid":{"rendered":"http:\/\/atmokpo.com\/w\/?p=36197"},"modified":"2024-11-01T09:46:33","modified_gmt":"2024-11-01T09:46:33","slug":"using-hugging-face-transformers-moderna-pfizer-covid-19-vaccine-bert-cls-vector-extraction","status":"publish","type":"post","link":"https:\/\/atmokpo.com\/w\/36197\/","title":{"rendered":"Using Hugging Face Transformers, Moderna Pfizer Covid-19 Vaccine BERT [CLS] Vector Extraction"},"content":{"rendered":"<p><body><\/p>\n<p>With the advancement of deep learning and natural language processing (NLP), many companies are exploring various methods to analyze text data. Among these, BERT (Bidirectional Encoder Representations from Transformers) has established itself as an innovative model for deeply understanding the meaning of text data. In this course, we will cover how to extract the [CLS] vector from texts related to Moderna and Pfizer Covid-19 vaccines using Hugging Face&#8217;s <strong>Transformers<\/strong> library.<\/p>\n<h2>1. Introduction to the BERT Model<\/h2>\n<p>BERT is a pre-trained language model developed by Google that understands the context of a given sentence and can be utilized for various natural language processing tasks. The structure of BERT is as follows:<\/p>\n<ul>\n<li><strong>Bidirectional:<\/strong> BERT processes sentences in both directions to understand the context. This allows it to grasp the meaning of words in relation to surrounding words.<\/li>\n<li><strong>Transformer:<\/strong> BERT is based on the Transformer architecture and learns the relationships between all words in a sentence through the self-attention mechanism.<\/li>\n<li><strong>[CLS] Token:<\/strong> A special token called [CLS] is always added to the beginning of the input sentences to the BERT model. The vector of this token represents the overall meaning of the sentence and plays an important role in classification tasks.<\/li>\n<\/ul>\n<h2>2. Installing the Hugging Face Transformers Library<\/h2>\n<p>The Hugging Face Transformers library provides various models and tokenizers for natural language processing tasks. The installation proceeds as follows:<\/p>\n<pre><code>pip install transformers torch<\/code><\/pre>\n<h2>3. Data Preparation<\/h2>\n<p>Now, we will prepare the documents related to Moderna and Pfizer. Here, we will use simple sentences as examples. In actual use, more data should be collected.<\/p>\n<pre><code>texts = [\n        \"The Moderna Covid-19 vaccine showed an efficacy of 94.1%.\",\n        \"The efficacy of the Pfizer vaccine was reported to be 95%.\",\n        \"Both the Moderna and Pfizer vaccines use mRNA technology.\"\n    ]<\/code><\/pre>\n<h2>4. Loading the BERT Model and Extracting Vectors<\/h2>\n<p>After loading the BERT model and tokenizer, we will introduce how to extract the [CLS] vector for the input sentences.<\/p>\n<pre><code>\nfrom transformers import BertTokenizer, BertModel\nimport torch\n\n# Load BERT model and tokenizer\ntokenizer = BertTokenizer.from_pretrained('bert-base-uncased')\nmodel = BertModel.from_pretrained('bert-base-uncased')\n\n# Input text\ntexts = [\n    \"The Moderna Covid-19 vaccine showed an efficacy of 94.1%.\",\n    \"The efficacy of the Pfizer vaccine was reported to be 95%.\",\n    \"Both the Moderna and Pfizer vaccines use mRNA technology.\"\n]\n\n# Extract [CLS] vectors\ncls_vectors = []\nfor text in texts:\n    inputs = tokenizer(text, return_tensors='pt', truncation=True, padding=True)\n    outputs = model(**inputs)\n    cls_vector = outputs.last_hidden_state[0][0]  # [CLS] vector\n    cls_vectors.append(cls_vector.detach().numpy())\n<\/code><\/pre>\n<h2>5. Result Analysis<\/h2>\n<p>By running the above code, the [CLS] vectors for each sentence will be extracted. These vectors represent the meaning of the sentences in a high-dimensional space and can be utilized in various subsequent NLP tasks.<\/p>\n<h3>5.1. Example of Vector Visualization<\/h3>\n<p>The extracted vectors can be visualized or clustered to analyze the similarity between sentences.<\/p>\n<pre><code>\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.decomposition import PCA\n\n# Reduce vectors to 2 dimensions\npca = PCA(n_components=2)\nreduced_vectors = pca.fit_transform(np.array(cls_vectors))\n\n# Visualization\nplt.figure(figsize=(10, 6))\nfor i, text in enumerate(texts):\n    plt.scatter(reduced_vectors[i, 0], reduced_vectors[i, 1])\n    plt.annotate(text, (reduced_vectors[i, 0], reduced_vectors[i, 1]))\nplt.title('BERT [CLS] Vectors Visualization')\nplt.xlabel('PC1')\nplt.ylabel('PC2')\nplt.grid()\nplt.show()\n<\/code><\/pre>\n<h2>6. Conclusion<\/h2>\n<p>In this course, we covered the process of extracting [CLS] vectors from texts related to Moderna and Pfizer vaccines using the Hugging Face Transformers library with the BERT model. Through this, we have laid the foundation for understanding the meaning of text data and its application in various NLP applications.<\/p>\n<p>These technologies can be applied in many fields, such as research papers and social opinion analysis, and will continue to advance in the future. Later, we will address more diverse application examples, such as classification problems and sentiment analysis using these vectors.<\/p>\n<h2>7. References<\/h2>\n<ul>\n<li><a href=\"https:\/\/huggingface.co\/transformers\/\" target=\"_blank\" rel=\"noopener\">Hugging Face Transformers Documentation<\/a><\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1810.04805\" target=\"_blank\" rel=\"noopener\">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding<\/a><\/li>\n<li><a href=\"https:\/\/pytorch.org\/\" target=\"_blank\" rel=\"noopener\">PyTorch Documentation<\/a><\/li>\n<\/ul>\n<p><\/body><\/p>\n","protected":false},"excerpt":{"rendered":"<p>With the advancement of deep learning and natural language processing (NLP), many companies are exploring various methods to analyze text data. Among these, BERT (Bidirectional Encoder Representations from Transformers) has established itself as an innovative model for deeply understanding the meaning of text data. In this course, we will cover how to extract the [CLS] &hellip; <a href=\"https:\/\/atmokpo.com\/w\/36197\/\" class=\"more-link\">\ub354 \ubcf4\uae30<span class=\"screen-reader-text\"> &#8220;Using Hugging Face Transformers, Moderna Pfizer Covid-19 Vaccine BERT [CLS] Vector Extraction&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[108],"tags":[],"class_list":["post-36197","post","type-post","status-publish","format-standard","hentry","category---en"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Using Hugging Face Transformers, Moderna Pfizer Covid-19 Vaccine BERT [CLS] Vector Extraction - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/atmokpo.com\/w\/36197\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Using Hugging Face Transformers, Moderna Pfizer Covid-19 Vaccine BERT [CLS] Vector Extraction - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"og:description\" content=\"With the advancement of deep learning and natural language processing (NLP), many companies are exploring various methods to analyze text data. Among these, BERT (Bidirectional Encoder Representations from Transformers) has established itself as an innovative model for deeply understanding the meaning of text data. In this course, we will cover how to extract the [CLS] &hellip; \ub354 \ubcf4\uae30 &quot;Using Hugging Face Transformers, Moderna Pfizer Covid-19 Vaccine BERT [CLS] Vector Extraction&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/atmokpo.com\/w\/36197\/\" \/>\n<meta property=\"og:site_name\" content=\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-01T09:46:33+00:00\" \/>\n<meta name=\"author\" content=\"root\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:site\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"root\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"3\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/atmokpo.com\/w\/36197\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36197\/\"},\"author\":{\"name\":\"root\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\"},\"headline\":\"Using Hugging Face Transformers, Moderna Pfizer Covid-19 Vaccine BERT [CLS] Vector Extraction\",\"datePublished\":\"2024-11-01T09:46:33+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36197\/\"},\"wordCount\":456,\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"articleSection\":[\"Using Hugging Face\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/atmokpo.com\/w\/36197\/\",\"url\":\"https:\/\/atmokpo.com\/w\/36197\/\",\"name\":\"Using Hugging Face Transformers, Moderna Pfizer Covid-19 Vaccine BERT [CLS] Vector Extraction - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#website\"},\"datePublished\":\"2024-11-01T09:46:33+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36197\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/atmokpo.com\/w\/36197\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/atmokpo.com\/w\/36197\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\ud648\",\"item\":\"https:\/\/atmokpo.com\/w\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Using Hugging Face Transformers, Moderna Pfizer Covid-19 Vaccine BERT [CLS] Vector Extraction\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/atmokpo.com\/w\/#website\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/atmokpo.com\/w\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"contentUrl\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"width\":400,\"height\":400,\"caption\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\"},\"image\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/bebubo4\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\",\"name\":\"root\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"caption\":\"root\"},\"sameAs\":[\"http:\/\/atmokpo.com\/w\"],\"url\":\"https:\/\/atmokpo.com\/w\/author\/root\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Using Hugging Face Transformers, Moderna Pfizer Covid-19 Vaccine BERT [CLS] Vector Extraction - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/atmokpo.com\/w\/36197\/","og_locale":"ko_KR","og_type":"article","og_title":"Using Hugging Face Transformers, Moderna Pfizer Covid-19 Vaccine BERT [CLS] Vector Extraction - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","og_description":"With the advancement of deep learning and natural language processing (NLP), many companies are exploring various methods to analyze text data. Among these, BERT (Bidirectional Encoder Representations from Transformers) has established itself as an innovative model for deeply understanding the meaning of text data. In this course, we will cover how to extract the [CLS] &hellip; \ub354 \ubcf4\uae30 \"Using Hugging Face Transformers, Moderna Pfizer Covid-19 Vaccine BERT [CLS] Vector Extraction\"","og_url":"https:\/\/atmokpo.com\/w\/36197\/","og_site_name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","article_published_time":"2024-11-01T09:46:33+00:00","author":"root","twitter_card":"summary_large_image","twitter_creator":"@bebubo4","twitter_site":"@bebubo4","twitter_misc":{"\uae00\uc4f4\uc774":"root","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"3\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/atmokpo.com\/w\/36197\/#article","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/36197\/"},"author":{"name":"root","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7"},"headline":"Using Hugging Face Transformers, Moderna Pfizer Covid-19 Vaccine BERT [CLS] Vector Extraction","datePublished":"2024-11-01T09:46:33+00:00","mainEntityOfPage":{"@id":"https:\/\/atmokpo.com\/w\/36197\/"},"wordCount":456,"publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"articleSection":["Using Hugging Face"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/atmokpo.com\/w\/36197\/","url":"https:\/\/atmokpo.com\/w\/36197\/","name":"Using Hugging Face Transformers, Moderna Pfizer Covid-19 Vaccine BERT [CLS] Vector Extraction - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/#website"},"datePublished":"2024-11-01T09:46:33+00:00","breadcrumb":{"@id":"https:\/\/atmokpo.com\/w\/36197\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/atmokpo.com\/w\/36197\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/atmokpo.com\/w\/36197\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\ud648","item":"https:\/\/atmokpo.com\/w\/en\/"},{"@type":"ListItem","position":2,"name":"Using Hugging Face Transformers, Moderna Pfizer Covid-19 Vaccine BERT [CLS] Vector Extraction"}]},{"@type":"WebSite","@id":"https:\/\/atmokpo.com\/w\/#website","url":"https:\/\/atmokpo.com\/w\/","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","description":"","publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/atmokpo.com\/w\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/atmokpo.com\/w\/#organization","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","url":"https:\/\/atmokpo.com\/w\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/","url":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","contentUrl":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","width":400,"height":400,"caption":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8"},"image":{"@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/bebubo4"]},{"@type":"Person","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7","name":"root","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","caption":"root"},"sameAs":["http:\/\/atmokpo.com\/w"],"url":"https:\/\/atmokpo.com\/w\/author\/root\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36197","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/comments?post=36197"}],"version-history":[{"count":1,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36197\/revisions"}],"predecessor-version":[{"id":36198,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36197\/revisions\/36198"}],"wp:attachment":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/media?parent=36197"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/categories?post=36197"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/tags?post=36197"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}