{"id":36653,"date":"2024-11-01T09:50:20","date_gmt":"2024-11-01T09:50:20","guid":{"rendered":"http:\/\/atmokpo.com\/w\/?p=36653"},"modified":"2024-11-01T11:52:21","modified_gmt":"2024-11-01T11:52:21","slug":"deep-learning-pytorch-course-transformer-attention","status":"publish","type":"post","link":"https:\/\/atmokpo.com\/w\/36653\/","title":{"rendered":"Deep Learning PyTorch Course, Transformer Attention"},"content":{"rendered":"<p><body><\/p>\n<p>Deep learning has become a key technology that has brought innovations to the field of artificial intelligence (AI) in recent years. Among various deep learning models, the Transformer has shown outstanding performance in the field of Natural Language Processing (NLP) and has attracted the attention of many researchers. In this article, we will provide an in-depth explanation of the Transformer architecture and attention mechanism using the PyTorch framework, along with practical code examples.<\/p>\n<h2>1. What is a Transformer?<\/h2>\n<p>The Transformer is a model proposed by researchers including Vaswani from Google in 2017, designed to overcome the limitations of traditional Recurrent Neural Networks (RNNs) and Long Short-Term Memory networks (LSTMs). The Transformer can process the entire input sequence at once, making parallelization easier and allowing it to learn longer dependencies.<\/p>\n<h3>1.1 Structure of the Transformer<\/h3>\n<p>The Transformer consists of two main components: the encoder and the decoder. The encoder takes in the input sequence, and the decoder generates the output sequence based on the encoder&#8217;s output. The key part here is the attention mechanism.<\/p>\n<h2>2. Attention Mechanism<\/h2>\n<p>Attention is a mechanism that allows focusing on specific parts of the input sequence. In other words, each word (or input vector) computes weights based on its relationships with other words to extract information. Attention fundamentally consists of three elements: Query, Key, and Value.<\/p>\n<h3>2.1 Attention Score<\/h3>\n<p>The attention score is calculated as the dot product between the query and key. This score indicates how much each word in the input sequence influences the current word.<\/p>\n<h3>2.2 Softmax Function<\/h3>\n<p>To normalize the attention scores, the softmax function is used to compute the weights. This ensures that all weights fall between 0 and 1, and their sum equals 1.<\/p>\n<h3>2.3 Attention Operation<\/h3>\n<p>Once the weights are determined, they are multiplied with the Values to generate the final attention output. The final output is the sum of the weighted Values.<\/p>\n<h2>3. Implementing Transformer with PyTorch<\/h2>\n<p>Now, let&#8217;s implement the Transformer and attention mechanism using PyTorch. The code below is an example of a basic attention module.<\/p>\n<h3>3.1 Installing Required Libraries<\/h3>\n<pre><code>!pip install torch torchvision<\/code><\/pre>\n<h3>3.2 Implementing Attention Class<\/h3>\n<pre><code>\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nclass ScaledDotProductAttention(nn.Module):\n    def __init__(self):\n        super(ScaledDotProductAttention, self).__init__()\n\n    def forward(self, query, key, value, mask=None):\n        # Calculate dot product between query and key\n        scores = torch.matmul(query, key.transpose(-2, -1)) \/ (key.size(-1) ** 0.5)\n\n        # Masking if a mask is provided\n        if mask is not None:\n            scores.masked_fill_(mask == 0, -1e9)\n\n        # Normalize using softmax function\n        attn_weights = F.softmax(scores, dim=-1)\n\n        # Calculate attention output by multiplying weights with values\n        output = torch.matmul(attn_weights, value)\n        return output, attn_weights\n    <\/code><\/pre>\n<h3>3.3 Implementing Transformer Encoder<\/h3>\n<pre><code>\nclass TransformerEncoder(nn.Module):\n    def __init__(self, embed_size, heads, num_layers, drop_out):\n        super(TransformerEncoder, self).__init__()\n        self.embed_size = embed_size\n        self.heads = heads\n        self.num_layers = num_layers\n        self.drop_out = drop_out\n\n        self.attention = ScaledDotProductAttention()\n        self.linear = nn.Linear(embed_size, embed_size)\n        self.dropout = nn.Dropout(drop_out)\n        self.norm = nn.LayerNorm(embed_size)\n\n    def forward(self, x, mask):\n        for _ in range(self.num_layers):\n            attention_output, _ = self.attention(x, x, x, mask)\n            x = self.norm(x + self.dropout(attention_output))\n            x = self.norm(x + self.dropout(self.linear(x)))\n        return x\n    <\/code><\/pre>\n<h2>4. Model Training and Evaluation<\/h2>\n<p>After implementing the Transformer encoder, we will explain how to train and evaluate the model using real data.<\/p>\n<h3>4.1 Data Preparation<\/h3>\n<p>To train the model, we first need to prepare the training data. Typically, sequence data such as text data is used.<\/p>\n<h3>4.2 Model Initialization<\/h3>\n<pre><code>\nembed_size = 256  # Embedding dimension\nheads = 8  # Number of attention heads\nnum_layers = 6  # Number of encoder layers\ndrop_out = 0.1  # Dropout rate\n\nmodel = TransformerEncoder(embed_size, heads, num_layers, drop_out)\n    <\/code><\/pre>\n<h3>4.3 Setting Loss Function and Optimizer<\/h3>\n<pre><code>\noptimizer = torch.optim.Adam(model.parameters(), lr=0.0001)\nloss_fn = nn.CrossEntropyLoss()\n    <\/code><\/pre>\n<h3>4.4 Training Loop<\/h3>\n<pre><code>\nfor epoch in range(num_epochs):\n    model.train()\n    total_loss = 0\n    for batch in train_loader:\n        optimizer.zero_grad()\n        output = model(batch['input'], batch['mask'])\n        loss = loss_fn(output.view(-1, output.size(-1)), batch['target'])\n        loss.backward()\n        optimizer.step()\n        total_loss += loss.item()\n    print(f\"Epoch: {epoch+1}, Loss: {total_loss\/len(train_loader)}\")\n    <\/code><\/pre>\n<h3>4.5 Evaluation and Testing<\/h3>\n<p>After training is completed, we evaluate the model to measure its performance. Generally, metrics such as accuracy, precision, and recall are used on test data.<\/p>\n<h2>5. Conclusion<\/h2>\n<p>In this article, we explained the Transformer architecture and attention mechanism, and demonstrated how to implement them using PyTorch. The Transformer model is useful for building high-performance natural language processing models and is applied in various fields. Since the performance can vary significantly depending on the training data and model hyperparameters, it is important to find the optimal combination through various experiments.<\/p>\n<p>The Transformer is currently making innovative contributions to NLP modeling and is expected to continue to evolve through various research outcomes. In the next article, we will cover the use cases of Transformer models in natural language processing. We appreciate your interest.<\/p>\n<footer>\n<p>\u00a9 2023 Deep Learning Research Institute. All Rights Reserved.<\/p>\n<\/footer>\n<p><\/body><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Deep learning has become a key technology that has brought innovations to the field of artificial intelligence (AI) in recent years. Among various deep learning models, the Transformer has shown outstanding performance in the field of Natural Language Processing (NLP) and has attracted the attention of many researchers. In this article, we will provide an &hellip; <a href=\"https:\/\/atmokpo.com\/w\/36653\/\" class=\"more-link\">\ub354 \ubcf4\uae30<span class=\"screen-reader-text\"> &#8220;Deep Learning PyTorch Course, Transformer Attention&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[149],"tags":[],"class_list":["post-36653","post","type-post","status-publish","format-standard","hentry","category-pytorch-study"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Deep Learning PyTorch Course, Transformer Attention - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/atmokpo.com\/w\/36653\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Deep Learning PyTorch Course, Transformer Attention - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"og:description\" content=\"Deep learning has become a key technology that has brought innovations to the field of artificial intelligence (AI) in recent years. Among various deep learning models, the Transformer has shown outstanding performance in the field of Natural Language Processing (NLP) and has attracted the attention of many researchers. In this article, we will provide an &hellip; \ub354 \ubcf4\uae30 &quot;Deep Learning PyTorch Course, Transformer Attention&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/atmokpo.com\/w\/36653\/\" \/>\n<meta property=\"og:site_name\" content=\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-01T09:50:20+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-11-01T11:52:21+00:00\" \/>\n<meta name=\"author\" content=\"root\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:site\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"root\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"4\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/atmokpo.com\/w\/36653\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36653\/\"},\"author\":{\"name\":\"root\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\"},\"headline\":\"Deep Learning PyTorch Course, Transformer Attention\",\"datePublished\":\"2024-11-01T09:50:20+00:00\",\"dateModified\":\"2024-11-01T11:52:21+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36653\/\"},\"wordCount\":540,\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"articleSection\":[\"PyTorch Study\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/atmokpo.com\/w\/36653\/\",\"url\":\"https:\/\/atmokpo.com\/w\/36653\/\",\"name\":\"Deep Learning PyTorch Course, Transformer Attention - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#website\"},\"datePublished\":\"2024-11-01T09:50:20+00:00\",\"dateModified\":\"2024-11-01T11:52:21+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36653\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/atmokpo.com\/w\/36653\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/atmokpo.com\/w\/36653\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\ud648\",\"item\":\"https:\/\/atmokpo.com\/w\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Deep Learning PyTorch Course, Transformer Attention\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/atmokpo.com\/w\/#website\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/atmokpo.com\/w\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"contentUrl\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"width\":400,\"height\":400,\"caption\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\"},\"image\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/bebubo4\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\",\"name\":\"root\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"caption\":\"root\"},\"sameAs\":[\"http:\/\/atmokpo.com\/w\"],\"url\":\"https:\/\/atmokpo.com\/w\/author\/root\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Deep Learning PyTorch Course, Transformer Attention - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/atmokpo.com\/w\/36653\/","og_locale":"ko_KR","og_type":"article","og_title":"Deep Learning PyTorch Course, Transformer Attention - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","og_description":"Deep learning has become a key technology that has brought innovations to the field of artificial intelligence (AI) in recent years. Among various deep learning models, the Transformer has shown outstanding performance in the field of Natural Language Processing (NLP) and has attracted the attention of many researchers. In this article, we will provide an &hellip; \ub354 \ubcf4\uae30 \"Deep Learning PyTorch Course, Transformer Attention\"","og_url":"https:\/\/atmokpo.com\/w\/36653\/","og_site_name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","article_published_time":"2024-11-01T09:50:20+00:00","article_modified_time":"2024-11-01T11:52:21+00:00","author":"root","twitter_card":"summary_large_image","twitter_creator":"@bebubo4","twitter_site":"@bebubo4","twitter_misc":{"\uae00\uc4f4\uc774":"root","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"4\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/atmokpo.com\/w\/36653\/#article","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/36653\/"},"author":{"name":"root","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7"},"headline":"Deep Learning PyTorch Course, Transformer Attention","datePublished":"2024-11-01T09:50:20+00:00","dateModified":"2024-11-01T11:52:21+00:00","mainEntityOfPage":{"@id":"https:\/\/atmokpo.com\/w\/36653\/"},"wordCount":540,"publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"articleSection":["PyTorch Study"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/atmokpo.com\/w\/36653\/","url":"https:\/\/atmokpo.com\/w\/36653\/","name":"Deep Learning PyTorch Course, Transformer Attention - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/#website"},"datePublished":"2024-11-01T09:50:20+00:00","dateModified":"2024-11-01T11:52:21+00:00","breadcrumb":{"@id":"https:\/\/atmokpo.com\/w\/36653\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/atmokpo.com\/w\/36653\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/atmokpo.com\/w\/36653\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\ud648","item":"https:\/\/atmokpo.com\/w\/en\/"},{"@type":"ListItem","position":2,"name":"Deep Learning PyTorch Course, Transformer Attention"}]},{"@type":"WebSite","@id":"https:\/\/atmokpo.com\/w\/#website","url":"https:\/\/atmokpo.com\/w\/","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","description":"","publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/atmokpo.com\/w\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/atmokpo.com\/w\/#organization","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","url":"https:\/\/atmokpo.com\/w\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/","url":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","contentUrl":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","width":400,"height":400,"caption":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8"},"image":{"@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/bebubo4"]},{"@type":"Person","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7","name":"root","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","caption":"root"},"sameAs":["http:\/\/atmokpo.com\/w"],"url":"https:\/\/atmokpo.com\/w\/author\/root\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36653","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/comments?post=36653"}],"version-history":[{"count":1,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36653\/revisions"}],"predecessor-version":[{"id":36654,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36653\/revisions\/36654"}],"wp:attachment":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/media?parent=36653"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/categories?post=36653"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/tags?post=36653"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}