{"id":36491,"date":"2024-11-01T09:48:54","date_gmt":"2024-11-01T09:48:54","guid":{"rendered":"http:\/\/atmokpo.com\/w\/?p=36491"},"modified":"2024-11-01T11:52:58","modified_gmt":"2024-11-01T11:52:58","slug":"deep-learning-pytorch-course-deep-q-learning","status":"publish","type":"post","link":"https:\/\/atmokpo.com\/w\/36491\/","title":{"rendered":"Deep Learning PyTorch Course, Deep Q-Learning"},"content":{"rendered":"<p><body><\/p>\n<h2>1. Introduction<\/h2>\n<p>\n        Deep Q-Learning is one of the most important algorithms in the field of Reinforcement Learning.<br \/>\n        It uses deep neural networks to teach agents to select optimal actions. In this tutorial, we will explore the fundamental concepts necessary to implement and understand the deep Q-learning algorithm using the PyTorch library.\n    <\/p>\n<h2>2. Basics of Reinforcement Learning<\/h2>\n<p>\n        Reinforcement Learning is a method by which an agent learns to maximize rewards by interacting with an environment.<br \/>\n        The agent observes the state, selects possible actions, and experiences changes in the environment as a result.<br \/>\n        This process consists of the following components.\n    <\/p>\n<ul>\n<li><strong>State (s)<\/strong>: The current situation of the environment where the agent exists.<\/li>\n<li><strong>Action (a)<\/strong>: The actions that the agent can choose from.<\/li>\n<li><strong>Reward (r)<\/strong>: The evaluation the agent receives after taking an action. <\/li>\n<li><strong>Policy (\u03c0)<\/strong>: The strategy for selecting actions in a given state.<\/li>\n<\/ul>\n<h2>3. Q-Learning Algorithm<\/h2>\n<p>\n        Q-Learning is a form of reinforcement learning where the agent learns the expected rewards for taking specific actions in certain states.<br \/>\n        The key to Q-Learning is updating the Q-value. The Q-value represents the long-term reward for a state-action pair and is updated using the following Bellman equation.\n    <\/p>\n<p>\n<strong>Q(s, a) \u2190 Q(s, a) + \u03b1[r + \u03b3 max Q(s&#8217;, a&#8217;) &#8211; Q(s, a)]<\/strong>\n<\/p>\n<p>\n        Here, \u03b1 is the learning rate, \u03b3 is the discount factor, s is the current state, and s&#8217; is the next state.<br \/>\n        Q-Learning typically stores Q-values in a tabular format; however, when the state space is large or continuous,<br \/>\n        we need to approximate Q-values using deep learning.\n    <\/p>\n<h2>4. Deep Q-Learning (DQN)<\/h2>\n<p>\n        Deep Q-Learning is a method that uses deep neural networks to approximate Q-values.<br \/>\n        DQN has the following key components.\n    <\/p>\n<ul>\n<li><strong>Experience Replay<\/strong>: Stores the agent&#8217;s experiences and samples randomly for learning.<\/li>\n<li><strong>Target Network<\/strong>: A network updated periodically to improve stability.<\/li>\n<\/ul>\n<p>\n        DQN utilizes these two techniques to enhance the stability and performance of the learning process.\n    <\/p>\n<h2>5. Setting Up the Environment<\/h2>\n<p>\n        Now, let&#8217;s install the necessary packages to implement DQN using Python and PyTorch.<br \/>\n        We will install the required libraries using pip as shown below.\n    <\/p>\n<pre>\n        <code>\n            pip install torch torchvision numpy matplotlib gym\n        <\/code>\n    <\/pre>\n<h2>6. Implementing DQN<\/h2>\n<p>\n        Below is the basic skeleton of the DQN class and the environment setup code. We will use the CartPole environment provided by OpenAI&#8217;s Gym as a simple example.\n    <\/p>\n<h3>6.1 Defining the DQN Class<\/h3>\n<pre>\n        <code>\n            import torch\n            import torch.nn as nn\n            import torch.optim as optim\n            import numpy as np\n            import random\n            \n            class DQN(nn.Module):\n                def __init__(self, state_size, action_size):\n                    super(DQN, self).__init__()\n                    self.fc1 = nn.Linear(state_size, 128)\n                    self.fc2 = nn.Linear(128, 128)\n                    self.fc3 = nn.Linear(128, action_size)\n\n                def forward(self, x):\n                    x = torch.relu(self.fc1(x))\n                    x = torch.relu(self.fc2(x))\n                    return self.fc3(x)\n        <\/code>\n    <\/pre>\n<h3>6.2 Setting Up the Environment and Hyperparameters<\/h3>\n<pre>\n        <code>\n            import gym\n            \n            # Setting up the environment and hyperparameters\n            env = gym.make('CartPole-v1')\n            state_size = env.observation_space.shape[0]\n            action_size = env.action_space.n\n            learning_rate = 0.001\n            gamma = 0.99\n            epsilon = 1.0\n            epsilon_decay = 0.995\n            epsilon_min = 0.01\n            num_episodes = 1000\n            replay_memory = []\n            replay_memory_size = 2000\n        <\/code>\n    <\/pre>\n<h3>6.3 Training Loop<\/h3>\n<pre>\n        <code>\n            def train_dqn():\n                model = DQN(state_size, action_size)\n                optimizer = optim.Adam(model.parameters(), lr=learning_rate)\n                criterion = nn.MSELoss()\n                \n                for episode in range(num_episodes):\n                    state = env.reset()\n                    state = np.reshape(state, [1, state_size])\n                    done = False\n                    total_reward = 0\n                    \n                    while not done:\n                        if np.random.rand() <= epsilon:\n                            action = np.random.randint(action_size)\n                        else:\n                            q_values = model(torch.FloatTensor(state))\n                            action = torch.argmax(q_values).item()\n\n                        next_state, reward, done, _ = env.step(action)\n                        total_reward += reward\n                        next_state = np.reshape(next_state, [1, state_size])\n                        \n                        if done:\n                            reward = -1\n\n                        replay_memory.append((state, action, reward, next_state, done))\n                        if len(replay_memory) > replay_memory_size:\n                            replay_memory.pop(0)\n\n                        if len(replay_memory) > 32:\n                            minibatch = random.sample(replay_memory, 32)\n                            for m_state, m_action, m_reward, m_next_state, m_done in minibatch:\n                                target = m_reward\n                                if not m_done:\n                                    target += gamma * torch.max(model(torch.FloatTensor(m_next_state))).item()\n                                target_f = model(torch.FloatTensor(m_state))\n                                target_f[m_action] = target\n                                optimizer.zero_grad()\n                                loss = criterion(model(torch.FloatTensor(m_state)), target_f)\n                                loss.backward()\n                                optimizer.step()\n\n                        state = next_state\n\n                    global epsilon\n                    if epsilon > epsilon_min:\n                        epsilon *= epsilon_decay\n                    \n                    print(f\"Episode: {episode}\/{num_episodes}, Total Reward: {total_reward}\")\n        \n            train_dqn()\n        <\/code>\n    <\/pre>\n<h2>7. Results and Conclusion<\/h2>\n<p>\n        The DQN algorithm can operate effectively on problems with complex state spaces.<br \/>\n        In this code example, we trained DQN using the CartPole environment.<br \/>\n        As training progresses, the agent will exhibit better performance.\n    <\/p>\n<p>\n        Future improvements may include experiments in more complex environments, tuning various hyperparameters,<br \/>\n        and combining techniques for various strategic approaches.<br \/>\n        We hope that the content covered in this tutorial helps enhance your understanding of deep learning and reinforcement learning!\n    <\/p>\n<h2>8. References<\/h2>\n<ul>\n<li>Mnih, V. et al. (2013). <em>Playing Atari with Deep Reinforcement Learning.<\/em><\/li>\n<li>Lillicrap, T. P., Hunt, J. J., Pritzel, A., et al. (2015). <em>Continuous Control with Deep Reinforcement Learning.<\/em><\/li>\n<\/ul>\n<p><\/body><\/p>\n","protected":false},"excerpt":{"rendered":"<p>1. Introduction Deep Q-Learning is one of the most important algorithms in the field of Reinforcement Learning. It uses deep neural networks to teach agents to select optimal actions. In this tutorial, we will explore the fundamental concepts necessary to implement and understand the deep Q-learning algorithm using the PyTorch library. 2. Basics of Reinforcement &hellip; <a href=\"https:\/\/atmokpo.com\/w\/36491\/\" class=\"more-link\">\ub354 \ubcf4\uae30<span class=\"screen-reader-text\"> &#8220;Deep Learning PyTorch Course, Deep Q-Learning&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[149],"tags":[],"class_list":["post-36491","post","type-post","status-publish","format-standard","hentry","category-pytorch-study"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Deep Learning PyTorch Course, Deep Q-Learning - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/atmokpo.com\/w\/36491\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Deep Learning PyTorch Course, Deep Q-Learning - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"og:description\" content=\"1. Introduction Deep Q-Learning is one of the most important algorithms in the field of Reinforcement Learning. It uses deep neural networks to teach agents to select optimal actions. In this tutorial, we will explore the fundamental concepts necessary to implement and understand the deep Q-learning algorithm using the PyTorch library. 2. Basics of Reinforcement &hellip; \ub354 \ubcf4\uae30 &quot;Deep Learning PyTorch Course, Deep Q-Learning&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/atmokpo.com\/w\/36491\/\" \/>\n<meta property=\"og:site_name\" content=\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-01T09:48:54+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-11-01T11:52:58+00:00\" \/>\n<meta name=\"author\" content=\"root\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:site\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"root\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"4\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/atmokpo.com\/w\/36491\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36491\/\"},\"author\":{\"name\":\"root\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\"},\"headline\":\"Deep Learning PyTorch Course, Deep Q-Learning\",\"datePublished\":\"2024-11-01T09:48:54+00:00\",\"dateModified\":\"2024-11-01T11:52:58+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36491\/\"},\"wordCount\":484,\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"articleSection\":[\"PyTorch Study\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/atmokpo.com\/w\/36491\/\",\"url\":\"https:\/\/atmokpo.com\/w\/36491\/\",\"name\":\"Deep Learning PyTorch Course, Deep Q-Learning - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#website\"},\"datePublished\":\"2024-11-01T09:48:54+00:00\",\"dateModified\":\"2024-11-01T11:52:58+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36491\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/atmokpo.com\/w\/36491\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/atmokpo.com\/w\/36491\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\ud648\",\"item\":\"https:\/\/atmokpo.com\/w\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Deep Learning PyTorch Course, Deep Q-Learning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/atmokpo.com\/w\/#website\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/atmokpo.com\/w\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"contentUrl\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"width\":400,\"height\":400,\"caption\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\"},\"image\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/bebubo4\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\",\"name\":\"root\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"caption\":\"root\"},\"sameAs\":[\"http:\/\/atmokpo.com\/w\"],\"url\":\"https:\/\/atmokpo.com\/w\/author\/root\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Deep Learning PyTorch Course, Deep Q-Learning - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/atmokpo.com\/w\/36491\/","og_locale":"ko_KR","og_type":"article","og_title":"Deep Learning PyTorch Course, Deep Q-Learning - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","og_description":"1. Introduction Deep Q-Learning is one of the most important algorithms in the field of Reinforcement Learning. It uses deep neural networks to teach agents to select optimal actions. In this tutorial, we will explore the fundamental concepts necessary to implement and understand the deep Q-learning algorithm using the PyTorch library. 2. Basics of Reinforcement &hellip; \ub354 \ubcf4\uae30 \"Deep Learning PyTorch Course, Deep Q-Learning\"","og_url":"https:\/\/atmokpo.com\/w\/36491\/","og_site_name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","article_published_time":"2024-11-01T09:48:54+00:00","article_modified_time":"2024-11-01T11:52:58+00:00","author":"root","twitter_card":"summary_large_image","twitter_creator":"@bebubo4","twitter_site":"@bebubo4","twitter_misc":{"\uae00\uc4f4\uc774":"root","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"4\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/atmokpo.com\/w\/36491\/#article","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/36491\/"},"author":{"name":"root","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7"},"headline":"Deep Learning PyTorch Course, Deep Q-Learning","datePublished":"2024-11-01T09:48:54+00:00","dateModified":"2024-11-01T11:52:58+00:00","mainEntityOfPage":{"@id":"https:\/\/atmokpo.com\/w\/36491\/"},"wordCount":484,"publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"articleSection":["PyTorch Study"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/atmokpo.com\/w\/36491\/","url":"https:\/\/atmokpo.com\/w\/36491\/","name":"Deep Learning PyTorch Course, Deep Q-Learning - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/#website"},"datePublished":"2024-11-01T09:48:54+00:00","dateModified":"2024-11-01T11:52:58+00:00","breadcrumb":{"@id":"https:\/\/atmokpo.com\/w\/36491\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/atmokpo.com\/w\/36491\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/atmokpo.com\/w\/36491\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\ud648","item":"https:\/\/atmokpo.com\/w\/en\/"},{"@type":"ListItem","position":2,"name":"Deep Learning PyTorch Course, Deep Q-Learning"}]},{"@type":"WebSite","@id":"https:\/\/atmokpo.com\/w\/#website","url":"https:\/\/atmokpo.com\/w\/","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","description":"","publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/atmokpo.com\/w\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/atmokpo.com\/w\/#organization","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","url":"https:\/\/atmokpo.com\/w\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/","url":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","contentUrl":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","width":400,"height":400,"caption":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8"},"image":{"@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/bebubo4"]},{"@type":"Person","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7","name":"root","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","caption":"root"},"sameAs":["http:\/\/atmokpo.com\/w"],"url":"https:\/\/atmokpo.com\/w\/author\/root\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36491","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/comments?post=36491"}],"version-history":[{"count":1,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36491\/revisions"}],"predecessor-version":[{"id":36492,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36491\/revisions\/36492"}],"wp:attachment":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/media?parent=36491"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/categories?post=36491"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/tags?post=36491"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}