Deep Learning PyTorch Course, Markov Decision Process - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/atmokpo.com\/w\/36493\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Deep Learning PyTorch Course, Markov Decision Process - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"og:description\" content=\"Markov Decision Process (MDP) is an important mathematical framework that underlies reinforcement learning. MDP is a model used by agents to determine the optimal actions in a specific environment. In this post, we will delve into the concept of MDP and how to implement it using PyTorch. 1. Overview of Markov Decision Process (MDP) MDP … \ub354 \ubcf4\uae30 "Deep Learning PyTorch Course, Markov Decision Process"\" \/>\n<meta property=\"og:url\" content=\"https:\/\/atmokpo.com\/w\/36493\/\" \/>\n<meta property=\"og:site_name\" content=\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-01T09:48:55+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-11-01T11:52:57+00:00\" \/>\n<meta name=\"author\" content=\"root\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:site\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"root\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"5\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/atmokpo.com\/w\/36493\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36493\/\"},\"author\":{\"name\":\"root\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\"},\"headline\":\"Deep Learning PyTorch Course, Markov Decision Process\",\"datePublished\":\"2024-11-01T09:48:55+00:00\",\"dateModified\":\"2024-11-01T11:52:57+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36493\/\"},\"wordCount\":666,\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"articleSection\":[\"PyTorch Study\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/atmokpo.com\/w\/36493\/\",\"url\":\"https:\/\/atmokpo.com\/w\/36493\/\",\"name\":\"Deep Learning PyTorch Course, Markov Decision Process - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#website\"},\"datePublished\":\"2024-11-01T09:48:55+00:00\",\"dateModified\":\"2024-11-01T11:52:57+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36493\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/atmokpo.com\/w\/36493\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/atmokpo.com\/w\/36493\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\ud648\",\"item\":\"https:\/\/atmokpo.com\/w\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Deep Learning PyTorch Course, Markov Decision Process\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/atmokpo.com\/w\/#website\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/atmokpo.com\/w\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"contentUrl\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"width\":400,\"height\":400,\"caption\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\"},\"image\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/bebubo4\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\",\"name\":\"root\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"caption\":\"root\"},\"sameAs\":[\"http:\/\/atmokpo.com\/w\"],\"url\":\"https:\/\/atmokpo.com\/w\/author\/root\/\"}]}<\/script>\n","yoast_head_json":{"title":"Deep Learning PyTorch Course, Markov Decision Process - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/atmokpo.com\/w\/36493\/","og_locale":"ko_KR","og_type":"article","og_title":"Deep Learning PyTorch Course, Markov Decision Process - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","og_description":"Markov Decision Process (MDP) is an important mathematical framework that underlies reinforcement learning. MDP is a model used by agents to determine the optimal actions in a specific environment. In this post, we will delve into the concept of MDP and how to implement it using PyTorch. 1. Overview of Markov Decision Process (MDP) MDP … \ub354 \ubcf4\uae30 \"Deep Learning PyTorch Course, Markov Decision Process\"","og_url":"https:\/\/atmokpo.com\/w\/36493\/","og_site_name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","article_published_time":"2024-11-01T09:48:55+00:00","article_modified_time":"2024-11-01T11:52:57+00:00","author":"root","twitter_card":"summary_large_image","twitter_creator":"@bebubo4","twitter_site":"@bebubo4","twitter_misc":{"\uae00\uc4f4\uc774":"root","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"5\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/atmokpo.com\/w\/36493\/#article","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/36493\/"},"author":{"name":"root","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7"},"headline":"Deep Learning PyTorch Course, Markov Decision Process","datePublished":"2024-11-01T09:48:55+00:00","dateModified":"2024-11-01T11:52:57+00:00","mainEntityOfPage":{"@id":"https:\/\/atmokpo.com\/w\/36493\/"},"wordCount":666,"publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"articleSection":["PyTorch Study"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/atmokpo.com\/w\/36493\/","url":"https:\/\/atmokpo.com\/w\/36493\/","name":"Deep Learning PyTorch Course, Markov Decision Process - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/#website"},"datePublished":"2024-11-01T09:48:55+00:00","dateModified":"2024-11-01T11:52:57+00:00","breadcrumb":{"@id":"https:\/\/atmokpo.com\/w\/36493\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/atmokpo.com\/w\/36493\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/atmokpo.com\/w\/36493\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\ud648","item":"https:\/\/atmokpo.com\/w\/en\/"},{"@type":"ListItem","position":2,"name":"Deep Learning PyTorch Course, Markov Decision Process"}]},{"@type":"WebSite","@id":"https:\/\/atmokpo.com\/w\/#website","url":"https:\/\/atmokpo.com\/w\/","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","description":"","publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/atmokpo.com\/w\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/atmokpo.com\/w\/#organization","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","url":"https:\/\/atmokpo.com\/w\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/","url":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","contentUrl":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","width":400,"height":400,"caption":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8"},"image":{"@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/bebubo4"]},{"@type":"Person","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7","name":"root","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","caption":"root"},"sameAs":["http:\/\/atmokpo.com\/w"],"url":"https:\/\/atmokpo.com\/w\/author\/root\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36493","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/comments?post=36493"}],"version-history":[{"count":1,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36493\/revisions"}],"predecessor-version":[{"id":36494,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36493\/revisions\/36494"}],"wp:attachment":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/media?parent=36493"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/categories?post=36493"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/tags?post=36493"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}

{"id":36493,"date":"2024-11-01T09:48:55","date_gmt":"2024-11-01T09:48:55","guid":{"rendered":"http:\/\/atmokpo.com\/w\/?p=36493"},"modified":"2024-11-01T11:52:57","modified_gmt":"2024-11-01T11:52:57","slug":"deep-learning-pytorch-course-markov-decision-process","status":"publish","type":"post","link":"https:\/\/atmokpo.com\/w\/36493\/","title":{"rendered":"Deep Learning PyTorch Course, Markov Decision Process"},"content":{"rendered":"

<\/p>\n

Markov Decision Process (MDP) is an important mathematical framework that underlies reinforcement learning. MDP is a model used by agents to determine the optimal actions in a specific environment. In this post, we will delve into the concept of MDP and how to implement it using PyTorch.<\/p>\n

1. Overview of Markov Decision Process (MDP)<\/h2>\n

MDP consists of the following components:<\/p>\n

State space (S)<\/strong>: A set of all possible states the agent can be in.<\/li>\n
Action space (A)<\/strong>: A set of all possible actions the agent can take in a specific state.<\/li>\n
Transition probabilities (P)<\/strong>: Defines the probability of transitioning to the next state based on the current state and action.<\/li>\n
Reward function (R)<\/strong>: The reward given when the agent takes a specific action in a specific state.<\/li>\n

Discount factor (\u03b3)<\/strong>: A value that adjusts the impact of future rewards on the present value, assuming that future rewards are considered less than present rewards.<\/li>\n<\/ul>\n
2. Mathematical Modeling of MDP<\/h2>\n
MDP is mathematically defined using the state space, action space, transition probabilities, reward function, and discount factor. MDP can be expressed as:<\/p>\n
\n
MDP = (S, A, P, R, \u03b3).<\/li>\n<\/ul>\n
Now, let\u2019s explain each component in more detail:<\/p>\n
State Space (S)<\/h3>\n
The state space is the set of all states the agent can be in. For example, in a game of Go, the state space could consist of all possible board configurations.<\/p>\n
Action Space (A)<\/h3>\n
The action space includes all actions the agent can take based on its state. For instance, in a Go game, the agent can place a stone at a specific position.<\/p>\n
Transition Probabilities (P)<\/h3>\n
Transition probabilities represent the likelihood of transitioning to the next state based on the current state and the chosen action. This is mathematically expressed as:<\/p>\n
P(s', r | s, a)<\/code><\/pre>\nHere, s'<\/code> is the next state, r<\/code> is the reward, s<\/code> is the current state, and a<\/code> is the chosen action.<\/p>\n Reward Function (R)<\/h3>\nThe reward function represents the reward given when the agent takes a specific action in a specific state. Rewards are a critical factor defining the agent’s goals.<\/p>\n Discount Factor (\u03b3)<\/h3>\nThe discount factor \u03b3 (0 \u2264 \u03b3 < 1)<\/code> reflects the impact of future rewards on the present value. The closer \u03b3<\/code> is to 0, the more the agent focuses on immediate rewards, and the closer it is to 1, the more the agent focuses on long-term rewards.<\/p>\n 3. Examples of MDP<\/h2>\nNow that we understand the concept of MDP, let\u2019s explore how to apply it to reinforcement learning problems through examples. Next, we will create a trained reinforcement learning agent using a simple MDP example.<\/p>\n 3.1 Simple Grid World Example<\/h3>\nThe grid world models a world composed of a 4×4 grid. The agent is located in each grid cell and can move through specific actions (up, down, left, right). The agent’s goal is to reach the bottom right area (goal position).<\/p>\n Definition of States and Actions<\/h4>\nIn this grid world:<\/p>\n \nState: Represented by numbers from 0 to 15 for each grid cell (4×4 grid)<\/li>\n Actions: Up (0), Down (1), Left (2), Right (3)<\/li>\n<\/ul>\nDefinition of Rewards<\/h4>\nThe agent receives a reward of +1 for reaching the goal state and 0 for any other state.<\/p>\n 4. Implementing MDP with PyTorch<\/h2>\nNow let’s implement the reinforcement learning agent using PyTorch. We will primarily use the Q-learning algorithm.<\/p>\n 4.1 Environment Initialization<\/h3>\nFirst, let\u2019s define a class for creating the grid world:<\/p>\n import numpy as np\n\nclass GridWorld:\n def __init__(self, grid_size=4):\n self.grid_size = grid_size\n self.state = 0\n self.goal_state = grid_size * grid_size - 1\n self.actions = [0, 1, 2, 3] # Up, Down, Left, Right\n self.rewards = np.zeros((grid_size * grid_size,))\n self.rewards[self.goal_state] = 1 # Reward for reaching the goal\n\n def reset(self):\n self.state = 0 # Starting state\n return self.state\n\n def step(self, action):\n x, y = divmod(self.state, self.grid_size)\n if action == 0 and x > 0: # Up\n x -= 1\n elif action == 1 and x < self.grid_size - 1: # Down\n x += 1\n elif action == 2 and y > 0: # Left\n y -= 1\n elif action == 3 and y < self.grid_size - 1: # Right\n y += 1\n self.state = x * self.grid_size + y\n return self.state, self.rewards[self.state]\n<\/code><\/pre>\n4.2 Implementing the Q-learning Algorithm<\/h3>\nWe will train the agent using Q-learning. Here is the code to implement the Q-learning algorithm:<\/p>\n import torch\nimport torch.nn as nn\nimport torch.optim as optim\n\nclass QNetwork(nn.Module):\n def __init__(self, state_size, action_size):\n super(QNetwork, self).__init__()\n self.fc1 = nn.Linear(state_size, 24)\n self.fc2 = nn.Linear(24, 24)\n self.fc3 = nn.Linear(24, action_size)\n\n def forward(self, x):\n x = torch.relu(self.fc1(x))\n x = torch.relu(self.fc2(x))\n return self.fc3(x)\n\ndef train_agent(episodes, max_steps):\n env = GridWorld()\n state_size = env.grid_size * env.grid_size\n action_size = len(env.actions)\n \n q_network = QNetwork(state_size, action_size)\n optimizer = optim.Adam(q_network.parameters(), lr=0.001)\n criterion = nn.MSELoss()\n\n for episode in range(episodes):\n state = env.reset()\n total_reward = 0\n for step in range(max_steps):\n state_tensor = torch.eye(state_size)[state]\n q_values = q_network(state_tensor)\n \n action = np.argmax(q_values.detach().numpy()) # epsilon-greedy policy\n next_state, reward = env.step(action)\n total_reward += reward\n \n next_state_tensor = torch.eye(state_size)[next_state]\n target = reward + 0.99 * torch.max(q_network(next_state_tensor)).detach()\n loss = criterion(q_values[action], target)\n \n optimizer.zero_grad()\n loss.backward()\n optimizer.step()\n\n if next_state == env.goal_state:\n break\n \n state = next_state\n print(f\"Episode {episode+1}: Total Reward = {total_reward}\")\n<\/code><\/pre>\n5. Conclusion<\/h2>\nIn this post, we explored the concept of Markov Decision Process (MDP) and how to implement it using PyTorch. MDP is a critical framework foundational to reinforcement learning, and it is essential to understand this concept to solve real reinforcement learning problems. I hope you gain deeper insights into MDP and reinforcement learning through practice.<\/p>\n Additionally, I encourage you to explore more complex MDP problems and learning algorithms. Using tools like PyTorch, try implementing various environments, training agents, and building your own reinforcement learning models.<\/p>\n \nI hope this post was helpful. If you have any questions, please leave a comment!<\/p>\n<\/footer>\n <\/body><\/p>\n","protected":false},"excerpt":{"rendered":" Markov Decision Process (MDP) is an important mathematical framework that underlies reinforcement learning. MDP is a model used by agents to determine the optimal actions in a specific environment. In this post, we will delve into the concept of MDP and how to implement it using PyTorch. 1. Overview of Markov Decision Process (MDP) MDP … \ub354 \ubcf4\uae30 “Deep Learning PyTorch Course, Markov Decision Process”<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[149],"tags":[],"class_list":["post-36493","post","type-post","status-publish","format-standard","hentry","category-pytorch-study"],"yoast_head":"\nDeep Learning PyTorch Course, Markov Decision Process - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/atmokpo.com\/w\/36493\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Deep Learning PyTorch Course, Markov Decision Process - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"og:description\" content=\"Markov Decision Process (MDP) is an important mathematical framework that underlies reinforcement learning. MDP is a model used by agents to determine the optimal actions in a specific environment. In this post, we will delve into the concept of MDP and how to implement it using PyTorch. 1. Overview of Markov Decision Process (MDP) MDP … \ub354 \ubcf4\uae30 "Deep Learning PyTorch Course, Markov Decision Process"\" \/>\n<meta property=\"og:url\" content=\"https:\/\/atmokpo.com\/w\/36493\/\" \/>\n<meta property=\"og:site_name\" content=\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-01T09:48:55+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-11-01T11:52:57+00:00\" \/>\n<meta name=\"author\" content=\"root\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:site\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"root\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"5\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/atmokpo.com\/w\/36493\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36493\/\"},\"author\":{\"name\":\"root\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\"},\"headline\":\"Deep Learning PyTorch Course, Markov Decision Process\",\"datePublished\":\"2024-11-01T09:48:55+00:00\",\"dateModified\":\"2024-11-01T11:52:57+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36493\/\"},\"wordCount\":666,\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"articleSection\":[\"PyTorch Study\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/atmokpo.com\/w\/36493\/\",\"url\":\"https:\/\/atmokpo.com\/w\/36493\/\",\"name\":\"Deep Learning PyTorch Course, Markov Decision Process - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#website\"},\"datePublished\":\"2024-11-01T09:48:55+00:00\",\"dateModified\":\"2024-11-01T11:52:57+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36493\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/atmokpo.com\/w\/36493\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/atmokpo.com\/w\/36493\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\ud648\",\"item\":\"https:\/\/atmokpo.com\/w\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Deep Learning PyTorch Course, Markov Decision Process\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/atmokpo.com\/w\/#website\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/atmokpo.com\/w\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"contentUrl\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"width\":400,\"height\":400,\"caption\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\"},\"image\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/bebubo4\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\",\"name\":\"root\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"caption\":\"root\"},\"sameAs\":[\"http:\/\/atmokpo.com\/w\"],\"url\":\"https:\/\/atmokpo.com\/w\/author\/root\/\"}]}<\/script>\n","yoast_head_json":{"title":"Deep Learning PyTorch Course, Markov Decision Process - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/atmokpo.com\/w\/36493\/","og_locale":"ko_KR","og_type":"article","og_title":"Deep Learning PyTorch Course, Markov Decision Process - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","og_description":"Markov Decision Process (MDP) is an important mathematical framework that underlies reinforcement learning. MDP is a model used by agents to determine the optimal actions in a specific environment. In this post, we will delve into the concept of MDP and how to implement it using PyTorch. 1. Overview of Markov Decision Process (MDP) MDP … \ub354 \ubcf4\uae30 \"Deep Learning PyTorch Course, Markov Decision Process\"","og_url":"https:\/\/atmokpo.com\/w\/36493\/","og_site_name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","article_published_time":"2024-11-01T09:48:55+00:00","article_modified_time":"2024-11-01T11:52:57+00:00","author":"root","twitter_card":"summary_large_image","twitter_creator":"@bebubo4","twitter_site":"@bebubo4","twitter_misc":{"\uae00\uc4f4\uc774":"root","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"5\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/atmokpo.com\/w\/36493\/#article","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/36493\/"},"author":{"name":"root","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7"},"headline":"Deep Learning PyTorch Course, Markov Decision Process","datePublished":"2024-11-01T09:48:55+00:00","dateModified":"2024-11-01T11:52:57+00:00","mainEntityOfPage":{"@id":"https:\/\/atmokpo.com\/w\/36493\/"},"wordCount":666,"publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"articleSection":["PyTorch Study"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/atmokpo.com\/w\/36493\/","url":"https:\/\/atmokpo.com\/w\/36493\/","name":"Deep Learning PyTorch Course, Markov Decision Process - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/#website"},"datePublished":"2024-11-01T09:48:55+00:00","dateModified":"2024-11-01T11:52:57+00:00","breadcrumb":{"@id":"https:\/\/atmokpo.com\/w\/36493\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/atmokpo.com\/w\/36493\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/atmokpo.com\/w\/36493\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\ud648","item":"https:\/\/atmokpo.com\/w\/en\/"},{"@type":"ListItem","position":2,"name":"Deep Learning PyTorch Course, Markov Decision Process"}]},{"@type":"WebSite","@id":"https:\/\/atmokpo.com\/w\/#website","url":"https:\/\/atmokpo.com\/w\/","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","description":"","publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/atmokpo.com\/w\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/atmokpo.com\/w\/#organization","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","url":"https:\/\/atmokpo.com\/w\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/","url":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","contentUrl":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","width":400,"height":400,"caption":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8"},"image":{"@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/bebubo4"]},{"@type":"Person","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7","name":"root","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","caption":"root"},"sameAs":["http:\/\/atmokpo.com\/w"],"url":"https:\/\/atmokpo.com\/w\/author\/root\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36493","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/comments?post=36493"}],"version-history":[{"count":1,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36493\/revisions"}],"predecessor-version":[{"id":36494,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36493\/revisions\/36494"}],"wp:attachment":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/media?parent=36493"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/categories?post=36493"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/tags?post=36493"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}

State Space (S)<\/h3>\n
The state space is the set of all states the agent can be in. For example, in a game of Go, the state space could consist of all possible board configurations.<\/p>\n

Action Space (A)<\/h3>\n
The action space includes all actions the agent can take based on its state. For instance, in a Go game, the agent can place a stone at a specific position.<\/p>\n

3.1 Simple Grid World Example<\/h3>\n
The grid world models a world composed of a 4×4 grid. The agent is located in each grid cell and can move through specific actions (up, down, left, right). The agent’s goal is to reach the bottom right area (goal position).<\/p>\n

Definition of Rewards<\/h4>\n
The agent receives a reward of +1 for reaching the goal state and 0 for any other state.<\/p>\n

4. Implementing MDP with PyTorch<\/h2>\n
Now let’s implement the reinforcement learning agent using PyTorch. We will primarily use the Q-learning algorithm.<\/p>\n

State Space (S)<\/h3>\nThe state space is the set of all states the agent can be in. For example, in a game of Go, the state space could consist of all possible board configurations.<\/p>\n

Action Space (A)<\/h3>\nThe action space includes all actions the agent can take based on its state. For instance, in a Go game, the agent can place a stone at a specific position.<\/p>\n

3.1 Simple Grid World Example<\/h3>\nThe grid world models a world composed of a 4×4 grid. The agent is located in each grid cell and can move through specific actions (up, down, left, right). The agent’s goal is to reach the bottom right area (goal position).<\/p>\n

Definition of Rewards<\/h4>\nThe agent receives a reward of +1 for reaching the goal state and 0 for any other state.<\/p>\n

4. Implementing MDP with PyTorch<\/h2>\nNow let’s implement the reinforcement learning agent using PyTorch. We will primarily use the Q-learning algorithm.<\/p>\n

State Space (S)<\/h3>\n
The state space is the set of all states the agent can be in. For example, in a game of Go, the state space could consist of all possible board configurations.<\/p>\n

Action Space (A)<\/h3>\n
The action space includes all actions the agent can take based on its state. For instance, in a Go game, the agent can place a stone at a specific position.<\/p>\n

3.1 Simple Grid World Example<\/h3>\n
The grid world models a world composed of a 4×4 grid. The agent is located in each grid cell and can move through specific actions (up, down, left, right). The agent’s goal is to reach the bottom right area (goal position).<\/p>\n

Definition of Rewards<\/h4>\n
The agent receives a reward of +1 for reaching the goal state and 0 for any other state.<\/p>\n

4. Implementing MDP with PyTorch<\/h2>\n
Now let’s implement the reinforcement learning agent using PyTorch. We will primarily use the Q-learning algorithm.<\/p>\n