{"id":36625,"date":"2024-11-01T09:50:05","date_gmt":"2024-11-01T09:50:05","guid":{"rendered":"http:\/\/atmokpo.com\/w\/?p=36625"},"modified":"2024-11-01T11:52:28","modified_gmt":"2024-11-01T11:52:28","slug":"deep-learning-pytorch-course-preprocessing-stemming","status":"publish","type":"post","link":"https:\/\/atmokpo.com\/w\/36625\/","title":{"rendered":"Deep Learning PyTorch Course, Preprocessing, Stemming"},"content":{"rendered":"<p><body><\/p>\n<p>Deep learning is a technology used to create predictive models by learning from vast amounts of data. The performance of deep learning models is heavily influenced by the quality and quantity of the data, making data preprocessing a very important process. In this course, we will explore the preprocessing of text data used in deep learning and stemming, a frequently used technique in natural language processing. Additionally, we will implement this through practical example code using Python and the PyTorch library.<\/p>\n<h2>1. Data Preprocessing<\/h2>\n<p>Data preprocessing is the process of refining and processing raw data, which can enhance the learning performance of the model. The preprocessing of text data consists of the following steps:<\/p>\n<ol>\n<li>Data collection: Methods for collecting actual data (crawling, API, etc.).<\/li>\n<li>Data cleansing: Removing unnecessary characters, standardizing case, handling duplicate data.<\/li>\n<li>Tokenization: Splitting text into words or sentences.<\/li>\n<li>Stemming and Lemmatization: Transforming the form of words to their base form.<\/li>\n<li>Indexing: Converting text data into numerical format.<\/li>\n<\/ol>\n<h3>1.1 Data Collection<\/h3>\n<p>Data collection is the first step in natural language processing (NLP), and data can be collected through various methods. For example, news articles can be obtained through web scraping or data can be collected via public APIs.<\/p>\n<h3>1.2 Data Cleansing<\/h3>\n<p>Data cleansing is the process of removing noise from raw data to create clean data. In this step, actions such as removing HTML tags, eliminating unnecessary symbols, and processing numbers will be performed.<\/p>\n<h4>Python Example: Data Cleansing<\/h4>\n<pre><code class=\"language-python\">\nimport re\n\ndef clean_text(text):\n    # Remove HTML tags\n    text = re.sub(r'&lt;.*?&gt;', '', text)\n    # Remove special characters\n    text = re.sub(r'[^a-zA-Z0-9\uac00-\ud7a3\\s]', '', text)\n    # Standardize case\n    text = text.lower()\n    return text\n\nsample_text = \"<h1>Hello, this is a deep learning course!!<\/h1> Starting data cleansing.\"\ncleaned_text = clean_text(sample_text)\nprint(cleaned_text)\n<\/code><\/pre>\n<h2>2. Stemming and Lemmatization<\/h2>\n<p>In natural language processing, stemming and lemmatization are primarily used. Stemming is a method that removes prefixes and suffixes from words to convert them into their root form. In contrast, lemmatization converts words into their appropriate base form according to context.<\/p>\n<h3>2.1 Stemming<\/h3>\n<p>Stemming is a method used to shorten words while maintaining their meaning. In Python, it can be easily implemented using libraries such as NLTK.<\/p>\n<h4>Python Example: Stemming<\/h4>\n<pre><code class=\"language-python\">\nfrom nltk.stem import PorterStemmer\n\nstemmer = PorterStemmer()\n\nwords = [\"running\", \"runner\", \"ran\", \"easily\", \"fairly\"]\nstems = [stemmer.stem(word) for word in words]\nprint(stems)\n<\/code><\/pre>\n<h3>2.2 Lemmatization<\/h3>\n<p>Lemmatization converts words into their appropriate base form based on their part of speech. This allows for a semantic analysis of morphemes.<\/p>\n<h4>Python Example: Lemmatization<\/h4>\n<pre><code class=\"language-python\">\nfrom nltk.stem import WordNetLemmatizer\n\nlemmatizer = WordNetLemmatizer()\n\nwords = [\"running\", \"runner\", \"ran\", \"easily\", \"fairly\"]\nlemmas = [lemmatizer.lemmatize(word, pos='v') for word in words]\nprint(lemmas)\n<\/code><\/pre>\n<h2>3. Applying Preprocessing in PyTorch<\/h2>\n<p>PyTorch is a deep learning framework characterized by dealing with data in tensor format. Preprocessed data can be applied to the PyTorch dataset for model training.<\/p>\n<h4>Python Example: Data Preprocessing in PyTorch<\/h4>\n<pre><code class=\"language-python\">\nimport torch\nfrom torch.utils.data import Dataset, DataLoader\n\nclass TextDataset(Dataset):\n    def __init__(self, texts):\n        self.texts = texts\n\n    def __len__(self):\n        return len(self.texts)\n\n    def __getitem__(self, index):\n        text = self.texts[index]\n        # Apply stemming or lemmatization\n        cleaned_text = clean_text(text)\n        return cleaned_text\n\n# Sample data\ntexts = [\n    \"I am feeling very good today.\",\n    \"Deep learning is truly an interesting topic.\"\n]\n\ndataset = TextDataset(texts)\ndataloader = DataLoader(dataset, batch_size=2)\n\nfor data in dataloader:\n    print(data)\n<\/code><\/pre>\n<h2>4. Conclusion<\/h2>\n<p>To enhance the performance of deep learning models, data preprocessing is essential. By applying correct preprocessing, the quality of data can be improved, and stemming and lemmatization are important techniques for natural language processing. We encourage you to apply the methods introduced in this course to actual data and further utilize them for training deep learning models.<\/p>\n<footer>\n<p>\u00a9 2023. Author of the Deep Learning Course.<\/p>\n<\/footer>\n<p><\/body><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Deep learning is a technology used to create predictive models by learning from vast amounts of data. The performance of deep learning models is heavily influenced by the quality and quantity of the data, making data preprocessing a very important process. In this course, we will explore the preprocessing of text data used in deep &hellip; <a href=\"https:\/\/atmokpo.com\/w\/36625\/\" class=\"more-link\">\ub354 \ubcf4\uae30<span class=\"screen-reader-text\"> &#8220;Deep Learning PyTorch Course, Preprocessing, Stemming&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[149],"tags":[],"class_list":["post-36625","post","type-post","status-publish","format-standard","hentry","category-pytorch-study"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Deep Learning PyTorch Course, Preprocessing, Stemming - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/atmokpo.com\/w\/36625\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Deep Learning PyTorch Course, Preprocessing, Stemming - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"og:description\" content=\"Deep learning is a technology used to create predictive models by learning from vast amounts of data. The performance of deep learning models is heavily influenced by the quality and quantity of the data, making data preprocessing a very important process. In this course, we will explore the preprocessing of text data used in deep &hellip; \ub354 \ubcf4\uae30 &quot;Deep Learning PyTorch Course, Preprocessing, Stemming&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/atmokpo.com\/w\/36625\/\" \/>\n<meta property=\"og:site_name\" content=\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-01T09:50:05+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-11-01T11:52:28+00:00\" \/>\n<meta name=\"author\" content=\"root\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:site\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"root\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"3\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/atmokpo.com\/w\/36625\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36625\/\"},\"author\":{\"name\":\"root\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\"},\"headline\":\"Deep Learning PyTorch Course, Preprocessing, Stemming\",\"datePublished\":\"2024-11-01T09:50:05+00:00\",\"dateModified\":\"2024-11-01T11:52:28+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36625\/\"},\"wordCount\":441,\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"articleSection\":[\"PyTorch Study\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/atmokpo.com\/w\/36625\/\",\"url\":\"https:\/\/atmokpo.com\/w\/36625\/\",\"name\":\"Deep Learning PyTorch Course, Preprocessing, Stemming - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#website\"},\"datePublished\":\"2024-11-01T09:50:05+00:00\",\"dateModified\":\"2024-11-01T11:52:28+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36625\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/atmokpo.com\/w\/36625\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/atmokpo.com\/w\/36625\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\ud648\",\"item\":\"https:\/\/atmokpo.com\/w\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Deep Learning PyTorch Course, Preprocessing, Stemming\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/atmokpo.com\/w\/#website\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/atmokpo.com\/w\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"contentUrl\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"width\":400,\"height\":400,\"caption\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\"},\"image\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/bebubo4\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\",\"name\":\"root\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"caption\":\"root\"},\"sameAs\":[\"http:\/\/atmokpo.com\/w\"],\"url\":\"https:\/\/atmokpo.com\/w\/author\/root\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Deep Learning PyTorch Course, Preprocessing, Stemming - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/atmokpo.com\/w\/36625\/","og_locale":"ko_KR","og_type":"article","og_title":"Deep Learning PyTorch Course, Preprocessing, Stemming - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","og_description":"Deep learning is a technology used to create predictive models by learning from vast amounts of data. The performance of deep learning models is heavily influenced by the quality and quantity of the data, making data preprocessing a very important process. In this course, we will explore the preprocessing of text data used in deep &hellip; \ub354 \ubcf4\uae30 \"Deep Learning PyTorch Course, Preprocessing, Stemming\"","og_url":"https:\/\/atmokpo.com\/w\/36625\/","og_site_name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","article_published_time":"2024-11-01T09:50:05+00:00","article_modified_time":"2024-11-01T11:52:28+00:00","author":"root","twitter_card":"summary_large_image","twitter_creator":"@bebubo4","twitter_site":"@bebubo4","twitter_misc":{"\uae00\uc4f4\uc774":"root","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"3\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/atmokpo.com\/w\/36625\/#article","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/36625\/"},"author":{"name":"root","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7"},"headline":"Deep Learning PyTorch Course, Preprocessing, Stemming","datePublished":"2024-11-01T09:50:05+00:00","dateModified":"2024-11-01T11:52:28+00:00","mainEntityOfPage":{"@id":"https:\/\/atmokpo.com\/w\/36625\/"},"wordCount":441,"publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"articleSection":["PyTorch Study"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/atmokpo.com\/w\/36625\/","url":"https:\/\/atmokpo.com\/w\/36625\/","name":"Deep Learning PyTorch Course, Preprocessing, Stemming - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/#website"},"datePublished":"2024-11-01T09:50:05+00:00","dateModified":"2024-11-01T11:52:28+00:00","breadcrumb":{"@id":"https:\/\/atmokpo.com\/w\/36625\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/atmokpo.com\/w\/36625\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/atmokpo.com\/w\/36625\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\ud648","item":"https:\/\/atmokpo.com\/w\/en\/"},{"@type":"ListItem","position":2,"name":"Deep Learning PyTorch Course, Preprocessing, Stemming"}]},{"@type":"WebSite","@id":"https:\/\/atmokpo.com\/w\/#website","url":"https:\/\/atmokpo.com\/w\/","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","description":"","publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/atmokpo.com\/w\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/atmokpo.com\/w\/#organization","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","url":"https:\/\/atmokpo.com\/w\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/","url":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","contentUrl":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","width":400,"height":400,"caption":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8"},"image":{"@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/bebubo4"]},{"@type":"Person","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7","name":"root","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","caption":"root"},"sameAs":["http:\/\/atmokpo.com\/w"],"url":"https:\/\/atmokpo.com\/w\/author\/root\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36625","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/comments?post=36625"}],"version-history":[{"count":1,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36625\/revisions"}],"predecessor-version":[{"id":36626,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36625\/revisions\/36626"}],"wp:attachment":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/media?parent=36625"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/categories?post=36625"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/tags?post=36625"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}