{"id":31755,"date":"2024-11-01T09:02:32","date_gmt":"2024-11-01T09:02:32","guid":{"rendered":"http:\/\/atmokpo.com\/w\/?p=31755"},"modified":"2024-11-01T11:48:24","modified_gmt":"2024-11-01T11:48:24","slug":"3-information-extraction-from-text-data","status":"publish","type":"post","link":"https:\/\/atmokpo.com\/w\/31755\/","title":{"rendered":"3. Information Extraction from Text Data"},"content":{"rendered":"\n<p>Regular expressions are effectively used in natural language processing (NLP) and data analysis. For example, they can be utilized to search for specific keywords in customer feedback data or to extract numerical and currency information from financial data.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import re\n\n# Customer feedback example\nfeedback = \"The service at our bank was fantastic. I was especially impressed by the kindness of Agent Kim. Thank you!\"\n\n# Extracting statements that include 'Agent Kim'\nagent_pattern = r\".*Agent Kim.*\"\nagent_feedback = re.search(agent_pattern, feedback)\n\nif agent_feedback:\n    print(agent_feedback.group())  # Extract specific sentence if found<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Cautions When Using Regular Expressions<\/h2>\n\n\n\n<p>Regular expressions are very powerful tools, but improper use can lead to performance issues. In particular, when handling complex patterns, CPU usage can spike. To optimize, keep the following points in mind:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Use the simplest patterns possible and avoid unnecessary grouping.<\/li>\n\n\n\n<li>Utilize non-greedy matching appropriately to reduce search time.<\/li>\n\n\n\n<li>When regular expressions are not needed, it is better to use string methods (e.g., <code>str.find()<\/code>, <code>str.replace()<\/code>).<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Debugging Regular Expressions<\/h3>\n\n\n\n<p>When writing regular expressions, unexpected results often occur. To address this, various online debugging tools can be utilized. These tools visually show the matching patterns of regular expressions, allowing for quick identification and correction of issues.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Extended Features of Regular Expressions<\/h3>\n\n\n\n<p>The Python <code>re<\/code> module offers additional functionalities using flags, in addition to basic regular expression functionalities. For example, there are features that ignore case sensitivity or are useful when dealing with multi-line strings:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>re.IGNORECASE<\/code>: Matches while ignoring case sensitivity.<\/li>\n\n\n\n<li><code>re.MULTILINE<\/code>: Used to find start and end across multiple lines.<\/li>\n\n\n\n<li><code>re.DOTALL<\/code>: The dot (<code>.<\/code>) matches all characters including newline characters.<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>import re\n\n# Multi-line string\nmultiline_text = \"\"\"first line\nsecond line\nthird line\"\"\"\n\n# Finding the start of lines in a multi-line example\nmultiline_pattern = r\"^second\"  # Finding the line that starts with 'second'\n\n# Result of the match\nmatches = re.findall(multiline_pattern, multiline_text, re.MULTILINE)\nprint(matches)  # &#91;'second']<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>In this lecture, we explored various ways to use regular expressions in Python. Regular expressions are a very powerful tool for string manipulation and can be applied in various fields. I hope the practical examples allow you to appreciate the usefulness of regular expressions. For those encountering regular expressions for the first time, they may seem complex and difficult, but by developing the ability to understand and apply patterns, they can become a highly efficient tool.<\/p>\n\n\n\n<p>As you become more familiar with regular expressions through practice and repetition, you&#8217;ll acquire a powerful skill that allows you to easily solve complex string processing problems. I hope this lecture has greatly helped in laying the foundation of Python regular expressions.<\/p>\n\n\n\n<p>By engaging with more practice and examples, familiarize yourself with regular expressions and enhance your data processing and analysis skills!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Regular expressions are effectively used in natural language processing (NLP) and data analysis. For example, they can be utilized to search for specific keywords in customer feedback data or to extract numerical and currency information from financial data. Cautions When Using Regular Expressions Regular expressions are very powerful tools, but improper use can lead to &hellip; <a href=\"https:\/\/atmokpo.com\/w\/31755\/\" class=\"more-link\">\ub354 \ubcf4\uae30<span class=\"screen-reader-text\"> &#8220;3. Information Extraction from Text Data&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[98],"tags":[95],"class_list":["post-31755","post","type-post","status-publish","format-standard","hentry","category--en","tag--en"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>3. Information Extraction from Text Data - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/atmokpo.com\/w\/31755\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"3. Information Extraction from Text Data - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"og:description\" content=\"Regular expressions are effectively used in natural language processing (NLP) and data analysis. For example, they can be utilized to search for specific keywords in customer feedback data or to extract numerical and currency information from financial data. Cautions When Using Regular Expressions Regular expressions are very powerful tools, but improper use can lead to &hellip; \ub354 \ubcf4\uae30 &quot;3. Information Extraction from Text Data&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/atmokpo.com\/w\/31755\/\" \/>\n<meta property=\"og:site_name\" content=\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-01T09:02:32+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-11-01T11:48:24+00:00\" \/>\n<meta name=\"author\" content=\"root\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:site\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"root\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"2\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/atmokpo.com\/w\/31755\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/31755\/\"},\"author\":{\"name\":\"root\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\"},\"headline\":\"3. Information Extraction from Text Data\",\"datePublished\":\"2024-11-01T09:02:32+00:00\",\"dateModified\":\"2024-11-01T11:48:24+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/atmokpo.com\/w\/31755\/\"},\"wordCount\":350,\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"keywords\":[\"\ud30c\uc774\uc36c\uac15\uc88c\"],\"articleSection\":[\"Python Study\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/atmokpo.com\/w\/31755\/\",\"url\":\"https:\/\/atmokpo.com\/w\/31755\/\",\"name\":\"3. Information Extraction from Text Data - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#website\"},\"datePublished\":\"2024-11-01T09:02:32+00:00\",\"dateModified\":\"2024-11-01T11:48:24+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/atmokpo.com\/w\/31755\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/atmokpo.com\/w\/31755\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/atmokpo.com\/w\/31755\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\ud648\",\"item\":\"https:\/\/atmokpo.com\/w\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"3. Information Extraction from Text Data\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/atmokpo.com\/w\/#website\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/atmokpo.com\/w\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"contentUrl\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"width\":400,\"height\":400,\"caption\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\"},\"image\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/bebubo4\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\",\"name\":\"root\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"caption\":\"root\"},\"sameAs\":[\"http:\/\/atmokpo.com\/w\"],\"url\":\"https:\/\/atmokpo.com\/w\/author\/root\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"3. Information Extraction from Text Data - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/atmokpo.com\/w\/31755\/","og_locale":"ko_KR","og_type":"article","og_title":"3. Information Extraction from Text Data - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","og_description":"Regular expressions are effectively used in natural language processing (NLP) and data analysis. For example, they can be utilized to search for specific keywords in customer feedback data or to extract numerical and currency information from financial data. Cautions When Using Regular Expressions Regular expressions are very powerful tools, but improper use can lead to &hellip; \ub354 \ubcf4\uae30 \"3. Information Extraction from Text Data\"","og_url":"https:\/\/atmokpo.com\/w\/31755\/","og_site_name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","article_published_time":"2024-11-01T09:02:32+00:00","article_modified_time":"2024-11-01T11:48:24+00:00","author":"root","twitter_card":"summary_large_image","twitter_creator":"@bebubo4","twitter_site":"@bebubo4","twitter_misc":{"\uae00\uc4f4\uc774":"root","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"2\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/atmokpo.com\/w\/31755\/#article","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/31755\/"},"author":{"name":"root","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7"},"headline":"3. Information Extraction from Text Data","datePublished":"2024-11-01T09:02:32+00:00","dateModified":"2024-11-01T11:48:24+00:00","mainEntityOfPage":{"@id":"https:\/\/atmokpo.com\/w\/31755\/"},"wordCount":350,"publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"keywords":["\ud30c\uc774\uc36c\uac15\uc88c"],"articleSection":["Python Study"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/atmokpo.com\/w\/31755\/","url":"https:\/\/atmokpo.com\/w\/31755\/","name":"3. Information Extraction from Text Data - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/#website"},"datePublished":"2024-11-01T09:02:32+00:00","dateModified":"2024-11-01T11:48:24+00:00","breadcrumb":{"@id":"https:\/\/atmokpo.com\/w\/31755\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/atmokpo.com\/w\/31755\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/atmokpo.com\/w\/31755\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\ud648","item":"https:\/\/atmokpo.com\/w\/en\/"},{"@type":"ListItem","position":2,"name":"3. Information Extraction from Text Data"}]},{"@type":"WebSite","@id":"https:\/\/atmokpo.com\/w\/#website","url":"https:\/\/atmokpo.com\/w\/","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","description":"","publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/atmokpo.com\/w\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/atmokpo.com\/w\/#organization","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","url":"https:\/\/atmokpo.com\/w\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/","url":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","contentUrl":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","width":400,"height":400,"caption":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8"},"image":{"@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/bebubo4"]},{"@type":"Person","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7","name":"root","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","caption":"root"},"sameAs":["http:\/\/atmokpo.com\/w"],"url":"https:\/\/atmokpo.com\/w\/author\/root\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/31755","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/comments?post=31755"}],"version-history":[{"count":1,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/31755\/revisions"}],"predecessor-version":[{"id":31756,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/31755\/revisions\/31756"}],"wp:attachment":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/media?parent=31755"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/categories?post=31755"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/tags?post=31755"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}