{"id":36211,"date":"2024-11-01T09:46:41","date_gmt":"2024-11-01T09:46:41","guid":{"rendered":"http:\/\/atmokpo.com\/w\/?p=36211"},"modified":"2024-11-01T09:46:41","modified_gmt":"2024-11-01T09:46:41","slug":"using-hugging-face-transformers-wikipedia-english-keyword-search","status":"publish","type":"post","link":"https:\/\/atmokpo.com\/w\/36211\/","title":{"rendered":"Using Hugging Face Transformers, Wikipedia English Keyword Search"},"content":{"rendered":"<p><body><\/p>\n<article>\n<p>The Hugging Face Transformers library has established itself as a powerful tool in the fields of deep learning and natural language processing (NLP). In this course, we will explain how to use the Hugging Face Transformers library along with the Wikipedia API to search for relevant documents on Wikipedia based on a given keyword.<\/p>\n<h2>1. What is Hugging Face Transformers?<\/h2>\n<p>Hugging Face is a platform providing library for training, inference, and deployment of natural language processing models. The Transformers library makes it easy to use pre-trained models and is compatible with PyTorch and TensorFlow. This library can be used for various NLP tasks. For example, it excels in tasks such as text classification, question answering, and text generation.<\/p>\n<h2>2. Introduction to the Wikipedia API<\/h2>\n<p>Wikipedia is an open online encyclopedia that provides information on a wide range of topics. It supports users in programmatically searching for information through its API. By utilizing the API, you can search for Wikipedia pages based on specific keywords and easily retrieve the necessary information.<\/p>\n<h2>3. Installing Required Libraries<\/h2>\n<p>To install the libraries needed for the task, use the command below. You need to install the <code>transformers<\/code> and <code>wikipedia-api<\/code> packages to use the Hugging Face library and the Wikipedia API.<\/p>\n<pre><code>pip install transformers wikipedia-api<\/code><\/pre>\n<h2>4. Choosing a Hugging Face Model<\/h2>\n<p>We will use a pre-trained model to evaluate the relevance of documents. For example, we can use the <code>distilbert-base-uncased<\/code> model. This model is a variant of BERT and is used to obtain embeddings of documents and measure the similarity between two documents.<\/p>\n<h2>5. Code Explanation<\/h2>\n<p>Now, we will write Python code based on the information outlined above. We will include a step-by-step explanation of the code.<\/p>\n<h3>5.1 Importing Required Libraries<\/h3>\n<pre><code>\nimport wikipediaapi\nfrom transformers import AutoTokenizer, AutoModel\nimport torch\n        <\/code><\/pre>\n<h3>5.2 Preparing the Model and Tokenizer<\/h3>\n<p>Now we will initialize the model and tokenizer using Transformers.<\/p>\n<pre><code>\n# Initialize Hugging Face model and tokenizer\nmodel_name = 'distilbert-base-uncased'\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModel.from_pretrained(model_name)\n        <\/code><\/pre>\n<h3>5.3 Implementing the Wikipedia Search Function<\/h3>\n<p>Define a function that searches for keywords on Wikipedia and returns relevant documents.<\/p>\n<pre><code>\ndef search_wikipedia(keyword):\n    wiki_wiki = wikipediaapi.Wikipedia('en')\n    page = wiki_wiki.page(keyword)\n    if page.exists():\n        return page.text\n    else:\n        return None\n        <\/code><\/pre>\n<h3>5.4 Creating Document Embeddings<\/h3>\n<p>Create a function that generates embeddings for the retrieved document.<\/p>\n<pre><code>\ndef create_embedding(text):\n    inputs = tokenizer(text, return_tensors='pt', truncation=True, padding=True, max_length=512)\n    with torch.no_grad():\n        outputs = model(**inputs)\n    return outputs['last_hidden_state'].mean(dim=1)\n        <\/code><\/pre>\n<h3>5.5 Finding Relevant Documents for the Keyword<\/h3>\n<p>Use the generated embeddings to find related information and similar pages for the given keyword.<\/p>\n<pre><code>\nkeyword = \"Deep Learning\"\nwiki_text = search_wikipedia(keyword)\n\nif wiki_text:\n    embedding = create_embedding(wiki_text)\n    print(\"Title:\", keyword)\n    print(\"Content Embedding:\", embedding)\nelse:\n    print(\"Could not find a Wikipedia page for the given keyword.\")\n        <\/code><\/pre>\n<h2>6. Running the Code and Results<\/h2>\n<p>Running the code above will provide the content of the Wikipedia document for the given keyword and its embedding. These embeddings can later be used to calculate the similarity with other documents.<\/p>\n<h2>7. Calculating Similarity<\/h2>\n<p>Additionally, you can calculate the similarity with other documents, allowing exploration of related topics to the input keyword. Let&#8217;s try to find similar documents by calculating the cosine similarity between embeddings.<\/p>\n<pre><code>\nfrom sklearn.metrics.pairwise import cosine_similarity\n\n# Generate two embeddings and calculate the similarity\nother_keyword = \"Machine Learning\"\nother_wiki_text = search_wikipedia(other_keyword)\n\nif other_wiki_text:\n    other_embedding = create_embedding(other_wiki_text)\n    similarity_score = cosine_similarity(embedding.numpy(), other_embedding.numpy())\n    print(f\"Similarity between {keyword} and {other_keyword}:\", similarity_score[0][0])\nelse:\n    print(\"Could not find a Wikipedia page for the given keyword.\")\n        <\/code><\/pre>\n<h2>8. Conclusion<\/h2>\n<p>In this course, we learned how to use the Hugging Face Transformers library and Wikipedia API to search for relevant information based on a specific keyword and generate embeddings of that content to evaluate its similarity with other documents. This can be applied in various fields such as search engine construction, recommendation systems, and information extraction.<\/p>\n<h2>9. Next Steps<\/h2>\n<p>Now, based on this basic structure, try to implement additional features. For instance, consider searching multiple documents and clustering, or creating a user interface that allows users to easily search for keywords. Utilize the diverse models of Hugging Face and the Wikipedia API to implement more functionalities.<\/p>\n<h2>10. References<\/h2>\n<p>\n            &#8211; <a href=\"https:\/\/huggingface.co\/transformers\/\">Hugging Face Transformers Documentation<\/a><br \/>\n            &#8211; <a href=\"https:\/\/wikipediaapi.readthedocs.io\/en\/latest\/\">Wikipedia API Documentation<\/a>\n<\/p>\n<\/article>\n<p><\/body><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Hugging Face Transformers library has established itself as a powerful tool in the fields of deep learning and natural language processing (NLP). In this course, we will explain how to use the Hugging Face Transformers library along with the Wikipedia API to search for relevant documents on Wikipedia based on a given keyword. 1. &hellip; <a href=\"https:\/\/atmokpo.com\/w\/36211\/\" class=\"more-link\">\ub354 \ubcf4\uae30<span class=\"screen-reader-text\"> &#8220;Using Hugging Face Transformers, Wikipedia English Keyword Search&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[108],"tags":[],"class_list":["post-36211","post","type-post","status-publish","format-standard","hentry","category---en"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Using Hugging Face Transformers, Wikipedia English Keyword Search - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/atmokpo.com\/w\/36211\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Using Hugging Face Transformers, Wikipedia English Keyword Search - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"og:description\" content=\"The Hugging Face Transformers library has established itself as a powerful tool in the fields of deep learning and natural language processing (NLP). In this course, we will explain how to use the Hugging Face Transformers library along with the Wikipedia API to search for relevant documents on Wikipedia based on a given keyword. 1. &hellip; \ub354 \ubcf4\uae30 &quot;Using Hugging Face Transformers, Wikipedia English Keyword Search&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/atmokpo.com\/w\/36211\/\" \/>\n<meta property=\"og:site_name\" content=\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-01T09:46:41+00:00\" \/>\n<meta name=\"author\" content=\"root\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:site\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"root\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"4\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/atmokpo.com\/w\/36211\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36211\/\"},\"author\":{\"name\":\"root\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\"},\"headline\":\"Using Hugging Face Transformers, Wikipedia English Keyword Search\",\"datePublished\":\"2024-11-01T09:46:41+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36211\/\"},\"wordCount\":534,\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"articleSection\":[\"Using Hugging Face\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/atmokpo.com\/w\/36211\/\",\"url\":\"https:\/\/atmokpo.com\/w\/36211\/\",\"name\":\"Using Hugging Face Transformers, Wikipedia English Keyword Search - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#website\"},\"datePublished\":\"2024-11-01T09:46:41+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36211\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/atmokpo.com\/w\/36211\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/atmokpo.com\/w\/36211\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\ud648\",\"item\":\"https:\/\/atmokpo.com\/w\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Using Hugging Face Transformers, Wikipedia English Keyword Search\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/atmokpo.com\/w\/#website\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/atmokpo.com\/w\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"contentUrl\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"width\":400,\"height\":400,\"caption\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\"},\"image\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/bebubo4\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\",\"name\":\"root\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"caption\":\"root\"},\"sameAs\":[\"http:\/\/atmokpo.com\/w\"],\"url\":\"https:\/\/atmokpo.com\/w\/author\/root\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Using Hugging Face Transformers, Wikipedia English Keyword Search - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/atmokpo.com\/w\/36211\/","og_locale":"ko_KR","og_type":"article","og_title":"Using Hugging Face Transformers, Wikipedia English Keyword Search - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","og_description":"The Hugging Face Transformers library has established itself as a powerful tool in the fields of deep learning and natural language processing (NLP). In this course, we will explain how to use the Hugging Face Transformers library along with the Wikipedia API to search for relevant documents on Wikipedia based on a given keyword. 1. &hellip; \ub354 \ubcf4\uae30 \"Using Hugging Face Transformers, Wikipedia English Keyword Search\"","og_url":"https:\/\/atmokpo.com\/w\/36211\/","og_site_name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","article_published_time":"2024-11-01T09:46:41+00:00","author":"root","twitter_card":"summary_large_image","twitter_creator":"@bebubo4","twitter_site":"@bebubo4","twitter_misc":{"\uae00\uc4f4\uc774":"root","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"4\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/atmokpo.com\/w\/36211\/#article","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/36211\/"},"author":{"name":"root","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7"},"headline":"Using Hugging Face Transformers, Wikipedia English Keyword Search","datePublished":"2024-11-01T09:46:41+00:00","mainEntityOfPage":{"@id":"https:\/\/atmokpo.com\/w\/36211\/"},"wordCount":534,"publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"articleSection":["Using Hugging Face"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/atmokpo.com\/w\/36211\/","url":"https:\/\/atmokpo.com\/w\/36211\/","name":"Using Hugging Face Transformers, Wikipedia English Keyword Search - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/#website"},"datePublished":"2024-11-01T09:46:41+00:00","breadcrumb":{"@id":"https:\/\/atmokpo.com\/w\/36211\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/atmokpo.com\/w\/36211\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/atmokpo.com\/w\/36211\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\ud648","item":"https:\/\/atmokpo.com\/w\/en\/"},{"@type":"ListItem","position":2,"name":"Using Hugging Face Transformers, Wikipedia English Keyword Search"}]},{"@type":"WebSite","@id":"https:\/\/atmokpo.com\/w\/#website","url":"https:\/\/atmokpo.com\/w\/","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","description":"","publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/atmokpo.com\/w\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/atmokpo.com\/w\/#organization","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","url":"https:\/\/atmokpo.com\/w\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/","url":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","contentUrl":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","width":400,"height":400,"caption":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8"},"image":{"@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/bebubo4"]},{"@type":"Person","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7","name":"root","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","caption":"root"},"sameAs":["http:\/\/atmokpo.com\/w"],"url":"https:\/\/atmokpo.com\/w\/author\/root\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36211","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/comments?post=36211"}],"version-history":[{"count":1,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36211\/revisions"}],"predecessor-version":[{"id":36212,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36211\/revisions\/36212"}],"wp:attachment":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/media?parent=36211"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/categories?post=36211"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/tags?post=36211"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}