huggingface transformers tutorial, Pfizer COVID-19 Wikipedia text retrieval

Fetching Pfizer COVID-19 Wikipedia Text

In this course, we will learn how to fetch COVID-19 related information about Pfizer from Wikipedia using the Hugging Face Transformers library. This course is aimed at those who have a basic knowledge of natural language processing (NLP) and will guide you on how to comfortably use Hugging Face’s library with Python as a friend.

1. Environment Setup

First, we need to install the necessary libraries. Enter the code below to install transformers and wikipedia-api.

!pip install transformers wikipedia-api

2. Importing Libraries

Let’s import the necessary libraries. transformers helps in easily using natural language processing models. wikipedia-api allows easy access to the Wikipedia API.

import wikipediaapi
from transformers import pipeline

3. Fetching Information from Wikipedia

Now, let’s fetch COVID-19 and Pfizer-related information from Wikipedia. We will use wikipediaapi to get the information.

wiki_wiki = wikipediaapi.Wikipedia('en')
page = wiki_wiki.page("COVID-19_vaccine_Pfizer") 

if page.exists():
    print(page.text[0:1000])  # Print the first 1000 characters
else:
    print("The page does not exist.") 

Code Explanation

The above code retrieves the “COVID-19 Vaccine Pfizer” page from Wikipedia. If the page exists, it prints the first 1000 characters. This helps us verify the content of the information we want to fetch.

4. Summarizing the Text

Since the fetched data contains many long sentences, let’s summarize it using a natural language processing model. We will use the summarization model provided by the Hugging Face transformers library.

summarizer = pipeline("summarization")

summary = summarizer(page.text, max_length=130, min_length=30, do_sample=False)

print("Summary:")
for s in summary:
    print(s['summary_text'])

Code Explanation

This code performs text summarization through the Hugging Face “summarization” pipeline. You can adjust the length of the summary by setting max_length and min_length.

5. Conclusion

In this course, we learned how to fetch and summarize Pfizer’s COVID-19 information using Hugging Face Transformers and the Wikipedia API. We hope you have glimpsed the possibilities of natural language processing. These techniques can be applied in various fields and are useful tools for your projects.

6. Next Steps

Furthermore, try various natural language processing tasks such as sentiment analysis, question-answering systems, and document classification! We recommend exploring Hugging Face’s model hub to find and utilize models that suit you.

Thank you!