{"id":36175,"date":"2024-11-01T09:46:23","date_gmt":"2024-11-01T09:46:23","guid":{"rendered":"http:\/\/atmokpo.com\/w\/?p=36175"},"modified":"2024-11-01T09:46:23","modified_gmt":"2024-11-01T09:46:23","slug":"hugging-face-transformers-tutorial-loading-the-wav2vec2-pre-trained-model","status":"publish","type":"post","link":"https:\/\/atmokpo.com\/w\/36175\/","title":{"rendered":"Hugging Face Transformers Tutorial: Loading the Wav2Vec2 Pre-trained Model"},"content":{"rendered":"<p><body><\/p>\n<p>In the field of deep learning, natural language processing (NLP) and automatic speech recognition (ASR) have made significant advancements in recent years. Among them, one of the most innovative approaches for speech recognition is the <strong>Wav2Vec2<\/strong> model. This model can be easily used through the Hugging Face Transformers library and effectively processes speech data by utilizing pre-trained models. In this article, I will explain the working principle of the Wav2Vec2 model, how to load the pre-trained model, and the process of converting speech to text through a simple example.<\/p>\n<h2>What is Wav2Vec2?<\/h2>\n<p>Wav2Vec2 is a speech recognition model developed by Facebook AI Research (Fair) that fundamentally learns speech representations by processing large amounts of speech data using **unsupervised learning** methods. This model directly extracts features from raw speech data and transforms them into representations suitable for a given task. Typically, the Wav2Vec2 model includes the following processes:<\/p>\n<ol>\n<li>Converting speech into Wav2Vec2&#8217;s input format.<\/li>\n<li>The model transforms the speech into feature (space) tensors.<\/li>\n<li>Using these feature tensors to recognize speech or generate text.<\/li>\n<\/ol>\n<h2>What is the Hugging Face Transformers Library?<\/h2>\n<p>The Hugging Face Transformers library is a library that allows easy access to the latest natural language processing models. It provides various pre-trained models, allowing users to easily load and use them. Speech recognition models like Wav2Vec2 can also be easily accessed through this library.<\/p>\n<h2>Installing the Wav2Vec2 Model<\/h2>\n<p>First, you need to install the necessary libraries. Use the command below to install the <code>transformers<\/code> and <code>torch<\/code> libraries:<\/p>\n<pre><code>pip install transformers torch<\/code><\/pre>\n<h2>Loading the Pre-Trained Wav2Vec2 Model<\/h2>\n<p>Now, let&#8217;s write the code to load the pre-trained Wav2Vec2 model. The following example demonstrates the process of converting an audio file to text using the Wav2Vec2 model.<\/p>\n<h3>1. Importing the Libraries<\/h3>\n<pre><code>from transformers import Wav2Vec2ForCTC, Wav2Vec2Tokenizer\nimport torch<\/code><\/pre>\n<h3>2. Initializing the Tokenizer and Model<\/h3>\n<p>To use the Wav2Vec2 model, we first initialize the tokenizer and model. The tokenizer processes the speech input data, while the model converts the speech into text.<\/p>\n<pre><code># Initialize the model and tokenizer\ntokenizer = Wav2Vec2Tokenizer.from_pretrained(\"facebook\/wav2vec2-base-960h\")\nmodel = Wav2Vec2ForCTC.from_pretrained(\"facebook\/wav2vec2-base-960h\")<\/code><\/pre>\n<h3>3. Loading the Audio File<\/h3>\n<p>When loading an audio file, we use the <code>torchaudio<\/code> library to load the WAV file. In this example, we load the audio file using <code>torchaudio<\/code> and adjust the sampling rate as needed.<\/p>\n<pre><code>import torchaudio\n\n# Audio file path\nfile_path = \"path\/to\/your\/audio.wav\"\n# Load the audio file\naudio_input, _ = torchaudio.load(file_path)\n# Adjust the sampling rate\naudio_input = audio_input.squeeze().numpy()<\/code><\/pre>\n<h3>4. Converting Speech to Text<\/h3>\n<p>After transforming the speech data into a suitable format for the model, we can convert the speech into text using the model. We process the model&#8217;s output to convert it to text. Write the following code to perform this process:<\/p>\n<pre><code># Preprocessing for model input\ninput_values = tokenizer(audio_input, return_tensors=\"pt\").input_values\n\n# Convert speech to text using the model\nwith torch.no_grad():\n    logits = model(input_values).logits\n\n# Convert indices to text\npredicted_ids = torch.argmax(logits, dim=-1)\ntranscription = tokenizer.batch_decode(predicted_ids)[0]<\/code><\/pre>\n<h3>5. Outputting the Results<\/h3>\n<p>Finally, we print the converted text. Use the following code to check the results:<\/p>\n<pre><code>print(\"Transcription:\", transcription)<\/code><\/pre>\n<h2>Summary of the Entire Code<\/h2>\n<p>Based on what has been described so far, I will summarize the entire code. Below is the complete code for converting an audio file to text:<\/p>\n<pre><code>from transformers import Wav2Vec2ForCTC, Wav2Vec2Tokenizer\nimport torchaudio\nimport torch\n\n# Initialize the model and tokenizer\ntokenizer = Wav2Vec2Tokenizer.from_pretrained(\"facebook\/wav2vec2-base-960h\")\nmodel = Wav2Vec2ForCTC.from_pretrained(\"facebook\/wav2vec2-base-960h\")\n\n# Audio file path\nfile_path = \"path\/to\/your\/audio.wav\"\n# Load the audio file\naudio_input, _ = torchaudio.load(file_path)\naudio_input = audio_input.squeeze().numpy()\n\n# Preprocessing for model input\ninput_values = tokenizer(audio_input, return_tensors=\"pt\").input_values\n\n# Convert speech to text using the model\nwith torch.no_grad():\n    logits = model(input_values).logits\n\n# Convert indices to text\npredicted_ids = torch.argmax(logits, dim=-1)\ntranscription = tokenizer.batch_decode(predicted_ids)[0]\n\n# Output the results\nprint(\"Transcription:\", transcription)<\/code><\/pre>\n<h2>Conclusion<\/h2>\n<p>By utilizing the Wav2Vec2 model, various tasks for speech recognition can be performed. Using pre-trained models allows you to have a powerful tool that easily converts speech to text without worrying about complex details. I hope you have learned the basics of installing the Wav2Vec2 model and converting audio files through this tutorial. I will return with more deep learning tutorials and information in the future. Thank you!<\/p>\n<p><\/body><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the field of deep learning, natural language processing (NLP) and automatic speech recognition (ASR) have made significant advancements in recent years. Among them, one of the most innovative approaches for speech recognition is the Wav2Vec2 model. This model can be easily used through the Hugging Face Transformers library and effectively processes speech data by &hellip; <a href=\"https:\/\/atmokpo.com\/w\/36175\/\" class=\"more-link\">\ub354 \ubcf4\uae30<span class=\"screen-reader-text\"> &#8220;Hugging Face Transformers Tutorial: Loading the Wav2Vec2 Pre-trained Model&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[108],"tags":[],"class_list":["post-36175","post","type-post","status-publish","format-standard","hentry","category---en"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Hugging Face Transformers Tutorial: Loading the Wav2Vec2 Pre-trained Model - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/atmokpo.com\/w\/36175\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Hugging Face Transformers Tutorial: Loading the Wav2Vec2 Pre-trained Model - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"og:description\" content=\"In the field of deep learning, natural language processing (NLP) and automatic speech recognition (ASR) have made significant advancements in recent years. Among them, one of the most innovative approaches for speech recognition is the Wav2Vec2 model. This model can be easily used through the Hugging Face Transformers library and effectively processes speech data by &hellip; \ub354 \ubcf4\uae30 &quot;Hugging Face Transformers Tutorial: Loading the Wav2Vec2 Pre-trained Model&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/atmokpo.com\/w\/36175\/\" \/>\n<meta property=\"og:site_name\" content=\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-01T09:46:23+00:00\" \/>\n<meta name=\"author\" content=\"root\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:site\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"root\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"4\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/atmokpo.com\/w\/36175\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36175\/\"},\"author\":{\"name\":\"root\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\"},\"headline\":\"Hugging Face Transformers Tutorial: Loading the Wav2Vec2 Pre-trained Model\",\"datePublished\":\"2024-11-01T09:46:23+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36175\/\"},\"wordCount\":535,\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"articleSection\":[\"Using Hugging Face\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/atmokpo.com\/w\/36175\/\",\"url\":\"https:\/\/atmokpo.com\/w\/36175\/\",\"name\":\"Hugging Face Transformers Tutorial: Loading the Wav2Vec2 Pre-trained Model - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#website\"},\"datePublished\":\"2024-11-01T09:46:23+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36175\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/atmokpo.com\/w\/36175\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/atmokpo.com\/w\/36175\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\ud648\",\"item\":\"https:\/\/atmokpo.com\/w\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Hugging Face Transformers Tutorial: Loading the Wav2Vec2 Pre-trained Model\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/atmokpo.com\/w\/#website\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/atmokpo.com\/w\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"contentUrl\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"width\":400,\"height\":400,\"caption\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\"},\"image\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/bebubo4\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\",\"name\":\"root\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"caption\":\"root\"},\"sameAs\":[\"http:\/\/atmokpo.com\/w\"],\"url\":\"https:\/\/atmokpo.com\/w\/author\/root\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Hugging Face Transformers Tutorial: Loading the Wav2Vec2 Pre-trained Model - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/atmokpo.com\/w\/36175\/","og_locale":"ko_KR","og_type":"article","og_title":"Hugging Face Transformers Tutorial: Loading the Wav2Vec2 Pre-trained Model - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","og_description":"In the field of deep learning, natural language processing (NLP) and automatic speech recognition (ASR) have made significant advancements in recent years. Among them, one of the most innovative approaches for speech recognition is the Wav2Vec2 model. This model can be easily used through the Hugging Face Transformers library and effectively processes speech data by &hellip; \ub354 \ubcf4\uae30 \"Hugging Face Transformers Tutorial: Loading the Wav2Vec2 Pre-trained Model\"","og_url":"https:\/\/atmokpo.com\/w\/36175\/","og_site_name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","article_published_time":"2024-11-01T09:46:23+00:00","author":"root","twitter_card":"summary_large_image","twitter_creator":"@bebubo4","twitter_site":"@bebubo4","twitter_misc":{"\uae00\uc4f4\uc774":"root","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"4\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/atmokpo.com\/w\/36175\/#article","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/36175\/"},"author":{"name":"root","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7"},"headline":"Hugging Face Transformers Tutorial: Loading the Wav2Vec2 Pre-trained Model","datePublished":"2024-11-01T09:46:23+00:00","mainEntityOfPage":{"@id":"https:\/\/atmokpo.com\/w\/36175\/"},"wordCount":535,"publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"articleSection":["Using Hugging Face"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/atmokpo.com\/w\/36175\/","url":"https:\/\/atmokpo.com\/w\/36175\/","name":"Hugging Face Transformers Tutorial: Loading the Wav2Vec2 Pre-trained Model - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/#website"},"datePublished":"2024-11-01T09:46:23+00:00","breadcrumb":{"@id":"https:\/\/atmokpo.com\/w\/36175\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/atmokpo.com\/w\/36175\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/atmokpo.com\/w\/36175\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\ud648","item":"https:\/\/atmokpo.com\/w\/en\/"},{"@type":"ListItem","position":2,"name":"Hugging Face Transformers Tutorial: Loading the Wav2Vec2 Pre-trained Model"}]},{"@type":"WebSite","@id":"https:\/\/atmokpo.com\/w\/#website","url":"https:\/\/atmokpo.com\/w\/","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","description":"","publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/atmokpo.com\/w\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/atmokpo.com\/w\/#organization","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","url":"https:\/\/atmokpo.com\/w\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/","url":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","contentUrl":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","width":400,"height":400,"caption":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8"},"image":{"@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/bebubo4"]},{"@type":"Person","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7","name":"root","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","caption":"root"},"sameAs":["http:\/\/atmokpo.com\/w"],"url":"https:\/\/atmokpo.com\/w\/author\/root\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36175","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/comments?post=36175"}],"version-history":[{"count":1,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36175\/revisions"}],"predecessor-version":[{"id":36176,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36175\/revisions\/36176"}],"wp:attachment":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/media?parent=36175"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/categories?post=36175"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/tags?post=36175"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}