{"id":36103,"date":"2024-11-01T09:45:47","date_gmt":"2024-11-01T09:45:47","guid":{"rendered":"http:\/\/atmokpo.com\/w\/?p=36103"},"modified":"2024-11-01T09:45:47","modified_gmt":"2024-11-01T09:45:47","slug":"in-depth-course-on-hugging-face-transformers-clip-preprocessing","status":"publish","type":"post","link":"https:\/\/atmokpo.com\/w\/36103\/","title":{"rendered":"In-depth Course on Hugging Face Transformers, CLIP Preprocessing"},"content":{"rendered":"<p><body><\/p>\n<p>One of the latest trends in deep learning is the emergence of various multi-modal models. In particular, OpenAI&#8217;s CLIP (Contrastive Language\u2013Image Pretraining) model is a very powerful methodology that learns the relationship between images and text, allowing it to perform various tasks. In this article, we will explore how to utilize the CLIP model through the Hugging Face library and its preprocessing steps.<\/p>\n<h2>1. Introduction to the CLIP Model<\/h2>\n<p>The CLIP model learns image and text pairs simultaneously, enabling it to understand what the image contains and measure the similarity between the given text description and the image. This approach can be flexibly applied to various tasks through unsupervised learning without needing to select a specific dataset.<\/p>\n<h2>2. CLIP Preprocessing Steps<\/h2>\n<p>To use the CLIP model, appropriate preprocessing of the input images and text is required. The preprocessing steps consist of the following:<\/p>\n<ol>\n<li>Loading and resizing the image<\/li>\n<li>Normalizing the image<\/li>\n<li>Tokenizing the text<\/li>\n<\/ol>\n<h3>2.1 Loading and Resizing the Image<\/h3>\n<p>The image input to the model must be resized to a consistent size. Typically, the CLIP model requires images of size 224&#215;224. The Python PIL library can be used for this.<\/p>\n<h3>2.2 Image Normalization<\/h3>\n<p>To improve the model&#8217;s performance, the pixel values of the image need to be normalized. The CLIP model typically uses <code>mean=[0.48145466, 0.4578275, 0.40821073]<\/code> and <code>std=[0.26862954, 0.26130258, 0.27577711]<\/code> for normalization.<\/p>\n<h3>2.3 Text Tokenization<\/h3>\n<p>The text must be encoded using a predefined tokenizer. CLIP uses the BPE (Byte Pair Encoding) model to convert the text into integer indices.<\/p>\n<h2>3. Code Example<\/h2>\n<p>Now let&#8217;s implement the above preprocessing steps in Python code. In this example, we will use the Hugging Face <code>transformers<\/code> library and the <code>PIL<\/code> library. First, we will install the necessary libraries.<\/p>\n<pre><code>pip install transformers torch torchvision pillow<\/code><\/pre>\n<h3>3.1 Image Preprocessing Code<\/h3>\n<pre><code>\nfrom PIL import Image\nimport requests\nfrom transformers import CLIPProcessor\n\n# Load CLIPProcessor\nprocessor = CLIPProcessor.from_pretrained(\"openai\/clip-vit-base-patch16\")\n\n# Image URL\nimage_url = \"https:\/\/example.com\/image.jpg\"  # Change to a valid image URL.\nimage = Image.open(requests.get(image_url, stream=True).raw)\n\n# Preprocess the image\ninputs = processor(images=image, return_tensors=\"pt\", padding=True)\nprint(inputs)\n    <\/code><\/pre>\n<h3>3.2 Text Preprocessing Code<\/h3>\n<pre><code>\n# Text input\ntext = \"A label describing the image\"\ntext_inputs = processor(text=[text], return_tensors=\"pt\", padding=True)\nprint(text_inputs)\n    <\/code><\/pre>\n<h2>4. Model Prediction<\/h2>\n<p>Once the image and text are preprocessed, you can input them into the model for prediction. You can proceed with the prediction using Hugging Face&#8217;s <code>CLIPModel<\/code> as follows.<\/p>\n<pre><code>\nfrom transformers import CLIPModel\n\n# Load CLIP model\nmodel = CLIPModel.from_pretrained(\"openai\/clip-vit-base-patch16\")\n\n# Extract features of the image and text\nwith torch.no_grad():\n    outputs = model(**inputs, **text_inputs)\n\n# Calculate the similarity between the image and text\nlogits_per_image = outputs.logits_per_image  # (batch_size, text_length)\nlogits_per_text = outputs.logits_per_text      # (batch_size, image_length)\nprint(logits_per_image)\nprint(logits_per_text)\n    <\/code><\/pre>\n<h2>5. Conclusion<\/h2>\n<p>In this post, we explored the preprocessing steps for the CLIP model using Hugging Face&#8217;s transformers. Preprocessing images and text is a crucial step to maximize the model&#8217;s performance. Now, you can better understand the relationships between images and text using the CLIP model.<\/p>\n<h2>6. References<\/h2>\n<ul>\n<li><a href=\"https:\/\/huggingface.co\/docs\/transformers\/model_doc\/clip\">Hugging Face CLIP Documentation<\/a><\/li>\n<li><a href=\"https:\/\/openai.com\/research\/clip\">OpenAI CLIP Paper<\/a><\/li>\n<\/ul>\n<p><\/body><\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the latest trends in deep learning is the emergence of various multi-modal models. In particular, OpenAI&#8217;s CLIP (Contrastive Language\u2013Image Pretraining) model is a very powerful methodology that learns the relationship between images and text, allowing it to perform various tasks. In this article, we will explore how to utilize the CLIP model through &hellip; <a href=\"https:\/\/atmokpo.com\/w\/36103\/\" class=\"more-link\">\ub354 \ubcf4\uae30<span class=\"screen-reader-text\"> &#8220;In-depth Course on Hugging Face Transformers, CLIP Preprocessing&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[108],"tags":[],"class_list":["post-36103","post","type-post","status-publish","format-standard","hentry","category---en"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>In-depth Course on Hugging Face Transformers, CLIP Preprocessing - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/atmokpo.com\/w\/36103\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"In-depth Course on Hugging Face Transformers, CLIP Preprocessing - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"og:description\" content=\"One of the latest trends in deep learning is the emergence of various multi-modal models. In particular, OpenAI&#8217;s CLIP (Contrastive Language\u2013Image Pretraining) model is a very powerful methodology that learns the relationship between images and text, allowing it to perform various tasks. In this article, we will explore how to utilize the CLIP model through &hellip; \ub354 \ubcf4\uae30 &quot;In-depth Course on Hugging Face Transformers, CLIP Preprocessing&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/atmokpo.com\/w\/36103\/\" \/>\n<meta property=\"og:site_name\" content=\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-01T09:45:47+00:00\" \/>\n<meta name=\"author\" content=\"root\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:site\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"root\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"3\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/atmokpo.com\/w\/36103\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36103\/\"},\"author\":{\"name\":\"root\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\"},\"headline\":\"In-depth Course on Hugging Face Transformers, CLIP Preprocessing\",\"datePublished\":\"2024-11-01T09:45:47+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36103\/\"},\"wordCount\":374,\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"articleSection\":[\"Using Hugging Face\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/atmokpo.com\/w\/36103\/\",\"url\":\"https:\/\/atmokpo.com\/w\/36103\/\",\"name\":\"In-depth Course on Hugging Face Transformers, CLIP Preprocessing - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#website\"},\"datePublished\":\"2024-11-01T09:45:47+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36103\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/atmokpo.com\/w\/36103\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/atmokpo.com\/w\/36103\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\ud648\",\"item\":\"https:\/\/atmokpo.com\/w\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"In-depth Course on Hugging Face Transformers, CLIP Preprocessing\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/atmokpo.com\/w\/#website\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/atmokpo.com\/w\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"contentUrl\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"width\":400,\"height\":400,\"caption\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\"},\"image\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/bebubo4\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\",\"name\":\"root\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"caption\":\"root\"},\"sameAs\":[\"http:\/\/atmokpo.com\/w\"],\"url\":\"https:\/\/atmokpo.com\/w\/author\/root\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"In-depth Course on Hugging Face Transformers, CLIP Preprocessing - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/atmokpo.com\/w\/36103\/","og_locale":"ko_KR","og_type":"article","og_title":"In-depth Course on Hugging Face Transformers, CLIP Preprocessing - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","og_description":"One of the latest trends in deep learning is the emergence of various multi-modal models. In particular, OpenAI&#8217;s CLIP (Contrastive Language\u2013Image Pretraining) model is a very powerful methodology that learns the relationship between images and text, allowing it to perform various tasks. In this article, we will explore how to utilize the CLIP model through &hellip; \ub354 \ubcf4\uae30 \"In-depth Course on Hugging Face Transformers, CLIP Preprocessing\"","og_url":"https:\/\/atmokpo.com\/w\/36103\/","og_site_name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","article_published_time":"2024-11-01T09:45:47+00:00","author":"root","twitter_card":"summary_large_image","twitter_creator":"@bebubo4","twitter_site":"@bebubo4","twitter_misc":{"\uae00\uc4f4\uc774":"root","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"3\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/atmokpo.com\/w\/36103\/#article","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/36103\/"},"author":{"name":"root","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7"},"headline":"In-depth Course on Hugging Face Transformers, CLIP Preprocessing","datePublished":"2024-11-01T09:45:47+00:00","mainEntityOfPage":{"@id":"https:\/\/atmokpo.com\/w\/36103\/"},"wordCount":374,"publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"articleSection":["Using Hugging Face"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/atmokpo.com\/w\/36103\/","url":"https:\/\/atmokpo.com\/w\/36103\/","name":"In-depth Course on Hugging Face Transformers, CLIP Preprocessing - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/#website"},"datePublished":"2024-11-01T09:45:47+00:00","breadcrumb":{"@id":"https:\/\/atmokpo.com\/w\/36103\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/atmokpo.com\/w\/36103\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/atmokpo.com\/w\/36103\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\ud648","item":"https:\/\/atmokpo.com\/w\/en\/"},{"@type":"ListItem","position":2,"name":"In-depth Course on Hugging Face Transformers, CLIP Preprocessing"}]},{"@type":"WebSite","@id":"https:\/\/atmokpo.com\/w\/#website","url":"https:\/\/atmokpo.com\/w\/","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","description":"","publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/atmokpo.com\/w\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/atmokpo.com\/w\/#organization","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","url":"https:\/\/atmokpo.com\/w\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/","url":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","contentUrl":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","width":400,"height":400,"caption":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8"},"image":{"@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/bebubo4"]},{"@type":"Person","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7","name":"root","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","caption":"root"},"sameAs":["http:\/\/atmokpo.com\/w"],"url":"https:\/\/atmokpo.com\/w\/author\/root\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36103","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/comments?post=36103"}],"version-history":[{"count":1,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36103\/revisions"}],"predecessor-version":[{"id":36104,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36103\/revisions\/36104"}],"wp:attachment":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/media?parent=36103"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/categories?post=36103"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/tags?post=36103"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}