{"id":36105,"date":"2024-11-01T09:45:47","date_gmt":"2024-11-01T09:45:47","guid":{"rendered":"http:\/\/atmokpo.com\/w\/?p=36105"},"modified":"2024-11-01T09:45:47","modified_gmt":"2024-11-01T09:45:47","slug":"lecture-on-using-hugging-face-transformers-clip-inference","status":"publish","type":"post","link":"https:\/\/atmokpo.com\/w\/36105\/","title":{"rendered":"Lecture on Using Hugging Face Transformers, CLIP Inference"},"content":{"rendered":"<p><body><\/p>\n<p>Recently, the CLIP (Contrastive Language-Image Pretraining) model has been gaining attention in the field of artificial intelligence. The CLIP model learns the relationship between natural language and images, making it applicable in various applications. In this course, we will take a closer look at how to infer the similarity between images and texts using the CLIP model.<\/p>\n<h2>What is the CLIP Model?<\/h2>\n<p>The CLIP model is developed by OpenAI and trained on large amounts of text-image pair data collected from the web. This model maps images and text into their respective embedding spaces and calculates the distance (similarity) between these two embeddings to find the text that describes a specific image or, conversely, find the most relevant text based on an image.<\/p>\n<h2>How CLIP Works<\/h2>\n<p>CLIP consists of two main components:<\/p>\n<ol>\n<li><strong>Image Encoder:<\/strong> Takes an input image and transforms it into a vector that represents the image.<\/li>\n<li><strong>Text Encoder:<\/strong> Accepts the input text and generates a vector that represents the text.<\/li>\n<\/ol>\n<p>These two encoders operate in different ways but ultimately map to the same dimensional vector space. Then, the relationship between the image and text is assessed through the cosine similarity between the two vectors.<\/p>\n<h2>Example of CLIP Model Utilization<\/h2>\n<p>Now, let&#8217;s use the CLIP model. Below is an example code on how to download and use the CLIP model using Hugging Face&#8217;s Transformers library.<\/p>\n<pre><code>import torch\nfrom transformers import CLIPProcessor, CLIPModel\nimport PIL\n\n# Load CLIP model and processor\nmodel = CLIPModel.from_pretrained(\"openai\/clip-vit-base-patch16\")\nprocessor = CLIPProcessor.from_pretrained(\"openai\/clip-vit-base-patch16\")\n\n# Define image and text\nimage_path = 'path_to_your_image.jpg'  # Path to the image to be processed\ntext = \"A description of the image\"  # Description of the image\n\n# Open the image\nimage = PIL.Image.open(image_path)\n\n# Data preprocessing\ninputs = processor(text=[text], images=image, return_tensors=\"pt\", padding=True)\n\n# Model inference\nwith torch.no_grad():\n    outputs = model(**inputs)\n\n# Calculate logits and similarity\nlogits_per_image = outputs.logits_per_image  # Logits of the text that describes the image\nprobs = logits_per_image.softmax(dim=1)      # Convert to probabilities using softmax\n\nprint(f\"The probability that the image matches the text '{text}' is: {probs[0][0].item()}\")\n<\/code><\/pre>\n<h3>Code Explanation<\/h3>\n<ul>\n<li><code>torch<\/code>: A PyTorch library used for building deep learning models.<\/li>\n<li><code>CLIPProcessor<\/code>: A preprocessing tool necessary for handling CLIP model inputs.<\/li>\n<li><code>CLIPModel<\/code>: Loads and uses the actual CLIP model.<\/li>\n<li>The image file path and text description should be suitably modified by the user.<\/li>\n<li>Different pairs of images and texts can be tested in succession.<\/li>\n<\/ul>\n<h2>Interpreting Results<\/h2>\n<p>When you run the code above, it will output a probability value representing the similarity between the given image and text. A higher value indicates that the image and description match better.<\/p>\n<h2>Various Applications<\/h2>\n<p>The CLIP model can be applied in various fields. Here are some examples:<\/p>\n<ul>\n<li><strong>Image Search:<\/strong> You can search for images related to a keyword by entering the keyword.<\/li>\n<li><strong>Content Filtering:<\/strong> You can filter inappropriate content based on the image&#8217;s content.<\/li>\n<li><strong>Social Media:<\/strong> You can effectively classify images uploaded by users through hashtags or descriptions.<\/li>\n<\/ul>\n<h2>Conclusion<\/h2>\n<p>The CLIP model is a powerful tool for understanding the interaction between images and text. The Transformers library from Hugging Face makes it easy to utilize this model. As more data and advanced algorithms are combined in the future, the performance of CLIP will further improve.<\/p>\n<p>I hope this course has helped you understand the basic concepts of the CLIP model and practical examples of its use. If you have any questions or feedback, please feel free to leave a comment!<\/p>\n<footer>\n<p>\u00a9 2023 &#8211; Hugging Face Transformers Utilization Course.<\/p>\n<\/footer>\n<p><\/body><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently, the CLIP (Contrastive Language-Image Pretraining) model has been gaining attention in the field of artificial intelligence. The CLIP model learns the relationship between natural language and images, making it applicable in various applications. In this course, we will take a closer look at how to infer the similarity between images and texts using the &hellip; <a href=\"https:\/\/atmokpo.com\/w\/36105\/\" class=\"more-link\">\ub354 \ubcf4\uae30<span class=\"screen-reader-text\"> &#8220;Lecture on Using Hugging Face Transformers, CLIP Inference&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[108],"tags":[],"class_list":["post-36105","post","type-post","status-publish","format-standard","hentry","category---en"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Lecture on Using Hugging Face Transformers, CLIP Inference - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/atmokpo.com\/w\/36105\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Lecture on Using Hugging Face Transformers, CLIP Inference - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"og:description\" content=\"Recently, the CLIP (Contrastive Language-Image Pretraining) model has been gaining attention in the field of artificial intelligence. The CLIP model learns the relationship between natural language and images, making it applicable in various applications. In this course, we will take a closer look at how to infer the similarity between images and texts using the &hellip; \ub354 \ubcf4\uae30 &quot;Lecture on Using Hugging Face Transformers, CLIP Inference&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/atmokpo.com\/w\/36105\/\" \/>\n<meta property=\"og:site_name\" content=\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-01T09:45:47+00:00\" \/>\n<meta name=\"author\" content=\"root\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:site\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"root\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"3\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/atmokpo.com\/w\/36105\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36105\/\"},\"author\":{\"name\":\"root\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\"},\"headline\":\"Lecture on Using Hugging Face Transformers, CLIP Inference\",\"datePublished\":\"2024-11-01T09:45:47+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36105\/\"},\"wordCount\":466,\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"articleSection\":[\"Using Hugging Face\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/atmokpo.com\/w\/36105\/\",\"url\":\"https:\/\/atmokpo.com\/w\/36105\/\",\"name\":\"Lecture on Using Hugging Face Transformers, CLIP Inference - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#website\"},\"datePublished\":\"2024-11-01T09:45:47+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/atmokpo.com\/w\/36105\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/atmokpo.com\/w\/36105\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/atmokpo.com\/w\/36105\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\ud648\",\"item\":\"https:\/\/atmokpo.com\/w\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Lecture on Using Hugging Face Transformers, CLIP Inference\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/atmokpo.com\/w\/#website\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/atmokpo.com\/w\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"contentUrl\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"width\":400,\"height\":400,\"caption\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\"},\"image\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/bebubo4\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\",\"name\":\"root\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"caption\":\"root\"},\"sameAs\":[\"http:\/\/atmokpo.com\/w\"],\"url\":\"https:\/\/atmokpo.com\/w\/author\/root\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Lecture on Using Hugging Face Transformers, CLIP Inference - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/atmokpo.com\/w\/36105\/","og_locale":"ko_KR","og_type":"article","og_title":"Lecture on Using Hugging Face Transformers, CLIP Inference - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","og_description":"Recently, the CLIP (Contrastive Language-Image Pretraining) model has been gaining attention in the field of artificial intelligence. The CLIP model learns the relationship between natural language and images, making it applicable in various applications. In this course, we will take a closer look at how to infer the similarity between images and texts using the &hellip; \ub354 \ubcf4\uae30 \"Lecture on Using Hugging Face Transformers, CLIP Inference\"","og_url":"https:\/\/atmokpo.com\/w\/36105\/","og_site_name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","article_published_time":"2024-11-01T09:45:47+00:00","author":"root","twitter_card":"summary_large_image","twitter_creator":"@bebubo4","twitter_site":"@bebubo4","twitter_misc":{"\uae00\uc4f4\uc774":"root","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"3\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/atmokpo.com\/w\/36105\/#article","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/36105\/"},"author":{"name":"root","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7"},"headline":"Lecture on Using Hugging Face Transformers, CLIP Inference","datePublished":"2024-11-01T09:45:47+00:00","mainEntityOfPage":{"@id":"https:\/\/atmokpo.com\/w\/36105\/"},"wordCount":466,"publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"articleSection":["Using Hugging Face"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/atmokpo.com\/w\/36105\/","url":"https:\/\/atmokpo.com\/w\/36105\/","name":"Lecture on Using Hugging Face Transformers, CLIP Inference - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/#website"},"datePublished":"2024-11-01T09:45:47+00:00","breadcrumb":{"@id":"https:\/\/atmokpo.com\/w\/36105\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/atmokpo.com\/w\/36105\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/atmokpo.com\/w\/36105\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\ud648","item":"https:\/\/atmokpo.com\/w\/en\/"},{"@type":"ListItem","position":2,"name":"Lecture on Using Hugging Face Transformers, CLIP Inference"}]},{"@type":"WebSite","@id":"https:\/\/atmokpo.com\/w\/#website","url":"https:\/\/atmokpo.com\/w\/","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","description":"","publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/atmokpo.com\/w\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/atmokpo.com\/w\/#organization","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","url":"https:\/\/atmokpo.com\/w\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/","url":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","contentUrl":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","width":400,"height":400,"caption":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8"},"image":{"@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/bebubo4"]},{"@type":"Person","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7","name":"root","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","caption":"root"},"sameAs":["http:\/\/atmokpo.com\/w"],"url":"https:\/\/atmokpo.com\/w\/author\/root\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36105","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/comments?post=36105"}],"version-history":[{"count":1,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36105\/revisions"}],"predecessor-version":[{"id":36106,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/36105\/revisions\/36106"}],"wp:attachment":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/media?parent=36105"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/categories?post=36105"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/tags?post=36105"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}