{"id":35769,"date":"2024-11-01T09:42:26","date_gmt":"2024-11-01T09:42:26","guid":{"rendered":"http:\/\/atmokpo.com\/w\/?p=35769"},"modified":"2024-11-01T11:10:56","modified_gmt":"2024-11-01T11:10:56","slug":"machine-learning-and-deep-learning-algorithm-trading-generating-doc2vec-input-from-yelp-sentiment-data","status":"publish","type":"post","link":"https:\/\/atmokpo.com\/w\/35769\/","title":{"rendered":"Machine Learning and Deep Learning Algorithm Trading, Generating doc2vec Input from Yelp Sentiment Data"},"content":{"rendered":"<p><body><\/p>\n<p>Today, machine learning and deep learning play a vital role in financial markets. This article focuses on processing Yelp review data using doc2vec, which plays an important role in sentiment analysis, and applying it to trading algorithms.<\/p>\n<h2>1. Importance of Machine Learning and Deep Learning<\/h2>\n<p>Machine learning and deep learning technologies demonstrate outstanding performance in analyzing and predicting large amounts of data. In particular, in financial trading, it is essential to analyze the impact of unstructured data such as social media, news, and reviews on price fluctuations, in addition to market data. Such data can be utilized to build models that support decision-making.<\/p>\n<h2>2. Introduction to Yelp Sentiment Data<\/h2>\n<p>Yelp is a platform where users leave reviews about restaurants and businesses, including text reviews, ratings, and user information. By performing sentiment analysis on Yelp data, we can identify patterns in positive or negative reviews and use them as predictive indicators for stock prices.<\/p>\n<h2>3. Introduction and Necessity of doc2vec<\/h2>\n<p>Doc2vec is a technique that understands the context of text data and represents the meaning of documents in vector form. It is based on the advancements in word embedding technology and generates unique vectors for each document. This vectorization significantly contributes to enhancing the performance of subsequent machine learning models.<\/p>\n<h3>3.1 Structure of the doc2vec Model<\/h3>\n<p>Doc2vec is based on two main algorithms: Distributed Bag of Words (DBOW) and Distributed Memory (DM). DBOW disregards the context of words by using the labels of documents, capturing the meaning of the document. DM works by predicting subsequent words based on past words.<\/p>\n<h2>4. Data Collection<\/h2>\n<p>To collect Yelp sentiment data, the latest APIs or web scraping technologies can be utilized. Here, we describe the process of gathering data as an example using Python&#8217;s <code>requests<\/code> library and <code>BeautifulSoup<\/code>.<\/p>\n<pre><code>import requests\nfrom bs4 import BeautifulSoup\n\ndef fetch_yelp_reviews(business_id):\n    url = f'https:\/\/www.yelp.com\/biz\/{business_id}'\n    response = requests.get(url)\n    soup = BeautifulSoup(response.text, 'html.parser')\n    reviews = soup.find_all('p', class_='comment')\n    return [review.text for review in reviews]\n\nbusiness_id = \"example-business-id\"\nreviews = fetch_yelp_reviews(business_id)\nprint(reviews)<\/code><\/pre>\n<h2>5. Data Preprocessing<\/h2>\n<p>Preprocessing is necessary to input the collected review data into the doc2vec model. This includes processes such as text cleaning, tokenization, stopword removal, and stemming.<\/p>\n<pre><code>import nltk\nfrom nltk.corpus import stopwords\nfrom nltk.stem import PorterStemmer\n\nnltk.download('stopwords')\nstop_words = set(stopwords.words('english'))\nstemmer = PorterStemmer()\n\ndef preprocess_reviews(reviews):\n    processed_reviews = []\n    for review in reviews:\n        tokens = nltk.word_tokenize(review.lower())\n        filtered_tokens = [stemmer.stem(word) for word in tokens if word.isalnum() and word not in stop_words]\n        processed_reviews.append(filtered_tokens)\n    return processed_reviews\n\ncleaned_reviews = preprocess_reviews(reviews)\nprint(cleaned_reviews)<\/code><\/pre>\n<h2>6. Training the doc2vec Model<\/h2>\n<p>This is the stage of training the doc2vec model using the preprocessed review data. We create and train the model using Gensim&#8217;s Doc2Vec library.<\/p>\n<pre><code>from gensim.models import Doc2Vec, TaggedDocument\n\ndocuments = [TaggedDocument(words=review, tags=[str(i)]) for i, review in enumerate(cleaned_reviews)]\n\nmodel = Doc2Vec(vector_size=50, min_count=2, epochs=40)\nmodel.build_vocab(documents)\nmodel.train(documents, total_examples=model.corpus_count, epochs=model.epochs)\n\n# Check document vector\nvector = model.infer_vector(['great', 'food'])\nprint(vector)<\/code><\/pre>\n<h2>7. Designing Trading Strategies<\/h2>\n<p>Using the document vectors obtained from the trained doc2vec model, we design trading strategies. For instance, we can develop a return prediction model based on sentiment indices.<\/p>\n<h3>7.1 Structure of the Prediction Model<\/h3>\n<p>A trading model generally has the following structure:<\/p>\n<ul>\n<li>Data collection and preprocessing<\/li>\n<li>Feature vector generation (including document vectors)<\/li>\n<li>Model training (regression or classification model)<\/li>\n<li>Model evaluation and optimization<\/li>\n<li>Real-time trading execution<\/li>\n<\/ul>\n<h2>8. Model Evaluation<\/h2>\n<p>The trained model&#8217;s performance should be evaluated using a test dataset. Commonly used metrics include RMSE, accuracy, and MAPE.<\/p>\n<pre><code>from sklearn.metrics import mean_squared_error\nfrom math import sqrt\n\n# Compare predicted values with actual values\ny_true = [2.5, 3.0, 4.5] # Actual values\ny_pred = [2.0, 3.5, 4.0] # Predicted values\n\nrmse = sqrt(mean_squared_error(y_true, y_pred))\nprint(f'RMSE: {rmse}<\/code><\/pre>\n<h2>9. Conclusion<\/h2>\n<p>This tutorial explained the process of generating doc2vec vectors to input into machine learning and deep learning models using Yelp sentiment data. This data can provide valuable signals for algorithmic trading and can be utilized in real-time financial markets. In fact, one can build their own trading algorithms and maximize their performance by using such methods.<\/p>\n<h2>10. References<\/h2>\n<ul>\n<li>Le, Q. &amp; Mikolov, T. (2014). Distributed Representations of Sentences and Documents. <i>ICML<\/i>.<\/li>\n<li>Gensim Documentation. (n.d.). <a href=\"https:\/\/radimrehurek.com\/gensim\/\" target=\"_blank\" rel=\"noopener\">Gensim<\/a>.<\/li>\n<li>NLTK Documentation. (n.d.). <a href=\"http:\/\/www.nltk.org\/\" target=\"_blank\" rel=\"noopener\">NLTK<\/a>.<\/li>\n<\/ul>\n<h2>11. Additional Exercises<\/h2>\n<p>Additional practice problems are provided for the reader. Try to collect Yelp data, generate vectors using doc2vec, and design various trading algorithms.<\/p>\n<ul>\n<li>Collect and compare data from different business categories<\/li>\n<li>Hyperparameter tuning to improve the prediction model<\/li>\n<li>Testing and optimizing the model with real-time data included<\/li>\n<\/ul>\n<p>Thank you!<\/p>\n<p><\/body><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today, machine learning and deep learning play a vital role in financial markets. This article focuses on processing Yelp review data using doc2vec, which plays an important role in sentiment analysis, and applying it to trading algorithms. 1. Importance of Machine Learning and Deep Learning Machine learning and deep learning technologies demonstrate outstanding performance in &hellip; <a href=\"https:\/\/atmokpo.com\/w\/35769\/\" class=\"more-link\">\ub354 \ubcf4\uae30<span class=\"screen-reader-text\"> &#8220;Machine Learning and Deep Learning Algorithm Trading, Generating doc2vec Input from Yelp Sentiment Data&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[121],"tags":[],"class_list":["post-35769","post","type-post","status-publish","format-standard","hentry","category-deep-learning-automated-trading"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.2 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Machine Learning and Deep Learning Algorithm Trading, Generating doc2vec Input from Yelp Sentiment Data - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/atmokpo.com\/w\/35769\/\" \/>\n<meta property=\"og:locale\" content=\"ko_KR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Machine Learning and Deep Learning Algorithm Trading, Generating doc2vec Input from Yelp Sentiment Data - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"og:description\" content=\"Today, machine learning and deep learning play a vital role in financial markets. This article focuses on processing Yelp review data using doc2vec, which plays an important role in sentiment analysis, and applying it to trading algorithms. 1. Importance of Machine Learning and Deep Learning Machine learning and deep learning technologies demonstrate outstanding performance in &hellip; \ub354 \ubcf4\uae30 &quot;Machine Learning and Deep Learning Algorithm Trading, Generating doc2vec Input from Yelp Sentiment Data&quot;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/atmokpo.com\/w\/35769\/\" \/>\n<meta property=\"og:site_name\" content=\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\" \/>\n<meta property=\"article:published_time\" content=\"2024-11-01T09:42:26+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-11-01T11:10:56+00:00\" \/>\n<meta name=\"author\" content=\"root\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:site\" content=\"@bebubo4\" \/>\n<meta name=\"twitter:label1\" content=\"\uae00\uc4f4\uc774\" \/>\n\t<meta name=\"twitter:data1\" content=\"root\" \/>\n\t<meta name=\"twitter:label2\" content=\"\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04\" \/>\n\t<meta name=\"twitter:data2\" content=\"4\ubd84\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/atmokpo.com\/w\/35769\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/35769\/\"},\"author\":{\"name\":\"root\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\"},\"headline\":\"Machine Learning and Deep Learning Algorithm Trading, Generating doc2vec Input from Yelp Sentiment Data\",\"datePublished\":\"2024-11-01T09:42:26+00:00\",\"dateModified\":\"2024-11-01T11:10:56+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/atmokpo.com\/w\/35769\/\"},\"wordCount\":589,\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"articleSection\":[\"Deep learning Automated trading\"],\"inLanguage\":\"ko-KR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/atmokpo.com\/w\/35769\/\",\"url\":\"https:\/\/atmokpo.com\/w\/35769\/\",\"name\":\"Machine Learning and Deep Learning Algorithm Trading, Generating doc2vec Input from Yelp Sentiment Data - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"isPartOf\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#website\"},\"datePublished\":\"2024-11-01T09:42:26+00:00\",\"dateModified\":\"2024-11-01T11:10:56+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/atmokpo.com\/w\/35769\/#breadcrumb\"},\"inLanguage\":\"ko-KR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/atmokpo.com\/w\/35769\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/atmokpo.com\/w\/35769\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\ud648\",\"item\":\"https:\/\/atmokpo.com\/w\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Machine Learning and Deep Learning Algorithm Trading, Generating doc2vec Input from Yelp Sentiment Data\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/atmokpo.com\/w\/#website\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/atmokpo.com\/w\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ko-KR\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/atmokpo.com\/w\/#organization\",\"name\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\",\"url\":\"https:\/\/atmokpo.com\/w\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"contentUrl\":\"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png\",\"width\":400,\"height\":400,\"caption\":\"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8\"},\"image\":{\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/bebubo4\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7\",\"name\":\"root\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ko-KR\",\"@id\":\"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g\",\"caption\":\"root\"},\"sameAs\":[\"http:\/\/atmokpo.com\/w\"],\"url\":\"https:\/\/atmokpo.com\/w\/author\/root\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Machine Learning and Deep Learning Algorithm Trading, Generating doc2vec Input from Yelp Sentiment Data - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/atmokpo.com\/w\/35769\/","og_locale":"ko_KR","og_type":"article","og_title":"Machine Learning and Deep Learning Algorithm Trading, Generating doc2vec Input from Yelp Sentiment Data - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","og_description":"Today, machine learning and deep learning play a vital role in financial markets. This article focuses on processing Yelp review data using doc2vec, which plays an important role in sentiment analysis, and applying it to trading algorithms. 1. Importance of Machine Learning and Deep Learning Machine learning and deep learning technologies demonstrate outstanding performance in &hellip; \ub354 \ubcf4\uae30 \"Machine Learning and Deep Learning Algorithm Trading, Generating doc2vec Input from Yelp Sentiment Data\"","og_url":"https:\/\/atmokpo.com\/w\/35769\/","og_site_name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","article_published_time":"2024-11-01T09:42:26+00:00","article_modified_time":"2024-11-01T11:10:56+00:00","author":"root","twitter_card":"summary_large_image","twitter_creator":"@bebubo4","twitter_site":"@bebubo4","twitter_misc":{"\uae00\uc4f4\uc774":"root","\uc608\uc0c1 \ub418\ub294 \ud310\ub3c5 \uc2dc\uac04":"4\ubd84"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/atmokpo.com\/w\/35769\/#article","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/35769\/"},"author":{"name":"root","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7"},"headline":"Machine Learning and Deep Learning Algorithm Trading, Generating doc2vec Input from Yelp Sentiment Data","datePublished":"2024-11-01T09:42:26+00:00","dateModified":"2024-11-01T11:10:56+00:00","mainEntityOfPage":{"@id":"https:\/\/atmokpo.com\/w\/35769\/"},"wordCount":589,"publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"articleSection":["Deep learning Automated trading"],"inLanguage":"ko-KR"},{"@type":"WebPage","@id":"https:\/\/atmokpo.com\/w\/35769\/","url":"https:\/\/atmokpo.com\/w\/35769\/","name":"Machine Learning and Deep Learning Algorithm Trading, Generating doc2vec Input from Yelp Sentiment Data - \ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","isPartOf":{"@id":"https:\/\/atmokpo.com\/w\/#website"},"datePublished":"2024-11-01T09:42:26+00:00","dateModified":"2024-11-01T11:10:56+00:00","breadcrumb":{"@id":"https:\/\/atmokpo.com\/w\/35769\/#breadcrumb"},"inLanguage":"ko-KR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/atmokpo.com\/w\/35769\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/atmokpo.com\/w\/35769\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\ud648","item":"https:\/\/atmokpo.com\/w\/en\/"},{"@type":"ListItem","position":2,"name":"Machine Learning and Deep Learning Algorithm Trading, Generating doc2vec Input from Yelp Sentiment Data"}]},{"@type":"WebSite","@id":"https:\/\/atmokpo.com\/w\/#website","url":"https:\/\/atmokpo.com\/w\/","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","description":"","publisher":{"@id":"https:\/\/atmokpo.com\/w\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/atmokpo.com\/w\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ko-KR"},{"@type":"Organization","@id":"https:\/\/atmokpo.com\/w\/#organization","name":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8","url":"https:\/\/atmokpo.com\/w\/","logo":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/","url":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","contentUrl":"https:\/\/atmokpo.com\/w\/wp-content\/uploads\/2024\/11\/logo.png","width":400,"height":400,"caption":"\ub77c\uc774\ube0c\uc2a4\ub9c8\ud2b8"},"image":{"@id":"https:\/\/atmokpo.com\/w\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/bebubo4"]},{"@type":"Person","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/91b6b3b138fbba0efb4ae64b1abd81d7","name":"root","image":{"@type":"ImageObject","inLanguage":"ko-KR","@id":"https:\/\/atmokpo.com\/w\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/708197b41fc6435a7ce22d951b25d4a47e9e904270cb1f04682d4f025066f80c?s=96&d=mm&r=g","caption":"root"},"sameAs":["http:\/\/atmokpo.com\/w"],"url":"https:\/\/atmokpo.com\/w\/author\/root\/"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[],"_links":{"self":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/35769","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/comments?post=35769"}],"version-history":[{"count":1,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/35769\/revisions"}],"predecessor-version":[{"id":35770,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/posts\/35769\/revisions\/35770"}],"wp:attachment":[{"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/media?parent=35769"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/categories?post=35769"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/atmokpo.com\/w\/wp-json\/wp\/v2\/tags?post=35769"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}