Machine Learning and Deep Learning Algorithm Trading, Scraping yfinance Data from Yahoo Finance

The modern financial market has increasingly relied on data-driven decision-making. Advances in machine learning and deep learning technologies have brought innovative changes in developing and optimizing trading strategies. In this course, we will explore in detail how to scrape financial data from Yahoo Finance using the yfinance library and how to train machine learning and deep learning models with it.

1. Importance of Machine Learning and Deep Learning in Trading

Machine learning and deep learning have established themselves as powerful tools for analyzing data and making predictions. The following approaches are used to build models that can predict price movements of stocks, options, and other financial products:

Supervised Learning: Learns from past data and price movements to predict future prices.
Unsupervised Learning: Explores potential trading opportunities by clustering data or discovering patterns.
Reinforcement Learning: An agent interacts with the environment and optimizes strategies through rewards.

2. Installing and Basic Usage of the yfinance Library

yfinance is a library that makes it easy to access Yahoo Finance data in Python. It allows for easy retrieval of stock prices, volumes, dividends, and other financial data.

2.1 Installing the Library

pip install yfinance

2.2 Basic Data Retrieval

Now, let’s look at a basic code snippet to retrieve financial data using yfinance.

import yfinance as yf

# Download stock data based on ticker symbol
ticker = 'AAPL'
data = yf.download(ticker, start='2020-01-01', end='2023-01-01')
print(data.head())

2.3 Data Description

The code above downloads stock data for Apple Inc. (AAPL) from January 1, 2020, to January 1, 2023. The data consists of the following columns:

Open: Opening price
High: Highest price
Low: Lowest price
Close: Closing price
Adj Close: Adjusted closing price
Volume: Trading volume

3. Data Preprocessing for Building Machine Learning Models

Before feeding the data into machine learning models, essential preprocessing steps are required. Here are several steps necessary for data preprocessing:

3.1 Handling Missing Values

Missing values can degrade the model’s performance, so it’s important to check for and handle them first.

# Check for missing values
print(data.isnull().sum())

# Remove missing values
data = data.dropna()

3.2 Feature Engineering

Additional features can be created for price prediction. For example, technical indicators such as moving averages or volatility can be included.

data['SMA_20'] = data['Close'].rolling(window=20).mean()
data['SMA_50'] = data['Close'].rolling(window=50).mean()

3.3 Splitting Training Set and Test Set

To train the model, the data needs to be split into training and test sets. Typically, an 80:20 split is common.

from sklearn.model_selection import train_test_split

# Define features and labels
X = data[['SMA_20', 'SMA_50']]
y = data['Close']

# Split into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

4. Choosing and Training a Machine Learning Model

Now it’s time to select and train a machine learning model based on the data. There are various machine learning algorithms; we will use a linear regression model.

4.1 Model Selection: Linear Regression

from sklearn.linear_model import LinearRegression

# Initialize the model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

4.2 Model Evaluation

To evaluate the performance of the trained model, we can use the test set to check the model’s predictions.

from sklearn.metrics import mean_squared_error

# Predictions
y_pred = model.predict(X_test)

# Calculate MSE
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse:.2f}')  # Output mean squared error

5. Building a Deep Learning Model

Deep learning models can recognize more complex patterns, making them particularly useful when needed. Let’s build a simple neural network using Keras.

5.1 Installing Keras

pip install tensorflow

5.2 Designing the Deep Learning Model

A multilayer perceptron (MLP) model can be constructed to predict stock prices.

from tensorflow import keras
from tensorflow.keras import layers

# Define the model
model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    layers.Dense(64, activation='relu'),
    layers.Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.1)

6. Result Analysis and Visualization

The model’s prediction results can be visualized for analysis. Predictions can be visually represented using matplotlib or seaborn.

6.1 Visualization Comparing Predicted and Actual Values

import matplotlib.pyplot as plt

# Visualizing actual and predicted values
plt.figure(figsize=(14,7))
plt.plot(y_test.index, y_test, color='blue', label='Actual Price')
plt.plot(y_test.index, y_pred, color='red', label='Predicted Price')
plt.title('Stock Price Prediction')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()

7. Conclusion and Future Directions

In this course, we looked at collecting financial data using the yfinance library and training machine learning and deep learning models based on that. These techniques can be used to build an algorithmic trading system, and by continually collecting data and updating models, improved performance can be expected.

7.1 Learning Tasks

Try applying various machine learning algorithms (e.g., Random Forest, SVM, etc.).
Add various features and compare model performance.
Perform hyperparameter tuning to improve deep learning models.

7.2 References

Now you have a basic understanding of algorithmic trading using machine learning and deep learning, and you’re ready to collect more data through yfinance and practice. Moving forward, try to explore various advanced techniques. Thank you!