Bitcoin Price Prediction Using Machine Learning
Bitcoin has established itself as one of the most popular assets in the financial market in recent years.
Many investors aim to leverage the volatility of Bitcoin’s price to generate profits.
In this course, we will learn how to predict the short-term price movements of Bitcoin using deep learning and machine learning techniques.
In particular, we will focus on the process of predicting Bitcoin prices using regression models.
1. Data Preparation
The dataset used for predicting Bitcoin prices mainly includes information such as Bitcoin’s price, trading volume, high and low prices.
Generally, real-time data can be collected through APIs provided by cryptocurrency exchanges such as CoinMarketCap or Binance.
In this course, historical price data will be used for examples.
import pandas as pd
# Downloading and reading data from Binance API as a CSV file.
df = pd.read_csv('bitcoin_price.csv')
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
# Creating a DataFrame with only the necessary columns.
data = df[['Open', 'High', 'Low', 'Close', 'Volume']]
data.head()
2. Data Preprocessing
Data preprocessing is crucial for improving the performance of machine learning models.
Various preprocessing steps are needed, such as handling missing values, scaling, and merging.
Moreover, considering the time series nature of prices, past price information can influence future prices.
# Handling missing values
data = data.fillna(method='ffill')
# Data normalization
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(data)
# Creating sequential data
def create_dataset(dataset, time_step=1):
X, y = [], []
for i in range(len(dataset) - time_step - 1):
X.append(dataset[i:(i + time_step), 0:dataset.shape[1]])
y.append(dataset[i + time_step, 3]) # Close price
return np.array(X), np.array(y)
# Setting time step
time_step = 10
X, y = create_dataset(scaled_data, time_step)
# Splitting into training and testing datasets.
train_size = int(len(X) * 0.8)
X_train, X_test = X[0:train_size], X[train_size:len(X)]
y_train, y_test = y[0:train_size], y[train_size:len(y)]
3. Model Building
We will use LSTM (Long Short-Term Memory) networks to learn from the time series data.
LSTM is a type of RNN (Recurrent Neural Network) that can effectively learn patterns in time series data.
from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dropout(0.2))
model.add(LSTM(units=50, return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=50))
model.add(Dropout(0.2))
model.add(Dense(units=1)) # Price prediction
model.compile(optimizer='adam', loss='mean_squared_error')
4. Model Training
Now, let’s train the model.
We will train it over a sufficient number of epochs to ensure that the model learns the data patterns well.
# Model training
model.fit(X_train, y_train, epochs=100, batch_size=32)
5. Model Evaluation
We will evaluate the trained model using the validation dataset.
To assess the model’s predictive performance, we will use RMSE (Root Mean Square Error).
import numpy as np
# Predictions
train_predict = model.predict(X_train)
test_predict = model.predict(X_test)
# Inverse scaling
train_predict = scaler.inverse_transform(np.concatenate((np.zeros((train_predict.shape[0], 4)), train_predict), axis=1))[:, 4]
test_predict = scaler.inverse_transform(np.concatenate((np.zeros((test_predict.shape[0], 4)), test_predict), axis=1))[:, 4]
# Calculating RMSE
train_rmse = np.sqrt(np.mean((train_predict - y_train) ** 2))
test_rmse = np.sqrt(np.mean((test_predict - y_test) ** 2))
print(f'Train RMSE: {train_rmse}')
print(f'Test RMSE: {test_rmse}')
6. Visualization of Prediction Results
Finally, we will visualize the prediction results to evaluate the performance of the model.
By visually comparing the actual prices with the prices predicted by the model, we can gauge the model’s predictive performance.
import matplotlib.pyplot as plt
# Visualization
plt.figure(figsize=(14, 5))
plt.plot(df.index[:len(y_train)], y_train, label='Actual Price (Train)', color='blue')
plt.plot(df.index[len(y_train):len(y_train)+len(y_test)], y_test, label='Actual Price (Test)', color='green')
plt.plot(df.index[:len(y_train)], train_predict, label='Predicted Price (Train)', color='red')
plt.plot(df.index[len(y_train):len(y_train)+len(y_test)], test_predict, label='Predicted Price (Test)', color='orange')
plt.title('Bitcoin Price Prediction')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()
Conclusion
In this course, we learned how to build a Bitcoin price prediction model using deep learning and machine learning.
Through the LSTM model, we were able to learn patterns from past price data to predict future prices.
By trying various models in this way, we can achieve better predictive performance.
When building an automated trading system for Bitcoin, price prediction is one of the important factors, and this process will help in making investment decisions.