In recent years, Bitcoin has attracted the attention of many investors due to its rapid price volatility. Based on this, Bitcoin price prediction models utilizing machine learning and deep learning techniques are evolving. This course covers how to use the ARIMA (AutoRegressive Integrated Moving Average) model to forecast Bitcoin price time series.
1. Overview of the ARIMA Model
The ARIMA model is widely used to find patterns and make predictions in time series data. ARIMA consists of the following three components:
- AR (AutoRegressive) part: Analyzes the influence of past values on the current value.
- I (Integrated) part: Stabilizes the time series data by differencing it to ensure stationarity.
- MA (Moving Average) part: Analyzes the effect of past prediction errors on the current prediction.
ARIMA models are expressed in the form ARIMA(p, d, q)
, where p
is the number of autoregressive terms, d
is the number of differences, and q
is the number of moving average terms.
2. Collecting Bitcoin Price Time Series Data
To collect Bitcoin price data, several data provider APIs can be used. In this example, we will use the yfinance
library to collect the data. First, install the necessary libraries.
pip install yfinance
Example Code for Data Collection
import yfinance as yf
import pandas as pd
# Fetch Bitcoin data
btc_data = yf.download('BTC-USD', start='2020-01-01', end='2023-09-30')
btc_data['Close'].plot(title='Bitcoin Closing Prices', fontsize=14)
3. Preprocessing Time Series Data
Before applying the ARIMA model, it is essential to check the stability of the data. This involves visualizing the time series and conducting stationarity tests. The ADF (Augmented Dickey-Fuller) test can be used to check for stationarity.
Example Code for Stationarity Test
from statsmodels.tsa.stattools import adfuller
import matplotlib.pyplot as plt
# ADF test function
def adf_test(series):
result = adfuller(series, autolag='AIC')
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
for key, value in result[4].items():
print('Critical Values:')
print('\t%s: %.3f' % (key, value))
# Perform ADF test on closing price data
adf_test(btc_data['Close'])
4. Training the ARIMA Model
If the data is stationary, the ARIMA model can be trained. The ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) plots are used to set the model parameters.
Example Code for ACF and PACF Plot Generation
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
# ACF and PACF plots
plt.figure(figsize=(12, 6))
plt.subplot(121)
plot_acf(btc_data['Close'], ax=plt.gca(), lags=30)
plt.subplot(122)
plot_pacf(btc_data['Close'], ax=plt.gca(), lags=30)
plt.show()
Example Code for Training the ARIMA Model
from statsmodels.tsa.arima.model import ARIMA
# Create ARIMA model (set p, d, q to appropriate values)
model = ARIMA(btc_data['Close'], order=(5, 1, 0))
model_fit = model.fit()
# Model summary
print(model_fit.summary())
5. Prediction and Result Visualization
After training the model, predictions are made, and the results are visualized. It is crucial to compare the predicted results with the actual data.
Example Code for Prediction and Visualization
# Forecasting price for the next 30 days
forecast = model_fit.forecast(steps=30)
forecast_index = pd.date_range(start='2023-10-01', periods=30)
forecast_series = pd.Series(forecast, index=forecast_index)
# Visualizing actual data
plt.figure(figsize=(10, 6))
plt.plot(btc_data['Close'], label='Actual Prices')
plt.plot(forecast_series, label='Forecasted Prices', color='red')
plt.title('Bitcoin Price Forecast')
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.legend()
plt.show()
6. Evaluating Model Performance
To evaluate the prediction performance of the model, metrics such as RMSE (Root Mean Squared Error) can be used.
Example Code for Calculating RMSE
from sklearn.metrics import mean_squared_error
import numpy as np
# Calculate RMSE
rmse = np.sqrt(mean_squared_error(btc_data['Close'][-30:], forecast_series))
print(f'RMSE: {rmse}')
Conclusion
Using the ARIMA model for Bitcoin price prediction is a powerful tool for time series data analysis. However, the model’s performance can vary based on the quality of the data, the tuning of the model parameters, and external factors. Additionally, combining it with other machine learning and deep learning methods can achieve improved prediction performance.