文章目录
1.Background Description
Stock prices are an unstable time series that are influenced by multiple factors. There are many external factors that affect the stock market, mainly including economic factors, political factors, and internal factors of the company. Since the emergence of the stock market, researchers have used various methods to study the volatility of stock prices. With the widespread application of mathematical statistics methods and machine learning, more and more people are applying machine learning and other prediction methods to stock prediction, such as neural network prediction, decision tree prediction, support vector machine prediction, logistic regression prediction, etc.
XGBoost was proposed by TianqiChen in 2016 and has been proven to have low computational complexity, fast running speed, and high accuracy. XGBoost is an efficient implementation of GBDT. When analyzing time series data, although GBDT can effectively improve stock prediction results, due to its relatively slow detection rate, in order to seek a fast and accurate prediction method, the XGBoost model is used for stock prediction, which not only improves prediction accuracy but also increases prediction speed. The XGBoost network model can be used to analyze and predict the closing price of historical stock data, compare the true value with the predicted value, and finally evaluate the effectiveness of the XGBoost model in stock price prediction through an evaluation operator.
The dataset obtained historical data of stocks (code 510050. SH) from 2005 to 2020 through a crawler. The following table shows the market performance of the stocks over multiple trading days, with the main fields including:
These fields comprehensively record the daily price fluctuations and trading situations of stocks, which are used for subsequent analysis and prediction of stock trends.
2.Comparison of algorithm implementation between Python code and Sentosa_DSML community edition
(1) Data reading
1、Implementation in Python code
Import the required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import rcParams
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import xgboost as xgb
Data reading
dataset = pd.read_csv('20_year_FD.csv')
print(dataset.head())
2、Implementation of Sentosa_DSML Community Edition
Firstly, use text operators to read stock datasets from local files.
(2) Feature Engineering
1、Implementation in Python code
def calculate_moving_averages(dataset, windows):
for window in windows:
column_name = f'MA{
window}'
dataset[column_name] = dataset['close'].rolling(window=window).mean()
dataset[['close'] + [f'MA{
window}' for window in windows]] = dataset[['close'] + [f'MA{
window}' for window in windows]].round(3)
return dataset
windows = [5, 7, 30]
dataset = calculate_moving_averages(dataset, windows)
print(dataset[['close', 'MA5', 'MA7', 'MA30']].head())
plt.figure(figsize=(14, 7))
plt.plot(dataset['close'], label='Close Price', color='blue')
plt.plot(dataset['MA5'], label='5-Day Moving Average', color='red', linestyle='--')
plt.plot(dataset['MA7'], label='7-Day Moving Average', color='green', linestyle='--')
plt.plot(dataset['MA30'], label='30-Day Moving Average', color='orange', linestyle='--')
plt.title('Close Price and Moving Averages')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()
Obtain the absolute value of the difference between the actual stock price and the average stock price, and observe the deviation level.
def calculate_deviation(dataset, ma_column):
deviation_column = f'deviation_{
ma_column}'
dataset[deviation_column] = abs(dataset['close'] - dataset[ma_column])
return dataset
dataset = calculate_deviation(dataset, 'MA5')
dataset = calculate_deviation(dataset, 'MA7')
dataset = calculate_deviation(dataset, 'MA30')
plt.figure(figsize=(10, 6))
plt.plot(dataset['deviation_MA5'], label='Deviation from MA5')
plt.plot(dataset['deviation_MA7'], label=