【Chapter 5】Machine Learning Regression Cases_Stock Price Prediction Analysis-XGBoost

1.Background Description

  Stock prices are an unstable time series that are influenced by multiple factors. There are many external factors that affect the stock market, mainly including economic factors, political factors, and internal factors of the company. Since the emergence of the stock market, researchers have used various methods to study the volatility of stock prices. With the widespread application of mathematical statistics methods and machine learning, more and more people are applying machine learning and other prediction methods to stock prediction, such as neural network prediction, decision tree prediction, support vector machine prediction, logistic regression prediction, etc.
  XGBoost was proposed by TianqiChen in 2016 and has been proven to have low computational complexity, fast running speed, and high accuracy. XGBoost is an efficient implementation of GBDT. When analyzing time series data, although GBDT can effectively improve stock prediction results, due to its relatively slow detection rate, in order to seek a fast and accurate prediction method, the XGBoost model is used for stock prediction, which not only improves prediction accuracy but also increases prediction speed. The XGBoost network model can be used to analyze and predict the closing price of historical stock data, compare the true value with the predicted value, and finally evaluate the effectiveness of the XGBoost model in stock price prediction through an evaluation operator.
  The dataset obtained historical data of stocks (code 510050. SH) from 2005 to 2020 through a crawler. The following table shows the market performance of the stocks over multiple trading days, with the main fields including:

  These fields comprehensively record the daily price fluctuations and trading situations of stocks, which are used for subsequent analysis and prediction of stock trends.

2.Comparison of algorithm implementation between Python code and Sentosa_DSML community edition

(1) Data reading

1、Implementation in Python code
  Import the required libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import rcParams
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import xgboost as xgb

  Data reading

dataset = pd.read_csv('20_year_FD.csv')
print(dataset.head())

2、Implementation of Sentosa_DSML Community Edition

  Firstly, use text operators to read stock datasets from local files.
在这里插入图片描述

(2) Feature Engineering

1、Implementation in Python code

def calculate_moving_averages(dataset, windows):
    for window in windows:
        column_name = f'MA{
     window}'
        dataset[column_name] = dataset['close'].rolling(window=window).mean()
    dataset[['close'] + [f'MA{
     window}' for window in windows]] = dataset[['close'] + [f'MA{
     window}' for window in windows]].round(3)
    return dataset

windows = [5, 7, 30]
dataset = calculate_moving_averages(dataset, windows)

print(dataset[['close', 'MA5', 'MA7', 'MA30']].head())

plt.figure(figsize=(14, 7))
plt.plot(dataset['close'], label='Close Price', color='blue')
plt.plot(dataset['MA5'], label='5-Day Moving Average', color='red', linestyle='--')
plt.plot(dataset['MA7'], label='7-Day Moving Average', color='green', linestyle='--')
plt.plot(dataset['MA30'], label='30-Day Moving Average', color='orange', linestyle='--')
plt.title('Close Price and Moving Averages')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()

在这里插入图片描述

Obtain the absolute value of the difference between the actual stock price and the average stock price, and observe the deviation level.

def calculate_deviation(dataset, ma_column):
    deviation_column = f'deviation_{
     ma_column}'
    dataset[deviation_column] = abs(dataset['close'] - dataset[ma_column])
    return dataset

dataset = calculate_deviation(dataset, 'MA5')
dataset = calculate_deviation(dataset, 'MA7')
dataset = calculate_deviation(dataset, 'MA30')

plt.figure(figsize=(10, 6))
plt.plot(dataset['deviation_MA5'], label='Deviation from MA5')
plt.plot(dataset['deviation_MA7'], label=
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值