背景
1、自学金融可能遇到的问题。
2、搭建量化所需的框架探索。
3、策略探索。
框架对比
Aspect | First Script | Second Script | Third Script |
---|---|---|---|
Data Source | AkShare API | AkShare API | AkShare API |
Models Used | SVC | Logistic Regression, GaussianNB, SVC, Random Forest, MLP | Logistic Regression, GaussianNB, SVC, Random Forest, MLP |
Feature Engineering | Lagged returns, direction as features | Lagged returns, direction as features | Lagged returns, direction as features |
Evaluation Metrics | Accuracy Score, Classification Report, Confusion Matrix | Total Returns, Annual Volatility, Accuracy Score | Total Returns, Annual Volatility, Accuracy Score |
Backtesting Strategy | Uses predictions to decide full buy/sell strategy, no shorting | Uses multiple models for predictions, reports on highest accuracy | Uses best-performing model (SVM) for backtest strategy |
Performance Visualization | Performance comparison with benchmark using Pyfolio | Uses Pyfolio for detailed strategy evaluation and plotting | Detailed plotting using Pyfolio for various metrics |
Code Organization | Functions for data download, feature creation, train/test split | More modular with functions for models and evaluation | Highly modular, structured with dedicated functions |
Key Libraries | AkShare, sklearn, backtrader, pyfolio | AkShare, sklearn, backtrader, pyfolio | AkShare, sklearn, backtrader, pyfolio |
Handling of NaN Values | Dropped NaNs before feature creation | Filled NaNs with 0 before conversion to int | Filled NaNs with 0 before conversion to int |
基础概念
股票价格进行前复权和后复权的计算。在量化分析中,后复权数据由于其连续性,更适合用来计算股票的真实收益率和进行长期投资分析。前复权数据虽然方便观察当前价格与历史价格的关系,但可能因分红送股等事件导致历史价格出现负值,因此在量化策略中使用较少。
import pandas as pd
# 假设我们有以下DataFrame,包含日期、原始收盘价、每股分红和每股转送股数
data = {
'date': ['2024-08-01', '2024-08-02', '2024-08-03'],
'close_price': [100, 90, 95], # 原始收盘价
'dividend_per_share': [0, 2, 0], # 每股分红
'share_transfer_per_share': [0, 0, 0.5] # 每股转送股数
}
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])
# 计算前复权因子
df['adjust_factor'] = (1 + df['share_transfer_per_share']) * (1 - df['dividend_per_share'] / df['close_price'])
# 计算前复权价格
df['forward_adjusted_price'] = df['close_price'] / df['adjust_factor'].shift(1).fillna(1)
# 计算后复权价格
df['backward_adjusted_price'] = df['close_price'] * df['adjust_factor'].cumprod()
# 显示结果
print(df)
import akshare as ak
import pandas as pd
# 获取上证指数的历史数据,指定后复权
stock_zh_a_hist_df = ak.stock_zh_a_hist(symbol='000001.SH',
start_date='20200101',
end_date='20240829',
adjust='hfq')
# 将数据转换为DataFrame,并设置日期列为索引
df = pd.DataFrame(stock_zh_a_hist_df['日K线'])
df.index = pd.to_datetime(df['日期'])
df.drop('日期', axis=1, inplace=True)
# 显示数据
print(df.head())
# 如果需要前复权数据,只需将adjust参数改为'qfq'
# stock_zh_a_hist_df = ak.stock_zh_a_hist(symbol='000001.SH', adjust='qfq')
参考资料
https://akshare.akfamily.xyz/introduction.html
https://github.com/akfamily/akshare