第五章 传统机器学习与深度学习策略

 内容源自:数宽网www.dataquant.cn

一、概述

量化交易使用机器学习策略的流程可以归纳如下:

  1. 数据获取和处理:首先需要获取相应的数据,进行处理后使其符合机器学习模型的要求。

  2. 特征工程:选择有效的特征,并对其进行工程化处理,包括对数据的预处理(如标准化、缺失值和异常值的处理等)、特征选择、特征构造等。

  3. 模型选择与训练:根据策略需求选择适合的机器学习模型,然后利用获取的特征进行训练。这一步需要根据实际策略需求以及市场环境等因素进行模型选择和参数调整。

  4. 模型评估:使用回测数据评估模型的性能。这一步需要设计合理的评估指标,并对模型进行调优。

  5. 策略优化:基于模型评估的结果,对策略进行调整和优化。

  6. 交易执行:最后将策略应用于实际交易中,根据模型输出的结果进行交易决策,并执行相应的交易计划。

总的来说,量化交易使用机器学习策略需要经历数据获取和处理、特征工程、模型选择与训练、模型评估、策略优化以及交易执行等一系列流程。需要注意的是,这一系列流程并不一定是一次性的,而是需要不断地进行迭代和优化。

二、流程

三、样本、特征、label

样本的粒度是每只股票单日数据快照;特征先采用一些基本的指标;label使用明天尾盘上涨可能性或者下周交易日尾盘上涨可能性,请看基本示例:

四、特征挖掘

扩展特征数量

为了提高模型的性能,我们可以通过以下方式扩展特征数量:

定义特征组:在训练前,将特征组展开成具体的特征。例如,可以将MACD建立为一个时间序列,类型为

array<float>类型,内部包含过去N天的数据。这样不仅保留了时间序列的信息,还扩大了特征的维度。

挖掘新特征

在量化交易领域,我们通常从以下几个维度寻找和构建新特征:

  1. 动量指标:如相对强弱指数(RSI)、均线斜率等,用于捕捉资产价格的趋势和动量。

  2. 均值回归指标:如历史价格的均值、标准差等,用于识别价格偏离其长期均值的程度。

  3. 波动率指标:如历史波动率(HV)、隐含波动率(IV)等,用于衡量资产价格的波动性和风险水平。

  4. 财务指标:包括市盈率(P/E)、市净率(P/B)、股息率等,用于评估公司的财务健康状况和价值。

  5. 流动性指标:如交易量、成交额、流动性比率等,用于衡量市场流动性和交易活跃度。

  6. 价格指标:如最高价、最低价、开盘价、收盘价等,用于捕捉价格的波动范围和市场情绪。

  7. 宏观指标:如GDP增长率、通货膨胀率、利率等宏观经济数据,用于评估宏观经济对市场的影响。

  8. 市场全局指标:如市场整体的成交量、市场宽度(上涨股票数量与下跌股票数量的比例)、市场情绪指数等,用于捕捉整体市场趋势和投资者情绪。

三、传统机器学习

模型选择

使用xgboost模型在传统机器学习里表现较好。

def train_XGBClassifier(X_train, y_train):
    clf = XGBClassifier(#learning_rate=0.1,
                          n_estimators=1000,  # 树的个数--1000棵树建立xgboost
                          #max_depth=6,  # 树的深度
                          #min_child_weight=1,  # 叶子节点最小权重
                        # gamma=0.,  # 惩罚项中叶子结点个数前的参数
                         #subsample=0.8,  # 随机选择80%样本建立决策树
                          objective='binary:logistic',
                         # scale_pos_weight=1,  # 解决样本个数不平衡的问题
                          random_state=50,  # 随机数
                          n_jobs=2
    )
    clf.fit(X_train, y_train)
    return clf

主流程

# 获得向后修正交易日期
start_dt = self.re_self_trade_day(market_type, '2018-09-01')
predict_start_dt = self.re_self_trade_day(market_type, '2020-01-01')
end_dt = self.re_self_trade_day(market_type, '2023-07-21')
train_dt_len = 280
# 获得股票列表
pool_stock_list, pool_stock_dict = self.get_pool_stock_list(market_type)
# 获得交易日列表
trade_day_list = self.find_trade_day_between(market_type, start_dt, end_dt)
# 获得财报数据
funda_df = self._fundaService.find_funda_index(market_type, trade_day_list, pool_stock_dict.keys())
# 获得宏观数据
macro_df = self._marketService.find_macro_index(market_type, trade_day_list, start_dt, end_dt)
# 获得指数数据
market_index_df = self._marketService.find_market_index(market_type, trade_day_list, start_dt, end_dt)
# 生成训练样本
all_df = self.gen_sample(market_type, start_dt, end_dt, pool_stock_list, trade_day_list, funda_df, macro_df, market_index_df)
# 开始训练
self.train_predict(market_type, all_df, pool_stock_dict, predict_start_dt, train_dt_len, end_dt)

生成样本

def gen_sample(self, market_type, start_dt, end_dt, pool_stock_list, funda_df, macro_df, market_index_df):
    # 获得股票列表基本数据,即主表
    base_sample_dict = self.download_by_period(market_type, pool_stock_list, start_dt, end_dt, period='day')
    # left join 股票技术指标
    tech_df_dict = self.algo_tech_index(market_type, base_sample_dict)
    # label计算,以下周五的收盘价格与当日收盘价格比较,大于为1,小于为0
    week_stock_data_dict = self.down_week_stock_data_dict(market_type, pool_stock_list, start_dt, end_dt)
    self.algo_label(market_type, end_dt, week_stock_data_dict, tech_df_dict)
    # 整理数据,删除每个股票为nan的行,因为使用ema30,前30行为nan
    for k, v_df in tech_df_dict.items():
        v_df.drop(list(v_df.head(30).index), inplace=True)  # inplace=True 改变原df
    all_df = pd.concat(tech_df_dict.values())
    all_df = all_df.sort_values(by=['date', 'code'])
    # left join 财务数据funda_df,宏观数据macro_df,股票指数market_index_df
    all_df = pd.merge(all_df, funda_df, on=['date', 'code'], how='left')
    all_df = pd.merge(all_df, market_index_df, on=['date'], how='left')
    all_df = pd.merge(all_df, macro_df, on=['date'], how='left')
    all_df.fillna(0, inplace=True)
    all_df.dropna() # remove empty
    return all_df

训练和预估

def train_predict(self, market_type, param_df, pool_stock_dict, start_predict_dt, train_dt_len, end_dt):
    base_df = param_df.copy()
    df_y = base_df['y']
    y = df_y.values
    df = base_df.drop(['code', 'y'], axis=1)
    scaler = preprocessing.StandardScaler().fit(df)
    X = scaler.transform(df)  # 归一化
    df_index = list(df.index)
    all_date_list = list(OrderedDict.fromkeys(df_index))  # 日期列表
    start_predict_dt_i = all_date_list.index(start_predict_dt) # 开始预估日期索引
    train_start_dt = all_date_list[start_predict_dt_i - train_dt_len] # 计算开始训练日期
    train_start_index = df_index.index(train_start_dt)   # 开始训练日期索引
    end_dt_cur_week_list = self.get_cur_week_trade_day_list(market_type, end_dt)
    end_dt_cur_week_firstday = end_dt_cur_week_list[0]['dt'] # 最后预估天所在周内的第一天
    self._logger.info("start train ......")
    is_first, cycle = 0, 0
    for i in range(start_predict_dt_i, len(all_date_list)):
        predict_date = all_date_list[i]
        predict_i = df_index.index(predict_date)
        if i==len(all_date_list) - 1: # last once
            next_predict_date = -1
            next_predict_i = len(df_index)
        else:
            next_predict_date = all_date_list[i + 1]
            next_predict_i = df_index.index(next_predict_date)
            
        #训练
        cur_week_trade_day_list = self.get_cur_week_trade_day_list(market_type, predict_date) # 本周交易日列表
        if len(cur_week_trade_day_list) == 0:
                continue
        trade_last_date = cur_week_trade_day_list[-1]['dt'] # 本周最后一天
        if predict_date == trade_last_date: # 本周最后一天,train包含上周数据
            cur_week_firstday = cur_week_trade_day_list[0]['dt'] # 本周第一天
            if end_dt_cur_week_firstday == cur_week_firstday: # 每周训练一次
                train_end_dt = cur_week_firstday # 本周第一天
                train_end_i = df_index.index(train_end_dt)
                X_train = X[train_start_index:train_end_i, :] # 不包含train_end_i,训练数据结束到上周最后一天
                y_train = y[train_start_index:train_end_i]
                clf = SKlearn.train_XGBClassifier(X_train, y_train) # 模型训练
                self._logger.info("friday train step start_dt="+train_start_dt+", end_dt=" + train_end_dt + ", window, start_index=" + str(train_start_index) +
                                  ", end_index=" + str(train_end_i)+", size=" +str(train_end_i-train_start_index)+ " ......")
        else: # 本周其他天,train只包括上上周数据
            if predict_date == end_dt:
                before_week_firstday = self.get_before_week_trade_firstday(market_type, trade_last_date)
                before_week_firstday = before_week_firstday['dt']
                train_end_dt = before_week_firstday
                train_end_i = df_index.index(train_end_dt)
                X_train = X[train_start_index:train_end_i, :]
                y_train = y[train_start_index:train_end_i]
                if is_first == 0: # 首次训练
                    clf = SKlearn.train_XGBClassifier(X_train, y_train) # 模型训练
                    self._logger.info("first train step start_dt="+train_start_dt+", end_dt=" + train_end_dt + ", window, start_index=" + str(train_start_index) +
                                      ", end_index=" + str(train_end_i)+", size=" +str(train_end_i-train_start_index)+ " ......")
                    is_first = 1
        cycle += 1

        # 后移开始训练时间
        if predict_date == trade_last_date:
            train_start_index += (next_predict_i - predict_i) * cycle
            cycle = 0
        train_start_dt = base_df.iloc[train_start_index].name
        
        # 预估
        step_dt = None
        for j in range(predict_i, next_predict_i): # 每日预估
            origin_row = base_df.iloc[j]
            dt = origin_row.name
            step_dt = dt
            if dt != end_dt:
                continue
            code = origin_row['code']
            cn_name = pool_stock_dict[code]['cn_name']
            X_test = X[j:j + 1, :]
            y_predict_proba = clf.predict_proba(X_test) # 模型预估
            down_score = round(y_predict_proba[0][0], 6) # 下跌概率
            up_score = round(y_predict_proba[0][1], 6) # 上涨概率
            if dt == end_dt:
                score_dict = {'market_type': market_type, 'dt': dt, 'code': code, 'cn_name': cn_name,
                              'down_score': down_score, 'up_score': up_score}
                self.save_or_update_score(score_dict) # 保存

        if dt != end_dt:
            continue
        self._logger.info("predict step dt=" + step_dt + ",window predict_i=" + str(predict_i) + ", end_predict_i=" + str(next_predict_i)
                          + ", size=" + str(next_predict_i - predict_i) + " end ......")
    self._logger.info("end train and predict......")

四、深度学习

模型选择

LSTM适合处理和预测时间序列数据中的序列依赖问题,而股票市场正是这种场景。

class LSTM(nn.Module):
    def __init__(self, input_size=66, hidden_layer_size=256, num_layers=3, output_size=2): # nonlinearity='tanh'
        """
        LSTM二分类任务
        :param input_size: 输入数据的维度
        :param hidden_layer_size:隐层的数目
        :param output_size: 输出的个数
        """
        super().__init__()
        self.hidden_layer_size = hidden_layer_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_layer_size, num_layers=num_layers,dropout=0.1)
        self.linear = nn.Linear(hidden_layer_size, output_size)
        #self.sigmoid = nn.Sigmoid()
        self.softmax = nn.Softmax(dim=1)

    def forward(self, input_x):
        input_x = input_x.view(len(input_x), 1, -1)
        hidden_cell1 = (torch.zeros(self.num_layers, input_x.shape[1], self.hidden_layer_size),  # shape: (n_layers, batch, hidden_size)
                       torch.zeros(self.num_layers, input_x.shape[1], self.hidden_layer_size))
        lstm_out, (h_n, h_c) = self.lstm(input_x, hidden_cell1)
        #lstm_out, (h_n, h_c) = self.lstm(input_x)
        linear_out = self.linear(lstm_out.view(len(input_x), -1))  # =self.linear(lstm_out[:, -1, :])
        x = self.softmax(linear_out)
        x = torch.squeeze(x, 1)
        return x

主流程

由于深度学习是基于大规模样本,所以只能将样本和模型训练分开进行,同时模型训练需要全量和增量相结合,所以需要分几个阶段进行。

  1. 生成样本

  2. 训练base model

  3. 天级增量训练

  4. 预估

生成样本

def gen_base_sample(self, market_type, start_dt, end_dt, file_name):
    pool_stock_list, pool_stock_dict = self.get_pool_stock_list(market_type)
    trade_day_list = self.find_trade_day_between(market_type, start_dt, end_dt)
    all_df, pool_stock_dict = self.gen_sample(market_type, start_dt, end_dt, pool_stock_list, pool_stock_dict, trade_day_list)
    all_df.dropna()  # remove empty
    all_df.to_csv(ConfigUtils._global_dbdir + os.sep + 'dl' + os.sep + market_type + "_" + file_name + ".csv")
    json_dict = {'start_dt':start_dt, 'end_dt':end_dt}
    json_rs = json.dumps(json_dict)
    json_file = ConfigUtils._global_dbdir + os.sep + 'dl' + os.sep + market_type + "_" + file_name + "_conf.json"
    with open(json_file, 'w') as file:
        file.write(json_rs)

训练base model

def train_base_model(self, market_type, p_start_train_dt, p_end_train_dt, base_sample_name, base_model_name):
    if gt_local.smoke == False:
        sample_path = ConfigUtils._global_dbdir + os.sep + 'dl' + os.sep + market_type + "_"+base_sample_name+".csv"
    else:
        sample_path = ConfigUtils._global_dbdir + os.sep + 'dl' + os.sep + market_type + "_"+base_sample_name+".csv"
    df = read_csv(sample_path)
    df.set_index('date', inplace=True)
    no_y_list=[]
    for y_value in df['y'].values:
        no_y_list.append(1 - y_value)
    df['no_y'] = no_y_list
    # 截取训练样本
    df_index = list(df.index)
    first_index = df_index[0]
    if p_start_train_dt < first_index:
        p_start_train_dt = first_index
    train_start_index = df_index.index(p_start_train_dt)
    train_end_index = df_index.index(p_end_train_dt)
    train_sample_df = df.iloc[train_start_index:train_end_index, :]
    #base_df = df.copy()
    # df_y = df['y']
    # y = df_y.values
    #np.array(no_y_list)
    Y = train_sample_df[['no_y', 'y']]
    train_sample_df = train_sample_df.drop(['code', 'no_y', 'y', 'y3', 'y4'], axis=1)
    scaler = preprocessing.StandardScaler().fit(train_sample_df)
    X = scaler.transform(train_sample_df)  # 归一化
    X_train = self.get_tensor_from_np(X).float()
    y_train = self.get_tensor_from_df(Y).float()

    train_loader = Data.DataLoader(
        dataset=Data.TensorDataset(X_train, y_train),  # 封装进Data.TensorDataset()类的数据,可以为任意维度
        batch_size=19,  # 每块的大小
        shuffle=False  # 要不要打乱数据 (打乱比较好)
    )
    # 建模三件套:loss,opt,epochs
    #model = MLP(input_size=len(train_sample_df.columns)) # 模型
    model = LSTM(input_size=len(train_sample_df.columns)) # 模型
    #loss_function = nn.BCELoss()  # loss sigmoid
    loss_function = nn.CrossEntropyLoss()  # loss softmax
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)  # 优化器
    epochs = 350
    writer = SummaryWriter(log_dir=ConfigUtils._global_workdir + os.sep + 'tb_log')
    # 开始训练
    model.train()
    log_step_interval = 100  # 记录的步数间隔
    for i in range(epochs):
        total_loss, total_acc = 0, 0
        for step, (seq, labels) in enumerate(train_loader):
            # 若想要获得类别,二分类问题使用四舍五入的方法即可:print(torch.round(y_pred))
            y_pred = model(seq)  # 进行预测。y_pred是一个长度为16的向量,分别是输进去的16个样本的预测值
            loss = loss_function(y_pred, labels)  # 二元交叉熵损失,刻画的是实际输出(概率)与期望输出(概率)的距离
            acc = self.accuracy(y_pred, labels)  # 每一个批次,对全部数据进行预测,得到的平均正确率
            # 把模型中所有的变量的grad置为0
            optimizer.zero_grad()  # 这个方法会把所有需要优化的参数(即初始化时传进去的模型的参数)的梯度置为0
            loss.backward()  # 反向传播,计算每一个变量的梯度
            optimizer.step()  # 使用优化方法对参数进行优化
            global_iter_num = i * len(train_loader) + step + 1
            if global_iter_num % log_step_interval == 0:
                print("train global_step:{}, loss:{:.4}, accuracy:{:.4}".format(global_iter_num, loss.item(), acc))
                # 添加的第一条日志:损失函数-全局迭代次数
                writer.add_scalar("basemodel_train_loss", loss.item(), global_step=global_iter_num)
                writer.add_scalar("basemodel_train_acc", acc, global_step=global_iter_num)
            total_loss += loss.item()
            total_acc += acc
        writer.add_scalar("basemodel_train_total_loss", total_loss, i)
        writer.add_scalar("basemodel_train_total_acc", total_acc, i)
        print("train epochs:[{}/{}], total_loss:{:.4}, total_accuracy:{:.4}".format(i+1, epochs, total_loss, total_acc))
        if i % 10 == 0:
            torch.save({'state_dict': model.state_dict()}, ConfigUtils._global_dbdir + os.sep + 'dl' + os.sep + market_type + '_' + base_model_name+'.pkl')
            self._logger.info('epoch %d round,save model' % i)
    torch.save({'state_dict': model.state_dict()}, ConfigUtils._global_dbdir + os.sep + 'dl' + os.sep + market_type + '_' + base_model_name+'.pkl')
    # save conf file
    json_dict = {'start_dt':p_start_train_dt, 'end_dt':p_end_train_dt}
    json_rs = json.dumps(json_dict)
    json_file = ConfigUtils._global_dbdir + os.sep + 'dl' + os.sep + market_type + '_' + base_model_name + "_conf.json"
    with open(json_file, 'w') as file:
        file.write(json_rs)
    self._logger.info("end train base model ......")

增量训练与预估

def increament_train_eval(self, market_type, start_predict_dt, sample_file_name, model_file_name):
    # load data
    sample_path = ConfigUtils._global_dbdir + os.sep + 'dl' + os.sep + market_type + "_"+sample_file_name+".csv"
    param_df = read_csv(sample_path)
    param_df.set_index('date', inplace=True)
    pool_stock_list, pool_stock_dict = self.get_pool_stock_list(market_type)

    version = 3
    base_df = param_df.copy()
    no_y_list=[]
    for y_value in base_df['y'].values:
        no_y_list.append(1 - y_value)
    base_df['no_y'] = no_y_list
    y_df = base_df[['no_y', 'y']]
    df = base_df.drop(['code', 'no_y', 'y', 'y3', 'y4'], axis=1)
    scaler = preprocessing.StandardScaler().fit(df)
    x_narray = scaler.transform(df)  # 归一化
    df_index = list(df.index)
    all_date_list = list(OrderedDict.fromkeys(df_index))

    self._logger.info("start backtrace data train and eval ......")
    writer = SummaryWriter(log_dir=ConfigUtils._global_workdir + os.sep + 'tb_log')
    start_predict_dt_i = all_date_list.index(start_predict_dt)
    predict_date = all_date_list[start_predict_dt_i]
    predict_i = df_index.index(predict_date)
    model_conf_file_path = ConfigUtils._global_dbdir + os.sep + 'dl' + os.sep + market_type + "_" + model_file_name + "_conf.json"
    model_conf = self.load_json_file(model_conf_file_path)
    model_start_dt = model_conf['start_dt']
    train_start_dt = model_conf['end_dt']
    train_start_i = df_index.index(train_start_dt)
    # 追数据
    cur_week_trade_day_list = self.get_cur_week_trade_day_list(market_type, predict_date)
    trade_last_date = cur_week_trade_day_list[-1]['dt']  # 本周最后一天
    if trade_last_date == predict_date:  # 本周最后一天,train包含上周数据,训练一次
        cur_week_firstday_date = cur_week_trade_day_list[0]['dt']
        train_end_dt = cur_week_firstday_date
        train_end_i = df_index.index(train_end_dt)
    else:  # 本周其他天,train只包括上上周数据
        before_week_firstday_date = self.get_before_week_trade_firstday(market_type, trade_last_date)
        before_week_firstday_date = before_week_firstday_date['dt']
        train_end_dt = before_week_firstday_date
        train_end_i = df_index.index(train_end_dt)

    # 加载 model
    model = LSTM(input_size=len(df.columns))  # 模型
    model_online_file_path = ConfigUtils._global_dbdir + os.sep + 'dl' + os.sep + market_type + '_' + model_file_name + '.pkl'
    checkpoint = torch.load(model_online_file_path)
    model.load_state_dict(checkpoint['state_dict'])

    # 训练
    if train_start_i < train_end_i:
        X_train = self.get_tensor_from_np(x_narray[train_start_i:train_end_i, :]).float()
        y_train = self.get_tensor_from_df(y_df[train_start_i:train_end_i]).float()
        train_loader = Data.DataLoader(
            dataset=Data.TensorDataset(X_train, y_train),  # 封装进Data.TensorDataset()类的数据,可以为任意维度
            batch_size=19,  # 每块的大小
            shuffle=False  # 要不要打乱数据 (打乱比较好)
        )
        loss_function = nn.CrossEntropyLoss()  # loss
        optimizer = torch.optim.Adam(model.parameters(), lr=0.001)  # 优化器
        epochs = 200
        model.train()
        log_step_interval = 10  # 记录的步数间隔
        for t in range(epochs):
            total_loss, total_acc = 0, 0
            for step, (seq, label_list) in enumerate(train_loader):
                # 若想要获得类别,二分类问题使用四舍五入的方法即可:print(torch.round(y_pred))
                y_pred = model(seq)  # 进行预测。y_pred是一个长度为16的向量,分别是输进去的16个样本的预测值
                acc = self.accuracy(y_pred, label_list)  # 每一个批次,对全部数据进行预测,得到的平均正确率
                loss = loss_function(y_pred, label_list)  # 二元交叉熵损失,刻画的是实际输出(概率)与期望输出(概率)的距离
                # 把模型中所有的变量的grad置为0
                optimizer.zero_grad()  # 这个方法会把所有需要优化的参数(即初始化时传进去的模型的参数)的梯度置为0
                loss.backward()  # 反向传播,计算每一个变量的梯度
                optimizer.step()  # 使用优化方法对参数进行优化
                global_iter_num = t * len(train_loader) + step + 1
                if global_iter_num % log_step_interval == 0:
                    print("online train global_step:{}, loss:{:.4}, accuracy:{:.4}".format(global_iter_num, loss.item(), acc))
                    # 添加的第一条日志:损失函数-全局迭代次数
                    writer.add_scalar("online_train_loss", loss.item(), global_step=global_iter_num)
                    writer.add_scalar("online_train_acc", acc, global_step=global_iter_num)
                total_loss += loss.item()
                total_acc += acc
            writer.add_scalar("online_train_total_loss", total_loss, t)
            writer.add_scalar("online_train_total_acc", total_acc, t)
            print("online train epochs:{}, total_loss:{:.4}, total_accuracy:{:.4}".format(t, total_loss, total_acc))
        #     if (t) % 10 == 0:
        #         torch.save({'state_dict': model.state_dict()}, ConfigUtils._global_dbdir + os.sep + 'dl' + os.sep + market_type + '_'+model_file_name+'_daily.pkl')
        #         print('online epoch %d round,save model' % t)
        # torch.save({'state_dict': model.state_dict()}, ConfigUtils._global_dbdir + os.sep + 'dl' + os.sep + market_type + '_'+model_file_name+'_daily.pkl')
        # # save conf file
        # json_dict = {'start_dt':model_start_dt, 'end_dt':train_end_i}
        # json_rs = json.dumps(json_dict)
        # json_file = ConfigUtils._global_dbdir + os.sep + 'dl' + os.sep+market_type + '_' + model_file_name + "_conf.json"
        # with open(json_file, 'w') as file:
        #     file.write(json_rs)
    # 预估
    model.eval()
    if start_predict_dt_i==len(all_date_list) - 1: # last once
        next_predict_date = -1
        next_predict_i = len(df_index)
    else:
        next_predict_date = all_date_list[start_predict_dt_i + 1]
        next_predict_i = df_index.index(next_predict_date)

    X_eval = self.get_tensor_from_np(x_narray[predict_i:next_predict_i, :]).float()
    y_eval = self.get_tensor_from_df(y_df[predict_i:next_predict_i]).float()
    eval_loader = Data.DataLoader(
        dataset=Data.TensorDataset(X_eval, y_eval),  # 封装进Data.TensorDataset()类的数据,可以为任意维度
        batch_size=19,  # 每块的大小
        shuffle=False  # 要不要打乱数据 (打乱比较好)
    )
    pred_list,label_list = [],[]
    step_dt = None
    for step, (x, y) in enumerate(eval_loader):
        pred = model(x)
        #list = pred.data.squeeze(1).tolist()
        sub_pred_list = pred.data.tolist()
        pred_list.extend(sub_pred_list)
        label_list.extend(y.tolist())
    # 预估结果保存DB
    for j in range(len(pred_list)):
        origin_row = base_df.iloc[predict_i + j]
        dt = origin_row.name
        step_dt = dt
        code = origin_row['code']
        cn_name = pool_stock_dict[code]['cn_name']
        pred_value = pred_list[j]
        down_score = round(pred_value[0], 6)
        up_score = round(pred_value[1], 6)
        scoreDO = {'market_type': market_type, 'dt': dt, 'code': code, 'cn_name': cn_name, 'down_score': down_score, 'up_score': up_score, 'version':version}
        self.save_or_update_score(scoreDO)
    self._logger.info("predict and train step dt=" + step_dt + ",window start_i=" + str(predict_i) + ", end_i=" + str(next_predict_i)
                      + ",size=" + str(next_predict_i - predict_i) + " end ......")

 内容源自:数宽网www.dataquant.cn

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值