关于tensorflow: stack_bidirectional_dynamic_rnn、bidirecitonal_dynamic_rnn函数中sequence_length的理解

关于tensorflow: stack_bidirectional_dynamic_rnn、bidirecitonal_dynamic_rnn函数中sequence_length参数的理解


最近因为做毕设,代码中用到了 stack_bidirectional_dynamic_rnn这个API 。对于其中的sequence_length纠结了两三天,不知道到底有没有必要传入这个参数。昨天读了相关源代码,懂了个大概。
通过阅读源码可知,该函数内部通过for循环调用了 bidirectional_dynamic_rnn函数,放上相关源代码( 链接):

def stack_bidirectional_dynamic_rnn(cells_fw,
                                    cells_bw,
                                    inputs,
                                    initial_states_fw=None,
                                    initial_states_bw=None,
                                    dtype=None,
                                    sequence_length=None,
                                    parallel_iterations=None,
                                    time_major=False,
                                    scope=None):
  """
  ...
  Args:
    sequence_length: (optional) An int32/int64 vector, size `[batch_size]`,
      containing the actual lengths for each of the sequences.
      ...
"""
 ...
  states_fw = []
  states_bw = []
  prev_layer = inputs

  with vs.variable_scope(scope or "stack_bidirectional_rnn"):
    for i, (cell_fw, cell_bw) in enumerate(zip(cells_fw, cells_bw)):
      initial_state_fw = None
      initial_state_bw = None
      if initial_states_fw:
        initial_state_fw = initial_states_fw[i]
      if initial_states_bw:
        initial_state_bw = initial_states_bw[i]

      with vs.variable_scope("cell_%d" % i):
        outputs, (state_fw, state_bw) = rnn.bidirectional_dynamic_rnn(
            cell_fw,
            cell_bw,
            prev_layer,
            initial_state_fw=initial_state_fw,
            initial_state_bw=initial_state_bw,
            sequence_length=sequence_length,
            parallel_iterations=parallel_iterations,
            dtype=dtype,
            time_major=time_major)
        # Concat the outputs to create the new input.
        prev_layer = array_ops.concat(outputs, 2)
      states_fw.append(state_fw)
      states_bw.append(state_bw)

  return prev_layer, tuple(states_fw), tuple(states_bw)

可以看到,注释中写的是sequence_length是optional,但是实际debug时,发现如果不加sequence_length这个参数,输出的output中,某个数据[max_time, layers_output] 超过实际序列长度后,是不同的output,还是在向后传递计算。所以在模型训练时,这有可能会影响最后的prediction。

现在来看看bidirectional_dynamic_rnn链接)中,对于sequence_length参数的处理,这里注意该函数内部调用的是dynamic_rnn。

def bidirectional_dynamic_rnn(cell_fw, cell_bw, inputs, sequence_length=None,
                              initial_state_fw=None, initial_state_bw=None,
                              dtype=None, parallel_iterations=None,
                              swap_memory=False, time_major=False, scope=None):
 ...
 Args:
    sequence_length: (optional) An int32/int64 vector, size `[batch_size]`,
      containing the actual lengths for each of the sequences in the batch.
      If not provided, all batch entries are assumed to be full sequences; and
      time reversal is applied from time `0` to `max_time` for each sequence.
...
  with vs.variable_scope(scope or "bidirectional_rnn"):
    # Forward direction
    with vs.variable_scope("fw") as fw_scope:
      output_fw, output_state_fw = dynamic_rnn(
          cell=cell_fw, inputs=inputs, sequence_length=sequence_length,
          initial_state=initial_state_fw, dtype=dtype,
          parallel_iterations=parallel_iterations, swap_memory=swap_memory,
          time_major=time_major, scope=fw_scope)

前向传播部分,是把sequence_length传入dynamic_rnn。

现在重点来了,后向传播部分,需要先把input进行reverse,再计算dynamic_rnn。

    # Backward direction
    if not time_major:
      time_axis = 1
      batch_axis = 0
    else:
      time_axis = 0
      batch_axis = 1

    def _reverse(input_, seq_lengths, seq_axis, batch_axis):
      if seq_lengths is not None:
        return array_ops.reverse_sequence(
            input=input_, seq_lengths=seq_lengths,
            seq_axis=seq_axis, batch_axis=batch_axis)
      else:
        return array_ops.reverse(input_, axis=[seq_axis])

    with vs.variable_scope("bw") as bw_scope:

      def _map_reverse(inp):
        return _reverse(
            inp,
            seq_lengths=sequence_length,
            seq_axis=time_axis,
            batch_axis=batch_axis)

      inputs_reverse = nest.map_structure(_map_reverse, inputs)
      tmp, output_state_bw = dynamic_rnn(
          cell=cell_bw, inputs=inputs_reverse, sequence_length=sequence_length,
          initial_state=initial_state_bw, dtype=dtype,
          parallel_iterations=parallel_iterations, swap_memory=swap_memory,
          time_major=time_major, scope=bw_scope)

后向传播部分,对input进行reverse,是经过_map_reverse函数,也即_reverse函数完成的。在_reverse函数中,就有对sequence_length参数的处理,如果没有输入这个参数,通过array_ops.reverse函数翻转整个序列。那么对于输入长短不一的数据,padding部分的0,就会在后向传播一开始输入进去;而输入了sequence_length参数的情况下,是将其传入array_ops.reverse_sequence函数进行翻转。

先看看两个翻转函数的区别:

import tensorflow as tf
import numpy as np

with tf.Session() as sess:
    # a shape[3, 4, 2]
    a = np.array([[[1, 2], [2, 1], [4, 3], [0, 0]],
                  [[2, 1], [3, 4], [0, 0], [0, 0]],
                  [[3, 5], [1, 3], [4, 6], [5, 2]]])

    seq_length = [3, 2, 4]
    b = tf.reverse_sequence(a, seq_length, 1, 0)
    c = tf.reverse(a, axis=[1])
    print(sess.run(b))
    print(sess.run(c))
    
输出:
[[[4 3]
  [2 1]
  [1 2]
  [0 0]]
 [[3 4]
  [2 1]
  [0 0]
  [0 0]]
 [[5 2]
  [4 6]
  [1 3]
  [3 5]]]
[[[0 0]
  [4 3]
  [2 1]
  [1 2]]
 [[0 0]
  [0 0]
  [3 4]
  [2 1]]
 [[5 2]
  [4 6]
  [1 3]
  [3 5]]]

能看到reverse_sequence是只翻转实际长度,0放在最后,reverse是直接全部翻转。

所以,得到第一个结论,stack_bidirectional_dynamic_rnnbidirecitonal_dynamic_rnn函数对sequence_length参数的处理在于使用了两个不同的翻转函数:reverse_sequence和reverse。

那么,sequence_length的输入与否会对stack_bidirectional_dynamic_rnn、bidirecitonal_dynamic_rnn的输出造成什么影响呢?

笔者看来,有两个方面对输出有潜在影响:

  • 一个是正向序列梯度计算时,输入的不足最大长度的data,在计算到实际长度后,还会对后面padding为0的time_step进行计算,生成不必要的错误数据,使训练时长增加;

  • 二则是由于是双向RNN,在反向序列计算中,reverse后,前面padding为0的time_step计算都是无用的,到了真正有数据时才开始计算。

现在还需要搞清楚,是否输入sequence_length是否会对计算结果准确度造成影响?

从基本的开始,先看sequence_length对dynamic_rnn函数输出的影响:

with tf.Session() as sess:
    X = np.random.randn(2, 10, 8)
    # 第二个example长度为6
    X[1, 6:] = 0.0
    # X[1, :6] = 0.0
    X_lengths = [10, 6]

    cell = tf.nn.rnn_cell.LSTMCell(num_units=5)
    outputs, last_states = tf.nn.dynamic_rnn(
        cell=cell,
        dtype=tf.float64,
        sequence_length=X_lengths,
        inputs=X)
    outputs1, last_states1 = tf.nn.dynamic_rnn(
        cell=cell,
        dtype=tf.float64,
        inputs
import torch import torch.nn as nn import torch.nn.functional as F import os import numpy as np import pandas as pd import pickle from torch.utils.data import Dataset, DataLoader, random_split from torch.optim import Adam # 配置文件路径 LABEL_FILE = r'E:\Python_Eswin\growing\dataset_202507_10spec\necking_ingot1_2_match_cleaned_data_above_50rows_cleaned.csv' SPEED_DIR = r'E:\Python_Eswin\growing\dataset_202507_10spec\ingot_tau_cal\Pull Speed Actual_embedded_5' ADC_DIR = r'E:\Python_Eswin\growing\dataset_202507_10spec\ingot_tau_cal\ADC Actual V_embedded_6' MELTGAP_DIR = r'E:\Python_Eswin\growing\dataset_202507_10spec\ingot_tau_cal\Melt_Gap_embedded_5' SPEC_FILE = r'E:\Python_Eswin\growing\dataset_202507_10spec\ingot_10_spec_length_cleaned_above_50rows.csv' MODEL_PATH = r"D:\anaconda\envs\pytorch\GrowingProject\lstm\for_10spec\saved_model\best_model.pth" TEST_SET_PATH = r"D:\anaconda\envs\pytorch\GrowingProject\lstm\for_10spec\saved_model\test_set.pkl" # 确保目录存在 os.makedirs(os.path.dirname(MODEL_PATH), exist_ok=True) class PullingDataset(Dataset): def __init__(self, label_df, speed_dir, adc_dir, meltgap_dir, spec_file): # 保存原始标签DataFrame self.labels = label_df self.speed_dir = speed_dir self.adc_dir = adc_dir self.meltgap_dir = meltgap_dir self.spec_data = self._load_spec_data(spec_file) # 加载全局参数数据 # 预计算所有序列长度并过滤有效样本 self.lengths = [] self.valid_indices = [] # 调试信息:打印全局参数文件中的ID数量 print(f"全局参数文件中找到的铸锭ID数量: {len(self.spec_data)}") print(f"前5个铸锭ID: {list(self.spec_data.keys())[:5]}") for idx, ingot_id in enumerate(self.labels.iloc[:, 0]): # 确保ID为字符串并去除可能的前后空格 ingot_id_str = str(ingot_id).strip() # 调试信息:打印当前处理的ID if idx < 5: # 只打印前5个用于调试 print(f"处理标签文件中的铸锭ID: '{ingot_id_str}' (索引: {idx})") # 检查所有必需文件是否存在 speed_path = os.path.join(speed_dir, f"{ingot_id_str}_embedded_output.csv") adc_path = os.path.join(adc_dir, f"{ingot_id_str}_embedded_output.csv") meltgap_path = os.path.join(meltgap_dir, f"{ingot_id_str}_embedded_output.csv") file_exists = ( os.path.exists(speed_path) and os.path.exists(adc_path) and os.path.exists(meltgap_path) and ingot_id_str in self.spec_data ) if file_exists: try: # 获取各序列长度(跳过第一行索引) speed_df = pd.read_csv(speed_path, header=0) adc_df = pd.read_csv(adc_path, header=0) meltgap_df = pd.read_csv(meltgap_path, header=0) speed_len = len(speed_df) adc_len = len(adc_df) meltgap_len = len(meltgap_df) # 取最小长度 min_len = min(speed_len, adc_len, meltgap_len) if min_len > 0: self.lengths.append(min_len) self.valid_indices.append(idx) else: print(f"警告: {ingot_id_str} 文件有空数据") except Exception as e: print(f"读取文件错误: {ingot_id_str} - {e}") else: missing_files = [] if not os.path.exists(speed_path): missing_files.append("speed") if not os.path.exists(adc_path): missing_files.append("ADC") if not os.path.exists(meltgap_path): missing_files.append("MeltGap") if ingot_id_str not in self.spec_data: missing_files.append("spec_params") # 调试信息:检查为什么找不到spec_params print(f"在全局参数文件中找不到ID: '{ingot_id_str}'") if len(self.spec_data) > 0: print(f"全局参数文件中的ID示例: {list(self.spec_data.keys())[:5]}") print(f"文件缺失: {ingot_id_str} ({', '.join(missing_files)})") # 过滤有效样本 if len(self.valid_indices) > 0: self.labels = self.labels.iloc[self.valid_indices] print(f"有效样本数量: {len(self.labels)}") # 如果没有有效样本,打印更多信息 if len(self.labels) == 0: print("错误: 没有找到有效样本!") print(f"标签文件中的ID示例: {self.labels.iloc[:5, 0].tolist()}") print(f"全局参数文件中的ID示例: {list(self.spec_data.keys())[:5]}") print("请检查ID是否匹配(包括类型和格式)") def _load_spec_data(self, file_path): """加载全局参数文件,返回{ingot_id: 特征向量}""" try: df = pd.read_csv(file_path, header=0) print(f"成功加载全局参数文件: {file_path}") print(f"文件包含 {len(df)} 行数据") except Exception as e: print(f"加载全局参数文件错误: {e}") return {} spec_dict = {} for _, row in df.iterrows(): # 处理ID格式:先转换为浮点数再转为整数,最后转为字符串(去除小数部分) try: ingot_id = str(int(float(row.iloc[0]))) # 关键修改:处理浮点型ID spec_dict[ingot_id] = row.iloc[1:10].values.astype(np.float32) except ValueError as e: print(f"转换ID错误: {row.iloc[0]} - {e}") continue print(f"全局参数文件中找到的铸锭ID数量: {len(spec_dict)}") print(f"处理后ID示例: {list(spec_dict.keys())[:5]}") return spec_dict def __len__(self): return len(self.labels) def __getitem__(self, idx): row = self.labels.iloc[idx] ingot_id = str(row.iloc[0]).strip() # 确保ingot_id是字符串并去除空格 label = row.iloc[11] # 第12列: 标签 (0或1) try: # 加载所有序列数据(跳过第一行索引) speed_data = pd.read_csv( os.path.join(self.speed_dir, f"{ingot_id}_embedded_output.csv"), header=0 ).values.astype(np.float32) adc_data = pd.read_csv( os.path.join(self.adc_dir, f"{ingot_id}_embedded_output.csv"), header=0 ).values.astype(np.float32) meltgap_data = pd.read_csv( os.path.join(self.meltgap_dir, f"{ingot_id}_embedded_output.csv"), header=0 ).values.astype(np.float32) # 获取全局参数向量 spec_vector = self.spec_data[ingot_id] # 取最小长度 min_len = min(len(speed_data), len(adc_data), len(meltgap_data)) # 截取相同长度的序列 speed_data = speed_data[:min_len, :] adc_data = adc_data[:min_len, :] meltgap_data = meltgap_data[:min_len, :] # 创建全局参数矩阵(复制min_len次) spec_matrix = np.tile(spec_vector, (min_len, 1)) # 合并所有特征 combined_data = np.concatenate([ speed_data, # 5维 adc_data, # 6维 meltgap_data, # 5维 spec_matrix # 9维 ], axis=1) # 转换为tensor seq = torch.tensor(combined_data, dtype=torch.float32) label = float(label) label = torch.tensor(label, dtype=torch.float32) return seq, label, min_len except Exception as e: print(f"错误: 无法处理铸锭 {ingot_id}: {e}") return torch.zeros(0), torch.tensor(0.0), 0 def collate_fn(batch): sequences, labels, lengths = zip(*batch) max_len = max(lengths) if lengths else 0 # 过滤掉空序列 valid_indices = [i for i, seq in enumerate(sequences) if len(seq) > 0] if not valid_indices: return torch.zeros(0), torch.zeros(0), torch.zeros(0) sequences = [sequences[i] for i in valid_indices] labels = [labels[i] for i in valid_indices] lengths = [lengths[i] for i in valid_indices] # 获取特征维度 feat_dim = sequences[0].shape[1] if sequences[0].nelement() > 0 else 0 # 填充序列 padded_seqs = torch.zeros(len(sequences), max_len, feat_dim) for i, seq in enumerate(sequences): padded_seqs[i, :lengths[i]] = seq lengths = torch.tensor(lengths, dtype=torch.long) labels = torch.stack(labels) return padded_seqs, labels, lengths def save_test_set(test_dataset, path): test_data = [] for i in range(len(test_dataset)): test_data.append(test_dataset[i]) with open(path, 'wb') as f: pickle.dump(test_data, f) class SpeedLSTM(nn.Module): def __init__(self, input_dim=25, hidden_dim=128, num_layers=2, dropout=0.2): super().__init__() # 添加LayerNorm稳定训练 self.input_norm = nn.LayerNorm(input_dim) self.lstm = nn.LSTM( input_size=input_dim, hidden_size=hidden_dim, num_layers=num_layers, bidirectional=True, batch_first=True, dropout=dropout if num_layers > 1 else 0 ) # 添加梯度裁剪和更稳定的分类器 self.classifier = nn.Sequential( nn.LayerNorm(2 * hidden_dim), nn.Linear(2 * hidden_dim, 64), nn.ReLU(), nn.Dropout(dropout), nn.LayerNorm(64), nn.Linear(64, 32), nn.ReLU(), nn.Dropout(dropout), nn.LayerNorm(32), nn.Linear(32, 1) ) def forward(self, x, lengths): if x.nelement() == 0: return torch.zeros(x.size(0)), None x = self.input_norm(x) # 输入标准化 packed = nn.utils.rnn.pack_padded_sequence(x, lengths.cpu(), batch_first=True, enforce_sorted=False) packed_out, _ = self.lstm(packed) out, _ = nn.utils.rnn.pad_packed_sequence(packed_out, batch_first=True) # 取最后一个有效时间步 last_output = out[torch.arange(out.size(0)), lengths - 1] # 梯度裁剪 torch.nn.utils.clip_grad_norm_(self.parameters(), max_norm=1.0) return self.classifier(last_output).squeeze(-1), None def save_model(model, MODEL_PATH): os.makedirs(os.path.dirname(MODEL_PATH), exist_ok=True) try: torch.save(model.state_dict(), MODEL_PATH) print(f"模型成功保存至: {MODEL_PATH}") except Exception as e: print(f"模型保存失败: {e}") def load_model(path, input_dim=25): model = SpeedLSTM(input_dim) model.load_state_dict(torch.load(path, map_location=torch.device('cpu'))) model.eval() print(f"从 {path} 加载模型") return model def main(): print(f"使用设备: {'GPU' if torch.cuda.is_available() else 'CPU'}") # 加载标签数据 - 跳过第一行索引 try: # 使用header=0跳过第一行索引 labels_df = pd.read_csv(LABEL_FILE, header=0) print(f"加载标签数据: {labels_df.shape[0]} 行") except Exception as e: print(f"加载数据错误: {e}") return # 创建完整数据集 full_dataset = PullingDataset( labels_df, SPEED_DIR, ADC_DIR, MELTGAP_DIR, SPEC_FILE ) if len(full_dataset) == 0: print("错误: 没有有效样本!") return # 检查第一个样本获取输入维度 sample, label, length = full_dataset[0] if len(sample) > 0: input_dim = sample.shape[1] print(f"检测到输入维度: {input_dim} (5拉速 + 6ADC + 5MeltGap + 9全局参数)") else: print("警告: 无法获取样本维度, 使用默认值25") input_dim = 25 # 划分训练/验证集 (80%/20%) train_size = int(0.8 * len(full_dataset)) test_size = len(full_dataset) - train_size train_dataset, test_dataset = random_split(full_dataset, [train_size, test_size]) # 保存测试集 save_test_set(test_dataset, TEST_SET_PATH) print(f"保存测试集到 {TEST_SET_PATH}") # 创建数据加载器 train_loader = DataLoader( train_dataset, batch_size=32, shuffle=True, collate_fn=collate_fn) # 初始化模型 device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = SpeedLSTM(input_dim=input_dim).to(device) # 计算类别权重 labels = [] for i in range(len(full_dataset)): try: _, label_val, _ = full_dataset[i] labels.append(float(label_val.item())) except Exception as e: print(f"错误: 获取标签值失败: {e}") pos_count = sum(labels) neg_count = len(labels) - pos_count pos_weight = torch.tensor([neg_count / max(pos_count, 1)]).to(device) print(f"类别分布: 负样本 {neg_count}, 正样本 {pos_count}") print(f"正样本权重: {pos_weight.item():.2f}") criterion = nn.BCEWithLogitsLoss(pos_weight=pos_weight) optimizer = Adam(model.parameters(), lr=0.0001, weight_decay=1e-4) torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) # 训练参数 best_val_loss = float('inf') epochs = 100 early_stop_patience = 20 no_improve_count = 0 print("\n开始训练...") for epoch in range(epochs): model.train() train_loss = 0.0 train_correct = 0 train_total = 0 for inputs, labels, lengths in train_loader: if inputs.nelement() == 0: continue inputs, labels = inputs.to(device), labels.to(device) optimizer.zero_grad() outputs, _ = model(inputs, lengths) loss = criterion(outputs, labels) loss.backward() torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0) optimizer.step() preds = (torch.sigmoid(outputs) > 0.5).float() train_correct += (preds == labels).sum().item() train_total += labels.size(0) train_loss += loss.item() * inputs.size(0) # 计算平均损失和准确率 train_loss = train_loss / train_total if train_total > 0 else 0 train_acc = train_correct / train_total if train_total > 0 else 0 print(f"Epoch {epoch + 1}/{epochs}: " f"Train Loss: {train_loss:.4f}, " f"Acc: {train_acc:.4f}") # 保存最佳模型 if train_loss < best_val_loss: best_val_loss = train_loss save_model(model, MODEL_PATH) no_improve_count = 0 print(f"保存最佳模型 (loss={best_val_loss:.4f})") else: no_improve_count += 1 if no_improve_count >= early_stop_patience: print(f"早停: {early_stop_patience}个epoch无改进") break print(f"\n训练完成! 最佳模型保存至: {MODEL_PATH}") if __name__ == "__main__": main() 我想要优化这个二分类预测模型的准确性,目前准确度在50%左右
最新发布
07-25
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值