[软件工程应用与实践]lingvo学习笔记

最新推荐文章于 2025-12-09 16:47:37 发布

原创最新推荐文章于 2025-12-09 16:47:37 发布 · 363 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#深度学习 #lstm #机器学习 #python #tensorflow

[软件工程应用与实践]lingvo学习笔记专栏收录该内容

18 篇文章

订阅专栏

本文介绍了一种LSTMRNN层并行化方法，通过输入投影跨时间步并行化来加速训练过程。详细解释了FPropWithProjectedInput方法的作用及其实现，并对比了使用该方法与传统FProp方法的不同之处。

部署运行你感兴趣的模型镜像

2021SC@SDUSC

lingvo.core.lstm_frnn_layer.py

使 LSTM RNN 层与 LSTM 神经元融合到 FRNN 的运行输入

class LSTMCellExt

使用额外的并行方法扩展基于lstm的单元格类

类方法

def FPropWithProjectedInput(self, theta, state0, inputs)

方法作用：该方法是将输入投影跨时间步并行化，以加速训练。

参数：

theta：层权重 NestedMap。值得注意的是，出于性能原因，它被期望在键’wm_i’(输入)和’wm_h’(隐藏状态)下为输入和隐藏状态投影包含单独的权重tensor。
state0：NestedMap。与self.zero_state()的返回值具有相同的结构。
inputs：
- proj_inputs：[batch, 4 * hidden_dim] 的 tensor
- padding：[batch, 1] 的 tensor
- reset_mask：[batch, 1] 的 tensor
return：
- state1：NestedMap。与state0结构相同。
- extras：NestedMap。反向传播的中间结果。.

源码：

def FPropWithProjectedInput(self, theta, state0, inputs):
    if self.params.reset_cell_state:
      state0_modified = self._ResetState(state0.DeepCopy(), inputs)
    else:
      state0_modified = state0
    xmw = self._MixWithProjectedInput(theta, state0_modified,
                                      inputs.proj_inputs)
    gates_input = inputs.copy()
    gates_input.act = [inputs.proj_inputs]
    state1 = self._Gates(xmw, theta, state0_modified, gates_input)
    return state1, py_utils.NestedMap()

该方法只是将输入投影跨时间步并行化来加速计算，因此以下使用FProp与使用FPropWithProjectedInput两种方式是等价的

    >>> inputs = <a tensor of [T, B, D]>
    >>> paddings = tf.zeros([T, B])
    >>> theta = cell.theta
    >>> state = cell.zero_state(theta, B)

使用 FProp().

    >>> for i in range(T):
    ...  state, _ = cell.FProp(theta, inputs[i, :, :], paddings, state)

使用FPropWithProjectedInput().

    >>> proj_inputs = cell.ProjectInputSequence(theta, inputs)
    >>> for i in range(T):
    ...  state, _ = cell.FPropWithProjectedInputs(
    ...    theta, proj_inputs[i, :, :], paddings, state)

关于 LSTMCell

参考文章

BasicLSTMCell源码

def build(self, inputs_shape):
    if inputs_shape[-1] is None:
      raise ValueError("Expected inputs.shape[-1] to be known, saw shape: %s"
                       % inputs_shape)
    input_depth = inputs_shape[-1]
    h_depth = self._num_units
    self._kernel = self.add_variable(
        _WEIGHTS_VARIABLE_NAME,
        shape=[input_depth + h_depth, 4 * self._num_units])
    self._bias = self.add_variable(
        _BIAS_VARIABLE_NAME,
        shape=[4 * self._num_units],
        initializer=init_ops.zeros_initializer(dtype=self.dtype))

build函数中初始化了[input_depth + h_depth, 4 * self._num_units]形状的变量

输入：其中input_depth代表Xt输入的维度，h_depth也就是_num_units代表ht-1的维度；
输出：4*self._num_units为4个非线性变换单元的维度W

搭建LSTM

创建cell之后有至少两种方式创建rnn

tf.nn.dynamic_rnn

batch_size = 5
time_step = 7
depth = 30
num_units = 20
inputs = tf.Variable(tf.random_normal([batch_size, time_step, depth])) 
cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
outputs, output_state = tf.nn.dynamic_rnn(cell, inputs, dtype=tf.float32)
# outputs1 	[batch_size, time_step, num_units]
# output_state [2, batch_size, num_units]

inputs形状为[batch_size, time_step, depth]
outputs形状为[batch_size, time_step, num_units]，每个time_step的信息总和，当要取最后一次time_step的信息，需要tf.transpose(outputs, [1,0,2])[-1]来获得
output_state形状为[2, batch_size, num_units]

tf.nn.static_rnn

inputs = tf.unstack(inputs, axis=1)
cell = tf.nn.rnn_cell.BasicLSTMCell(20)
outputs1, output_state_fw1 = tf.nn.static_rnn(cell, inputs, dtype=tf.float32)

inputs形状为[time_step, batch_size, depth]
outputs形状为[time_step, batch_size, num_units]，每个time_step的h信息总和，当要取最后一次time_step的信息，需要outputs[-1]来获得
output_state_fw形状为[2, batch_size, num_units]

类 class LstmFRNN(base_layer.BaseLayer)

它利用了跨时间步的输入投影的并行性，通常比LayerNormalizedLSTMCellLean和FRNN的组合要快。

类方法

def FProp(self, theta, inputs, paddings, state0=None, segment_id=None)

方法作用：计算LSTM前向传播

参数：

theta：NestedMap对象。包含该层及其子层的权重值。
inputs：基数等于rnn_cell.inputs_arity的单个张量或张量元组。对于每个输入张量，第一维是时间，第二维是批处理，第三维是深度。
paddings：一个张量。第一维是时间，第二维是批次，第三维是1。
state0：非空则初始rnn状态为NestedMap。默认为神经元的零状态。

源码：

def FProp(self, theta, inputs, paddings, state0=None, segment_id=None):
    p = self.params
    assert isinstance(self.cell, rnn_cell.RNNCell)

    if not isinstance(inputs, (list, tuple)):
      inputs = [inputs]

将wm切片到循环外的wm_{i,h}，以获得比常规的20%的加速
在循环中保持切片只能提供< 3%的加速。

    # LSTM baseline.
    cell_theta = theta.cell.copy()
    num_input_nodes = p.cell.num_input_nodes
    cell_theta['wm_i'] = cell_theta.wm[:num_input_nodes, :]
    cell_theta['wm_h'] = cell_theta.wm[num_input_nodes:, :]
    tf.logging.vlog(1, 'cell_theta: %r', cell_theta)
    if p.packed_input:
      assert segment_id is not None
      reset_mask = rnn_layers.GeneratePackedInputResetMask(
          segment_id, is_reverse=False)
      reset_mask = py_utils.HasShape(reset_mask, tf.shape(paddings))
    else:
      reset_mask = tf.zeros_like(paddings)

    if p.reverse:
      inputs = [tf.reverse(x, [0]) for x in inputs]
      paddings = tf.reverse(paddings, [0])
      reset_mask = tf.reverse(reset_mask, [0])

    if not state0:
      batch_size = py_utils.GetShape(paddings)[1]
      state0 = self.cell.zero_state(cell_theta, batch_size)

    # [T, B, H]
    proj_inputs = self.cell.ProjectInputSequence(cell_theta,
                                                 py_utils.NestedMap(act=inputs))
    proj_inputs = py_utils.NestedMap(
        proj_inputs=proj_inputs, padding=paddings, reset_mask=reset_mask)

    acc_state, final_state = recurrent.Recurrent(
        theta=cell_theta,
        state0=state0,
        inputs=proj_inputs,
        cell_fn=self.cell.FPropWithProjectedInput,
        cell_type=self.cell.layer_type,
        accumulator_layer=self,
        allow_implicit_capture=p.allow_implicit_capture)

    act = self.cell.GetOutput(acc_state)
    if p.reverse:
      act = tf.reverse(act, [0])
    return act, final_state

关于python的深拷贝浅拷贝

一、浅拷贝

所谓浅拷贝,指的是对于某个对象,虽然创建了与该对象具有相同值的另一个对象,但是,这两个对象內部嵌套的对应子对象全都是同一个对象。简单地说,外部进行了拷贝,内部没有拷贝。实际上他们内部都引用着同一个内存id。

L1 = [1,[1,2,3],6]

L2 = L1.copy() # [1, [1, 2, 3], 6] 使用list.copy()

L2 = L1[:] # [1, [1, 2, 3], 6] # 使用索引切片的方式

L2 = list(L1) #  [1, 2, 3], 6] # 使用list()函数赋值

L2 = copy.copy(L1) # [1, [1, 2, 3], 6]  # 调用标准库模块copy中的函数copy()

# 通过打印L1和L2的id可以看出，L2只拷贝了L1的外部，形成了一个和L1具有相同值的对象

# L1和L2内部值的id全都相同，即引用的同一内存地址

print('L1_id：%d' % id(L1)) # L1_id：140024932419056

print('L2_id：%d' % id(L2)) # L2_id：140024932419456

print('L1[1]_id：%d' % id(L1[1])) # L1[1]_id：140024932419376

print('L2[1]_id：%d' % id(L2[1])) # L2[1]_id：140024932419376

print('id_L1[2] %d' % id(L1[2])) # id_L1[2] 9466624

print('id_L2[2] %d' % id(L2[2])) # id_L2[2] 9466624

二、深拷贝

所谓深拷贝,指的是:对于某个对象,创建与该对象具有相同值的另一个对象,同时,这两个对象内部嵌套的对应可变子对象全都不是同一个对象。简单地说,外部和内部都进行了拷贝。

深拷贝方法：

调用标准库模块copy中的函数deepcopy()import copy

L1 = [1,[1,2,3],6]

L2 = copy.deepcopy(L1) # [1, [1, 2, 3], 6]

# 通过打印L1和L2的内存地址可以看出，其外部进行拷贝，L2是和L1具有相同值的新对象

# 对于内部嵌套的可变类型对象，L1[1]和L2[1]内存地址并不相同

# 对于内部嵌套的不可变类型对象，L1[2]和L2[2]内存地址相同，引用的同一内存地址

print('L1_id：%d' % id(L1)) # L1_id：139984573203792

print('L2_id：%d' % id(L2)) # L2_id：139984573203952

print('L1[1]_id：%d' % id(L1[1])) # L1[1]_id：139984573203472

print('L2[1]_id：%d' % id(L2[1])) # L2[1]_id：139984573204512

print('id_L1[2] %d' % id(L1[2])) # id_L1[2] 9466624

print('id_L2[2] %d' % id(L2[2])) # id_L2[2] 9466624

# 深拷贝，列表内部嵌套的可变类型对象，修改L1[1][1] 为5不影响L2[1][1]的值，

您可能感兴趣的与本文相关的镜像

AutoGPT

AI应用

AutoGPT于2023年3月30日由游戏公司Significant Gravitas Ltd.的创始人Toran Bruce Richards发布,AutoGPT是一个AI agent（智能体），也是开源的应用程序，结合了GPT-4和GPT-3.5技术，给定自然语言的目标，它将尝试通过将其分解成子任务，并在自动循环中使用互联网和其他工具来实现这一目标