DL学习笔记【19】nn包中的各位Modules

最新推荐文章于 2025-10-12 23:24:35 发布

原创最新推荐文章于 2025-10-12 23:24:35 发布 · 1.6k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#deep learning #module

深度学习专栏收录该内容

44 篇文章

订阅专栏

本文详细介绍了PyTorch中网络构建的基本概念，包括前向传播、反向传播、参数更新等关键步骤，并通过实例展示了如何共享参数及类型转换。

参考网址：

https://github.com/torch/nn/blob/master/doc/module.md#nn.Module.updateOutput

Module

[output] forward(input)

提供输入，计算输出

和UpdateOutput的关系，可能是包含【***********不需要记录参数的值么？】————————————————

[gradInput] backward(input, gradOutput)

在forward之后使用，计算梯度

包含UpdateGradInput(input, gradOutput)【计算梯度】 accGradParameters(input, gradOutput, scale)【记录累积的梯度】两个函数

zeroGradParameters()

将accGradParameters(input, gradOutput, scale)累积的梯度清零，之后再应用backward计算本次的梯度。这样存储不会冲突。

updateParameters(learningRate)

更新参数，在backward之后使用，即可以将计算好的参数应用到网络中

accUpdateGradParameters(input, gradOutput, learningRate)

【别看这个函数。。没啥用。。】合并了两个过程，不过不实用，因为不存储gradient，对于计算非线性操作的网络并不适用

share(mlp,s1,s2,...,sn)

共享参数，修改一个模块的参数，另一个响应模块也会修改

例子程序：

-- make an mlp
mlp1=nn.Sequential();
mlp1:add(nn.Linear(100,10));

-- make a second mlp
mlp2=nn.Sequential();
mlp2:add(nn.Linear(100,10));

-- the second mlp shares the bias of the first
mlp2:share(mlp1,'bias');

-- we change the bias of the first
mlp1:get(1).bias[1]=99;

-- and see that the second one's bias has also changed..
print(mlp2:get(1).bias[1])

clone(mlp,...)

相当于创建+share的过程

例子代码：

-- make an mlp
mlp1=nn.Sequential();
mlp1:add(nn.Linear(100,10));


-- make a copy that shares the weights and biases
mlp2=mlp1:clone('weight','bias');

-- we change the bias of the first mlp
mlp1:get(1).bias[1]=99;

-- and see that the second one's bias has also changed..
print(mlp2:get(1).bias[1])

有两种情况，分不太清-。- 第一种是不分享gradparameters？所以说。。就只是初始化的时候一样么？可是这里边使用了share，share是修改一个，另一个也会修改吧。。？

第二种情况虽然最后param有加上gradparams，但是下一个循环就会清空吧。。并且backward的过程就已经记录了。

情况一：

-- our optimization procedure will iterate over the modules, so only share
-- the parameters
mlp = nn.Sequential()
linear = nn.Linear(2,2)
linear_clone = linear:clone('weight','bias') -- clone sharing the parameters
mlp:add(linear)
mlp:add(linear_clone)
function gradUpdate(mlp, x, y, criterion, learningRate) 
  local pred = mlp:forward(x)
  local err = criterion:forward(pred, y)
  local gradCriterion = criterion:backward(pred, y)
  mlp:zeroGradParameters()
  mlp:backward(x, gradCriterion)
  mlp:updateParameters(learningRate)
end

情况二：

-- our optimization procedure will use all the parameters at once, because
-- it requires the flattened parameters and gradParameters Tensors. Thus,
-- we need to share both the parameters and the gradParameters
mlp = nn.Sequential()
linear = nn.Linear(2,2)
-- need to share the parameters and the gradParameters as well
linear_clone = linear:clone('weight','bias','gradWeight','gradBias')
mlp:add(linear)
mlp:add(linear_clone)
params, gradParams = mlp:getParameters()
function gradUpdate(mlp, x, y, criterion, learningRate, params, gradParams)
  local pred = mlp:forward(x)
  local err = criterion:forward(pred, y)
  local gradCriterion = criterion:backward(pred, y)
  mlp:zeroGradParameters()
  mlp:backward(x, gradCriterion)
  -- adds the gradients to all the parameters at once
  params:add(-learningRate, gradParams)
end

type(type[, tensorCache])

转换模块参数的类型为指定类型

例子代码：

-- make an mlp
mlp1=nn.Sequential();
mlp1:add(nn.Linear(100,10));

-- make a second mlp
mlp2=nn.Sequential();
mlp2:add(nn.Linear(100,10));

-- the second mlp shares the bias of the first
mlp2:share(mlp1,'bias');

-- mlp1 and mlp2 will be converted to float, and will share bias
-- note: tensors can be provided as inputs as well as modules
nn.utils.recursiveType({mlp1, mlp2}, 'torch.FloatTensor')

下边列举三种特殊情况：

float([tensorCache])

Convenience method for calling module:type('torch.FloatTensor'[, tensorCache])

double([tensorCache])

Convenience method for calling module:type('torch.DoubleTensor'[, tensorCache])

cuda([tensorCache])

Convenience method for calling module:type('torch.CudaTensor'[, tensorCache])

State Variables

【*********************不知道怎么使用。。。教程里边有简单的介绍：指针不能改，但是tensor的size可以改，table layers还包含其他部分

output

模块输出

gradinput

输入的梯度

Parameters and gradients w.r.t parameters

我们需要训练的参数，当然，有些层不含参数

[{weights}, {gradWeights}] parameters()

返回weight和学习到的参数gradweight

如果想使用存在tensor中的参数，可以重载这一部分

[flatParameters, flatGradParameters] getParameters()

把所有的weight和gradweight分别存在一个tensor中，在一个网络中用一次，不能重载

training() VS evaluate()

This sets the mode of the Module (or sub-modules) to train=true. This is useful for modules likeDropout orBatchNormalization that have a different behaviour during training vs evaluation.

对待训练和评估不一样的网络需要设置这个参数

比如dropout训练时，需要随机舍弃weight，但是评估网络性能时，需要全都用上。

findModules(typename)

找的相应模块名字对应的输出的类型

或者找的相应地模块名字对应的模块，将模块类型替换掉

listModules()

列出一个网络中所有模块的名字

【****************************Returns a flattened list of modules, including container modules (which will be listed first), self, and any other component modules.什么是flattened？平整的？

clearState()

清理output, gradinput等

apply(function)

【*****************************并不能知道应该怎么使用-。-

training() and evaluate()实现的方式即为此

代码如下：

model:apply(function(module)
   module.train = true
end)

replace(function)

替换或者移除某些模块

例子代码：

model:replace(function(module)
   if torch.typename(module) == 'nn.Dropout' then
      return nn.Identity()
   else
      return module
   end
end)