Pytorch第六课：package-torch.nn详解（2）之网络结构组建

本文深入解析深度学习模型中的关键层，包括卷积层、池化层、激活层、线性层、归一化层、循环层、dropout层及Sparse层，详细介绍了各层的功能、参数、形状和使用示例。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

本节要点：

1 卷积层

2 池化层

3 非线性激活层

4 正则层

5 循环层

6 线性层

7 Dropout层

8 Sparse层

9 Veision层

10 Multi-GPU层

1 卷积层
1.1 一维卷积层
类名：
class torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
Parameters：
in_channels(int) – 输入信号的通道

out_channels(int) – 卷积产生的通道

kerner_size(int or tuple) - 卷积核的尺寸

stride(int or tuple, optional) - 卷积步长

padding (int or tuple, optional)- 输入的每一条边补充0的层数

dilation(int or tuple, optional) – 卷积核元素之间的间距

groups(int, optional) – 从输入通道到输出通道的阻塞连接数。制输入和输出之间的连接， group=1，输出是所有的输入的卷积；group=2，此时相当于有并排的两个卷积层，每个卷积层计算输入通道的一半，并且产生的输出是输出通道的一半，随后将这两个输出连接起来。

bias(bool, optional) - 如果bias=True，添加偏置

shape:
输入: (N,C_in,L_in)

输出: (N,C_out,L_out)

输入输出的计算方式：
Lout=floor((Lin+2padding−dilation(kernerlsize−1)−1)/stride+1) L_{out}=floor((L_{in}+2padding-dilation(kernerl_size-1)-1)/stride+1)
L
out

=floor((L
in

+2padding−dilation(kernerl
s

ize−1)−1)/stride+1)

变量:
变量是模型训练过程中要学习的对象，在卷积层中涉及两类：

weight(tensor) - 卷积的权重，大小是(out_channels, in_channels, kernel_size)

bias(tensor) - 卷积的偏置系数，大小是（out_channel）

例子：
下面给出一个构建一维卷积层的例子，并且感受一下输入输出的维度变化。

import torch
import torch.nn as nn
import torch.autograd as autograd

# 构建一个卷积层，inchannel是16需要与输入数据的channel一致
conv = nn.Conv1d(16, 33, 3, stride=2)

# 构建一个输如数据(比如20个样本，每个样本是16个channel, 每个channel是长度为50的一维向量)
input = autograd.Variable(torch.randn(20, 16, 50))

# 将数据输入卷积层进行前向计算（输出任然是20个样本，channel变成了33，因为stride=2,因此每个channel中是一个长度24的一维向量）
output = conv(input)
print(output.size())
1
2
3
4
5
6
7
8
9
10
11
12
13
torch.Size([20, 33, 24])
1
1.2 二维卷积层
类名：
class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
二维和一维卷积的区别在于输入数据每个channel中是二维的还是一维的。一般我们输入的图像数据都是hight*width的二维图像。

Parameters：
in_channels(int) – 输入信号的通道

out_channels(int) – 卷积产生的通道

kerner_size(int or tuple) - 卷积核的尺寸

stride(int or tuple, optional) - 卷积步长

padding(int or tuple, optional) - 输入的每一条边补充0的层数

dilation(int or tuple, optional) – 卷积核元素之间的间距

groups(int, optional) – 从输入通道到输出通道的阻塞连接数

bias(bool, optional) - 如果bias=True，添加偏置

二维中，参数kernel_size，stride,padding，dilation可以是一个int的数据，也可以是一个二元的tuple类型，里面分别是hight和width对应的数值。

shape:
input: (N,C_in,H_in,W_in)

output: (N,C_out,H_out,W_out)

Hout=floor((Hin+2padding[0]−dilation[0](kernerlsize[0]−1)−1)/stride[0]+1) H_{out}=floor((H_{in}+2padding[0]-dilation[0](kernerl_size[0]-1)-1)/stride[0]+1)
H
out

=floor((H
in

+2padding[0]−dilation[0](kernerl
s

ize[0]−1)−1)/stride[0]+1)

Wout=floor((Win+2padding[1]−dilation[1](kernerlsize[1]−1)−1)/stride[1]+1) W_{out}=floor((W_{in}+2padding[1]-dilation[1](kernerl_size[1]-1)-1)/stride[1]+1)
W
out

=floor((W
in

+2padding[1]−dilation[1](kernerl
s

ize[1]−1)−1)/stride[1]+1)

变量:
weight(tensor) - 卷积的权重，大小是(out_channels, in_channels,kernel_size)

bias(tensor) - 卷积的偏置系数，大小是（out_channel）

例子：
# 构建一个二维卷积层, strie可以是Int值，表示height,width都对应1
conv = nn.Conv2d(16, 33, 3, stride=2)

# 也可以是tuple
conv = nn.Conv2d(16, 33, (3,5), stride=(2,1), padding=(4, 2), dilation=(3, 1))

# 构建输入数据，16个channel，每个channel中是50*100的二维矩阵
input = autograd.Variable(torch.randn(20, 16, 50, 100))

# 前向计算,注意输出维度的变化
output = conv(input)
print(output.size())
1
2
3
4
5
6
7
8
9
10
11
12
torch.Size([20, 33, 26, 100])
1
1.3 三维卷积层
类名：
class torch.nn.Conv3d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)
shape:
input: (N,C_in,D_in,H_in,W_in)

output: (N,C_out,D_out,H_out,W_out)

Dout=floor((Din+2padding[0]−dilation[0](kernerlsize[0]−1)−1)/stride[0]+1) D_{out}=floor((D_{in}+2padding[0]-dilation[0](kernerl_size[0]-1)-1)/stride[0]+1)
D
out

=floor((D
in

+2padding[0]−dilation[0](kernerl
s

ize[0]−1)−1)/stride[0]+1)

Hout=floor((Hin+2padding[1]−dilation[2](kernerlsize[1]−1)−1)/stride[1]+1) H_{out}=floor((H_{in}+2padding[1]-dilation[2](kernerl_size[1]-1)-1)/stride[1]+1)
H
out

=floor((H
in

+2padding[1]−dilation[2](kernerl
s

ize[1]−1)−1)/stride[1]+1)

Wout=floor((Win+2padding[2]−dilation[2](kernerlsize[2]−1)−1)/stride[2]+1) W_{out}=floor((W_{in}+2padding[2]-dilation[2](kernerl_size[2]-1)-1)/stride[2]+1)
W
out

=floor((W
in

+2padding[2]−dilation[2](kernerl
s

ize[2]−1)−1)/stride[2]+1)

例子：
参数个变量与一维和二维都是一样的。

因为是三维的，参数kernel_size，stride,padding，dilation可以是一个int的数据，也可以是一个三元的tuple类型。

下面给出一个例子：

# With square kernels and equal stride
m = nn.Conv3d(16, 33, 3, stride=2)

# non-square kernels and unequal stride and with padding
m = nn.Conv3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(4, 2, 0))

input = autograd.Variable(torch.randn(20, 16, 10, 50, 100))

output = m(input)

print(output.size())
1
2
3
4
5
6
7
8
9
10
11
torch.Size([20, 33, 8, 50, 99])
1
1.4 解卷积层
类名：
与一维，二维，三维卷积层对应，解卷积也有一维，二维，三维，参数都是一样的，就是名字略有不同，分别是：

class torch.nn.ConvTranspose1d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True)

class torch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True)

class torch.nn.ConvTranspose3d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True)

参数：
参数都是一样的，要注意的是：由于内核的大小，输入的最后的一些列的数据可能会丢失。因为输入和输出是不是完全的互相关。因此，用户可以进行适当的填充（padding操作）。

in_channels(int) – 输入信号的通道数

out_channels(int) – 卷积产生的通道

kernel_size(int or tuple) - 卷积核的大小

stride(int or tuple, optional) - 卷积步长

padding(int or tuple, optional) - 输入的每一条边补充0的层数

output_padding(int or tuple, optional) - 输出的每一条边补充0的层数

dilation(int or tuple, optional) – 卷积核元素之间的间距

groups(int, optional) – 从输入通道到输出通道的阻塞连接数

bias(bool, optional) - 如果bias=True，添加偏置

参数kernel_size，stride, padding，dilation数据类型：一个int类型的数据，此时卷积height和width值相同; 也可以是一个tuple数组（包含来两个/三个int类型的数据），第一个int数据表示height的数值，tuple的第二个int类型的数据表示width的数值

变量:
变量也是一样的。

weight(tensor) - 卷积的权重，大小是(in_channels, in_channels,kernel_size)

bias(tensor) - 卷积的偏置系数，大小是(out_channel)

shape:
一维：

输入: (N,C_in,L_in)
输出: (N,C_out,L_out)
Lout=(Lin−1)stride−2padding+kernelsize+outputpadding L_{out}=(L_{in}-1)stride-2padding+kernel_size+output_padding
L
out

=(L
in

−1)stride−2padding+kernel
s

ize+output
p

adding
二维：

输入: (N,C_in,H_in，W_in)
输出: (N,C_out,H_out,W_out)
Hout=(Hin−1)stride[0]−2padding[0]+kernelsize[0]+outputpadding[0] H_{out}=(H_{in}-1)stride[0]-2padding[0]+kernel_size[0]+output_padding[0]
H
out

=(H
in

−1)stride[0]−2padding[0]+kernel
s

ize[0]+output
p

adding[0]
Wout=(Win−1)stride[1]−2padding[1]+kernelsize[1]+outputpadding[1] W_{out}=(W_{in}-1)stride[1]-2padding[1]+kernel_size[1]+output_padding[1]
W
out

=(W
in

−1)stride[1]−2padding[1]+kernel
s

ize[1]+output
p

adding[1]

三维：

输入: (N,C_in,H_in，W_in)
输出: (N,C_out,H_out,W_out)
Dout=(Din−1)stride[0]−2padding[0]+kernelsize[0]+outputpadding[0] D_{out}=(D_{in}-1)stride[0]-2padding[0]+kernel_size[0]+output_padding[0]
D
out

=(D
in

−1)stride[0]−2padding[0]+kernel
s

ize[0]+output
p

adding[0]
Hout=(Hin−1)stride[1]−2padding[1]+kernelsize[1]+outputpadding[0] H_{out}=(H_{in}-1)stride[1]-2padding[1]+kernel_size[1]+output_padding[0]
H
out

=(H
in

−1)stride[1]−2padding[1]+kernel
s

ize[1]+output
p

adding[0]

Wout=(Win−1)stride[2]−2padding[2]+kernelsize[2]+outputpadding[2] W_{out}=(W_{in}-1)stride[2]-2padding[2]+kernel_size[2]+output_padding[2]
W
out

=(W
in

−1)stride[2]−2padding[2]+kernel
s

ize[2]+output
p

adding[2]

例子：
给出三维的例子：

# With square kernels and equal stride
m = nn.ConvTranspose3d(16, 33, 3, stride=2)

# non-square kernels and unequal stride and with padding
m = nn.Conv3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(0, 4, 2))

input = autograd.Variable(torch.randn(20, 16, 10, 50, 100))

output = m(input)
print(output.size())
1
2
3
4
5
6
7
8
9
10
torch.Size([20, 33, 4, 54, 103])
1
2 池化层
池化层根据计算方式不同可以分为：最大池化，平均池化。

根据操作方式不同可以分为：普通池化、分数池化、幂池化、自适应池化。

根据维度不同可分为：一维、二维、三维。

2.1 最大池化
和卷积层一样，普通的最大池化中也是分为一维，二维，三维的。除了名字不一样，参数是一样的。

类名：
class torch.nn.MaxPool1d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
class torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
class torch.nn.MaxPool3d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
参数：
kernel_size(int or tuple) - max pooling的窗口大小
stride(int or tuple, optional) - max pooling的窗口移动的步长。默认值是kernel_size
padding(int or tuple, optional) - 输入的每一条边补充0的层数
dilation(int or tuple, optional) – 一个控制窗口中元素步幅的参数
return_indices - 如果等于True，会返回输出最大值的序号，对于上采样操作会有帮助
ceil_mode - 如果等于True，计算输出信号大小的时候，会使用向上取整，代替默认的向下取整的操作
在二维和三维中，参数kernel_size，stride, padding，dilation数据类型：可以是一个int类型的数据，此时卷积height和width值相同; 也可以是一个tuple数组（包含来两个int类型的数据），第一个int数据表示height的数值，tuple的第二个int类型的数据表示width的数值

shape:
一维：

输入: (N,C_in,L_in)
输出: (N,C_out,L_out)
Lout=floor((Lin+2padding−dilation(kernelsize−1)−1)/stride+1 L_{out}=floor((L_{in} + 2padding - dilation(kernel_size - 1) - 1)/stride + 1
L
out

=floor((L
in

+2padding−dilation(kernel
s

ize−1)−1)/stride+1
二维：

输入: (N,C,H_{in},W_in)
输出: (N,C,H_out,W_out)
Hout=floor((Hin+2padding[0]−dilation[0](kernelsize[0]−1)−1)/stride[0]+1 H_{out}=floor((H_{in} + 2padding[0] - dilation[0](kernel_size[0] - 1) - 1)/stride[0] + 1
H
out

=floor((H
in

+2padding[0]−dilation[0](kernel
s

ize[0]−1)−1)/stride[0]+1
Wout=floor((Win+2padding[1]−dilation[1](kernelsize[1]−1)−1)/stride[1]+1 W_{out}=floor((W_{in} + 2padding[1] - dilation[1](kernel_size[1] - 1) - 1)/stride[1] + 1
W
out

=floor((W
in

+2padding[1]−dilation[1](kernel
s

ize[1]−1)−1)/stride[1]+1

三维：

输入: (N,C,H_in,W_in)
输出: (N,C,H_out,W_out)
Dout=floor((Din+2padding[0]−dilation[0](kernelsize[0]−1)−1)/stride[0]+1) D_{out}=floor((D_{in} + 2padding[0] - dilation[0](kernel_size[0] - 1) - 1)/stride[0] + 1)
D
out

=floor((D
in

+2padding[0]−dilation[0](kernel
s

ize[0]−1)−1)/stride[0]+1)
Hout=floor((Hin+2padding[1]−dilation[1](kernelsize[0]−1)−1)/stride[1]+1) H_{out}=floor((H_{in} + 2padding[1] - dilation[1](kernel_size[0] - 1) - 1)/stride[1] + 1)
H
out

=floor((H
in

+2padding[1]−dilation[1](kernel
s

ize[0]−1)−1)/stride[1]+1)

Wout=floor((Win+2padding[2]−dilation[2](kernelsize[2]−1)−1)/stride[2]+1) W_{out}=floor((W_{in} + 2padding[2] - dilation[2](kernel_size[2] - 1) - 1)/stride[2] + 1)
W
out

=floor((W
in

+2padding[2]−dilation[2](kernel
s

ize[2]−1)−1)/stride[2]+1)

例子：
import torch
import torch.nn as nn
import torch.autograd as autograd

# 1. 创建一个一维最大池化层
p1 = nn.MaxPool1d(3, stride=2)

# 创建一个输入变量
input = autograd.Variable(torch.randn(20, 16, 50))

# 前向计算
output = p1(input)
print(output.size())

# 2. 创建一个二维最大池化层
p2 = nn.MaxPool2d((3,2), stride=(2,1))

# 创建一个输入变量
input = autograd.Variable(torch.randn(20, 16, 50, 32))

# 前向计算
output = p2(input)
print(output.size())

# 2. 创建一个三维最大池化层
p3 = nn.MaxPool3d((3,2, 1), stride=(2,1, 1))

# 创建一个输入变量
input = autograd.Variable(torch.randn(20, 16, 50, 32, 20))

# 前向计算
output = p3(input)
print(output.size())

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
torch.Size([20, 16, 24])
torch.Size([20, 16, 24, 31])
torch.Size([20, 16, 24, 31, 20])
1
2
3
与最大池化相对应的有最大逆池化。

MaxUnpool是Maxpool的逆过程，不过并不是完全的逆过程，因为在maxpool1d的过程中，一些最大值的已经丢失。 MaxUnpool1d输入MaxPool1d的输出，包括最大值的索引，并计算所有maxpool1d过程中非最大值被设置为零的部分的反向。

MaxPool1d可以将多个输入大小映射到相同的输出大小。因此，反演过程可能会变得模棱两可。为了适应这一点，可以在调用中将输出大小（output_size）作为额外的参数传入。具体用法，请参阅下面的输入和示例

同样也有三个维度：

类名：
class torch.nn.MaxUnpool1d(kernel_size, stride=None, padding=0)
class torch.nn.MaxUnpool2d(kernel_size, stride=None, padding=0)
class torch.nn.MaxUnpool3d(kernel_size, stride=None, padding=0)
参数：
kernel_size(int or tuple) - max pooling的窗口大小
stride(int or tuple, optional) - max pooling的窗口移动的步长。默认值是kernel_size
padding(int or tuple, optional) - 输入的每一条边补充0的层数
输入：
input:需要转换的tensor
indices：Maxpool1d的索引号
output_size:一个指定输出大小的torch.Size
shape:
一维：

input: (N,C,H_in)
output:(N,C,H_out)
Hout=(Hin−1)stride[0]−2padding[0]+kernelsize[0] H_{out}=(H_{in}-1)stride[0]-2padding[0]+kernel_size[0]
H
out

=(H
in

−1)stride[0]−2padding[0]+kernel
s

ize[0]

也可以使用output_size指定输出的大小
二维：

input: (N,C,H_in,W_in)
output:(N,C,H_out,W_out)
Hout=(Hin−1)stride[0]−2padding[0]+kernelsize[0] H_{out}=(H_{in}-1)stride[0]-2padding[0]+kernel_size[0]
H
out

=(H
in

−1)stride[0]−2padding[0]+kernel
s

ize[0]

Wout=(Win−1)stride[1]−2padding[1]+kernelsize[1] W_{out}=(W_{in}-1)stride[1]-2padding[1]+kernel_size[1]
W
out

=(W
in

−1)stride[1]−2padding[1]+kernel
s

ize[1]

也可以使用output_size指定输出的大小

三维：

input: (N,C,D_in,H_in,W_in)
output:(N,C,D_out,H_out,W_out)
KaTeX parse error: Expected & or \\ or \cr or \end at position 72: …+kernel_size[0]\̲ ̲H_{out}=(H_{in}…
例子：
以一维池化举例：

# 创建一个池化层
p = nn.MaxPool1d(2, stride=2, return_indices=True)

# 创建一个逆池化层
up = nn.MaxUnpool1d(2, stride=2)

# 创建输入变量1*1*8
input = autograd.Variable(torch.Tensor([[[1, 2, 3, 4, 5, 6, 7, 8]]]))

# 池化层计算
output, indices = p(input)

# 逆池化
up_output = up(output, indices)
print(up_output)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
tensor([[[ 0., 2., 0., 4., 0., 6., 0., 8.]]])
1
# 可以用output_size来指定输出的大小
# 逆池化
up_output = up(output, indices,output_size=input.size())
print(up_output)
1
2
3
4
tensor([[[ 0., 2., 0., 4., 0., 6., 0., 8.]]])
1
2.2 平均池化
类名：
class torch.nn.AvgPool1d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True)
class torch.nn.AvgPool2d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True)
class torch.nn.AvgPool3d(kernel_size, stride=None)
参数：
kernel_size(int or tuple) - 池化窗口大小
stride(int or tuple, optional) - max pooling的窗口移动的步长。默认值是kernel_size
padding(int or tuple, optional) - 输入的每一条边补充0的层数
dilation(int or tuple, optional) – 一个控制窗口中元素步幅的参数
ceil_mode - 如果等于True，计算输出信号大小的时候，会使用向上取整，代替默认的向下取整的操作
count_include_pad - 如果等于True，计算平均池化时，将包括padding填充的0
大小：
参考最大池化层

例子：
以一维为例，其他可参考最大池化层

import torch
import torch.nn as nn
import torch.autograd as autograd

# 1. 创建一个一维最大池化层
p1 = nn.AvgPool1d(3, stride=2)

# 创建一个输入变量
input = autograd.Variable(torch.randn(20, 16, 50))

# 前向计算
output = p1(input)
print(output.size())
1
2
3
4
5
6
7
8
9
10
11
12
13
torch.Size([20, 16, 24])
1
2.3 分数最大池化
对输入的信号，提供2维的分数最大化池化操作分数最大化池化的细节请阅读论文:https://arxiv.org/abs/1412.6071

由目标输出大小确定的随机步长,在kH∗kW kH*kWkH∗kW区域进行最大池化操作。输出特征和输入特征的数量相同。

类名：
class torch.nn.FractionalMaxPool2d(kernel_size, output_size=None, output_ratio=None, return_indices=False, _random_samples=None)
参数：
kernel_size(int or tuple) - 最大池化操作时的窗口大小。可以是一个数字（表示KK的窗口），也可以是一个元组（khkw）
output_size - 输出图像的尺寸。可以使用一个tuple指定(oH,oW)，也可以使用一个数字oH指定一个oH*oH的输出。
output_ratio – 将输入图像的大小的百分比指定为输出图片的大小，使用一个范围在(0,1)之间的数字指定
return_indices - 默认值False，如果设置为True，会返回输出的索引，索引对 nn.MaxUnpool2d有用。
例子：
# 可以用确定的值来设定输出的大小
m = nn.FractionalMaxPool2d(3, output_size=(13, 12))

# 可以用分数比例来设定输出的大小
m = nn.FractionalMaxPool2d(3, output_ratio=(0.5, 0.5))

input = autograd.Variable(torch.randn(20, 16, 50, 32))
output = m(input)

print(output.size())

1
2
3
4
5
6
7
8
9
10
11
torch.Size([20, 16, 25, 16])
1
2.4 幂平均池化
对输入信号提供2维的幂平均池化操作。输出的计算方式： f(x)=pow(sum(X,p),1/p)

当p为无穷大的时候时，等价于最大池化操作

当p=1时，等价于平均池化操作

类名：
class torch.nn.LPPool2d(norm_type, kernel_size, stride=None, ceil_mode=False)
参数kernel_size, stride的数据类型：

int，池化窗口的宽和高相等
tuple数组（两个数字的），一个元素是池化窗口的高，另一个是宽
参数
kernel_size: 池化窗口的大小
stride：池化窗口移动的步长。kernel_size是默认值
ceil_mode: ceil_mode=True时，将使用向下取整代替向上取整
shape
输入：(N,C,H_in,W_in)
输出：(N,C,H_out,W_out)
Hout=floor((Hin+2padding[0]−dilation[0](kernelsize[0]−1)−1)/stride[0]+1) Wout=floor((Win+2padding[1]−dilation[1](kernelsize[1]−1)−1)/stride[1]+1) \begin{aligned} H_{out} = floor((H_{in}+2padding[0]-dilation[0](kernel_size[0]-1)-1)/stride[0]+1)\ W_{out} = floor((W_{in}+2padding[1]-dilation[1](kernel_size[1]-1)-1)/stride[1]+1) \end{aligned}
H
out

=floor((H
in

+2padding[0]−dilation[0](kernel
s

ize[0]−1)−1)/stride[0]+1) W
out

=floor((W
in

+2padding[1]−dilation[1](kernel
s

ize[1]−1)−1)/stride[1]+1)


例子：
# power-2 pool of square window of size=3, stride=2
m = nn.LPPool2d(2, 3, stride=2)

# pool of non-square window of power 1.2
m = nn.LPPool2d(1.2, (3, 2), stride=(2, 1))

input = autograd.Variable(torch.randn(20, 16, 50, 32))
output = m(input)

print(output.size())
1
2
3
4
5
6
7
8
9
10
torch.Size([20, 16, 24, 31])
1
2.5 自适应池化
2.5.1 自适应最大池化
对输入信号，提供1维或2维的自适应最大池化操作对于任何输入大小的输入，可以将输出尺寸指定为H，但是输入和输出特征的数目不会变化。

类名：
class torch.nn.AdaptiveMaxPool1d(output_size, return_indices=False)
class torch.nn.AdaptiveMaxPool2d(output_size, return_indices=False)
参数：
output_size: 输出信号的尺寸
return_indices: 如果设置为True，会返回输出的索引。对 nn.MaxUnpool1d有用，默认值是False
例子：
# 一维，指定大小为5
m = nn.AdaptiveMaxPool1d(5)
input = autograd.Variable(torch.randn(1, 64, 8))
output = m(input)

print(output.size())
1
2
3
4
5
6
torch.Size([1, 64, 5])
1
# 二维，指定大小为（5，7）
m = nn.AdaptiveMaxPool2d((5,7))
input = autograd.Variable(torch.randn(1, 64, 8, 9))

# 二维，指定大小为（7，7）
m = nn.AdaptiveMaxPool2d(7)
input = autograd.Variable(torch.randn(1, 64, 10, 9))

output = m(input)

print(output.size())
1
2
3
4
5
6
7
8
9
10
11
torch.Size([1, 64, 7, 7])
1
2.5.2 自适应平均池化
自适应平均池化与自适应最大池化类似，但参数只有：

output_size: 输出信号的尺寸
例子：
# target output size of 5x7
m = nn.AdaptiveAvgPool2d((5,7))
input = autograd.Variable(torch.randn(1, 64, 8, 9))

# target output size of 7x7 (square)
m = nn.AdaptiveAvgPool2d(7)
input = autograd.Variable(torch.randn(1, 64, 10, 9))

output = m(input)

print(output.size())
1
2
3
4
5
6
7
8
9
10
11
torch.Size([1, 64, 7, 7])
1
3 非线性激活层
类名   参数   公式
class torch.nn.ReLU(inplace=False)   inplace-选择是否进行覆盖运算   ReLU(x)=max(0,x) {ReLU}(x)= max(0, x)ReLU(x)=max(0,x)
class torch.nn.ReLU6(inplace=False)   inplace-选择是否进行覆盖运算   ReLU6(x)=min(max(0,x),6) {ReLU6}(x) = min(max(0,x), 6)ReLU6(x)=min(max(0,x),6)
class torch.nn.ELU(alpha=1.0, inplace=False)       f(x)=max(0,x)+min(0,alpha∗(ex−1)) f(x) = max(0,x) + min(0, alpha * (e^x - 1))f(x)=max(0,x)+min(0,alpha∗(e
x
−1))
class torch.nn.PReLU(num_parameters=1, init=0.25)   num_parameters：需要学习的a的个数，默认等于1；
init：a的初始值，默认等于0.25   PReLU(x)=max(0,x)+a∗min(0,x) PReLU(x) = max(0,x) + a * min(0,x)PReLU(x)=max(0,x)+a∗min(0,x)
class torch.nn.Threshold(threshold, value, inplace=False)   threshold：阈值
value：输入值小于阈值则会被value代替
inplace：选择是否进行覆盖运算   y=x,ifx>=thresholdy=value,ifx<threshold y=x,if x>=threshold y=value,if x<thresholdy=x,ifx>=thresholdy=value,ifx<threshold
class torch.nn.Sigmoid   无   f(x)=1/(1+e−x) f(x)=1/(1+e−x)f(x)=1/(1+e−x)
class torch.nn.Tanh   无   f(x)=ex−e−xex+ex f(x)=ex−e−xex+exf(x)=ex−e−xex+ex
class torch.nn.LogSigmoid   无   LogSigmoid(x)=log(1/(1+e−x)) LogSigmoid(x) = log( 1 / ( 1 + e^{-x}))LogSigmoid(x)=log(1/(1+e
−x
))
class torch.nn.Softplus(beta=1, threshold=20)   beta：Softplus函数的beta值
threshold：阈值   f(x)=1beta∗log(1+e(beta∗xi)) f(x)=1beta∗log(1+e(beta∗xi))f(x)=1beta∗log(1+e(beta∗xi))
class torch.nn.Softshrink(lambd=0.5)   lambd：Softshrink函数的lambda值，默认为0.5   f(x)=x−lambda,ifx>lambdaf(x)=x+lambda,ifx<−lambdaf(x)=0,otherwise f(x)=x−lambda,if x>lambda f(x)=x+lambda,if x<−lambda f(x)=0,otherwisef(x)=x−lambda,ifx>lambdaf(x)=x+lambda,ifx<−lambdaf(x)=0,otherwise
class torch.nn.Softsign   无   $f(x) = x / (1 +
class torch.nn.Tanhshrink   无   Tanhshrink(x)=x−Tanh(x) Tanhshrink(x)=x−Tanh(x)Tanhshrink(x)=x−Tanh(x)
class torch.nn.Softmin   无   fi(x)=e(−xi−shift)/∑je(−xj−shift),shift=max(xi) fi(x)=e(−xi−shift)/∑je(−xj−shift),shift=max(xi)fi(x)=e(−xi−shift)/∑je(−xj−shift),shift=max(xi)
class torch.nn.Softmax   无   fi(x)=e(xi−shift)/∑je(xj−shift),shift=max(xi) fi(x)=e(xi−shift)/∑je(xj−shift),shift=max(xi)fi(x)=e(xi−shift)/∑je(xj−shift),shift=max(xi)
class torch.nn.LogSoftmax   无   fi(x)=loge(xi)/a,a=∑je(xj) fi(x)=loge(xi)/a,a=∑je(xj)fi(x)=loge(xi)/a,a=∑je(xj)
下面举一个例子，其他以此类推。

# 创建一个激活函数Module
m = nn.Softmax()

# 创建输入变量
input = autograd.Variable(torch.randn(2, 3))

print(input)
print(m(input))
1
2
3
4
5
6
7
8
tensor([[ 1.7255, -0.2483, -0.4758],
[ 0.2217, 1.4740, -1.6893]])
tensor([[ 0.8003, 0.1112, 0.0886],
[ 0.2152, 0.7529, 0.0318]])

/Users/wangxiaocao/miniconda3/lib/python3.6/site-packages/ipykernel_launcher.py:8: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
1
2
3
4
5
6
7
4 线性层
类名：
class torch.nn.Linear(in_features, out_features, bias=True)
功能：
对输入数据做线性变换：y=Ax+b

参数：
in_features - 每个输入样本的大小
out_features - 每个输出样本的大小
bias - 若设置为False，这层不会学习偏置。默认值：True
形状：
输入: (N,in_features)
输出： (N,out_features)
变量：
weight -形状为(out_features x in_features)的模块中可学习的权值
bias -形状为(out_features)的模块中可学习的偏置
# 创建一个线性激活层module
m = nn.Linear(20, 30)

# 创建输入变量
input = autograd.Variable(torch.randn(128, 20))

# 线性计算
output = m(input)

print(output.size())

1
2
3
4
5
6
7
8
9
10
11
torch.Size([128, 30])
1
5 归一化层
类名：
class torch.nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True)
class torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True)
class torch.nn.BatchNorm3d(num_features, eps=1e-05, momentum=0.1, affine=True)
功能：
对小批量(mini-batch)输入进行批标准化(Batch Normalization)操作

在每一个小批量（mini-batch）数据中，计算输入各个维度的均值和标准差。gamma与beta是可学习的大小为C的参数向量（C为输入大小）

在训练时，该层计算每次输入的均值与方差，并进行移动平均。移动平均默认的动量值为0.1。

在验证时，训练求得的均值/方差将用于标准化验证数据。

参数：
num_features：来自期望输入的特征数，
一维：该期望输入的大小为’batch_size x num_features [x width]’

二维：该期望输入的大小为’batch_size x num_features x height x width’

三维：该期望输入的大小为’batch_size x num_features depth x height x width’

eps：为保证数值稳定性（分母不能趋近或取0）,给分母加上的值。默认为1e-5。
momentum：动态均值和动态方差所使用的动量。默认为0.1。
affine：一个布尔值，当设为true，给该层添加可学习的仿射变换参数。
大小：
输入与输出相同。

例子:
# With Learnable Parameters
m = nn.BatchNorm3d(100)

# Without Learnable Parameters
m = nn.BatchNorm3d(100, affine=False)

input = autograd.Variable(torch.randn(20, 100, 35, 45, 10))
output = m(input)
print(output.size())
1
2
3
4
5
6
7
8
9
torch.Size([20, 100, 35, 45, 10])
1
6 循环层
6.1 循环网络
目前提供三类最常用的循环网络:普通的RNN，LSTM，GRU。

类名：
class torch.nn.RNN( args, * kwargs)
class torch.nn.LSTM( args, * kwargs)
class torch.nn.GRU( args, * kwargs)
参数说明:
RNN:

input_size – 输入x的特征数量。
hidden_size – 隐层的特征数量。
num_layers – RNN的层数。
nonlinearity – 指定非线性函数使用tanh还是relu。默认是tanh。
bias – 如果是False，那么RNN层就不会使用偏置权重 bih b_ihb
i

h和bhh b_hhb
h

h,默认是True
batch_first – 如果True的话，那么输入Tensor的shape应该是[batch_size, time_step, feature],输出也是这样。
dropout – 如果值非零，那么除了最后一层外，其它层的输出都会套上一个dropout层。
bidirectional – 如果True，将会变成一个双向RNN，默认为False。
LSTM:

RNN参数中去掉nonlinearity
GRU：

与LSTM同
输入：(input, h_0)
input (seq_len, batch, input_size): 保存输入序列特征的tensor。input可以是被填充的变长的序列。细节请看torch.nn.utils.rnn.pack_padded_sequence()

h_0 (num_layers * num_directions, batch, hidden_size): 保存着初始隐状态的tensor

输出： (output, h_n)
output (seq_len, batch, hidden_size * num_directions): 保存着RNN最后一层的输出特征。如果输入是被填充过的序列，那么输出也是被填充的序列。
h_n (num_layers * num_directions, batch, hidden_size): 保存着最后一个时刻隐状态。
RNN模型参数:
weight_ih_l[k] – 第k层的 input-hidden 权重，可学习，形状是(input_size x hidden_size)。

weight_hh_l[k] – 第k层的 hidden-hidden 权重，可学习，形状是(hidden_size x hidden_size)

bias_ih_l[k] – 第k层的 input-hidden 偏置，可学习，形状是(hidden_size)

bias_hh_l[k] – 第k层的 hidden-hidden 偏置，可学习，形状是(hidden_size)

例子：
以GRU为例，其他二者可参考：

# 创建一个GRU循环神经网络,输入维度=10，hiden_size=20,hiden_layer=2
rnn = nn.GRU(10, 20, 2)

# 构建2个输入数据
# 3个样本，每个样本的序列长度是5，序列中每个元素的特征长度是10
input = autograd.Variable(torch.randn(5, 3, 10))
# GRU层数2，3个样本，隐层的输出维度是20
h0 = autograd.Variable(torch.randn(2, 3, 20))

# 计算，有2个输出
output, hn = rnn(input, h0)
print(output.size())
print(hn.size())
1
2
3
4
5
6
7
8
9
10
11
12
13
torch.Size([5, 3, 20])
torch.Size([2, 3, 20])
1
2
6.2 循环单元
注意，6.1可以一次性构建多层的整个循环神经网络，这一节讲的是构建一个循环单元，可以通过for循环将多个单元组合起来。也就是说多个时间维度上的cell组合起来才是完整的循环网络。

类名：
class torch.nn.RNNCell(input_size, hidden_size, bias=True, nonlinearity=‘tanh’)
class torch.nn.LSTMCell(input_size, hidden_size, bias=True)
class torch.nn.GRUCell(input_size, hidden_size, bias=True)
参数：
input_size – 输入x xx，特征的维度。

hidden_size – 隐状态特征的维度。

bias – 如果为False，RNN cell中将不会加入bias，默认为True。

nonlinearity – 用于选择非线性激活函数 [tanh|relu]. 默认值为： tanh。在LSTM和GRU中没有该参数。

输入： input, hidden
RNN:

input (batch, input_size): 包含输入特征的tensor。

hidden (batch, hidden_size): 保存着初始隐状态值的tensor。

LSTM和GRU:将hidden换成一下两个输入：

h_0 ( batch, hidden_size):保存着batch中每个元素的初始化隐状态的Tensor

c_0 (batch, hidden_size): 保存着batch中每个元素的初始化细胞状态的Tensor

输出： h’
RNN，GRU：

h’ (batch, hidden_size):下一个时刻的隐状态。
LSTM:

h_1 (batch, hidden_size): 下一个时刻的隐状态。
c_1 (batch, hidden_size): 下一个时刻的细胞状态。
变量：
weight_ih – input-hidden 权重，可学习，形状是(input_size x hidden_size)。

weight_hh – hidden-hidden 权重，可学习，形状是(hidden_size x hidden_size)

bias_ih – input-hidden 偏置，可学习，形状是(hidden_size)

bias_hh – hidden-hidden 偏置，可学习，形状是(hidden_size)

例子：
以GRU为例，其他二者可参考：

# 构建GRUcell,input_feature_size = 10, hidden_size=20
rnn = nn.GRUCell(10, 20)

# 构造输入变量，
# 序列长度=6，batch_size=3, input_size=10
input = autograd.Variable(torch.randn(6, 3, 10))
# batch_size=3, hidden_size=20
hx = autograd.Variable(torch.randn(3, 20))

output = []
for i in range(6):
# 输出隐层
hx = rnn(input[i], hx)
output.append(hx)

print(len(output))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
6
1
7.dropout层
类名：
class torch.nn.Dropout(p=0.5, inplace=False) 针对一维数据
class torch.nn.Dropout3d(p=0.5, inplace=False) 针对二维数据
class torch.nn.Dropout2d(p=0.5, inplace=False) 针对三维数据
参数：
p - 将元素置0的概率。默认值：0.5
in-place - 若设置为True，会在原地执行操作。默认值：False
形状：
1d:

输入：任意。输入可以为任意形状。
输出：相同。输出和输入形状相同。
2d:

输入： (N,C,H,W)
输出： (N,C,H,W)（与输入形状相同）
3d:

输入： N,C,D,H,W)
输出： (N,C,D,H,W)（与输入形状相同）
# 创建一个dropout层module
m = nn.Dropout(p=0.2)

input = autograd.Variable(torch.randn(20, 16))
output = m(input)
print(output.size())
1
2
3
4
5
6
torch.Size([20, 16])
1
# 创建一个dropout2d层module
m = nn.Dropout(p=0.2)

input = autograd.Variable(torch.randn(20, 16, 32,32))
output = m(input)
print(output.size())
1
2
3
4
5
6
torch.Size([20, 16, 32, 32])
1
# 创建一个dropout3d层module
m = nn.Dropout(p=0.2)

input = autograd.Variable(torch.randn(20, 16, 4,32,32))
output = m(input)
print(output.size())
1
2
3
4
5
6
torch.Size([20, 16, 4, 32, 32])
1
8 Sparse层
类名：
class torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False)
功能：
一个保存了固定字典和大小的简单查找表。

这个模块常用来保存词嵌入和用下标检索它们。模块的输入是一个下标的列表，输出是对应的词嵌入。

参数：
num_embeddings (int) - 嵌入字典的大小
embedding_dim (int) - 每个嵌入向量的大小
padding_idx (int, optional) - 如果提供的话，输出遇到此下标时用零填充
max_norm (float, optional) - 如果提供的话，会重新归一化词嵌入，使它们的范数小于提供的值
norm_type (float, optional) - 对于max_norm选项计算p范数时的p
scale_grad_by_freq (boolean, optional) - 如果提供的话，会根据字典中单词频率缩放梯度
变量：
weight (Tensor) -形状为(num_embeddings, embedding_dim)的模块中可学习的权值
形状：
输入： LongTensor (N, W), N = mini-batch, W = 每个mini-batch中提取的下标数
输出： (N, W, embedding_dim)
# 创建一个Sparse层module,10个词，每个词向量长度为3
embedding = nn.Embedding(10, 3)

# 创建一批数据，包含两个样本，每个样本的fetaure长度为4
input = autograd.Variable(torch.LongTensor([[1,2,3,4],[5,6,7,8]]))

input_emb = embedding(input)
print(input_emb)
1
2
3
4
5
6
7
8
tensor([[[ 0.1388, 1.0344, 0.4986],
[ 1.2887, -0.2868, 1.8511],
[-0.2473, 0.3659, -2.0664],
[ 0.4521, -0.3340, 1.0321]],

[[ 1.0713, 0.8976, -0.1969],
[-0.4481, -0.7756, 0.5349],
[ 2.1492, 1.2860, 1.2949],
[ 1.1719, -1.3687, -1.8749]]])
1
2
3
4
5
6
7
8
9
# example with padding_idx
embedding = nn.Embedding(10, 3, padding_idx=0)

input = autograd.Variable(torch.LongTensor([[0,2,0,5]]))

print(embedding(input))
1
2
3
4
5
6
tensor([[[ 0.0000, 0.0000, 0.0000],
[ 1.0288, 1.4577, -0.4938],
[ 0.0000, 0.0000, 0.0000],
[ 1.5563, -1.6282, -0.2595]]])
---------------------
作者：AI算法-王小草
来源：优快云
原文：https://blog.youkuaiyun.com/sinat_33761963/article/details/84704810
版权声明：本文为博主原创文章，转载请附上博文链接！