Pytorch 默认参数初始化
代码参考自pytorch
pytorch中的各种参数层(Linear、Conv2d、BatchNorm等)在__init__方法中定义后,不需要手动初始化就可以直接使用,这是因为Pytorch对这些层都会进行默认初始化,因此,本文主要根据源码来了解一下不同层的默认初始化方法
初始化函数
def kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu'):
fan = _calculate_correct_fan(tensor, mode)
gain = calculate_gain(nonlinearity, a)
std = gain / math.sqrt(fan)
bound = math.sqrt(3.0) * std # Calculate uniform bounds from standard deviation
with torch.no_grad():
return tensor.uniform_(-bound, bound)
kaiming_uniform_按照均匀分布初始化tensor,在
U
(
−
b
o
u
n
d
,
b
o
u
n
d
)
U(-bound,bound)
U(−bound,bound)中采样,其中
bound
=
6
(
1
+
a
2
)
×
fan_in
\text{bound} = \sqrt{\frac{6}{(1 + a^2) \times \text{fan\_in}}}
bound=(1+a2)×fan_in6
在二维的时候,fan_in就是tensor.size(1),即输入向量的维数
def kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu'):
fan = _calculate_correct_fan(tensor, mode)
gain = calculate_gain(nonlinearity, a)
std = gain / math.sqrt(fan)
with torch.no_grad():
return tensor.normal_(0, std)
kaiming_normal_从
N
(
0
,
std
)
\mathcal{N}(0, \text{std})
N(0,std)中采样来初始化tensor,其中
std
=
2
(
1
+
a
2
)
×
fan_in
\text{std} = \sqrt{\frac{2}{(1 + a^2) \times \text{fan\_in}}}
std=(1+a2)×fan_in2
同样的,fan_in在tensor为二维时,是tensor.size(1),注意,上面给出的初始化公式均是在mode和nonlinearity在默认参数下的结果
Linear的初始化
Linear自带的初始化函数为
def reset_parameters(self):
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in)
init.uniform_(self.bias, -bound, bound)
W在
U
(
−
b
o
u
n
d
,
b
o
u
n
d
)
U(-bound,bound)
U(−bound,bound)中采样,其中
bound
=
1
fan_in
\text{bound} = \sqrt{\frac{1}{\text{fan\_in}}}
bound=fan_in1
fan_in即为W的第二维大小,即Linear所作用的输入向量的维度
bias也在
U
(
−
b
o
u
n
d
,
b
o
u
n
d
)
U(-bound,bound)
U(−bound,bound)中采样,且bound与W一样
Conv的初始化
以二维为例,卷积层的参数实际上是一个四维tensor
if transposed:
self.weight = Parameter(torch.Tensor(
in_channels, out_channels // groups, *kernel_size))
else:
self.weight = Parameter(torch.Tensor(
out_channels, in_channels // groups, *kernel_size))
if bias:
self.bias = Parameter(torch.Tensor(out_channels))
else:
self.register_parameter('bias', None)
比如一个输入channel为3,输出channel为64,kernel size=3的卷积层,其权值即为一个3×64×3×3的向量,它会这样进行初始化
def reset_parameters(self):
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in)
init.uniform_(self.bias, -bound, bound)
同样默认使用kaiming_uniform,在
U
(
−
b
o
u
n
d
,
b
o
u
n
d
)
U(-bound,bound)
U(−bound,bound)中采样,其中
bound
=
1
fan_in
\text{bound} = \sqrt{\frac{1}{\text{fan\_in}}}
bound=fan_in1
对于fan_in的计算:
num_input_fmaps = tensor.size(1)
num_output_fmaps = tensor.size(0)
receptive_field_size = 1
if tensor.dim() > 2:
receptive_field_size = tensor[0][0].numel()
fan_in = num_input_fmaps * receptive_field_size
fan_out = num_output_fmaps * receptive_field_size
也就是
fan_in
=
in_channels
×
kernel_size
2
\text{fan\_in} = \text{in\_channels}\times \text{kernel\_size}^2
fan_in=in_channels×kernel_size2
BatchNorm层初始化
def reset_parameters(self):
self.reset_running_stats()
if self.affine:
init.uniform_(self.weight)
init.zeros_(self.bias)
weigth初始化为 U ( 0 , 1 ) U(0,1) U(0,1),bias初始化为0
网络初始化
在各种内置的网络模型中,初始化的方法也有不同
ResNet
resnet在定义各层之后,pytorch官方代码的__init__方法会对不同的层进行手动的初始化
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
首先对于所有卷积层,其参数与卷积层默认初始化
U
(
−
bound
,
bound
)
,
bound
=
1
in_channels
×
kernel_size
2
U(-\text{bound},\text{bound}),\text{bound}=\sqrt{\dfrac{1}{\text{in\_channels}\times \text{kernel\_size}^2}}
U(−bound,bound),bound=in_channels×kernel_size21不同,这里采用的mode是fan_out,nonlinearity是relu,且使用的初始化函数为kaiming_normal_,即参数在
N
(
0
,
s
t
d
)
N(0,std)
N(0,std)中采样,其中
s
t
d
=
2
f_out
std =\sqrt{\dfrac{2}{\text{f\_out}}}
std=f_out2
f_out
=
out_channels
×
kernel_size
2
\text{f\_out}=\text{out\_channels}\times \text{kernel\_size}^2
f_out=out_channels×kernel_size2
卷积层的bias这里没有提到,因此采用的仍然是默认的初始化方法,而BatchNorm和GroupNorm的weight均初始化为1,bias初始化为0,区别于默认的weight在0~1中均匀采样,bias为0,剩下的Linear层未被提到,仍然采用默认的初始化方法
VGG
VGG的pytorch官方初始化方法如下
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
nn.init.constant_(m.bias, 0)
卷积层的初始化方法同ResNet,只不过bias初始化为0,BatchNorm层初始化方法同ResNet,Linear层的weight初始化为
N
(
0
,
0.01
)
N(0,0.01)
N(0,0.01),bias初始化为0
本文深入探讨了PyTorch中各类层(如Linear、Conv2d、BatchNorm)的默认参数初始化方法,包括He均匀分布和正态分布初始化,并详细解析了ResNet和VGG网络的自定义初始化策略。
1万+





