PyTorch参数初始化方法

本文详细介绍了PyTorch中各种权重初始化方法,包括均匀分布、正态分布填充及多种专用初始化技术如Xavier和He初始化等,并给出了具体实例。
部署运行你感兴趣的模型镜像

torch.nn.init

torch.nn.init. calculate_gain ( nonlinearityparam=None ) [source]

Return the recommended gain value for the given nonlinearity function. The values are as follows:

nonlinearity gain
linear 1 1
conv{1,2,3}d 1 1
sigmoid 1 1
tanh 5/3 5/3
relu 2 2
leaky_relu 2/(1+negative_slope2) 2/(1+negative_slope2)
Parameters:
  • nonlinearity – the nonlinear function (nn.functional name)
  • param – optional parameter for the nonlinear function

Examples

>>> gain = nn.init.calculate_gain('leaky_relu')
torch.nn.init. uniform ( tensora=0b=1 ) [source]

Fills the input Tensor or Variable with values drawn from the uniform distribution  U(a,b) U(a,b).

Parameters:
  • tensor – an n-dimensional torch.Tensor or autograd.Variable
  • a – the lower bound of the uniform distribution
  • b – the upper bound of the uniform distribution

Examples

>>> w = torch.Tensor(3, 5)
>>> nn.init.uniform(w)
torch.nn.init. normal ( tensormean=0std=1 ) [source]

Fills the input Tensor or Variable with values drawn from the normal distribution  N(mean,std) N(mean,std).

Parameters:
  • tensor – an n-dimensional torch.Tensor or autograd.Variable
  • mean – the mean of the normal distribution
  • std – the standard deviation of the normal distribution

Examples

>>> w = torch.Tensor(3, 5)
>>> nn.init.normal(w)
torch.nn.init. constant ( tensorval ) [source]

Fills the input Tensor or Variable with the value val.

Parameters:
  • tensor – an n-dimensional torch.Tensor or autograd.Variable
  • val – the value to fill the tensor with

Examples

>>> w = torch.Tensor(3, 5)
>>> nn.init.constant(w)
torch.nn.init. eye ( tensor ) [source]

Fills the 2-dimensional input Tensor or Variable with the identity matrix. Preserves the identity of the inputs in Linear layers, where as many inputs are preserved as possible.

Parameters: tensor – a 2-dimensional torch.Tensor or autograd.Variable

Examples

>>> w = torch.Tensor(3, 5)
>>> nn.init.eye(w)
torch.nn.init. dirac ( tensor ) [source]

Fills the {3, 4, 5}-dimensional input Tensor or Variable with the Dirac delta function. Preserves the identity of the inputs in Convolutional layers, where as many input channels are preserved as possible.

Parameters: tensor – a {3, 4, 5}-dimensional torch.Tensor or autograd.Variable

Examples

>>> w = torch.Tensor(3, 16, 5, 5)
>>> nn.init.dirac(w)
torch.nn.init. xavier_uniform ( tensorgain=1 ) [source]

Fills the input Tensor or Variable with values according to the method described in “Understanding the difficulty of training deep feedforward neural networks” - Glorot, X. & Bengio, Y. (2010), using a uniform distribution. The resulting tensor will have values sampled from  U(a,a) U(−a,a) where  a=gain×2/(fan_in+fan_out)×3 a=gain×2/(fan_in+fan_out)×3. Also known as Glorot initialisation.

Parameters:
  • tensor – an n-dimensional torch.Tensor or autograd.Variable
  • gain – an optional scaling factor

Examples

>>> w = torch.Tensor(3, 5)
>>> nn.init.xavier_uniform(w, gain=nn.init.calculate_gain('relu'))
torch.nn.init. xavier_normal ( tensorgain=1 ) [source]

Fills the input Tensor or Variable with values according to the method described in “Understanding the difficulty of training deep feedforward neural networks” - Glorot, X. & Bengio, Y. (2010), using a normal distribution. The resulting tensor will have values sampled from  N(0,std) N(0,std) where  std=gain×2/(fan_in+fan_out) std=gain×2/(fan_in+fan_out). Also known as Glorot initialisation.

Parameters:
  • tensor – an n-dimensional torch.Tensor or autograd.Variable
  • gain – an optional scaling factor

Examples

>>> w = torch.Tensor(3, 5)
>>> nn.init.xavier_normal(w)
torch.nn.init. kaiming_uniform ( tensora=0mode='fan_in' ) [source]

Fills the input Tensor or Variable with values according to the method described in “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification” - He, K. et al. (2015), using a uniform distribution. The resulting tensor will have values sampled from  U(bound,bound) U(−bound,bound) where  bound=2/((1+a2)×fan_in)×3 bound=2/((1+a2)×fan_in)×3. Also known as He initialisation.

Parameters:
  • tensor – an n-dimensional torch.Tensor or autograd.Variable
  • a – the negative slope of the rectifier used after this layer (0 for ReLU by default)
  • mode – either ‘fan_in’ (default) or ‘fan_out’. Choosing fan_in preserves the magnitude of the variance of the weights in the forward pass. Choosing fan_outpreserves the magnitudes in the backwards pass.

Examples

>>> w = torch.Tensor(3, 5)
>>> nn.init.kaiming_uniform(w, mode='fan_in')
torch.nn.init. kaiming_normal ( tensora=0mode='fan_in' ) [source]

Fills the input Tensor or Variable with values according to the method described in “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification” - He, K. et al. (2015), using a normal distribution. The resulting tensor will have values sampled from  N(0,std) N(0,std)where  std=2/((1+a2)×fan_in) std=2/((1+a2)×fan_in). Also known as He initialisation.

Parameters:
  • tensor – an n-dimensional torch.Tensor or autograd.Variable
  • a – the negative slope of the rectifier used after this layer (0 for ReLU by default)
  • mode – either ‘fan_in’ (default) or ‘fan_out’. Choosing fan_in preserves the magnitude of the variance of the weights in the forward pass. Choosing fan_outpreserves the magnitudes in the backwards pass.

Examples

>>> w = torch.Tensor(3, 5)
>>> nn.init.kaiming_normal(w, mode='fan_out')
torch.nn.init. orthogonal ( tensorgain=1 ) [source]

Fills the input Tensor or Variable with a (semi) orthogonal matrix, as described in “Exact solutions to the nonlinear dynamics of learning in deep linear neural networks” - Saxe, A. et al. (2013). The input tensor must have at least 2 dimensions, and for tensors with more than 2 dimensions the trailing dimensions are flattened.

Parameters:
  • tensor – an n-dimensional torch.Tensor or autograd.Variable, where n >= 2
  • gain – optional scaling factor

Examples

>>> w = torch.Tensor(3, 5)
>>> nn.init.orthogonal(w)
torch.nn.init. sparse ( tensorsparsitystd=0.01 ) [source]

Fills the 2D input Tensor or Variable as a sparse matrix, where the non-zero elements will be drawn from the normal distribution  N(0,0.01) N(0,0.01), as described in “Deep learning via Hessian-free optimization” - Martens, J. (2010).

Parameters:
  • tensor – an n-dimensional torch.Tensor or autograd.Variable
  • sparsity – The fraction of elements in each column to be set to zero
  • std – the standard deviation of the normal distribution used to generate
  • non-zero values (the) –

Examples

>>> w = torch.Tensor(3, 5)
>>> nn.init.sparse(w, sparsity=0.1)

您可能感兴趣的与本文相关的镜像

PyTorch 2.5

PyTorch 2.5

PyTorch
Cuda

PyTorch 是一个开源的 Python 机器学习库,基于 Torch 库,底层由 C++ 实现,应用于人工智能领域,如计算机视觉和自然语言处理

<think>好的,我需要帮助用户了解在PyTorch中进行参数初始化方法。首先,我应该回顾用户提供的引用内容,看看有哪些相关信息。引用1提到PyTorch的张量初始化与基本运算,引用2和3讨论了参数初始化的不同方法,如高斯分布、均匀分布、Xavier和He初始化。引用4提到了如何在PyTorch中遍历模型层并应用特定初始化。 用户的问题是关于PyTorch参数初始化的教程,所以需要系统地整理这些方法,并给出具体的代码示例。根据引用3,常见的初始化方法包括随机、零、正态、均匀,以及更高级的Xavier和He初始化。引用4提到如何在模型初始化时遍历各层并应用不同的方法,比如对卷积层使用kaiming初始化。 接下来,我需要将信息结构化。首先介绍内置初始化方法,如`init`模块中的函数,然后讲解自定义初始化方法,比如通过遍历模块并应用初始化。然后可以提到迁移学习中的预训练初始化,最后总结注意事项,如激活函数的影响和参数类型的不同处理。 需要确保代码示例正确,比如使用`torch.nn.init`下的函数,并展示如何应用到模型的层。同时,注意用户提到的LaTeX格式要求,行内公式用$...$,独立公式用$$...$$,比如初始化公式可能需要单独成段。 还需要生成相关问题,可能涉及不同初始化方法的比较、选择依据、在模型中的应用步骤以及调试方法。确保引用正确,比如引用[^2][^3][^4]在相关部分末尾自然添加。 最后检查是否符合所有系统级指令,特别是中文回答、LaTeX格式正确,结构清晰,分步骤说明,并附上相关问题。</think>### PyTorch参数初始化方法教程 在PyTorch中,参数初始化是模型训练的重要环节,合理的初始化能加速收敛并提高模型性能。以下是常见的初始化方法及实现步骤: --- #### 一、内置初始化方法 PyTorch通过`torch.nn.init`模块提供多种初始化函数: 1. **零初始化** 将参数初始化为零,适用于偏置项(bias): ```python torch.nn.init.zeros_(tensor) ``` 2. **常数初始化** 将参数初始化为固定值: ```python torch.nn.init.constant_(tensor, value=0.1) ``` 3. **正态分布初始化** 基于高斯分布初始化参数,需指定均值(`mean`)和标准差(`std`): ```python torch.nn.init.normal_(tensor, mean=0.0, std=0.01) ``` 4. **均匀分布初始化** 在区间$[a, b]$内均匀采样: ```python torch.nn.init.uniform_(tensor, a=0.0, b=1.0) ``` --- #### 二、高级初始化方法 1. **Xavier/Glorot初始化** 适用于线性层和激活函数(如Tanh、Sigmoid),通过缩放方差保持输入输出方差一致。公式为: $$W \sim U\left(-\sqrt{\frac{6}{n_{\text{in}} + n_{\text{out}}}}, \sqrt{\frac{6}{n_{\text{in}} + n_{\text{out}}}}\right)$$ PyTorch实现: ```python torch.nn.init.xavier_uniform_(tensor) # 均匀分布 torch.nn.init.xavier_normal_(tensor) # 正态分布 ``` 2. **He/Kaiming初始化** 针对ReLU激活函数优化,公式基于输入维度$n_{\text{in}}$的方差缩放[^3]: $$W \sim N\left(0, \sqrt{\frac{2}{n_{\text{in}}}}\right)$$ PyTorch实现: ```python torch.nn.init.kaiming_uniform_(tensor, mode='fan_in', nonlinearity='relu') # 均匀分布 torch.nn.init.kaiming_normal_(tensor, mode='fan_in', nonlinearity='relu') # 正态分布 ``` --- #### 三、自定义初始化方法 通过遍历模型层,针对不同类型层应用初始化策略[^4]: ```python def init_weights(m): if isinstance(m, torch.nn.Linear): torch.nn.init.xavier_normal_(m.weight) if m.bias is not None: torch.nn.init.zeros_(m.bias) elif isinstance(m, torch.nn.Conv2d): torch.nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') model = MyModel() model.apply(init_weights) # 递归应用初始化函数 ``` --- #### 四、迁移学习中的初始化 使用预训练模型参数作为初始值: ```python pretrained_model = torchvision.models.resnet18(pretrained=True) model = MyCustomModel() model.load_state_dict(pretrained_model.state_dict(), strict=False) # 部分加载参数 ``` --- #### 五、注意事项 1. **激活函数适配** 如ReLU适合He初始化,Tanh适合Xavier初始化。 2. **参数类型区分** 权重(weights)和偏置(bias)通常采用不同策略。 3. **梯度影响** 初始化范围过大可能导致梯度爆炸,过小可能引发梯度消失。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值