PyTorch参数初始化方法

最新推荐文章于 2025-03-08 12:35:01 发布

转载最新推荐文章于 2025-03-08 12:35:01 发布 · 4.1k 阅读

pytorch 专栏收录该内容

34 篇文章

订阅专栏

本文详细介绍了PyTorch中各种权重初始化方法，包括均匀分布、正态分布填充及多种专用初始化技术如Xavier和He初始化等，并给出了具体实例。

部署运行你感兴趣的模型镜像

torch.nn.init

torch.nn.init. calculate_gain ( nonlinearity, param=None ) [source]

Return the recommended gain value for the given nonlinearity function. The values are as follows:

nonlinearity	gain
linear	1
conv{1,2,3}d	1
sigmoid	1
tanh	5/3
relu	2‾√
leaky_relu	2/(1+negative_slope2)‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾√

Parameters:	nonlinearity – the nonlinear function (nn.functional name) param – optional parameter for the nonlinear function

Examples

 
    >>> gain = nn.init.calculate_gain('leaky_relu')

torch.nn.init. uniform ( tensor, a=0, b=1 ) [source]

Fills the input Tensor or Variable with values drawn from the uniform distribution U(a,b) .

Parameters:	tensor – an n-dimensional torch.Tensor or autograd.Variable a – the lower bound of the uniform distribution b – the upper bound of the uniform distribution

Examples

 
    >>> w = torch.Tensor(3, 5)
>>> nn.init.uniform(w)

torch.nn.init. normal ( tensor, mean=0, std=1 ) [source]

Fills the input Tensor or Variable with values drawn from the normal distribution N(mean,std) .

Parameters:	tensor – an n-dimensional torch.Tensor or autograd.Variable mean – the mean of the normal distribution std – the standard deviation of the normal distribution

Examples

 
    >>> w = torch.Tensor(3, 5)
>>> nn.init.normal(w)

torch.nn.init. constant ( tensor, val ) [source]

Fills the input Tensor or Variable with the value val.

Parameters:	tensor – an n-dimensional torch.Tensor or autograd.Variable val – the value to fill the tensor with

Examples

 
    >>> w = torch.Tensor(3, 5)
>>> nn.init.constant(w)

torch.nn.init. eye ( tensor ) [source]

Fills the 2-dimensional input Tensor or Variable with the identity matrix. Preserves the identity of the inputs in Linear layers, where as many inputs are preserved as possible.

Parameters:	tensor – a 2-dimensional torch.Tensor or autograd.Variable

Examples

 
    >>> w = torch.Tensor(3, 5)
>>> nn.init.eye(w)

torch.nn.init. dirac ( tensor ) [source]

Fills the {3, 4, 5}-dimensional input Tensor or Variable with the Dirac delta function. Preserves the identity of the inputs in Convolutional layers, where as many input channels are preserved as possible.

Parameters:	tensor – a {3, 4, 5}-dimensional torch.Tensor or autograd.Variable

Examples

 
    >>> w = torch.Tensor(3, 16, 5, 5)
>>> nn.init.dirac(w)

torch.nn.init. xavier_uniform ( tensor, gain=1 ) [source]

Fills the input Tensor or Variable with values according to the method described in “Understanding the difficulty of training deep feedforward neural networks” - Glorot, X. & Bengio, Y. (2010), using a uniform distribution. The resulting tensor will have values sampled from U(−a,a) where a=gain×2/(fan_in+fan_out)‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾√×3‾√ . Also known as Glorot initialisation.

Parameters:	tensor – an n-dimensional torch.Tensor or autograd.Variable gain – an optional scaling factor

Examples

 
    >>> w = torch.Tensor(3, 5)
>>> nn.init.xavier_uniform(w, gain=nn.init.calculate_gain('relu'))

torch.nn.init. xavier_normal ( tensor, gain=1 ) [source]

Fills the input Tensor or Variable with values according to the method described in “Understanding the difficulty of training deep feedforward neural networks” - Glorot, X. & Bengio, Y. (2010), using a normal distribution. The resulting tensor will have values sampled from N(0,std) where std=gain×2/(fan_in+fan_out)‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾√ . Also known as Glorot initialisation.

Parameters:	tensor – an n-dimensional torch.Tensor or autograd.Variable gain – an optional scaling factor

Examples

 
    >>> w = torch.Tensor(3, 5)
>>> nn.init.xavier_normal(w)

torch.nn.init. kaiming_uniform ( tensor, a=0, mode='fan_in' ) [source]

Fills the input Tensor or Variable with values according to the method described in “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification” - He, K. et al. (2015), using a uniform distribution. The resulting tensor will have values sampled from U(−bound,bound) where bound=2/((1+a2)×fan_in)‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾√×3‾√ . Also known as He initialisation.

Parameters:	tensor – an n-dimensional torch.Tensor or autograd.Variable a – the negative slope of the rectifier used after this layer (0 for ReLU by default) mode – either ‘fan_in’ (default) or ‘fan_out’. Choosing fan_in preserves the magnitude of the variance of the weights in the forward pass. Choosing fan_outpreserves the magnitudes in the backwards pass.

Examples

 
    >>> w = torch.Tensor(3, 5)
>>> nn.init.kaiming_uniform(w, mode='fan_in')

torch.nn.init. kaiming_normal ( tensor, a=0, mode='fan_in' ) [source]

Fills the input Tensor or Variable with values according to the method described in “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification” - He, K. et al. (2015), using a normal distribution. The resulting tensor will have values sampled from N(0,std) where std=2/((1+a2)×fan_in)‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾√ . Also known as He initialisation.

Parameters:	tensor – an n-dimensional torch.Tensor or autograd.Variable a – the negative slope of the rectifier used after this layer (0 for ReLU by default) mode – either ‘fan_in’ (default) or ‘fan_out’. Choosing fan_in preserves the magnitude of the variance of the weights in the forward pass. Choosing fan_outpreserves the magnitudes in the backwards pass.

Examples

 
    >>> w = torch.Tensor(3, 5)
>>> nn.init.kaiming_normal(w, mode='fan_out')

torch.nn.init. orthogonal ( tensor, gain=1 ) [source]

Fills the input Tensor or Variable with a (semi) orthogonal matrix, as described in “Exact solutions to the nonlinear dynamics of learning in deep linear neural networks” - Saxe, A. et al. (2013). The input tensor must have at least 2 dimensions, and for tensors with more than 2 dimensions the trailing dimensions are flattened.

Parameters:	tensor – an n-dimensional torch.Tensor or autograd.Variable, where n >= 2 gain – optional scaling factor

Examples

 
    >>> w = torch.Tensor(3, 5)
>>> nn.init.orthogonal(w)

torch.nn.init. sparse ( tensor, sparsity, std=0.01 ) [source]

Fills the 2D input Tensor or Variable as a sparse matrix, where the non-zero elements will be drawn from the normal distribution N(0,0.01) , as described in “Deep learning via Hessian-free optimization” - Martens, J. (2010).

Parameters:	tensor – an n-dimensional torch.Tensor or autograd.Variable sparsity – The fraction of elements in each column to be set to zero std – the standard deviation of the normal distribution used to generate non-zero values (the) –

Examples

 
    >>> w = torch.Tensor(3, 5)
>>> nn.init.sparse(w, sparsity=0.1)

您可能感兴趣的与本文相关的镜像

PyTorch 2.5

PyTorch

Cuda

PyTorch 是一个开源的 Python 机器学习库，基于 Torch 库，底层由 C++ 实现，应用于人工智能领域，如计算机视觉和自然语言处理