nn.init

nn.init 模块是 PyTorch 中用于初始化神经网络模型参数的模块。通过 nn.init 模块，可以对模型的权重、偏置等参数进行初始化，以帮助模型更快地收敛并获得更好的性能。
torch.nn.init 此模块中的所有函数都用于初始化神经网络参数，因此它们都在 torch.no_grad() 模式下运行，autograd 不会考虑它们。

torch.nn.init.calculate_gain(nonlinearity, param=None)

为了实现自归一化神经网络，你应该使用nonlinear =’linear’而不是nonlinearity =’selu’。这使得初始权重的方差为1/N，这对于在前向传递中引入稳定的固定点是必要的。相比之下，SELU的默认增益牺牲了归一化效果，以获得矩形层中更稳定的梯度流。

Parameters:

nonlinearity(strzm)：[nn.functional] 要计算增益的非线性函数的名称。例如，relu、tanh、sigmoid 等。
param(float or None, optional)：非线性函数的参数。例如，对于 LeakyReLU，param 是负斜率。

Returns:

gain(float)：一个浮点数，表示给定非线性函数的增益。

Examples:

1	`gain = nn.init.calculate_gain('leaky_relu', 0.2) # leaky_relu with negative_slope=0.2`

torch.nn.init.uniform_(tensor, a=0.0, b=1.0, generator=None)

用从均匀分布中提取的值填充输入张量。

Parameters:

tensor(Tensor):一个 N 维张量torch.Tensor
a(float):均匀分布的下限
b(float):均匀分布的上界

Returns:

Tensor :返回一个张量，同时参数也会自动更新

Examples:

nn.init.uniform_(w)后自动修改w的值

import torch
import torch.nn as nn

w = torch.empty(3, 5)
print(w)
res = nn.init.uniform_(w)
print(w)
print(res)
# tensor([[1.0194e-38, 1.0469e-38, 1.0010e-38, 6.4286e-39, 9.9184e-39],
#         [8.4490e-39, 1.0102e-38, 9.0919e-39, 1.0102e-38, 8.9082e-39],
#         [8.4489e-39, 1.0102e-38, 1.0561e-38, 9.4592e-39, 1.0102e-38]])
# tensor([[0.2565, 0.3630, 0.4908, 0.5529, 0.4225],
#         [0.9944, 0.0088, 0.4067, 0.4673, 0.5027],
#         [0.0211, 0.9172, 0.5605, 0.1267, 0.0560]])
# tensor([[0.2565, 0.3630, 0.4908, 0.5529, 0.4225],
#         [0.9944, 0.0088, 0.4067, 0.4673, 0.5027],
#         [0.0211, 0.9172, 0.5605, 0.1267, 0.0560]])

torch.nn.init.normal_(tensor, mean=0.0, std=1.0, generator=None)

使用从正态分布中提取的值填充输入张量。

Parameters:

tensor(Tensor):一个 N 维张量torch.Tensor
mean(float):正态分布的均值
std(float):正态分布的标准差

Returns:

Tensor :返回一个张量，同时参数也会自动更新

Examples:

w = torch.empty(3, 5)
nn.init.normal_(w)
```     

## torch.nn.init.constant_(tensor, val)
用标量值val填充输入张量。

#### Parameters:  
- `tensor`(Tensor): 一个 N 维张量torch.Tensor
- `val`(float): 填充值

#### Returns:  
-  `Tensor` :返回一个张量，同时参数也会自动更新

#### Examples:
```python
w = torch.empty(3, 5)
nn.init.constant_(w, 0.5)

torch.nn.init.ones_(tensor)

用标量值1填充输入张量。

Parameters:

tensor(Tensor):一个 N 维张量torch.Tensor

Returns:

Tensor :返回一个张量，同时参数也会自动更新

Examples:

1 2	`w = torch.empty(3, 5) nn.init.ones_(w)`

torch.nn.init.zeros_(tensor)

用标量值0填充输入张量。

Parameters:

tensor(Tensor):一个 N 维张量torch.Tensor

Returns:

Tensor :返回一个张量，同时参数也会自动更新

Examples:

1 2	`w = torch.empty(3, 5) nn.init.zeros_(w)`

torch.nn.init.eye_(tensor, out=None)

用单位矩阵填充二维输入张量。
保留线性层中输入的标识，其中保留尽可能多的输入。

Parameters:

tensor(Tensor):一个二维张量torch.Tensor

Returns:

Tensor :返回一个二维张量，同时参数也会自动更新

Examples:

1 2	`w = torch.empty(3, 5) nn.init.eye_(w)`

torch.nn.init.dirac_(tensor, groups=1)

用狄拉克函数填充{3，4，5}维输入张量。
保留卷积层中输入的标识，其中保留尽可能多的输入通道。在组>1的情况下，每个通道组保持同一性

Parameters:

tensor(Tensor):一个 {3,4,5} 维张量torch.Tensor
groups(int):每个输入通道的组数

Returns:

Tensor :返回一个张量，同时参数也会自动更新

Examples:

w = torch.empty(3, 16, 5, 5)
nn.init.dirac_(w)
w = torch.empty(3, 24, 5, 5)
nn.init.dirac_(w, 3)

torch.nn.init.xavier_uniform_(tensor, gain=1.0, generator=None)

使用Xavier均匀分布填充输入张量。
根据Glorot, X.和Bengio, Y.在《Understanding the difficulty of training deep feedforward neural networks》中描述的方法，用一个均匀分布生成值，填充输入的张量或变量。结果张量中的值采样自 $U(−a, a)$，其中：$$a=\text{gain}\times\sqrt{\frac6{\text{fan}_\text{in}+\text{fan}_\text{out}}}$$

Parameters:

tensor(Tensor):一个 N 维张量torch.Tensor
gain(float, optional):可缩放因子

Returns:

Tensor :返回一个张量，同时参数也会自动更新

Examples:

1 2	`w = torch.empty(3, 5) nn.init.xavier_uniform_(w, gain=nn.init.calculate_gain('relu'))`

torch.nn.init.xavier_normal_(tensor, gain=1.0, generator=None)

使用Xavier正态分布填充输入张量。
根据Glorot, X.和Bengio, Y.在《Understanding the difficulty of training deep feedforward neural networks》中描述的方法，用一个正态分布生成值，填充输入的张量或变量。结果张量中的值采样自 $N(0,\text{std}^2)$ 的正态分布，其中标准差：$$\mathrm{std}=\mathrm{gain}\times\sqrt{\frac2{\mathrm{fan}_\mathrm{in}+\mathrm{fan}_\mathrm{out}}}$$

Parameters:

tensor(Tensor):一个 N 维张量torch.Tensor
gain(float, optional):可缩放因子

Returns:

Tensor :返回一个张量，同时参数也会自动更新

Examples:

1 2	`w = torch.empty(3, 5) nn.init.xavier_normal_(w)`

torch.nn.init.kaiming_uniform_(tensor, a=0, mode=’fan_in’, nonlinearity=’leaky_relu’, generator=None)

使用Kaiming均匀分布填充输入张量。
根据He, K等人于2015年在《Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification》中描述的方法，用一个均匀分布生成值，填充输入的张量或变量。结果张量中的值采样自 $U(−bound,bound)$，其中：$$\mathrm{bound}=\mathrm{gain}\times\sqrt{\frac{3}{\mathrm{fan_mode}}}$$

Parameters:

tensor(Tensor):一个 N 维张量torch.Tensor
a(float, optional):用于计算负斜率的Leaky ReLU参数,（仅与’leaky_relu’一起使用）
mode(str, optional): 可以为fan_in或fan_out。若为fan_in则保留前向传播时权值方差的量级，若为fan_out则保留反向传播时的量级，默认值为fan_in。
nonlinearity(str, optional): 一个非线性函数，即一个nn.functional的名称，推荐使用relu或者leaky_relu，默认值为leaky_relu。

Returns:

Tensor :返回一个张量，同时参数也会自动更新

Examples:

1 2	`w = torch.empty(3, 5) nn.init.kaiming_uniform_(w, mode='fan_in', nonlinearity='relu')`

torch.nn.init.kaiming_normal_(tensor, a=0, mode=’fan_in’, nonlinearity=’leaky_relu’, generator=None)

使用Kaiming正态分布填充输入张量。
结果张量中的值采样自 $N(0,\text{std}^2)$，其中:$$\mathrm{std}=\frac{\mathrm{gain}}{\sqrt{\mathrm{fan_mode}}}$$
也称为He初始化。

Parameters:

tensor(Tensor):一个 N 维张量torch.Tensor
a(float, optional):用于计算负斜率的Leaky ReLU参数,（仅与’leaky_relu’一起使用）
mode(str, optional): 可以为fan_in或fan_out。若为fan_in则保留前向传播时权值方差的量级，若为fan_out则保留反向传播时的量级，默认值为fan_in。
nonlinearity(str, optional): 一个非线性函数，即一个nn.functional的名称，推荐使用relu或者leaky_relu，默认值为leaky_relu。

Returns:

Tensor :返回一个张量，同时参数也会自动更新

Examples:

1 2	`w = torch.empty(3, 5) nn.init.kaiming_normal_(w, mode='fan_out', nonlinearity='relu')`

torch.nn.init.trunc_normal_(tensor, mean=0.0, std=1.0, a=-2.0, b=2.0, generator=None)

使用从截断正态分布中提取的值填充输入张量。
该函数用截断正态分布中的值填充输入张量。这些值实际上是从正态分布 $N(\text{mean}, \text{std}^2)$ 中得出的，其中[a, b] 之外的值被重新绘制，直到它们在边界内。用于生成随机值的方法在 ${a ≤ mean ≤ b}$ 情况下效果最佳。

Parameters:

tensor(Tensor):一个 N 维张量torch.Tensor
mean(float, optional): 正态分布的平均值
std(float, optional): 正态分布的标准差
a(float, optional): 最小截止值
b(float, optional): 最大截止值

Returns:

Tensor :返回一个张量，同时参数也会自动更新

Examples:

1 2	`w = torch.empty(3, 5) nn.init.trunc_normal_(w)`

torch.nn.init.orthogonal_(tensor, gain=1, generator=None)

用（半）正交矩阵填充输入张量。
输入张量必须至少有2个维度，对于超过2个维度的张量，尾部维度将被展平。

Parameters:

tensor(Tensor):一个 N 维张量torch.Tensor, 其中 n>=2
gain(float, optional): 用于缩放的因子

Returns:

Tensor :返回一个张量，同时参数也会自动更新

Examples:

1 2	`w = torch.empty(3, 5) nn.init.orthogonal_(w)`

torch.nn.init.sparse_(tensor, sparsity, std=0.01, generator=None)

将2D输入张量填充为稀疏矩阵。
非零元素将从正态分布中提取 $N(0,0.01)$

Parameters:

tensor(Tensor):一个 N 维张量torch.Tensor
sparsity(float): 每列中元素被设置为零的比例
std(float, optional): 用于生成非零值的正态分布的标准差

Returns:

Tensor :返回一个张量，同时参数也会自动更新

Examples:

1 2	`w = torch.empty(3, 5) nn.init.sparse_(w, sparsity=0.1)`

nn.init

http://jiqingjiang.github.io/p/edda2ce1/

作者

Jiqing

发布于

2024年8月10日

许可协议

训练集、验证集和测试集的区别上一篇

BiasedMHA 下一篇