GraphormerLayer

class dgl.nn.pytorch.gt.GraphormerLayer(feat_size, hidden_size, num_heads, attn_bias_type='add', norm_first=False, dropout=0.1, attn_dropout=0.1, activation=ReLU())

Do Transformers Really Perform Bad for Graph Representation中介绍的Graphormer层

Parameters

  • feat_size (int) – 特征的维度。
  • hidden_size (int) – 前馈层的隐藏维度。
  • num_heads (int) – 注意力头的数量。feat_size可被其整除。
  • attn_bias_type (str) – 注意力偏差的类型,用于修正注意力。可以是“add”或“mul”。默认“add”。
  • norm_first (bool, optional) – 如果为True,则在注意力和前馈操作之前执行层规范化。否则,它会在之后应用层规范化。默认值:False。
  • dropout (float, optional) – Dropout概率。默认0.1。
  • attn_dropout (float, optional) – 注意力Dropout概率。默认0.1。
  • activation (callable activation layer, optional) – 激活函数。默认nn.ReLU()

forward(nfeat, attn_bias=None, attn_mask=None)

Parameters
- nfeat (torch.Tensor) – -3D输入张量。形状为:(batch_size,N,feat_size),其中N是节点的最大数量。
- attn_bias (torch.Tensor, optional) – 用于注意力修改的注意力偏差。形状:(batch_size,N,N,num_heads)
- attn_mask (torch.Tensor, optional) – 用于避免计算无效位置的注意掩码,其中无效位置由True值指示。形状:(batch_size,N,N)。注意:对于与不存在的节点对应的行,请确保至少有一个条目设置为False,以防止使用softmax获取NaN。
Returns
y - 输出张量。形状:(batch_size,N,feat_size)

Return type
torch.Tensor

源代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import torch.nn as nn
from .biased_mha import BiasedMHA

class GraphormerLayer(nn.Module):
def __init__(
self,
feat_size,
hidden_size,
num_heads,
attn_bias_type="add",
norm_first=False,
dropout=0.1,
attn_dropout=0.1,
activation=nn.ReLU(),
):
super().__init__()

self.norm_first = norm_first

self.attn = BiasedMHA(
feat_size=feat_size,
num_heads=num_heads,
attn_bias_type=attn_bias_type,
attn_drop=attn_dropout,
)
self.ffn = nn.Sequential(
nn.Linear(feat_size, hidden_size),
activation,
nn.Dropout(p=dropout),
nn.Linear(hidden_size, feat_size),
nn.Dropout(p=dropout),
)

self.dropout = nn.Dropout(p=dropout)
self.attn_layer_norm = nn.LayerNorm(feat_size)
self.ffn_layer_norm = nn.LayerNorm(feat_size)

def forward(self, nfeat, attn_bias=None, attn_mask=None):
residual = nfeat
if self.norm_first:
nfeat = self.attn_layer_norm(nfeat)
nfeat = self.attn(nfeat, attn_bias, attn_mask)
nfeat = self.dropout(nfeat)
nfeat = residual + nfeat
if not self.norm_first:
nfeat = self.attn_layer_norm(nfeat)
residual = nfeat
if self.norm_first:
nfeat = self.ffn_layer_norm(nfeat)
nfeat = self.ffn(nfeat)
nfeat = residual + nfeat
if not self.norm_first:
nfeat = self.ffn_layer_norm(nfeat)
return nfeat

Example

Example1:

1
2
3
4
5
6
7
8
9
10
11
import torch as th
from dgl.nn import GraphormerLayer

batch_size = 16
num_nodes = 100
feat_size = 512
num_heads = 8
nfeat = th.rand(batch_size, num_nodes, feat_size)
bias = th.rand(batch_size, num_nodes, num_nodes, num_heads)
net = GraphormerLayer(feat_size=feat_size, hidden_size=2048, num_heads=num_heads)
out = net(nfeat, bias)

GraphormerLayer
http://jiqingjiang.github.io/p/591bc275/
作者
Jiqing
发布于
2024年8月3日
许可协议