Transformer代码架构
transformer模型调用时,EncoderDecoder()共含有5个参数,分别为:
【encoder】【decoder】【input embeddings + position】【output embeddings + position】【generator】
EncoderDecoder(encoder, decoder, src_embed, tgt_embed, generator)
model = EncoderDecoder(
Encoder(EncoderLayer(d_model, c(attn), c(ff), dropout), N),
Decoder(DecoderLayer(d_model, c(attn), c(attn), c(ff), dropout), N),
nn.Sequential(Embeddings(d_model, src_vocab), c(position)),
nn.Sequential(Embeddings(d_model, tgt_vocab), c(position)),
Generator(d_model, tgt_vocab)
)
encoder: Encoder(layer, N)
N为层数,该层共调用几次
Encoder(EncoderLayer(d_model, c(attn), c(ff), dropout), N)
layer: EncoderLayer(size, self_attn, feed_forward, dropout)
EncoderLayer(d_model, c(attn), c(ff), dropout)
d_model: 特征的维度(一个单词共有多少个特征)
c(): deepcopy
attn: MultiHeadedAttention(h, d_model)
ff: PositionwiseFeedForward(d_model, d_ff, dropout)
decoder: Encoder(layer, N)
Decoder(DecoderLayer(d_model, c(attn), c(attn), c(ff), dropout), N)
layer: DecoderLayer(size, self_attn, src_attn, feed_forward, dropout)
DecoderLayer(d_model, c(attn), c(attn), c(ff), dropout)
d_model: 特征的维度(一个单词共有多少个特征)
c(): deepcopy
attn: MultiHeadedAttention(h, d_model)
ff: PositionwiseFeedForward(d_model, d_ff, dropout)
input embedding + positional encoding
nn.Sequential(Embeddings(d_model, src_vocab), c(position))
Embeddings(d_model, src_vocab)
Embeddings(d_model, vocab)
d_model: 特征的维度(一个单词共有多少个特征)
vocab: 字典有多少个单词
Positional Encoding
PositionalEncoding(d_model, dropout, max_len)
output embedding + positional encoding
nn.Sequential(Embeddings(d_model, src_vocab), c(position))
Embeddings(d_model, src_vocab)
Embeddings(d_model, vocab)
d_model: 特征的维度(一个单词共有多少个特征)
vocab: 字典有多少个单词
Positional Encoding
PositionalEncoding(d_model, dropout, max_len)
generator(linear + softmax)
Generator(d_model, vocab)