手搓Transformer代码（简易版）复现-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_50374797/article/details/139475028

环境配置

1.创建环境

2.下包

math、torch、numpy

代码流程

1.基本设置

1.导包

import math
import torch
import numpy as np
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as Data

2.基本设置

#设置设备为cuda
device = 'cuda'

# 周期为50
epochs = 50

2.设置数据集及处理

1.设置手动数据集

定义了一个训练数据集sentence，其中包含了三对中文到英语的句子。每个句子都被分成了三个部分：enc_input（编码器的输入，即中文句子），dec_input（解码器的输入，包含了一个开始符号S和英语句子），dec_output（解码器的输出，包含了英语句子和一个结束符号E）。

# 训练集
sentences = [
    # 中文和英语的单词个数不要求相同
    # enc_input                dec_input           dec_output
    ['我 有 一 个 好 朋 友 P', 'S I have a good friend .', 'I have a good friend . E'],
    ['我 有 零 个 女 朋 友 P', 'S I have zero girl friend .', 'I have zero girl friend . E'],
    ['我 有 一 个 男 朋 友 P', 'S I have a boy friend .', 'I have a boy friend . E']
]

为中文（源语言）和英语（目标语言）分别创建了词汇表src_vocab和tgt_vocab。
src_idx2word为一个字典，将单词映射到唯一的索引。同时创建了索引到单词的映射src_idx2word和idx2word，以及计算词汇表的大小src_vocab_size和tgt_vocab_size。

# 中文和英语的单词要分开建立词库
# Padding Should be Zero
src_vocab = {'P': 0, '我': 1, '有': 2, '一': 3,
             '个': 4, '好': 5, '朋': 6, '友': 7, '零': 8, '女': 9, '男': 10}
src_idx2word = {i: w for i, w in enumerate(src_vocab)}
src_vocab_size = len(src_vocab)

tgt_vocab = {'P': 0, 'I': 1, 'have': 2, 'a': 3, 'good': 4,
             'friend': 5, 'zero': 6, 'girl': 7,  'boy': 8, 'S': 9, 'E': 10, '.': 11}
idx2word = {i: w for i, w in enumerate(tgt_vocab)}
tgt_vocab_size = len(tgt_vocab)

源语言和目标语言句子的最大长度src_len和tgt_len

d_model：模型中嵌入向量的维度，也是位置编码的维度。

d_ff：前馈神经网络中间层的维度。

d_k和d_v：注意力机制中查询（Q）和键（K）、值（V）的维度。

n_layers：编码器和解码器中层的数量。

n_heads：多头注意力机制中的头数。

src_len = 8  # （源句子的长度）enc_input max sequence length
tgt_len = 7  # dec_input(=dec_output) max sequence length

# Transformer Parameters
d_model = 512  # Embedding Size（token embedding和position编码的维度）
# FeedForward dimension (两次线性层中的隐藏层 512->2048->512，线性层是用来做特征提取的），当然最后会再接一个projection层
d_ff = 2048
d_k = d_v = 64  # dimension of K(=Q), V（Q和K的维度需要相同，这里为了方便让K=V）
n_layers = 6  # number of Encoder of Decoder Layer（Block的个数）
n_heads = 8  # number of heads in Multi-Head Attention（有几套头）

2.处理加工数据集

定义了一个函数make_data，其目的是将句子中的单词转换成对应的数字序列，这些数字序列可以用于训练机器翻译模型。

函数接受一个参数sentence，这是一个列表，其中包含了多个句子，每个句子都是一个包含三个字符串的列表：中文句子（编码器输入）、带有开始符号的英文句子（解码器输入）、以及英文句子（解码器输出）。

首先初始化了三个空列表enc_inputs、dec_inputs和dec_outputs，用于存储转换后的数字序列。然后，它遍历sentence中的每个句子，并将每个句子的单词转换成对应的数字。这是通过使用之前定义的src_vocab和tgt_vocab词汇表来完成的，这些词汇表将单词映射到唯一的整数索引。

对于每个句子，代码执行以下操作：

enc_input：将中文句子中的每个单词转换为其在src_vocab中的索引，并将结果存储为一个列表。
dec_input：将带有开始符号的英文句子中的每个单词转换为其在tgt_vocab中的索引，并将结果存储为一个列表。
dec_output：将英文句子中的每个单词（不包括开始符号）转换为其在tgt_vocab中的索引，并将结果存储为一个列表。

然后，使用extend方法将这些转换后的列表添加到enc_inputs、dec_inputs和dec_outputs中。最后，函数返回三个torch.LongTensor对象，这些对象包含了所有转换后的数字序列，可以用于训练PyTorch模型。

在函数调用make_data(sentences)之后，enc_inputs、dec_inputs和dec_outputs将包含转换后的数字序列，这些序列可以用于训练Transformer模型进行机器翻译。

def make_data(sentences):
    """把单词序列转换为数字序列"""
    enc_inputs, dec_inputs, dec_outputs = [], [], []
    for i in range(len(sentences)):
 
        enc_input = [[src_vocab[n] for n in sentences[i][0].split()]]
        dec_input = [[tgt_vocab[n] for n in sentences[i][1].split()]]
        dec_output = [[tgt_vocab[n] for n in sentences[i][2].split()]]

        #[[1, 2, 3, 4, 5, 6, 7, 0], [1, 2, 8, 4, 9, 6, 7, 0], [1, 2, 3, 4, 10, 6, 7, 0]]
        enc_inputs.extend(enc_input)
        #[[9, 1, 2, 3, 4, 5, 11], [9, 1, 2, 6, 7, 5, 11], [9, 1, 2, 3, 8, 5, 11]]
        dec_inputs.extend(dec_input)
        #[[1, 2, 3, 4, 5, 11, 10], [1, 2, 6, 7, 5, 11, 10], [1, 2, 3, 8, 5, 11, 10]]
        dec_outputs.extend(dec_output)

    return torch.LongTensor(enc_inputs), torch.LongTensor(dec_inputs), torch.LongTensor(dec_outputs)


enc_inputs, dec_inputs, dec_outputs = make_data(sentences)

3.构建Transformer模型

1.全部代码

# ====================================================================================================
# Transformer模型

class PositionalEncoding(nn.Module):
    def __init__(self, d_model, dropout=0.1, max_len=5000):
        super(PositionalEncoding, self).__init__()
        self.dropout = nn.Dropout(p=dropout)