没有标题题

最新推荐文章于 2025-04-24 19:54:05 发布

A_zwu

最新推荐文章于 2025-04-24 19:54:05 发布

阅读量1k

点赞数 26

文章标签：笔记

本文链接：https://blog.youkuaiyun.com/Az_Wu/article/details/147242937

版权

1. 导入必要的库

import jieba
import torch
import torch.nn as nn
import torch.optim as optim

import jieba
- 作用：这是一个中文分词库，用于将中文句子拆分成单独的词（tokens）。
- 细节：jieba是一个基于统计模型和规则的分词工具，支持精确模式（jieba.lcut）、全模式和搜索模式。这里使用的是精确模式，适用于自然语言处理任务。
- 为什么要用它：中文不像英文有天然的空格分隔，句子如“人工智能正在改变我们的生活方式”需要分词成["人工智能", "正在", "改变", "我们", "的", "生活方式", "。"]，以便后续处理。
import torch
- 作用：PyTorch 是深度学习框架，提供张量计算、自动求导和GPU加速功能。
- 细节：它是本代码的核心依赖，所有模型的构建、训练和推理都依赖 PyTorch 的动态计算图。
import torch.nn as nn
- 作用：这是 PyTorch 的神经网络模块，提供构建神经网络的组件，如嵌入层（nn.Embedding）、LSTM（nn.LSTM）、线性层（nn.Linear）等。
- 细节：nn模块中的类通常继承自nn.Module，支持前向传播和参数管理。
import torch.optim as optim
- 作用：提供优化算法，如 Adam、SGD 等，用于更新模型参数。
- 细节：optim.Adam是一种自适应学习率优化器，结合动量法和RMSProp 的优点，适合训练深度模型。

2. 定义示例句子对

sentences = [
    ("人工智能正在改变我们的生活方式。",
     "Artificial intelligence is changing our way of life."),
    ("今天的天气很好。",
     "The weather is nice today."),
    ("我喜欢读书和学习。",
     "I like reading and studying.")
]

作用：这是训练数据，包含三对中英文句子，用于模型学习从中文到英文的翻译。
数据结构：sentences 是一个列表，每个元素是一个元组，元组的第一个元素是中文句子（字符串），第二个元素是对应的英文翻译（字符串）。

3. 分词处理

chinese_sentences = [jieba.lcut(sent[0]) for sent in sentences]
english_sentences = [sent[1].split() for sent in sentences]

chinese_sentences = [jieba.lcut(sent[0]) for sent in sentences]
- 作用：对每个中文句子进行分词，生成词列表。
- 细节：
  - sent[0]：取每对句子中的中文部分。
  - jieba.lcut：将字符串切分成词列表。例如，“人工智能正在改变我们的生活方式。”变成["人工智能", "正在", "改变", "我们", "的", "生活方式", "。"]。
  - 列表推导式：遍历sentences，对每个中文句子应用jieba.lcut，结果是一个嵌套列表：
```
[
    ["人工智能", "正在", "改变", "我们", "的", "生活方式", "。"],
    ["今天", "的", "天气", "很", "好", "。"],
    ["我", "喜欢", "读书", "和", "学习", "。"]
]
```
english_sentences = [sent[1].split() for sent in sentences]
- 作用：对每个英文句子按空格分词，生成词列表。
- 细节：
  - sent[1]：取每对句子中的英文部分。
  - split()：默认按空格分割字符串。例如，“Artificial intelligence is changing our way of life.”变成["Artificial", "intelligence", "is", "changing", "our", "way", "of", "life."]。
  - 结果是一个嵌套列表：
```
[
    ["Artificial", "intelligence", "is", "changing", "our", "way", "of", "life."],
    ["The", "weather", "is", "nice", "today."],
    ["I", "like", "reading", "and", "studying."]
]
```

4. 构建词汇表

def build_vocab(sentences):
    vocab = {"<pad>": 0, "<sos>": 1, "<eos>": 2, "<unk>": 3}
    for sent in sentences:
        for word in sent:
            if word not in vocab:
                vocab[word] = len(vocab)
    return vocab

chinese_vocab = build_vocab(chinese_sentences)
english_vocab = build_vocab(english_sentences)

def build_vocab(sentences):
- 作用：为输入的句子列表构建词汇表，将每个唯一词映射到一个整数索引。
- 参数：
  - sentences：嵌套列表，每个元素是一个分词后的句子（词列表）。
- 细节：
  - 初始化：
    - vocab 是一个字典，预定义 4 个特殊 token：
      - "<pad>"（索引 0）：填充标记，用于补齐短序列。
      - "<sos>"（索引 1）：序列开始标记，表示翻译输入或输出的起点。
      - "<eos>"（索引 2）：序列结束标记，表示序列终点。
      - "<unk>"（索引 3）：未知词标记，用于处理词汇表外的词。
  - 遍历：
    - 外层循环 for sent in sentences：遍历每个句子。
    - 内层循环 for word in sent：遍历句子中的每个词。
    - if word not in vocab：检查词是否已存在于词汇表中。
    - vocab[word] = len(vocab)：如果词不在词汇表中，将其添加并分配一个新索引（当前词汇表长度）。
  - 返回：返回完整的词汇表字典。
- 示例：
  - 对于 chinese_sentences，可能生成：
```
{
    "<pad>": 0, "<sos>": 1, "<eos>": 2, "<unk>": 3,
    "人工智能": 4, "正在": 5, "改变": 6, "我们": 7, "的": 8,
    "生活方式": 9, "。": 10, "今天": 11, "天气": 12, "很": 13,
    "好": 14, "我": 15, "喜欢": 16, "读书": 17, "和": 18, "学习": 19
}
```
  - 对于 english_sentences，可能生成：
```
{
    "<pad>": 0, "<sos>": 1, "<eos>": 2, "<unk>": 3,
    "Artificial": 4, "intelligence": 5, "is": 6, "changing": 7,
    "our": 8, "way": 9, "of": 10, "life.": 11, "The": 12, "weather": 13,
    "nice": 14, "today.": 15, "I": 16, "like": 17, "reading": 18,
    "and": 19, "studying.": 20
}
```
chinese_vocab = build_vocab(chinese_sentences)
- 调用 build_vocab，传入中文分词结果，生成中文词汇表。
english_vocab = build_vocab(english_sentences)
- 调用 build_vocab，传入英文分词结果，生成英文词汇表。

5. 将句子转换为索引序列

def sentence_to_indices(sentences, vocab):
    return [[vocab["<sos>"]] + [vocab.get(word, vocab["<unk>"]) for word in sent] + [vocab["<eos>"]]
            for sent in sentences]

chinese_indices = sentence_to_indices(chinese_sentences, chinese_vocab)
english_indices = sentence_to_indices(english_sentences, english_vocab)

def sentence_to_indices(sentences, vocab):
- 作用：将分词后的句子转换为索引序列，并在开头和结尾添加 <sos> 和 <eos>。
- 参数：
  - sentences：分词后的句子列表。
  - vocab：对应的词汇表字典。
- 细节：
  - 内部列表推导式：[vocab.get(word, vocab["<unk>"]) for word in sent]
    - 遍历句子中的每个词 word。
    - vocab.get(word, vocab["<unk>"])：查找词在词汇表中的索引，若不存在，返回 <unk> 的索引（3）。
    - 例如，对于 ["人工智能", "正在"]，可能生成 [4, 5]。
  - 拼接：
    - [vocab["<sos>"]]：开头添加 <sos> 的索引（1）。
    - + [vocab.get(...)]：中间是词的索引序列。
    - + [vocab["<eos>"]]：结尾添加 <eos> 的索引（2）。
  - 外层列表推导式：对每个句子执行上述操作，返回嵌套列表。
- 示例：
  - 中文句子 ["人工智能", "正在", "改变", "我们", "的", "生活方式", "。"] -> [1, 4, 5, 6, 7, 8, 9, 10, 2]。
  - 英文句子 ["Artificial", "intelligence", "is", "changing", "our", "way", "of", "life."] -> [1, 4, 5, 6, 7, 8, 9, 10, 11, 2]。

chinese_indices = sentence_to_indices(chinese_sentences, chinese_vocab)

结果示例：

[
    [1, 4, 5, 6, 7, 8, 9, 10, 2],    # 人工智能正在改变我们的生活方式。
    [1, 11, 8, 12, 13, 14, 10, 2],   # 今天的天气很好。
    [1, 15, 16, 17, 18, 19, 10, 2]   # 我喜欢读书和学习。
]

english_indices = sentence_to_indices(english_sentences, english_vocab)

结果示例：

[
    [1, 4, 5, 6, 7, 8, 9, 10, 11, 2],   # Artificial intelligence is changing our way of life.
    [1, 12, 13, 6, 14, 15, 2],          # The weather is nice today.
    [1, 16, 17, 18, 19, 20, 2]          # I like reading and studying.
]

6. 序列填充（Padding）

max_len = max(max(len(sent) for sent in chinese_indices), max(len(sent) for sent in english_indices))
chinese_padded = [sent + [chinese_vocab["<pad>"]] * (max_len - len(sent)) for sent in chinese_indices]
english_padded = [sent + [english_vocab["<pad>"]] * (max_len - len(sent)) for sent in english_indices]

max_len = max(max(len(sent) for sent in chinese_indices), max(len(sent) for sent in english_indices))
- 作用：计算所有序列中的最大长度，用于填充。
- 细节：
  - len(sent) for sent in chinese_indices：计算每个中文序列的长度（如 9、8、8）。
  - max(...)：取中文序列的最大长度（9）。
  - 同样计算英文序列的最大长度（10）。
  - max(9, 10)：取两者中的最大值，结果 max_len = 10。
- 意义：确保所有序列长度一致，以便转换为张量并进行批处理。
chinese_padded = [sent + [chinese_vocab["<pad>"]] * (max_len - len(sent)) for sent in chinese_indices]
- 作用：对中文序列填充 <pad>，使长度达到 max_len。
- 细节：
  - [chinese_vocab["<pad>"]]：<pad> 的索引（0）。
  - (max_len - len(sent))：计算需要填充的长度。例如，序列长度 9，需填充 1 个 <pad>。
  - *：列表乘法，重复 <pad> 指定次数。
  - sent + ...：将填充部分追加到原始序列后。
  - 列表推导式：对每个序列执行填充。
- 示例：
  - [1, 4, 5, 6, 7, 8, 9, 10, 2]（长度 9） -> [1, 4, 5, 6, 7, 8, 9, 10, 2, 0]。
  - 结果：
```
[
    [1, 4, 5, 6, 7, 8, 9, 10, 2, 0],
    [1, 11, 8, 12, 13, 14, 10, 2, 0, 0],
    [1, 15, 16, 17, 18, 19, 10, 2, 0, 0]
]
```
english_padded = [sent + [english_vocab["<pad>"]] * (max_len - len(sent)) for sent in english_indices]
- 作用：对英文序列填充 <pad>。
- 示例：
  - [1, 12, 13, 6, 14, 15, 2]（长度 7） -> [1, 12, 13, 6, 14, 15, 2, 0, 0, 0]。
  - 结果：
```
[
    [1, 4, 5, 6, 7, 8, 9, 10, 11, 2],
    [1, 12, 13, 6, 14, 15, 2, 0, 0, 0],
    [1, 16, 17, 18, 19, 20, 2, 0, 0, 0]
]
```

7. 转换为张量

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
src_data = torch.tensor(chinese_padded, dtype=torch.long).to(device)  # [batch_size, seq_len]
trg_data = torch.tensor(english_padded, dtype=torch.long).to(device)  # [batch_size, seq_len]

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
- 作用：检测是否有 GPU 可用，若有则使用 GPU（“cuda”），否则使用 CPU。
- 细节：
  - torch.cuda.is_available()：返回布尔值，检查 CUDA 是否可用。
  - torch.device：创建一个设备对象，用于指定张量和模型的计算位置。
src_data = torch.tensor(chinese_padded, dtype=torch.long).to(device)
- 作用：将填充后的中文序列转换为 PyTorch 张量，并移动到指定设备。
- 细节：
  - torch.tensor(chinese_padded)：将嵌套列表转换为张量，形状为 [batch_size, seq_len]，这里是 [3, 10]。
  - dtype=torch.long：指定数据类型为长整型，适合表示索引。
  - .to(device)：将张量移动到 GPU 或 CPU。
- 结果：src_data 是一个 [3, 10] 的张量，表示 3 个中文序列，每个序列长度 10。
trg_data = torch.tensor(english_padded, dtype=torch.long).to(device)
- 作用：将填充后的英文序列转换为张量。
- 细节：同上，形状为 [3, 10]。

8. 定义编码器（Encoder）

class Encoder(nn.Module):
    def __init__(self, input_dim, emb_dim, hid_dim, n_layers, dropout):
        super().__init__()
        self.embedding = nn.Embedding(input_dim, emb_dim)
        self.lstm = nn.LSTM(emb_dim, hid_dim, n_layers, dropout=dropout)
        self.dropout = nn.Dropout(dropout)
    
    def forward(self, src):
        embedded = self.dropout(self.embedding(src))
        outputs, (hidden, cell) = self.lstm(embedded)
        return outputs, hidden, cell

class Encoder(nn.Module):
- 作用：编码器将源语言（中文）序列转换为隐藏表示，作为解码器的上下文。
- 继承：nn.Module 是 PyTorch 的基类，提供模型参数管理和前向传播功能。
def __init__(self, input_dim, emb_dim, hid_dim, n_layers, dropout):
- 参数：
  - input_dim：输入词汇表大小（中文词汇表长度）。
  - emb_dim：词嵌入维度（128），将每个词映射为一个稠密向量。
  - hid_dim：LSTM 隐藏层维度（256），控制模型容量。
  - n_layers：LSTM 层数（2），多层 LSTM 可以捕捉更复杂的依赖关系。
  - dropout：丢弃率（0.3），用于正则化，防止过拟合。
- 组件：
  - self.embedding = nn.Embedding(input_dim, emb_dim)
    - 将索引转换为嵌入向量，形状从 [seq_len, batch_size] 变为 [seq_len, batch_size, emb_dim]。
  - self.lstm = nn.LSTM(emb_dim, hid_dim, n_layers, dropout=dropout)
    - LSTM 接收嵌入向量，输出隐藏状态。dropout 应用于层间连接（不包括最后一层）。
  - self.dropout = nn.Dropout(dropout)
    - 在嵌入层后应用 dropout，随机丢弃部分神经元。
def forward(self, src):
- 输入：
  - src：源序列张量，形状 [seq_len, batch_size]（如 [10, 3]）。
- 过程：
  - embedded = self.dropout(self.embedding(src))
    - self.embedding(src)：将索引映射为嵌入向量，形状 [seq_len, batch_size, emb_dim]（如 [10, 3, 128]）。
    - self.dropout(...)：应用 dropout，随机将部分值置为 0。
  - outputs, (hidden, cell) = self.lstm(embedded)
    - outputs：每个时间步的输出，形状 [seq_len, batch_size, hid_dim]（如 [10, 3, 256]）。
    - hidden：最后一层的隐藏状态，形状 [n_layers, batch_size, hid_dim]（如 [2, 3, 256]）。
    - cell：最后一层的细胞状态，形状同 hidden。
- 输出：
  - outputs：序列的上下文表示，供注意力机制使用。
  - hidden, cell：传递给解码器，作为初始状态。

9. 定义注意力机制（Attention）

class Attention(nn.Module):
    def __init__(self, hid_dim):
        super().__init__()
        self.attn = nn.Linear(hid_dim * 2, hid_dim)
        self.v = nn.Linear(hid_dim, 1, bias=False)
    
    def forward(self, hidden, encoder_outputs):
        src_len = encoder_outputs.shape[0]
        batch_size = encoder_outputs.shape[1]
        
        hidden = hidden[-1].unsqueeze(0).repeat(src_len, 1, 1)
        energy = torch.tanh(self.attn(torch.cat((hidden, encoder_outputs), dim=2)))
        attention = self.v(energy).squeeze(2)
        return torch.softmax(attention, dim=0)

class Attention(nn.Module):
- 作用：实现 Bahdanau 注意力机制，计算解码器对编码器输出的关注权重。
def __init__(self, hid_dim):
- 参数：
  - hid_dim：隐藏层维度（256）。
- 组件：
  - self.attn = nn.Linear(hid_dim * 2, hid_dim)
    - 输入维度是 hid_dim * 2，因为它拼接了解码器的隐藏状态和编码器的输出。
  - self.v = nn.Linear(hid_dim, 1, bias=False)
    - 将能量转换为标量注意力分数，无偏置以减少参数。
def forward(self, hidden, encoder_outputs):
- 输入：
  - hidden：解码器最新隐藏状态，形状 [n_layers, batch_size, hid_dim]（如 [2, 3, 256]）。
  - encoder_outputs：编码器输出，形状 [seq_len, batch_size, hid_dim]（如 [10, 3, 256]）。
- 过程：
  - src_len = encoder_outputs.shape[0]：源序列长度（如 10）。
  - batch_size = encoder_outputs.shape[1]：批次大小（如 3）。
  - hidden = hidden[-1].unsqueeze(0).repeat(src_len, 1, 1)
    - hidden[-1]：取最后一层隐藏状态，形状 [batch_size, hid_dim]（如 [3, 256]）。
    - .unsqueeze(0)：增加维度，变为 [1, batch_size, hid_dim]。
    - .repeat(src_len, 1, 1)：重复 src_len 次，变为 [seq_len, batch_size, hid_dim]（如 [10, 3, 256]），与 encoder_outputs 对齐。
  - energy = torch.tanh(self.attn(torch.cat((hidden, encoder_outputs), dim=2)))
    - torch.cat((hidden, encoder_outputs), dim=2)：在第 2 维拼接，形状 [seq_len, batch_size, hid_dim * 2]（如 [10, 3, 512]）。
    - self.attn(...)：线性变换，形状变为 [seq_len, batch_size, hid_dim]（如 [10, 3, 256]）。
    - torch.tanh(...)：应用 tanh 激活函数，归一化能量值到 [-1, 1]。
  - attention = self.v(energy).squeeze(2)
    - self.v(energy)：将能量映射为标量，形状 [seq_len, batch_size, 1]。
    - .squeeze(2)：移除第 2 维，形状变为 [seq_len, batch_size]（如 [10, 3]）。
  - return torch.softmax(attention, dim=0)
    - 沿第 0 维（序列维度）应用 softmax，使每列和为 1，表示注意力权重。
- 输出：
  - 注意力权重，形状 [seq_len, batch_size]，表示解码器对每个源词的关注程度。

10. 定义解码器（Decoder）

class Decoder(nn.Module):
    def __init__(self, output_dim, emb_dim, hid_dim, n_layers, dropout, attention):
        super().__init__()
        self.output_dim = output_dim
        self.attention = attention
        self.embedding = nn.Embedding(output_dim, emb_dim)
        self.lstm = nn.LSTM(emb_dim + hid_dim, hid_dim, n_layers, dropout=dropout)
        self.fc_out = nn.Linear(hid_dim * 2 + emb_dim, output_dim)
        self.dropout = nn.Dropout(dropout)
    
    def forward(self, input, hidden, cell, encoder_outputs):
        input = input.unsqueeze(0)
        embedded = self.dropout(self.embedding(input))
        a = self.attention(hidden, encoder_outputs)
        a = a.unsqueeze(2)
        weighted = torch.bmm(encoder_outputs.permute(1, 2, 0), a.permute(1, 0, 2)).permute(2, 0, 1)
        rnn_input = torch.cat((embedded, weighted), dim=2)
        output, (hidden, cell) = self.lstm(rnn_input, (hidden, cell))
        embedded = embedded.squeeze(0)
        output = output.squeeze(0)
        weighted = weighted.squeeze(0)
        prediction = self.fc_out(torch.cat((output, weighted, embedded), dim=1))
        return prediction, hidden, cell

class Decoder(nn.Module):
- 作用：解码器根据编码器的上下文生成目标语言（英文）序列。
def __init__(self, output_dim, emb_dim, hid_dim, n_layers, dropout, attention):
- 参数：
  - output_dim：目标词汇表大小（英文词汇表长度）。
  - attention：注意力机制实例。
  - 其他参数同编码器。
- 组件：
  - self.embedding = nn.Embedding(output_dim, emb_dim)：目标词嵌入层。
  - self.lstm = nn.LSTM(emb_dim + hid_dim, hid_dim, n_layers, dropout=dropout)
    - 输入维度是 emb_dim + hid_dim，因为拼接了嵌入向量和注意力加权上下文。
  - self.fc_out = nn.Linear(hid_dim * 2 + emb_dim, output_dim)
    - 输出层，输入拼接了 LSTM 输出、加权上下文和嵌入向量。
  - self.dropout = nn.Dropout(dropout)：正则化。
def forward(self, input, hidden, cell, encoder_outputs):
- 输入：
  - input：当前目标词索引，形状 [batch_size]（如 [3]）。
  - hidden, cell：上一时间步的状态，形状 [n_layers, batch_size, hid_dim]。
  - encoder_outputs：编码器输出，形状 [seq_len, batch_size, hid_dim]。
- 过程：
  - input = input.unsqueeze(0)：增加时间步维度，形状 [1, batch_size]。
  - embedded = self.dropout(self.embedding(input))：嵌入并 dropout，形状 [1, batch_size, emb_dim]。
  - a = self.attention(hidden, encoder_outputs)：计算注意力权重，形状 [seq_len, batch_size]。
  - a = a.unsqueeze(2)：形状变为 [seq_len, batch_size, 1]。
  - weighted = torch.bmm(encoder_outputs.permute(1, 2, 0), a.permute(1, 0, 2)).permute(2, 0, 1)
    - encoder_outputs.permute(1, 2, 0)：形状 [batch_size, hid_dim, seq_len]。
    - a.permute(1, 0, 2)：形状 [batch_size, seq_len, 1]。
    - torch.bmm(...)：批矩阵乘法，形状 [batch_size, hid_dim, 1]。
    - .permute(2, 0, 1)：形状 [1, batch_size, hid_dim]。
  - rnn_input = torch.cat((embedded, weighted), dim=2)：拼接，形状 [1, batch_size, emb_dim + hid_dim]。
  - output, (hidden, cell) = self.lstm(rnn_input, (hidden, cell))
    - output：形状 [1, batch_size, hid_dim]。
    - hidden, cell：更新后的状态。
  - embedded = embedded.squeeze(0)：形状 [batch_size, emb_dim]。
  - output = output.squeeze(0)：形状 [batch_size, hid_dim]。
  - weighted = weighted.squeeze(0)：形状 [batch_size, hid_dim]。
  - prediction = self.fc_out(torch.cat((output, weighted, embedded), dim=1))
    - 拼接后形状 [batch_size, hid_dim * 2 + emb_dim]，输出 [batch_size, output_dim]。
- 输出：
  - prediction：词汇表概率分布。
  - hidden, cell：更新后的状态。

11. 定义 Seq2Seq 模型

class Seq2Seq(nn.Module):
    def __init__(self, encoder, decoder, device):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder
        self.device = device
    
    def forward(self, src, trg, teacher_forcing_ratio=0.5):
        batch_size = src.shape[1]
        trg_len = trg.shape[0]
        trg_vocab_size = self.decoder.output_dim
        outputs = torch.zeros(trg_len, batch_size, trg_vocab_size).to(self.device)
        encoder_outputs, hidden, cell = self.encoder(src)
        input = trg[0, :]
        for t in range(1, trg_len):
            output, hidden, cell = self.decoder(input, hidden, cell, encoder_outputs)
            outputs[t] = output
            teacher_force = torch.rand(1).item() < teacher_forcing_ratio
            input = trg[t] if teacher_force else output.argmax(1)
        return outputs

class Seq2Seq(nn.Module):
- 作用：整合编码器和解码器，完成序列到序列的翻译。
def __init__(self, encoder, decoder, device):
- 参数：
  - encoder, decoder：实例化的编码器和解码器。
  - device：计算设备。
def forward(self, src, trg, teacher_forcing_ratio=0.5):
- 输入：
  - src：源序列，形状 [seq_len, batch_size]。
  - trg：目标序列，形状 [seq_len, batch_size]。
  - teacher_forcing_ratio：教师强制概率（0.5）。
- 过程：
  - batch_size, trg_len：从输入形状提取。
  - trg_vocab_size：目标词汇表大小。
  - outputs = torch.zeros(trg_len, batch_size, trg_vocab_size).to(self.device)：初始化输出张量。
  - encoder_outputs, hidden, cell = self.encoder(src)：编码源序列。
  - input = trg[0, :]：初始输入为 <sos>。
  - 循环解码：
    - output, hidden, cell = self.decoder(...)：生成下一步预测。
    - outputs[t] = output：存储预测。
    - teacher_force = torch.rand(1).item() < teacher_forcing_ratio：随机决定是否使用教师强制。
    - input = trg[t] if teacher_force else output.argmax(1)：选择真实目标或模型预测作为下一步输入。
- 输出：
  - outputs：形状 [trg_len, batch_size, trg_vocab_size]，每时间步的概率分布。

12. 设置超参数

INPUT_DIM = len(chinese_vocab)
OUTPUT_DIM = len(english_vocab)
EMB_DIM = 128
HID_DIM = 256
N_LAYERS = 2
DROPOUT = 0.3
EPOCHS = 50
BATCH_SIZE = len(sentences)

详细说明：
- INPUT_DIM：中文词汇表大小（如 20）。
- OUTPUT_DIM：英文词汇表大小（如 21）。
- EMB_DIM：嵌入维度（128），较小但适合小数据。
- HID_DIM：隐藏层维度（256），控制模型容量。
- N_LAYERS：LSTM 层数（2），增加模型深度。
- DROPOUT：0.3，防止过拟合。
- EPOCHS：训练 50 轮，可能过多，易过拟合。
- BATCH_SIZE：3，与数据量相同，无批次划分。

13. 实例化模型

enc = Encoder(INPUT_DIM, EMB_DIM, HID_DIM, N_LAYERS, DROPOUT)
attn = Attention(HID_DIM)
dec = Decoder(OUTPUT_DIM, EMB_DIM, HID_DIM, N_LAYERS, DROPOUT, attn)
model = Seq2Seq(enc, dec, device).to(device)

细节：
- 创建编码器、注意力机制和解码器实例。
- 组装成 Seq2Seq 模型并移动到指定设备。

14. 定义损失函数和优化器

criterion = nn.CrossEntropyLoss(ignore_index=english_vocab["<pad>"])
optimizer = optim.Adam(model.parameters(), lr=0.001)

criterion：交叉熵损失，忽略 <pad> 的贡献。
optimizer：Adam 优化器，学习率 0.001。

15. 训练循环

for epoch in range(EPOCHS):
    model.train()
    optimizer.zero_grad()
    
    src = src_data.transpose(0, 1)  # [seq_len, batch_size]
    trg = trg_data.transpose(0, 1)  # [seq_len, batch_size]
    
    output = model(src, trg)
    
    output = output[1:].reshape(-1, output.shape[-1])
    trg = trg[1:].reshape(-1)
    
    loss = criterion(output, trg)
    loss.backward()
    
    torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
    
    optimizer.step()
    
    if (epoch + 1) % 10 == 0:
        print(f'Epoch {epoch+1}/{EPOCHS}, Loss: {loss.item():.4f}')

细节：
- model.train()：启用训练模式（启用 dropout）。
- src, trg：转置为 [seq_len, batch_size]。
- output：模型预测，形状 [trg_len, batch_size, trg_vocab_size]。
- output[1:].reshape(-1, ...)：跳过 <sos>，展平为 [N, trg_vocab_size]。
- trg[1:].reshape(-1)：展平为 [N]。
- loss.backward()：计算梯度。
- clip_grad_norm_：裁剪梯度，防止爆炸。
- optimizer.step()：更新参数。

16. 翻译函数

def translate_sentence(model, sentence, chinese_vocab, english_vocab, device, max_len=50):
    model.eval()
    tokens = jieba.lcut(sentence)
    indices = [chinese_vocab["<sos>"]] + [chinese_vocab.get(token, chinese_vocab["<unk>"]) for token in tokens] + [chinese_vocab["<eos>"]]
    src = torch.tensor(indices, dtype=torch.long).unsqueeze(1).to(device)
    with torch.no_grad():
        encoder_outputs, hidden, cell = model.encoder(src)
    trg_indices = [english_vocab["<sos>"]]
    for _ in range(max_len):
        trg_tensor = torch.tensor([trg_indices[-1]], dtype=torch.long).to(device)
        with torch.no_grad():
            output, hidden, cell = model.decoder(trg_tensor, hidden, cell, encoder_outputs)
        pred_token = output.argmax(1).item()
        trg_indices.append(pred_token)
        if pred_token == english_vocab["<eos>"]:
            break
    trg_tokens = [list(english_vocab.keys())[list(english_vocab.values()).index(idx)] for idx in trg_indices]
    return " ".join(trg_tokens[1:-1])

细节：
- model.eval()：推理模式，禁用 dropout。
- 分词并转换为索引序列。
- 编码输入，逐词解码，使用 greedy decoding。
- 将索引转换回单词，拼接成句子。

17. 测试翻译

while True:
    test_sentence = input("请输入中文句子（输入 '退出' 结束）：")
    if test_sentence == "退出":
        break
    translation = translate_sentence(model, test_sentence, chinese_vocab, english_vocab, device)
    print(f"英文翻译: {translation}")