nlp-beginner task3 基于注意力机制的文本匹配（ESIM）

本文链接：https://blog.youkuaiyun.com/soyamilk233/article/details/108493494

本文档介绍了作者在实现基于注意力机制的文本匹配模型ESIM过程中的经验。作者参照了FudanNLP/nlp-beginner代码库，并对模型进行了简化，但未使用mask机制。尽管模型效果不佳，作者仍从实践中收获颇丰，认识到LSTM的双向设置，并对论文实现有了更深入的理解。主要问题在于未使用mask机制，导致性能下降，但作者认为这不应有显著影响。经过调整batch_size，发现loss表现不稳定，最终在4个epoch后的准确率停留在76.0856%。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

https://github.com/FudanNLP/nlp-beginner

参考文章：ESIM论文，ESMI代码实现参考

~~实现效果很差，暂时仅当是个记录。~~找到问题了，完全跑完估计要花个四五天，总之先挂着等数据。

1. 代码

模型部分是基本上是按着参考代码打的，只是加了点size的注释，去掉了mask机制，其他数据处理和训练等部分和task2大同小异

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import random_split
import pandas as pd
import numpy as np
import random

label_to_index = {
    'contradiction': 0,
    'neutral': 1,
    'entailment': 2
}
index_to_label = {key: value for key, value in label_to_index.items()}

read_data = pd.read_table('../snli_1.0_train.txt')
data = []
data_len = int(read_data.shape[0])  # 共550152条
for i in range(data_len):
    if pd.isnull(read_data['sentence2_binary_parse'][i]):
        # read_data['sentence2_binary_parse'][i] = 'N/A'  # 会出现N/A
        continue
    data.append([read_data['sentence1_binary_parse'][i].lower().replace('(', ' ').replace(')', ' ').split(),
                 read_data['sentence2_binary_parse'][i].lower().replace('(', ' ').replace(')', ' ').split(),
                 label_to_index[read_data['label1'][i]]])


word_to_ix = {}  # 给每个词分配index
ix_to_word = {}
word_set = set()
for sent, sent2, _ in data:
    for word in sent:
        if word not in word_to_ix:
            ix_to_word[len(word_to_ix)] = word
            word_to_ix[word] = len(word_to_ix)
            word_set.add(word)
    for word in sent2:
        if word not in word_to_ix:
            ix_to_word[len(word_to_ix)] = word
            word_to_ix[word] = len(word_to_ix)
            word_set.add(word)

unk = '<unk>'
ix_to_word[len(word_to_ix)] = unk
word_to_ix[unk] = len(word_to_ix)
wor