Python-NER-CRF

最新推荐文章于 2025-04-30 09:11:36 发布

小基基o_O

最新推荐文章于 2025-04-30 09:11:36 发布

阅读量1.6k

点赞数 3

分类专栏：自然语言处理

本文链接：https://blog.youkuaiyun.com/Yellow_python/article/details/89314157

版权

本文深入探讨了命名实体识别(NER)技术，介绍了三种主要模型：投票模型、条件随机场(CRF)及BiLSTM-CRF。通过实例展示了各模型在序列标注任务中的应用与效果，比较了它们在准确性与效率上的差异。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

投票模型

import pandas as pd
from sklearn.metrics import classification_report

# 数据
df = pd.read_csv('train.csv').fillna(method='ffill')

X = df.Word.values
y = df.Tag.values

labels = df.Tag.unique().tolist()
labels.remove('O')


# 投票模型
class Majority_vote:
    def fit(self, X, y):
        counter = {
   }
        for w, t in zip(X, y):
            if w in counter:
                if t in counter[w]:
                    counter[w][t] += 1
                else:
                    counter[w][t] = 1
            else:
                counter[w] = {
   t: 1}
        self.vote = {
   }
        for w, t in counter.items():
            self.vote[w] = max(t, key=t.get)
        return self

    def predict(self, X):
        return [self.vote.get(x, 'O') for x in X]

y_pred = Majority_vote().fit(X, y).predict(X)
report = classification_report(y, y_pred, labels)
print(report)

条件随机场

以下为链式CRF算法~

import pandas as pd
from sklearn_crfsuite import CRF
from sklearn_crfsuite.metrics import flat_classification_report

# 数据读取、预处理
data = pd.read_csv('train.csv').fillna(method='ffill')

labels = data.Tag.unique().tolist()
labels.remove('O')

# 按句子分组
f = lambda s: [(w, p, t) for w, p, t in zip