关系抽取RE任务中的metrics:Precision、Recall、F1-score

说在前面

        最近在研究关系抽取任务,接下来会对抽取结果进行数据分析,需要使用到这3个评估指标,在此记录他们的计算方法,有没有代码等问题。

        精确度 (Precision, P)、召回率 (Recall, R) 和 F1-score,这三个指标是用来衡量模型在关系抽取任务中的性能的。

        计算方法:使用工具包 sklearn.metrics 。

        关系抽取任务:LLM关系识别分类任务关系多分类问题),输入 entity pair、relations。输出 best relation。

理解P/R/F1的公式

precision:预测为关系 instance of 的样本中,实际为关系 instance of 的的概率,看该类别混淆矩阵的列数据。分母是预测数据中为关系 instance of 的样本数据。

recall:真实关系为 instance of 的样本中,实际为关系 instance of 的概率,看该类别混淆矩阵的行数据。分母是 instance of 类别的所有样本数据。

理解TP/FP/FN定义

数据示例:1000个样本数据格式:{text, head, relation, tail}。

任务:关系多分类问题,

  • 输入:entity pair、relations。
  • 输出:让模型从relations里面预测概率最大的一个relation。(LLM认为entity pair间不存在关系如何处理呢?)

TP/FP/FN定义:

  1. TP 的定义:预测关系为A,真实关系也为A。(真实关系为A,预测关系也为A)
  2. FP 的定义:预测关系为A,真实关系为其他关系。
  3. FN 的定义:真实关系为A,预测关系为其他关系。

  • TP 是指模型预测为某一类,且真实标签也是该类(即正确预测)。
  • FP 是指模型预测为某一类,但真实标签不是该类(即错误预测)。
  • FN 是指真实标签为某一类,但模型没有预测为该类(即漏预测)。

理解混淆矩阵:

理解macro/micro/weighted

  • Micro(微平均):

    • 定义: 在这种方法中,首先对所有类别的 True Positives (TP)False Positives (FP)False Negatives (FN) 进行汇总,然后使用这些总和来计算 precisionrecallF1 分数
    • 适用场景: 适用于类别分布比较均衡,或者你更关心整体的性能,而不是各个类别的单独性能。它对类别不平衡较为鲁棒,因为每个样本的贡献是均等的。
  • Macro(宏平均):

    • 定义: 在这种方法中,先计算每个类别的 precisionrecallF1 分数,然后对这些单独类别的分数取平均对每一类都赋予了相同的权重。
    • 适用场景: 适用于各类别的性能比较重要的情况,尤其是在类别不平衡时。它不会受到大类别的主导影响,能反映小类别的性能。但如果某些类别的样本很少,可能导致 F1 分数不稳定。
  • Weighted(加权平均):

    • 定义: 加权平均方法会根据每个类别在数据集中样本的比例,给每个类别的 precisionrecallF1 分数 赋予不同的权重。加权平均是根据类别出现的频率来调整每个类别的贡献。根据每一类的比例分别赋予不同的权重。
    • 适用场景: 适用于类别不平衡的任务,能够综合考虑每个类别的重要性,并对类别频率较高的类别给予更大的权重。

 sklearn.metrics计算分数代码demo

from sklearn.metrics import precision_score, recall_score, f1_score

# 示例数据
y_true = [0, 1, 2, 1, 0]  # 真实关系标签
y_pred = [0, 1, 4, 1, 0]  # 预测关系标签

# 计算精确率、召回率和F1分数
precision = precision_score(y_true, y_pred, average='macro')  # 或者使用 'micro', 'weighted'
recall = recall_score(y_true, y_pred, average='macro')        # 或者使用 'micro', 'weighted'
f1 = f1_score(y_true, y_pred, average='macro')                # 或者使用 'micro', 'weighted'

print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")

20241211代码最新探究

import json
from sklearn.metrics import classification_report, precision_score, recall_score, f1_score

# 确保类别对齐
labels = ['instance of', 'has part', 'taxon rank', 'parent taxon', 'part of', 'subclass of']

y_true = []  # 真实关系标签
y_pred = []  # 预测关系标签

path = './output/results_final.json'
with open(path, 'r', encoding='utf-8') as file:
    data = json.load(file)

for item in data:
    gold_relation = item['triple']['relation']
    pred_relation = item['output']['relation']
    y_true.append(gold_relation)
    y_pred.append(pred_relation)
print('y_true:',len(y_true) , y_true)
print('y_pred:',len(y_pred) , y_pred)


measure_result = classification_report(y_true, y_pred, labels=labels, zero_division=0)
print('measure_result = \n', measure_result)

print("----------------------------- precision(精确率)-----------------------------")
precision_score_average_None = precision_score(y_true, y_pred, labels=labels, average=None, zero_division=0)
precision_score_average_micro = precision_score(y_true, y_pred, labels=labels, average='micro', zero_division=0)
precision_score_average_macro = precision_score(y_true, y_pred, labels=labels, average='macro', zero_division=0)
precision_score_average_weighted = precision_score(y_true, y_pred, labels=labels, average='weighted', zero_division=0)
print('precision_score_average_None = ', precision_score_average_None)
print('precision_score_average_micro = ', precision_score_average_micro)
print('precision_score_average_macro = ', precision_score_average_macro)
print('precision_score_average_weighted = ', precision_score_average_weighted)

print("\n\n----------------------------- recall(召回率)-----------------------------")
recall_score_average_None = recall_score(y_true, y_pred, labels=labels, average=None, zero_division=0)
recall_score_average_micro = recall_score(y_true, y_pred, labels=labels, average='micro', zero_division=0)
recall_score_average_macro = recall_score(y_true, y_pred, labels=labels, average='macro', zero_division=0)
recall_score_average_weighted = recall_score(y_true, y_pred, labels=labels, average='weighted', zero_division=0)
print('recall_score_average_None = ', recall_score_average_None)
print('recall_score_average_micro = ', recall_score_average_micro)
print('recall_score_average_macro = ', recall_score_average_macro)
print('recall_score_average_weighted = ', recall_score_average_weighted)

print("\n\n----------------------------- F1-value-----------------------------")
f1_score_average_None = f1_score(y_true, y_pred, labels=labels, average=None, zero_division=0)
f1_score_average_micro = f1_score(y_true, y_pred, labels=labels, average='micro', zero_division=0)
f1_score_average_macro = f1_score(y_true, y_pred, labels=labels, average='macro', zero_division=0)
f1_score_average_weighted = f1_score(y_true, y_pred, labels=labels, average='weighted', zero_division=0)
print('f1_score_average_None = ', f1_score_average_None)
print('f1_score_average_micro = ', f1_score_average_micro)
print('f1_score_average_macro = ', f1_score_average_macro)
print('f1_score_average_weighted = ', f1_score_average_weighted)

20241216代码最新探究

import json
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, precision_recall_fscore_support
from sklearn.metrics import classification_report, precision_score, recall_score, f1_score

# 确保类别对齐
labels = ['instance of', 'has part', 'taxon rank', 'parent taxon', 'part of', 'subclass of']

y_true = []  # 真实关系标签
y_pred = []  # 预测关系标签

path = './output/results_final.json'
with open(path, 'r', encoding='utf-8') as file:
    data = json.load(file)
for item in data:
    gold_relation = item['triple']['relation']
    pred_relation = item['output']['relation']
    y_true.append(gold_relation)
    y_pred.append(pred_relation)
print('y_true:',len(y_true) , y_true)
print('y_pred:',len(y_pred) , y_pred)

cm = confusion_matrix(y_true, y_pred, labels=labels)  # 显式指定标签顺序

# 创建混淆矩阵的可视化对象
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=labels)
# 生成混淆矩阵的图像, 可设置图形颜色
disp.plot(cmap=plt.cm.Blues)

# 旋转x轴标签,提高可读性
plt.xticks(rotation=45) # x轴旋转45°
plt.yticks(rotation=0)  # y轴水平
plt.tight_layout()  # 自动调整布局
plt.savefig('confusion_matrix.png')

# 通过 precision_recall_fscore_support 提取 TP/FP/FN
precision, recall, f1, support = precision_recall_fscore_support(y_true, y_pred, labels=labels)

# 初始化字典存储 TP、FP、FN
results = {label: {'TP': 0, 'FP': 0, 'FN': 0} for label in labels}

# 通过 precision, recall 和 support 反推出 TP、FP 和 FN
for i, label in enumerate(labels):
    TP = int(support[i] * recall[i])  # recall = TP / (TP + FN)
    FN = support[i] - TP             # FN = 支持数 - TP
    FP = int(TP / precision[i]) - TP if precision[i] > 0 else 0  # precision = TP / (TP + FP)

    results[label]['TP'] = TP
    results[label]['FP'] = FP
    results[label]['FN'] = FN

# 输出结果
for label in labels:
    print(f"类别: {label}")
    print(f"  TP: {results[label]['TP']}")
    print(f"  FP: {results[label]['FP']}")
    print(f"  FN: {results[label]['FN']}")


measure_result = classification_report(y_true, y_pred, labels=labels, zero_division=0)
print('measure_result = \n', measure_result)

print("----------------------------- precision(精确率)-----------------------------")
precision_score_average_None = precision_score(y_true, y_pred, labels=labels, average=None, zero_division=0)
precision_score_average_micro = precision_score(y_true, y_pred, labels=labels, average='micro', zero_division=0)
precision_score_average_macro = precision_score(y_true, y_pred, labels=labels, average='macro', zero_division=0)
precision_score_average_weighted = precision_score(y_true, y_pred, labels=labels, average='weighted', zero_division=0)
print('precision_score_average_None = ', precision_score_average_None)
print('precision_score_average_micro = ', precision_score_average_micro)
print('precision_score_average_macro = ', precision_score_average_macro)
print('precision_score_average_weighted = ', precision_score_average_weighted)

print("\n\n----------------------------- recall(召回率)-----------------------------")
recall_score_average_None = recall_score(y_true, y_pred, labels=labels, average=None, zero_division=0)
recall_score_average_micro = recall_score(y_true, y_pred, labels=labels, average='micro', zero_division=0)
recall_score_average_macro = recall_score(y_true, y_pred, labels=labels, average='macro', zero_division=0)
recall_score_average_weighted = recall_score(y_true, y_pred, labels=labels, average='weighted', zero_division=0)
print('recall_score_average_None = ', recall_score_average_None)
print('recall_score_average_micro = ', recall_score_average_micro)
print('recall_score_average_macro = ', recall_score_average_macro)
print('recall_score_average_weighted = ', recall_score_average_weighted)

print("\n\n----------------------------- F1-value-----------------------------")
f1_score_average_None = f1_score(y_true, y_pred, labels=labels, average=None, zero_division=0)
f1_score_average_micro = f1_score(y_true, y_pred, labels=labels, average='micro', zero_division=0)
f1_score_average_macro = f1_score(y_true, y_pred, labels=labels, average='macro', zero_division=0)
f1_score_average_weighted = f1_score(y_true, y_pred, labels=labels, average='weighted', zero_division=0)
print('f1_score_average_None = ', f1_score_average_None)
print('f1_score_average_micro = ', f1_score_average_micro)
print('f1_score_average_macro = ', f1_score_average_macro)
print('f1_score_average_weighted = ', f1_score_average_weighted)

20241216-v2代码最新探究

  • TP:预测关系为A,真实关系也为A
  • FP:预测为关系A,但真实关系为其他关系。
  • FN:真实关系为A,预测关系为其他关系(包含NULL)。

1计算Micro 微平均:

  • Precision = 关系识别正确数/总样本-NULL样本
  • Recall = 关系识别正确数/总样本。

2计算 Macro 宏平均:

计算每个类别的precision、recall、f1-score,汇总取平均值。

import json
import matplotlib
import numpy as np

matplotlib.use('Agg')
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, precision_recall_fscore_support
from sklearn.metrics import classification_report, precision_score, recall_score, f1_score

# 确保类别对齐
labels = ['instance of', 'has part', 'taxon rank', 'parent taxon', 'part of', 'subclass of', 'NULL']

y_true = []  # 真实关系标签
y_pred = []  # 预测关系标签
NULL_SAMPLES = 0    # 预测关系中不存在关系的样本数
RELATIONS = 6 # 关系类型数量

path = './output/results_final.json'
with open(path, 'r', encoding='utf-8') as file:
    data = json.load(file)
for item in data:
    gold_relation = item['triple']['relation']
    pred_relation = item['output']['relation']
    if pred_relation == 'NULL':
        NULL_SAMPLES += 1
    y_true.append(gold_relation)
    y_pred.append(pred_relation)

cm = confusion_matrix(y_true, y_pred, labels=labels)  # 显式指定标签顺序
# 创建混淆矩阵的可视化对象
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=labels)
# 生成混淆矩阵的图像, 可设置图形颜色
disp.plot(cmap=plt.cm.Blues)
# 旋转x轴标签,提高可读性
plt.xticks(rotation=45) # x轴旋转45°
plt.yticks(rotation=0)  # y轴水平
plt.tight_layout()  # 自动调整布局
plt.savefig('confusion_matrix.png')


# 获取每个类别的precision/recall/f1/support
precision, recall, f1, support = precision_recall_fscore_support(y_true, y_pred, labels=labels, zero_division=0)
# 初始化字典存储 TP、FP、FN
results = {label: {'TP': 0, 'FP': 0, 'FN': 0} for label in labels}
# 通过 precision, recall 和 support 反推出 TP、FP 和 FN
for i, label in enumerate(labels):
    TP = int(support[i] * recall[i])  # recall = TP / (TP + FN)
    FN = support[i] - TP             # FN = 支持数 - TP
    FP = int(TP / precision[i]) - TP if precision[i] > 0 else 0  # precision = TP / (TP + FP)
    results[label]['TP'] = TP
    results[label]['FP'] = FP
    results[label]['FN'] = FN
    results[label]['support'] = support[i]

# 输出结果
ALL_TP = 0
ALL_SAMPLES = 0
for label in labels:
    ALL_TP += results[label]['TP']
    ALL_SAMPLES += results[label]['support']

micro_P = ALL_TP/(ALL_SAMPLES-NULL_SAMPLES) if ALL_SAMPLES-NULL_SAMPLES > 0 else 0
micro_R = ALL_TP/ALL_SAMPLES if ALL_SAMPLES > 0 else 0
micro_F1 = (2 * micro_P * micro_R) / (micro_P + micro_R) if (micro_P + micro_R) > 0 else 0

# 将结果保存到文本文件
with open('results.txt', 'w') as f:
    f.write("----------------------------- 方法1-手动计算 -----------------------------\n")
    f.write("----------------------------- micro(微平均)-----------------------------\n")
    f.write(f"micro_P: {micro_P * 100:.2f}%\n")
    f.write(f"micro_R: {micro_R * 100:.2f}%\n")
    f.write(f"micro_F1: {micro_F1 * 100:.2f}%\n\n")

    f.write(f"micro_P: {micro_P :.2f}\n")
    f.write(f"micro_R: {micro_R :.2f}\n")
    f.write(f"micro_F1: {micro_F1 :.2f}\n\n")

    precision_score_average_Single = precision_score(y_true, y_pred, labels=labels, average=None, zero_division=0)
    recall_score_average_Single = recall_score(y_true, y_pred, labels=labels, average=None, zero_division=0)
    f1_score_average_Single = f1_score(y_true, y_pred, labels=labels, average=None, zero_division=0)
    macro_P = np.sum(precision_score_average_Single) / RELATIONS
    macro_R = np.sum(recall_score_average_Single) / RELATIONS
    macro_F1 = np.sum(f1_score_average_Single) / RELATIONS

    f.write("----------------------------- macro(宏平均)-----------------------------\n")
    f.write(f"macro_P: {macro_P * 100:.2f}%\n")
    f.write(f"macro_R: {macro_R * 100:.2f}%\n")
    f.write(f"macro_F1: {macro_F1 * 100:.2f}%\n\n")

    f.write(f"macro_P: {macro_P :.2f}\n")
    f.write(f"macro_R: {macro_R :.2f}\n")
    f.write(f"macro_F1: {macro_F1 :.2f}\n\n")

    f.write("----------------------------- 方法2-自动计算(classification_report)-----------------------------\n")
    # 方法2
    measure_result = classification_report(y_true, y_pred, labels=[label for label in labels if label != 'NULL'], zero_division=0)   # 计算macro需要labels去掉NULL
    # print('measure_result = \n', measure_result)
    f.write(f"measure_result: {measure_result}\n")

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值