【机器翻译指标】BLEU、ROUGE、METEOR应用到中英文

英文

英文全都可以调用huggingface的包evaluate.metric
比如 bleu:https://huggingface.co/spaces/evaluate-metric/bleu
英文分词默认是Tokenizer13a,源码在:https://github.com/huggingface/evaluate/blob/main/metrics/bleu/tokenizer_13a.py

中文

中文涉及到分词,比较麻烦。其中bleu、rouge可以自定义分词器。中文可以使用jieba分词。

bleu, rouge: 传入自定义分词器

比如

fun = evaluate.load('bleu')
result = fun.compute(predictions=predictions, references=reference, tokenizer=jieba.lcut)

tokenizer也可以自定义

class BaseTokenizer:
    """A base dummy tokenizer to derive from."""

    def signature(self):
        """
        Returns a signature for the tokenizer.
        :return: signature string
        """
        return "none"

    def __call__(self, line):
        """
        Tokenizes an input line with the tokenizer.
        :param line: a segment to tokenize
        :return: the tokenized line
        """
        return line

class TokenizerJieba(BaseTokenizer):
    def signature(self):
        """
        Returns a signature for the tokenizer.
        :return: signature string
        """
        return "jieba"

    def __call__(self, line):
        """
        Tokenizes an input line with the tokenizer.
        :param line: a segment to tokenize
        :return: the tokenized line
        """
        line = jieba.lcut(line)
        return line

然后调用为:

fun = evaluate.load('bleu')
result = fun.compute(predictions=predictions, references=reference, tokenizer=TokenizerJieba())

meteor

evaluate.metric直接调用了nltk.translate中的meteor_score函数,不能自定义分词器。
所以这里使用nltk.translate中的meteor_score函数,手动使用jieba分词,把分词好的结果传给meteor_score。
alpha=0.9, beta=3, gamma=0.5是evaluate.metric的meteor的默认参数

import jieba
from nltk.translate import meteor_score
import numpy as np

def my_eval_meteor_single(reference, prediction, alpha=0.9, beta=3, gamma=0.5):
    reference = jieba.lcut(reference)
    prediction = jieba.lcut(prediction)
    result = meteor_score.single_meteor_score(reference, prediction, alpha=alpha, beta=beta, gamma=gamma)
    return result

def my_eval_meteor_lists(references, predictions, alpha=0.9, beta=3, gamma=0.5):
    results = [my_eval_meteor_single(reference, prediction, alpha=alpha, beta=beta, gamma=gamma)
               for reference, prediction in zip(references, predictions)]
    result = np.mean(results)
    return result

def check_meteor():
    content = "早上好,你吃早饭了吗?"
    text = jieba.lcut(content)
    print(text)

    result = meteor_score.single_meteor_score(text, text, alpha=0.9, beta=3, gamma=0.5)
    print(result)

if __name__ == '__main__':
    check_meteor()
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值