[Paper Summary: Modesty is the Formula for Success] Good applications for crummy machine translation_good applications for crummy mation translation-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_43928665/article/details/118781863

本文探讨了自ALPAC报告以来，机器翻译（MT）的应用和评价问题。作者强调找到MT的利基应用至关重要，而不是追求普适性。文章指出，MT的成功在于识别高回报的特定应用，例如工作站解决方案，它逐渐引入自动化工具，帮助专业译者提高效率。同时，MT也可服务于对质量要求不高的终端用户。传统评价指标往往忽视系统特定优势，而提出MT评估应关注预期用途。通过设定合理预期和经济意义，MT能在某些领域找到其价值。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Good applications for crummy machine translation

— Kenneth W. Church & Eduard H. Hovy, 1993

There is a risk that eval can devolve into mindless metrics.

The success of the eval often depends very strongly on the selection of an appropriate application.

It is wise to identify the niche application first (where strengths of machines are valued) and then we will be in a much better position to address evaluation questions, and then steer them towards high-payoff niches of functionality.

Agree with ALPAC that this basic research could not be justified in terms of short-term return on investiment. When compared with human capabilities, MT systems of the time were not deemed a success, and might never be.

Has anything changed since ALPAC?

→ Increasing commercial value

The application venue of MT has been shifted from government to industry, so as the punding providers. One must choose an application that exploits the strengths of the machine and does not compete with the strengths of human. This point is well put in the following: The question now is not whether MT is feasible, but in what domains it is most likely to be effective... The object of an evaluation is, to determine whether a system permits an adequate response to given needs and constraints. --- Lehrberger and Bourbeau, 1988

The blame is to be laid on the desire for generality

In spite of all the literature on MT, the general evaluation measures often fail to pinpoint the strengths of systems. They seem to confound important and less important aspects. Unfortunately, this failure seems to be characteristic of many of the task-independent evaluation metrics. We propose that MT eval metrics should be sensitive to the intended use of the system. And it becomes crucial to the success of an MT effort to identify high-payoff niche application so that the MT system will stand up well to the eval, even though the system might produce crummy translations. By and large 这有一点grants-oriented的嫌疑但是世界又需要这样的effort