How Google translates without understanding

本文探讨谷歌机器翻译,其采用统计方法,通过大量文本示例和自动性能指标提升翻译效果。但该方法缺乏对语言的理解,依赖的BLEU评价指标仅衡量字词相似度,存在诸多问题,如无法识别释义和同义词、对语序混乱文本评分不合理等。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

+ More by this author

(http://www.theregister.co.uk/2007/05/15/google_translation/)

Most of the right words, in mostly the right order

Column After just a couple years of practice, Google can claim to produce the best computer-generated language translations in the world - in languages their boffin creators don't even understand.

Last summer, Google took top honors at a bake-off competition sponsored by the American agency NIST between machine-translation engines, besting IBM in English-Arabic and English-Chinese. The crazy part is that no one on the Google team even understands those languages.... the automatic-translation engines they constructed triumphed by sheer brute-force statistical extrapolation rather than "understanding".

I spoke with Franz Och, Google's enthusiastic machine-translation guru, about this unusual new approach.

Sixty years of failure

Ever since the the Second World War there have been two competing approaches to automatic translation: expert rules vs. statistical deciphering.

Expert-rule buffs have tried to automate the grammar-school approach of diagramming sentences (using modifiers, phrases, and clauses): for example, "I visited (the house next to (the park) )." But like other optimistic software efforts, the exact rules foundered on the ambiguities of real human languages. (Think not? Try explaining this sentence: "Time flies like an arrow, but fruit flies like a banana.")

The competing statistical approach began with cryptography: treat the second language as an unknown code, and use statistical cues to find a mathematical formula to decode it, like the Allies did with Hitler's famous Enigma code. While those early "decipering" efforts foundered on a lack of computing power, they have been resurrected in the "Statistical Machine Translation" approach used by Google, which eschews strict rules in favor of noticing the statistical correlations between "white house" and "casa blanca." Statistics deals with ambiguity better than rules do, it turns out.

Under Google's hood

The Google approach is a lesson in practical software development: try things and see what sticks. It has just a few major steps:

  1. Google starts with lots and lots of paired-example texts, like formal documents from the United Nations, in which identical content is expertly translated into many different languages. With these documents they can discover that "white house" tends to co-occur with "casa blanca," so that the next time they have to translate a text containing "white house" they will tend to use "casa blanca" in the output.
  2. They have even more untranslated text in each language, which lets them make models of "well-formed" sentence fragments (for example, preferring "white house" to "house white"). So the raw output from the first translation step can be further massaged into (statistically) nicer-sounding text.
  3. Their key for improving the system - and winning competitions - is an automated performance metric, which assigns a translation quality number to each translation attempt. More on this fatally weak link below.

This game needs loads of computational horsepower for learning and testing, and a software architecture which lets Google tweak code and parameters to improve upon its previous score. So given these ingredients, Google's machine-translation strategy should be familiar to any software engineer: load the statistics, translate the examples, evaluate the translations, twiddle the system parameters, and repeat.

What is clearly missing from this approach is any form of "understanding". The machine has no idea that "walk" is an action using "feet," except when its statistics tell it the text strings "walk" and "feet" sometimes show up together. Nor does it know the subtle differences between "to boycott" and "not to attend." Och emphasized that the system does not even represent nouns, verbs, modifiers, or any of the grammatical building blocks we think of as language. In fact, he says, "linguists think our structures are weird" - but he demurred on actually describing them. His machine contains only statistical correlations and relationships, no more or less than "what is in the data." Each word and phrase in the source votes for various phrases in the output, and the final result is a kind of tallying of those myriad votes.

 

Winning at chess, losing at language

This approach is much like computerized chess: make a statistical model of the domain and optimize the hell out of it, ultimately winning by sheer computational horsepower. Like chess (but unlike vision), language is a source of pride, something both complex and uniquely human. For chess, computational optimization worked brilliantly; the best chess-playing computers, like Deep Blue, are better than the best human players. But score-based optimization won't work for language in its current form, even though it does do two really important things right

The first good thing about statistical machine translation is the statistics. Human brains are statistical-inference engines, and our senses routinely make up for noisy data by interpolating and extrapolating whatever pixels or phonemes we can rely on. Statistical analysis makes better sense of more data than strict rules do, and statistical rules produce more robust outputs. So any ultimate human-quality translation engine must use statistics at its core.

The other good thing is the optimization. As I've argued earlier, the key to understanding and duplicating brain-like behavior lies in optimization, the evolutionary ratchet which lets an accumulation of small, even accidental adjustments slowly converge on a good result. Optimization doesn't need an Einstein, just the right quality metric and an army of engineers.

So Och's team (and their competitors) have the overall structure right: they converted text translation into an engineering problem, and have a software architecture allowing iterative improvement. So they can improve their Black Box - but what's inside it? Och hinted at various trendy algorithms (Discriminative Learning and Expectation Maximization, I'll bet Bayesian Inference too), although our ever-vigilant chaperon from Google Communications wouldn't let him speak in detail. But so what? The optimization architecture lets you swap out this month's algorithm for a better one, so algorithms will change as performance improves.

Or maybe not. The Achilles' Heel of optimization is that everything depends on the performance metric, which in this case clearly misses a lot. That's not a problem for winning contests - the NIST competition used the same "BLEU"(Bilingual Evaluation Understudy) metric as Google practiced on, so Google's dramatic win mostly proved that Google gamed the scoring system better than IBM did. But the worse the metric, the less likely the translations will make sense.

The gist of the problem is that because machines don't yet understand language - that's the original problem, right? - they can't be too good at automatically evaluating language translations either. So researchers have to bootstrap the BLEU score, taking a scheme like (which merely compares the similarity of two same-language documents) and verifying that on average humans prefer reading outputs with high scores. (They compare candidate translations against gold-standard human translations)

The BLEUs

But all BLEU really measures is word-by-word similarity: are the same words present in both documents, somewhere? The same word-pairs, triplets, quadruplets? In obviously extreme cases, BLEU works well; it gives a low score if the documents are completely different, and a perfect score if they're identical. But in between, it can produce some very screwy results.

The most obvious problem is that paraphrases and synonyms score zero; to get any credit with , you need to produce the exact same words as the reference translation has: "Wander" doesn't get partial credit for "stroll," nor "sofa" for "couch."

The complementary problem is that BLEU can give a high similarity score to nonsensical language which contains the right phrases in the wrong order. Consider first this typical, sensible output from a NIST contest:

"Appeared calm when he was taken to the American plane, which will to Miami, Florida"

Now here is a possible garbled output which would get the very same score:

"was being led to the calm as he was would take carry him seemed quite when taken"

The core problem is that word-counting scores like BLEU - the linchpin of the whole machine-translation competitions - don't even recognize well-formed language, much less real translated meaning. (A stinging academic critique of BLEU can be found here.)

A classic example of how the word-by-word translation approach fails comes from German, a language which is too "tough" for Och's team to translate yet (although Och himself is a native speaker). German's problem is its relative-to-English-tangled Wordorder; take this example from Mark Twain's essay "The Awful German Language":

"But when he, upon the street, the (in-satin-and-silk-covered-now-very-unconstrained-after-the-newest-fashioned-dressed) government counselor's wife met, etc"

Until computers deal with the actual language structure (the hyphens and parentheses above), they will have no hope of translating even as well as Mark Twain did here.

So why are computers so much worse at language than at chess? Chess has properties that computers like: a well-defined state and well-defined rules for play. Computers do win at chess, like at calculation, because they are so exact and fussy about rules. Language, on the other hand, needs approximation and inference to extract "meaning" (whatever that is) together from text, context, subject matter, tone, expectations, and so on - and the computer needs yet more approximation to produce a translated version of that meaning with all the right interlocking features. Unlike chess, the game of language is played on the human home-turf of multivariate inference and approximation, so we will continue to beat the machines.

But for Google's purposes, perfect translation may not even be necessary. Google succeeded in web-search partly by avoiding the exact search language of AltaVista in favor of a tool which was fast, easy to use, and displayed most of the right results in mostly the right order. Perhaps it will also be enough for Google to machine-translate most of the right words in mostly the right order, leaving to users the much harder task of extracting meaning from them.

转载于:https://www.cnblogs.com/datasci/articles/1347655.html

内容概要:本文针对火电厂参与直购交易挤占风电上网空间的问题,提出了一种风火打捆参与大用户直购交易的新模式。通过分析可再生能源配额机制下的双边博弈关系,建立了基于动态非合作博弈理论的博弈模型,以直购电价和直购电量为决策变量,实现双方收益均衡最大化。论文论证了纳什均衡的存在性,并提出了基于纳什谈判法的风-火利益分配方法。算例结果表明,该模式能够增加各方收益、促进风电消纳并提高电网灵活性。文中详细介绍了模型构建、成本计算和博弈均衡的实现过程,并通过Python代码复现了模型,包括参数定义、收益函数、纳什均衡求解、利益分配及可视化分析等功能。 适合人群:电力系统研究人员、能源政策制定者、从事电力市场交易的工程师和分析师。 使用场景及目标:①帮助理解风火打捆参与大用户直购交易的博弈机制;②为电力市场设计提供理论依据和技术支持;③评估不同政策(如可再生能源配额)对电力市场的影响;④通过代码实现和可视化工具辅助教学和研究。 其他说明:该研究不仅提供了理论分析,还通过详细的代码实现和算例验证了模型的有效性,为实际应用提供了参考。此外,论文还探讨了不同场景下的敏感性分析,如证书价格、风电比例等对市场结果的影响,进一步丰富了研究内容。
资源下载链接为: https://pan.quark.cn/s/d37d4dbee12c A:计算机视觉,作为人工智能领域的关键分支,致力于赋予计算机系统 “看懂” 世界的能力,从图像、视频等视觉数据中提取有用信息并据此决策。 其发展历程颇为漫长。早期图像处理技术为其奠基,后续逐步探索三维信息提取,与人工智能结合,又经历数学理论深化、机器学习兴起,直至当下深度学习引领浪潮。如今,图像生成和合成技术不断发展,让计算机视觉更深入人们的日常生活。 计算机视觉综合了图像处理、机器学习、模式识别和深度学习等技术。深度学习兴起后,卷积神经网络成为核心工具,能自动提炼复杂图像特征。它的工作流程,首先是图像获取,用相机等设备捕获视觉信息并数字化;接着进行预处理,通过滤波、去噪等操作提升图像质量;然后进入关键的特征提取和描述环节,提炼图像关键信息;之后利用这些信息训练模型,学习视觉模式和规律;最终用于模式识别、分类、对象检测等实际应用。 在实际应用中,计算机视觉用途极为广泛。在安防领域,能进行人脸识别、目标跟踪,保障公共安全;在自动驾驶领域,帮助车辆识别道路、行人、交通标志,实现安全行驶;在医疗领域,辅助医生分析医学影像,进行疾病诊断;在工业领域,用于产品质量检测、机器人操作引导等。 不过,计算机视觉发展也面临挑战。比如图像生成技术带来深度伪造风险,虚假图像和视频可能误导大众、扰乱秩序。为此,各界积极研究检测技术,以应对这一问题。随着技术持续进步,计算机视觉有望在更多领域发挥更大作用,进一步改变人们的生活和工作方式 。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值