Some recent changes to choice of translation
翻译选择的一些最新变化
April 25, 2025 by Eddy | Comments
2025年4月25日由Eddy发表|评论
Since 6.7 there has been a flury of activity in how Qt choses, and helps its users chose, suitable localisation and internationalisation. That flurry seems to have settled down now, in 6.9, so it's time to give a summary of what's changed and why. The story actually starts in 6.4 with a fix for QTBUG-102796, to make the ordering of entries in uiLanguages() consistent between the system locale and those based on data from the Unicode Consortium's Common Locale Data Repository (CLDR). But first, let's get …
自6.7版本以来,Qt在如何选择并帮助用户选择合适的本地化和国际化方面出现了一片混乱。现在,在6.9版本中,这种骚动似乎已经平息,所以是时候总结一下发生了什么变化以及原因了。故事实际上从6.4开始,修复了QTBUG-102796,使uiLanguage()中的条目顺序在系统区域设置和基于Unicode联盟公共区域设置数据存储库(CLDR)数据的条目之间保持一致。但首先,让我们得到…
A little context
一点背景
When an application has a choice of resources to use, to tune the application to suit the user's needs, one category of choice is known as localisation and internationalisation – or L10n and I18n, since even anglophones don't agree on how to spell them, but several languages have 12-letter words for the former and 20-letter words for the latter, in each case agreeing at the start and end. The idea is to adapt (where relevant) to what languages the user understands, what scripts they can read and what conventions they use for writing various things (for example, amounts of money or the numeric forms of dates). The first two of these are obviously enough known as language and script; the rest is assumed to depend on those and where the user lives, which is identified in terms of a territory – in most cases a country but, the world being how it is, there are complications. The combination of language, script and territory is known as a locale. In Qt, L10n and I18n are principally taken care of by QLocale and QTranslator, although there are some other places that get involved.
当应用程序可以选择使用的资源时,为了调整应用程序以满足用户的需求,一种选择被称为本地化和国际化,即L10n和I18n,因为即使是英语母语者也不同意如何拼写它们,但几种语言的前者有12个字母的单词,后者有20个字母的词,每种情况下都在开头和结尾达成一致。这个想法是(在相关的情况下)适应用户理解的语言、他们可以阅读的脚本以及他们用来写各种东西的惯例(例如,金额或日期的数字形式)。其中前两个显然被称为语言和文字;其余的则被认为取决于这些因素以及用户居住的地方,这是以一个地区来标识的&在大多数情况下是一个国家,但世界就是这样,这很复杂。语言、文字和地域的组合被称为语言环境。在Qt中,L10n和I18n主要由QLocale和QTranslator处理,尽管还有其他一些地方参与其中。
QLocale knows how to query the operating system to find what the user's settings say about their L10n preferences. It's also what applications consult to get things like lists of available L10n choices, such as the application might offer in a dialog to let the user set a situation-specific L10n, taking the place of those user settings. Either way, a QLocale instance has the information to help select suitable L10n and I18n for other features of the application. In particular, one really big part of this, which in Qt is treated as I18n, is how text written by programmers gets translated into text to be read by users. That's taken care of by QTranslator, in cooperation with Qt Linguist and related tools.
QLocale知道如何查询操作系统,以找到用户的设置对其L10n偏好的描述。这也是应用程序咨询的内容,以获取可用L10n选项的列表,例如应用程序可能在对话框中提供的,让用户设置特定情况的L10n,以代替这些用户设置。无论哪种方式,QLocale实例都有帮助为应用程序的其他功能选择合适的L10n和I18n的信息。特别是,其中一个非常重要的部分,在Qt中被视为I18n,是程序员编写的文本如何被翻译成用户阅读的文本。这是由QTranslator与Qt Linguist和相关工具合作完成的。
QTranslator gets to select among the available translations that are installed for an application, to pick one suitable for the user. Its source of truth for that is QLocale::uiLanguages() – which should really be called uiLocales(), since it returns a list of locale identifiers, not of languages. The idea is that it picks, from the available translations, one matching as early an entry in uiLanguages() as it can find.
QTranslator可以在为应用程序安装的可用翻译中进行选择,以选择适合用户的翻译。它的真实来源是QLocale::uiLanguages()——它真的应该被称为uiLocales(),因为它返回的是一个区域标识符列表,而不是语言列表。其想法是,它从可用的翻译中,尽可能早地在uiLanguages()中找到一个匹配的条目。
All the recent changes have been driven by trying to make that process more robust and reliable in the face of the diverse user configurations that may be out there. In the long term, my hope is that we can implement a QLocaleSelector (see QTBUG-112765) that can handle this more gracefully but, for now, improvements in this area have taken the form of refinements to what uiLanguages() returns and how QTranslator uses it. There are other parts of Qt that use uiLanguages() in similar ways, and we may well review how they do so, now that the dust has settled, but they don't always have the same priorities as translation. For example, text-to-speech requires selection of a locale-appropriate voice, but might not care about the script aspect of a locale.
最近的所有变化都是为了在可能存在的各种用户配置面前使这一过程更加稳健和可靠。从长远来看,我希望我们可以实现一个QLocaleElector(见QTBUG-112765),它可以更优雅地处理这个问题,但就目前而言,这一领域的改进已经采取了对uiLanguages()返回的内容以及QTranslator如何使用它的改进形式。Qt的其他部分也以类似的方式使用uiLanguage(),现在尘埃落定了,我们可以很好地回顾它们是如何做到的,但它们并不总是与翻译具有相同的优先级。例如,文本转语音需要选择一个适合本地环境的语音,但可能不关心本地环境的脚本方面。
Locale identifiers
区域标识符
The entries in the list of strings returned by uiLanguages() are identifiers for locales, made up of so-called subtags, joined together by separators (I'll be using dashes, but underscores are also commonly used). Each subtag identifies a language, script or territory; they usually appear in this order. (In general, subtags can represent other things, but Qt only recognises these three.) Thus, for example, en-Latn-US identifies English as it is spoken in the USA and written in the Latin script (by which is meant the common script of most European languages, on whose unaccented forms the US-ASCII character repertoire is based). An identifier can't have two subtags of the same kind, but it can leave out script and/or territory; and the special language und (for undefined) is used as placeholder for language when it is not specified. So en is a generic locale using the English language, and und-AU is a generic locale for Australia.
uiLanguages()返回的字符串列表中的条目是区域设置的标识符,由所谓的子标签组成,通过分隔符连接在一起(我将使用破折号,但下划线也常用)。每个子标签标识一种语言、文字或地区;它们通常按此顺序出现。(一般来说,子标签可以代表其他东西,但Qt只识别这三种东西。)因此,例如,en-Latn-US将英语标识为在美国使用并用拉丁字母书写的英语(拉丁字母是指大多数欧洲语言的通用字母,US-ASCII字符库基于其无重音形式)。一个标识符不能有两个同类的子标签,但可以省略脚本和/或区域;当未指定时,特殊语言und(表示未定义)用作语言的占位符。因此,en是使用英语的通用语言环境,und AU是澳大利亚的通用语言空间。
Aside from the system locale, QLocale gets all its data about L10n from the Unicode Consortium's Common Locale Data Repository (CLDR). This comes with a set of likely subtag rules that are used to fill in the blanks when a locale is incompletely specified. If two locale identifiers, when filled in according to these rules, give the same full form, QLocale treats them as equivalent. Since they're equivalent under likely sub-tag rules, I'll call this likely-equivalence.
除了系统区域设置之外,QLocale还从Unicode联盟的公共区域设置数据存储库(CLDR)获取有关L10n的所有数据。这附带了一组可能的子标签规则,用于在未完全指定区域设置时填补空白。如果两个区域设置标识符在根据这些规则填写时给出了相同的完整形式,QLocale会将它们视为等效的。由于它们在可能的子标签规则下是等价的,我将称之为可能的等价。
Thus, for example, the rule da ⇒ da-Latn-DK says that if all you know about the user's preference is that they speak Danish, you're most likely best off using the Latin script and the ways of using Danish that are usual in Denmark. In most cases, the territory implied by a given language is the one the language is named after, with two notable exceptions: English and Portuguese doesn't map to England and Portugal. Instead, the rules en ⇒ en-Latn-US and pt ⇒ pt-Latn-BR map them to the USA and Brazil, due to there being more speakers of those languages in their former colonies than in the land of origin. Except that there isn't actually an en ⇒ en-Latn-US rule – because this equivalence is implied by the following.
例如,规则da⇒ da Latn DK表示,如果对用户的偏好只知道他们说丹麦语,那么最有可能使用拉丁字母和丹麦常用的丹麦语。在大多数情况下,给定语言所暗示的领土是该语言的命名地,但有两个明显的例外:英语和葡萄牙语并不对应于英格兰和葡萄牙。相反,规则⇒ en-Latn美国和pt⇒ pt Latn BR将它们映射到美国和巴西,因为在它们的前殖民地,使用这些语言的人比在原产地更多。除了实际上没有en⇒ en-Latn US规则——因为以下内容暗示了这种等效性。
The final fallback of the likely subtag rules, after 790 others (at CLDR v46.1), is und ⇒ en-Latn-US. This says that if you don't know what else to do, try the USAish form of English written in the Latin script. For the incomplete locale descriptions that don't have a matching likely subtag rule, there are rules for how to pick a likely subtag rule to apply. For example, if all you know is the user is Australian, expressed as und-AU, for which there is no rule, you set aside the only thing you knew, AU, apply the und rule above and then restore the thing you knew, replacing the US territory part of en-Latn-US with AU to get en-Latn-AU. Given und-Latn-AU, the same rule would imply en-Latn-AU. Likewise, plain en gets augmented by taking its remaining subtags from the und rule, which makes it equivalent to en-Latn-US despite the lack of the overt rule, saying that, that I mentioned above.
继790条其他子标签规则(CLDR v46.1)之后,可能的子标签规则的最终回退是⇒ en-Latn US。意思是,如果不知道还能做什么,可以尝试用拉丁字母书写的美式英语。对于没有匹配的可能子标签规则的不完整区域设置描述,有一些规则可以选择一个可能的子标签规则来应用。例如,如果只知道用户是澳大利亚人,表示为und AU,对此没有规则,就把唯一知道的东西AU放在一边,应用上面的und规则,然后恢复所知道的东西,用AU替换en-Latn US的美国领土部分,得到en-Latn AU。给定und Latn AU,同样的规则也意味着en-Latn AU。同样,plain en通过从und规则中提取其剩余的子标签来增强,这使得它与en-Latn US等效,尽管没有公开的规则,正如我上面提到的那样。
The rules then allow one to take a partially-specified locale, such as en-AU, and infer the parts omitted. Conversely, given a fully-specified locale, the same rules say which parts of it one can omit and still imply the same thing. Given that en ⇒ en-Latn-US, which differs from en-AU, so we can't prune it down to en. While starting with only AU does (see above) imply en-Latn-AU, the same as en-AU implies, we have to express what we started with as und-AU, which isn't a pruning of en-AU, so en-AU is minimal.
然后,这些规则允许人们采用部分指定的区域设置,如en-AU,并推断出省略的部分。相反,给定一个完全指定的区域设置,相同的规则说明可以省略哪些部分,但仍然意味着相同的事情。鉴于en⇒ en-Latn-US与en-AU不同,因此我们不能将其简化为en-AU。虽然仅以AU开头(见上文)意味着en-Latn-AU,与en-AU的含义相同,但我们必须将开头的内容表示为und AU,这不是en-AU的修剪,因此en-AU是最小的。
How Qt uses that
Qt如何使用它
Until 6.9, QLocale::uiLanguages() starts with the identifier of the locale it's given – or, in the case of the system locale, potentially a sequence of identifiers indicating what the user has said they can understand – and expands each entry by adding some forms likely-equivalent to it. Until 5.14 (and LTS 5.12.6) that expansion was only applied to CLDR-derived entries; if the system locale gave a list, that was used without change. Initially, I'd handled the addition of likely-equivalents for the system locale via a QLocale instance constructed from each string the system gave us, forgetting that this could coerce what the user had asked for to the closest match for which Qt has CLDR-derived locale data.
在6.9之前,QLocale::uiLanguages()从给定的语言环境标识符开始,或者在系统语言环境的情况下,可能是一系列标识符,指示用户所说的他们可以理解的内容,并通过添加一些可能与之等效的形式来扩展每个条目。在5.14(和LTS 5.12.6)之前,这种扩展仅适用于CLDR派生的条目;如果系统区域设置给出了一个列表,则可以直接使用。最初,我通过一个由系统提供的每个字符串构造的QLocale实例来处理系统区域设置的可能等效项的添加,但忘记了这可能会将用户的请求强制为Qt具有CLDR派生区域设置数据的最接近匹配项。
From then to 6.4, the sort order for the system locale didn't match that of CLDR-derived locales (QTBUG-102796). Since then (aside from some quirks where the initial entry might appear earlier), the entries resulting from expansion of a single entry appear with the more specific (with more subtags) before the less specific (with fewer). In 6.5 I also fixed my mistake of sending system locale entries via QLocale (see above) and added the system locale's own identifier to the list, if the system query hadn't included it (or an equivalent).
从那时到6.4,系统区域设置的排序顺序与CLDR派生的区域设置(QTBUG-102796)的排序顺序不匹配。从那时起(除了初始条目可能出现得更早的一些怪癖之外),由单个条目扩展而来的条目在不太具体(子标签更多)的情况下出现。在6.5中,我还修复了通过QLocale发送系统区域设置条目的错误(见上文),并将系统区域设置自己的标识符添加到列表中,如果系统查询没有包含它(或等效标识符)。
In 6.7 I sorted out some complications to how QMimeType used uiLanguages() (when selecting how to describe a file-type, typically identified by the file extension, in a way the user will understand) and added a separator parameter to uiLanguages() to make that a little simpler (although I had to fix a mistake in that later). In response to what I describe below, I've been able to further simplify the QMimeType code more recently.
在6.7中,我解决了QMimeType如何使用uiLanguages()的一些复杂问题(在选择如何描述文件类型时,通常由文件扩展名标识,以一种用户会理解的方式),并在uiLanguage()中添加了一个分隔符参数,使其更简单(尽管我稍后必须修复其中的一个错误)。为了回应我下面的描述,我最近能够进一步简化QMimeType代码。
The seed of change
变革的种子
One problem with only including likely-equivalent entries is that the list for en-AU includes en-Latn-AU but not en. If an application has an en translation but neither en-AU nor en-Latn-AU, it's still fairly sensible for it to select the en it has. In the case of English, this works fine (although, as we'll see, life isn't so easy for some other locales). So QTranslator was taking each entry it gets from uiLanguages() and, after searching for a resource matching it, checking for matches to truncations of it, dropping the last subtag each time, before moving on to the next entry in uiLanguages(). That way, it found en as a truncation of en-AU and all was well – until 6.4, when I made the ordering consistent, and the system locale started delivering en-Latn-AU before en-AU.
只包括可能的等效条目的一个问题是en-AU的列表包括en-Latn-AU,但不包括en。如果一个应用程序有en翻译,但既没有en-AU也没有en-Latn-AU,那么选择它所具有的en仍然是相当明智的。就英语而言,这很好(尽管,正如我们将看到的,对于其他一些地区来说,生活并不那么容易)。因此,QTranslator从uiLanguages()中获取每个条目,在搜索到与之匹配的资源后,检查其截断的匹配情况,每次删除最后一个子标签,然后再继续处理uiLanguage()中的下一个条目。这样,它发现en是en-AU的截断,一切都很好——直到6.4,当我使排序一致时,系统区域设置开始在en-AU之前交付en-Latn-AU。
That lead to QTranslator truncating en-Latn-AU via en-Latn to en before it got to en-AU, with the result that the user who'd configured en-AU got lumbered with the en translation before the code noticed en-AU was available (and more appropriate). Which is where my story begins, with
这导致QTranslator在到达en-AU之前通过en-Latn截断en-Latn-AU到en,结果是配置en-AU的用户在代码发现en-AU可用(更合适)之前就被en翻译所困扰。这就是我的故事开始的地方
- QTBUG-121418: QTranslator loads zh instead of zh_TW translation
- QTBUG-121418:QTranslator加载zh而不是zh_TW翻译
- QTBUG-124898: (The en-AU case above, somewhat disguised).
- QTBUG-124898:(上面的en-AU案例,有点伪装)。
Technically, the problem was there previously: the change in 6.4 just made it more visible. If a user had configured en-AU, en-GB as their system configuration, this would previously have expanded to en-AU, en-Latn-AU, en-GB, en-Latn-GB and, in the presence of only en-GB and en, they'd have been landed with the latter (which isn't even equivalent to any of their given choices, as it's en-Latn-US) despite the former being an exact match for one of their choices. But now that we could see the bug, we set out to fix it.
从技术上讲,问题以前就存在:6.4中的变化只是让它变得更加明显。如果用户将en-AU、en-GB配置为他们的系统配置,那么这之前会扩展到en-AU、en-Latn-AU、en-GB、en-Latn-GB,并且在只有en-GB和en的情况下,他们会使用后者着陆(这甚至不等于他们的任何给定选择,因为它是en-Latn-US),尽管前者与他们的选择完全匹配。但现在我们可以看到这个bug了,我们开始修复它。
Recent events
近期事件
An initial attempt to fix QTBUG-124898 was to have QTranslator – instead of truncating each entry in the uiLanguages() list as it iterated that entry – actually build the expanded list, with all truncations inserted into it and sort its entries by specificity failed to take account what happens when uiLanguages() starts with more than one entry, to expand on with likely-equivalent companions. For example, let's see what happens for the case I considered in the last paragraph:
修复QTBUG-124898的最初尝试是使用QTranslator,而不是在迭代uiLanguages()列表中的每个条目时截断该条目,而是实际构建扩展列表,将所有截断插入其中并按特定性对其条目进行排序,但未能考虑到当uiLanguages。例如,让我们看看我在上一段中考虑的情况会发生什么:
- uiLanguages() starts with en-AU, en-GB and expands it to en-Latn-AU, en-AU, en-Latn-GB, en-GB; then QTranslator
- uilanguages()以en-AU、en-GB开始,并将其扩展为en-Latn-AU、en-AU、en-Latn-GB、en-GB;然后Qtranslator
- Adds truncations to it: en-Latn-AU, en-Latn, en, en-AU, en, en-Latn-GB, en-Latn, en, en-GB, en and
- 添加截断:en-Latn-Au、en-Latn、en、en-Au、en、en-Latn-GB、en-Latn、en、en-GB、en和
- Sorts by specificity: en-Latn-AU, en-Latn-GB, en-Latn, en-AU, en-Latn, en-GB, en (I've eliminated duplicates, just for clarity).
- 按特异性排序:en-Latn-AU,en-Latn-GB,en-Latn,en-AU,en-Latn,en-GB,en(我已经消除了重复项,只是为了清晰起见)。
Notice that this has put en-Latn-GB before en-AU, reversing the order of the entries they came from in the list we started with. That works out worse when there's a mix of languages.
请注意,这将en-Latn GB放在en-AU之前,颠倒了它们在我们开始的列表中的条目顺序。当语言混合时,情况会更糟。
In particular, QTBUG-129434 had a mix that included English and Traditional Chinese, zh-Hant. Since plain zh is likely-equivalent to zh-Hans, Simplified Chinese, an actual zh-Hant translation has this more specific form for its translation's name, where the the app's translators hadn't needed to distinguish the various forms of English so just used plain en for it. It thus wasn't found because en was now later in the list, even though the English entry in the system configuration was earlier. None of the versions of English before zh-Hant matched an available translation file, so zh-Hant was picked. Thankfully this was found before the mistake could be released and was duly fixed by a timely revert.
特别是,QTBUG-129434包含英语和繁体中文zh-Hant。由于plain zh可能相当于zh-Hans(简体中文),因此实际的zh-Hant翻译对其翻译名称有更具体的形式,应用程序的翻译人员不需要区分各种形式的英语,所以只使用plain en。因此没有找到它,因为en现在在列表中的后面,尽管系统配置中的英语条目更早。zh-Hant之前的英语版本都与可用的翻译文件不匹配,因此选择了zh-Hant。值得庆幸的是,在错误被释放之前就发现了这一点,并通过及时回复得到了及时修复。
At this point I got to study the problem and concluded that the real problem is that QTranslator doesn't know about likely-equivalence, so isn't in a position to understand the ordering of uiLanguages(). While some results of truncation shall be likely-equivalent, others shall not (for example, en-AU isn't equivalent to its truncation en, since this is equivalent to en-Latn-US). Since truncation has to be done at some point, the answer is for it to be done by the part of the system that actually does understand likely-equivalence, namely QLocale. It also became clear that we needed to be more careful to include all likely-equivalents of a given entry alongside it. Previously, it just ensured the final list contained the result of filling in all likely subtags and the minimal likely-equivalent; this meant, for example, that a user configuring just plain en (which is a minimal form, so didn't get that addition) got en-Latn-US added to it but didn't get en-US or en-Latn added – it now does.
在这一点上,我研究了这个问题,并得出结论,真正的问题是QTranslator不知道可能的等价性,因此无法理解uiLanguages()的顺序。虽然截断的一些结果可能是等效的,但其他结果则不等效(例如,en-AU不等于其截断en,因为这相当于en-Latn-US)。由于截断必须在某个时候完成,答案是由系统中真正理解可能等价性的部分完成,即QLocale。同样明显的是,我们需要更加小心地将给定条目的所有可能等价物放在旁边。以前,它只是确保最终列表包含填写所有可能子标签和最小可能等价物的结果;这意味着,例如,一个只配置纯en的用户(这是一个最小的形式,所以没有得到添加)添加了en-Latn-US,但没有添加en-US或en-Latn——现在它添加了。
One other thing came to light in this: the prior attempt at a fix had been done without knowledge that uiLanguages() might contain entries from quite distinct languages. This was why that attempt had failed, and wasn't explicit in its documentation, so Volker added a paragraph about that. Then we set about ensuring the truncations got added in the right place.
还有一件事浮出水面:之前的修复尝试是在不知道uiLanguages()可能包含来自不同语言的条目的情况下进行的。这就是为什么那次尝试失败了,而且在文档中没有明确说明,所以沃尔克补充了一段关于这一点的内容。然后我们开始确保截断被添加到正确的位置。
Matching script
匹配脚本
One might reasonably wonder why uiLanguages() wasn't simply including the truncations already. After all, for many languages, the minimal form is all the translators ever bother with, and it works well enough for most users of that language. We've already seen one case where it's not as simple as that, with zh-Hant being widely used, while zh ⇒ zh-Hans means that it's not likely-equivalent to its truncation. I'm not sure how mutually intelligible the simplified and traditional forms of the script are, or what proportion of traditional readers are familiar enough to cope with simplified, but this illustrates the problem: namely, that a languag may exist in several scripts. This is no problem for code selecting a voice to use for text-to-speech rendering, but it matters for written translations.
人们可能会合理地想知道为什么uiLanguages()没有简单地包括截断。毕竟,对于许多语言来说,最小形式是翻译者所关心的,它对该语言的大多数用户来说都足够好。我们已经看到一个案例,它并没有那么简单,zh-Hant被广泛使用,而zh-Hant则被广泛使用⇒ zh-Hans的意思是它不太可能等同于它的截断。我不确定简化和传统字体的相互理解程度,也不确定有多少比例的传统读者熟悉简化字体,但这说明了问题所在:即一种语言可能存在于多种字体中。对于代码选择用于文本到语音渲染的语音来说,这不是问题,但对于书面翻译来说很重要。
I don't have an exhaustive list of examples where one language is written in different scripts by different populations, much less an exhaustive knowledge of which of those cases present a concrete problem of mutual intelligibility, but one theme in the cases where it arises is that the populations using distinct scripts for the same language are, in several cases, on opposite sides of some political or cultural conflict. Consequently, giving a user a translation in the other side's script runs a risk of causing distress or offence – or even getting them into trouble, if an unenlightened boss catches them reading enemy texts – quite apart from the risk that they simply can't read it. Given that folk tend to feel particularly strongly about conflicts with those from whom they least differ, this is another good reason to take care to not inflict such problems when we can avoid it.
我没有一个详尽的例子清单,说明一种语言是由不同的人群用不同的文字写成的,更不用说详尽地了解这些情况中哪种情况存在相互理解的具体问题了,但在出现这种情况的情况下,一个主题是,在某些情况下,使用不同文字的人群在某些政治或文化冲突中处于对立的一边。因此,给用户翻译另一方的脚本可能会造成痛苦或冒犯,甚至给他们带来麻烦,如果一个无知的老板发现他们在阅读敌人的文本——除了他们根本无法阅读的风险。鉴于人们往往对与他们差异最小的人的冲突感到特别强烈,这是另一个很好的理由,在我们可以避免的情况下,要注意不要造成这样的问题。
So while we've now decided to include non-equivalent truncations uiLanguages(), we need to take care that all reasonable options that are equivalent to what the user has configured get tried before any non-equivalent truncations. After some experimentation and feedback from users who'd reported related issues, I settled on a compromise for cases where a truncation does use the same script as the entry it truncates, but isn't equivalent. For that case, I opted to include the truncation just after the last block of likely-equivalent entries of which one truncated to it.
因此,虽然我们现在决定在uiLanguages()中包含非等效截断,但我们需要注意,在任何非等效截断之前,要尝试所有与用户配置的等效的合理选项。经过一些实验和报告相关问题的用户的反馈,我为截断确实使用与截断条目相同的脚本但不等效的情况做出了妥协。对于这种情况,我选择在最后一个可能等效的条目块之后加入截断,其中一个条目被截断。
If that rule is a bit hard to understand, consider a user who's configured en-GB, en-NL, nl-NL (imagine a Brit living in the Netherlands). Adding likely-equivalents expands that to en-Latn-GB, en-GB, en-Latn-NL, en-NL, nl-Latn-NL, nl-NL, nl-Latn, nl; this includes nl-Latn and nl because they are likely-equivalent to nl-NL but leaves out en and en-Latn because they aren't likely-equivalent to en-GB or en-NL. If we stuck all non-equivalent truncations at the end, this would put en after nl so the user would get their UI in Dutch instead of English, even though they can read plain en (which is in the script they're used to) just fine. So this rule says to put these English truncations after the last block of English entries in our list, leading to en-Latn-GB, en-GB, en-Latn-NL, en-NL, en-Latn, en, nl-Latn-NL, nl-NL, nl-Latn, nl, which ensures an en translation is selected, when available, in preference to a nl one.
如果该规则有点难以理解,请考虑一个配置了en-GB、en-NL、NL-NL的用户(想象一个住在荷兰的英国人)。添加可能的等效物将其扩展到en-Latn GB、en-GB、en-Latn NL、en-NL、NL-Latn NL、NL-NL、NL-Lant、NL;这包括nl-Latn和nl,因为它们可能与nl-nl等效,但省略了en和en-Latn,因为它们不太可能与en-GB或en-nl等效。如果我们在末尾加上所有不等效的截断,这将把en放在nl之后,这样用户就可以用荷兰语而不是英语获得他们的UI,即使他们可以很好地阅读纯en(在他们习惯的脚本中)。因此,该规则要求将这些英语截断放在列表中最后一块英语条目之后,从而得到en-Latn GB、en-GB、en-Latn NL、en-NL、en-Latn、en、NL-Latn L、NL-NL、NL-Latn、NL,这确保了在可用时选择en-翻译而不是NL翻译。
In contrast, the Punjabi language is written in the Arabic script in Pakistan but in Gurmukhi in India. A Punjabi from an Arabic-writing background might not know the Gurmukhi script at all (and vice versa). If such a user lives in England they might well have a system configuration selecting pa-PK, en-GB. Adding likely-equivalents then expands this to pa-Arab-PK, pa-PK, pa-Arab, en-Latn-GB, en-GB. Since pa-Arab is likely-equivalent to pa-PK, it is included – but the likely subtag rule pa ⇒ pa-Guru-IN makes plain pa distinct, so it is left out. Furthermore, since the script implied by pa is Guru, not matching the Arab implied by pa-PK, it gets shunted to the end of the list when we're adding truncations. In contrast, as before, en-Latn and en (though not likely-equivalent to it) do match the script implied by en-GB, so are still added to the end of its block, resulting in pa-Arab-PK, pa-PK, pa-Arab, en-Latn-GB, en-GB, en-Latn, en, pa. If there's no pa-PK or equivalent translation, but there are pa and en translations, this gets the user en, which we're sure they can read (as its script matches what they asked for), in preference to pa, even though that's their preferred language, because the available translation for it is in a script they may be entirely unable to read.
相比之下,旁遮普语在巴基斯坦是用阿拉伯语书写的,但在印度是用Gurmukhi书写的。一个有阿拉伯语写作背景的旁遮普人可能根本不知道古尔穆基文字(反之亦然)。如果这样的用户住在英国,他们很可能有一个系统配置,选择pa PK,en GB。添加可能的等价物,然后将其扩展到pa Arab PK、pa PK、pa Arab、en-Latn GB、en-GB。由于pa Arab可能相当于pa PK,因此它被包括在内,但可能的子标签规则是pa⇒ pa Guru IN使简单的pa与众不同,所以它被省略了。此外,由于pa所暗示的脚本是Guru,与pa PK所暗示的阿拉伯脚本不匹配,因此当我们添加截断时,它会被分流到列表的末尾。与此相反,如前所述,en-Latn和en(尽管不太可能等同于它)确实与en-GB所暗示的脚本相匹配,因此仍被添加到其块的末尾,导致pa Arab PK、pa PK、pa Arab、en-Latn GB、en-GB、en-Latn、en、pa。如果没有pa PK或等效翻译,但有pa和en翻译,这会让用户en优先于pa,我们相信他们可以阅读(因为它的脚本与他们要求的相匹配),即使这是他们的首选语言,因为它的可用翻译是在他们可能完全无法阅读的脚本中。
Getting it right
正确理解
So now we knew what we wanted, I just had to adapt the code to actually do that. This turned out to be quite tricky, but judiciously writing test-cases helped navigate to a final working solution, while making the code as straightforward as all these complications permit. (In fact, writing this prompted me to check one case I'd forgotten and thereby find a bug that I've now fixed in the course of writing this.) I hadn't, in any case, spotted all the details discussed above until I got to see how well the first few changes worked out and discuss the behaviour with others.
所以现在我们知道我们想要什么了,我只需要调整代码来真正做到这一点。事实证明,这相当棘手,但明智地编写测试用例有助于找到最终的工作解决方案,同时使代码尽可能简单明了。(事实上,写这篇文章促使我检查了一个我忘记的案例,从而发现了一个在写这篇论文的过程中已经修复的错误。)无论如何,在我看到前几项更改的效果并与他人讨论行为之前,我并没有发现上面讨论的所有细节。
The primary change was to add truncated entries to uiLanguages(). That let us play with the result, discover quirks and corner-cases and work out what to do differently. That went to 6.9 and simplified QTranslator, while 6.8 got a reworking of its QTranslator code to do roughly the same thing. I then did some sorting out of fine details in (what has since become) 6.9.
主要更改是将截断的条目添加到uiLanguages()中。这让我们可以利用结果,发现怪癖和极端情况,并找出不同的做法。6.9版本简化了QTranslator,而6.8版本则对QTranslater代码进行了重新设计,以实现大致相同的功能。然后,我在(后来成为)6.9中整理了一些细节。
The addition of truncated entries left uiLanguages() somewhat complicated so I reworked it to be a bit more straightforward. Adding some more test-cases then let me (finally) close QTBUG-121418.
添加截断的条目使uiLanguages()有点复杂,所以我将其修改得更简单一些。添加更多测试用例,然后让我(最后)关闭QTBUG-121418。
At this point I recognised the need to be more systematic about adding equivalent entries. That let me understand the ordering better and adapt the ordering to put each same-script truncation at the end of the last block of equivalents that gave rise to it (albeit with the mistake I mentioned above, whose fix I'm now seeing get integrated). After that I saw how to make the insertion of equivalents more systematic.
此时,我意识到需要更系统地添加等效条目。这让我更好地理解了顺序,并调整了顺序,将每个相同的脚本截断放在导致它的最后一个等效块的末尾(尽管有我上面提到的错误,我现在看到它的修复被集成了)。在那之后,我看到了如何使等价物的插入更加系统化。
So where does that leave us ?
那么,这给我们留下了什么 ?
Hopefully, as ever with Qt, everything should Just Work – as well as it did before, and maybe a bit better. You may, however, be able to simplify code using QLocale::uiLanguages(), while also making it work faster and better, if you were previously working round any of these complications:
希望和Qt一样,一切都能正常工作——就像以前一样,也许会好一点。但是,如果以前处理过以下任何复杂问题,则可以使用QLocale::uiLanguages()简化代码,同时使其工作得更快更好:
- If you've got any code that truncates entries from uiLanguages() for similar reasons to why QTranslator used to, you no longer (from 6.9; and mostly 6.8.3, too) need to do that.
- 如果有任何代码截断uiLanguages()中的条目,原因与QTranslator过去的原因类似,那么不再需要这样做(从6.9开始;大多数情况下也是6.8.3)。
- If your code checks for a resource matching the name of the locale whose uiLanguages() you're also scanning for matches, you should now (since 6.5) be fine just checking for matches in uiLanguages(), as the locale's name should be in there, too (along with some likely-equivalents and truncated forms).
- 如果代码检查了与在扫描其uiLanguages()匹配的区域设置名称匹配的资源,那么现在(自6.5以来)应该可以在uiLanguage()中检查匹配项,因为区域设置的名称也应该在那里(以及一些可能的等价物和截断形式)。
- Or, really, if you'd found it necessary to kludge something using uiLanguages() to avoid odd corner cases where it didn't reliably Do The Right Thing™, try dropping the kludges and seeing whether it now Does The Right Thing after all. I'd love to hear stories of that, if you have any to share, whether you end up having to keep the kludges or are finally glad you can get rid of them.
- 或者,真的,如果发现有必要使用uiLanguages()来拼凑一些东西,以避免它不能可靠地做正确的事情™的奇怪情况,试着放下这些拼凑,看看它现在是否真的做了正确的事情。我很想听听这样的故事,如果有什么要分享的话,不管最终是不得不留下这些烂摊子,还是最终很高兴能摆脱它们。
Thanks to all the good folks who contributed by telling us what was wrong before, that I hope we've now sorted out fully – and, as ever, if you find behaviour that looks wrong, or can think of ways Qt might behave better, feel free to let us know through any of the usual channels, to help us make Qt better with every release.
感谢所有通过告诉我们之前的错误而做出贡献的好人,我希望我们现在已经完全解决了这个问题——而且,一如既往,如果发现行为看起来是错误的,或者能想到Qt可能表现得更好的方法,请随时通过任何常用渠道告诉我们,以帮助我们在每个版本中都让Qt变得更好。
6260

被折叠的 条评论
为什么被折叠?



