Db2 Create database时COLLATE USING的解释

最新推荐文章于 2024-11-27 19:05:38 发布

神棍冰

最新推荐文章于 2024-11-27 19:05:38 发布

阅读量1.1k

点赞数

分类专栏：数据库文章标签： db2

本文链接：https://blog.youkuaiyun.com/qq_42616592/article/details/103596015

版权

数据库专栏收录该内容

1 篇文章

订阅专栏

Db2数据库在创建时的COLLATE USING涉及到字符数据的排序规则，即collating sequence。它决定了字符数据的比较方式，如是否区分大小写。Unicode Collation Algorithm（UCA）用于确定排序顺序。注意，FOR BIT DATA和BLOB数据使用二进制排序。创建数据库时可以自定义collating sequence，其对字符的排序、合并操作及查询结果有直接影响。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Db2 Create database时COLLATE USING的解释

摘自IBM官网，个人渣翻译，欢迎指正

个人理解下来collating sequence是用于数据库在比较数据大小的基准表，简单点的例子就是用于判断正序排序时字符A与字符B 谁在前谁在后的一个用于比较的基准表。

The database manager compares character data using a collating sequence. This is an ordering for a set of characters that determines whether a particular character sorts higher, lower, or the same as another.
数据库实例使用collating sequence比较字符数据，这是一序列的字符用于比较单独字符的大、小、相等。

The Unicode Collation Algorithm (UCA) uses weight tables to determine the collating sequence.
Unicode排序规则算法（UCA）使用权重表确定排序顺序。

NoteCharacter string data defined with the FOR BIT DATA attribute, and BLOB data, is sorted using the binary sort sequence.
Note：使用FOR BIT DATA 参数的字符串数据、BLOB数据使用binary sort sequence进行排序。

For example, a collating sequence can be used to indicate that lowercase and uppercase versions of a particular character are to be sorted equally.
举个例子，collating sequence可以用于标志特定字符的大写、小写进行完整的排序

The database manager allows databases to be created with custom collating sequences. The following sections help you determine and implement a particular collating sequence for a database.
数据库实例支持数据库在创建时选择自定义的collating sequences，下面的章节讲帮助你判断和实施特定的collating sequence

Each single-byte character in a database is represented internally as a unique number between 0 and 255 (in hexadecimal notation, between X’00’ and X’FF’). This number is referred to as the code point of the character; the assignment of numbers to characters in a set is collectively called a code page. A collating sequence is a mapping between the code point and the desired position of each character in a sorted sequence. The numeric value of the position is called the weight of the character in the collating sequence. In the simplest collating sequence, the weights are identical to the code points. This is called the identity sequence.
每个单字节字符在数据库内部使用唯一的数据进行指代（按照16进制表示就是 X’00’ 到 X’FF’). 这个数字被称为字符的code point；一组数字到字符的集合被称为code page. collating sequence是code point 与每个字符期望的位置（desired position）的排序序列. 位置对应的数据值在collating sequence被称为字符的权重，在简单的collating equence中权重与code points相等，这被称为identity sequence

For example, suppose the characters B and b have the code points X’42’ and X’62’, respectively. If (according to the collating sequence table) they both have a sort weight of X’42’ (B), they collate the same. If the sort weight for B is X’9E’, and the sort weight for b is X’9D’, b will be sorted before B. The collating sequence table specifies the weight of each character. The table is different from a code page, which specifies the code point of each character.
举个例子字符B的code points为 X’42’ 字符b的code points为X‘62’ 如果，他们的排序权重都是X’42’ 那么他们的排序整理是相同的，如果B的排序权重是X’9E’而b的排序权重是’X9D’ 那么 b将排在B的前面，collating sequence 表详述了没个字符的权重，collating sequence表于code page不同，详述了没个字符的code point

Consider the following example. The ASCII characters A through Z are represented by X’41’ through X’5A’. To describe a collating sequence in which these characters are sorted consecutively (no intervening characters), you can write: X’41’, X’42’, … X’59’, X’5A’.
考虑之后的例子，在ASCII字符中 A到Z 被 X’41 到 X’5A’ 代表，描述这些字符的排序顺序你可以使用X’41’, X’42’, … X’59’, X’5A’

The hexadecimal value of a multibyte character is also used as the weight. For example, suppose the code points for the double-byte characters A and B are X’8260’ and X’8261’ respectively, then the collation weights for X’82’, X’60’, and X’61’ are used to sort these two characters according to their code points.
多字节字符的16进制值通常也被用作权重，举个例子，假设字符A和字符B的双字节为X’8260’ 和 X’8261’ ，collation将使用权重X’82’, X’60’和X’61’来排序字符

The weights in a collating sequence need not be unique. For example, you could give uppercase letters and their lowercase equivalents the same weight.
collating sequence中的权重信息可以不唯一，举个例子，你可以给大小写相同的权重。

Specifying a collating sequence can be simplified if the collating sequence provides weights for all 256 code points. The weight of each character can be determined using the code point of the character.
collating sequence可以被简单理解为为256个code points提供权重值，每个字符的权重值可以被指定为字符的code point

In all cases, the DB2® database uses the collation table that was specified at database creation time. If you want the multibyte characters to be sorted the way that they appear in their code point table, you must specify IDENTITY as the collating sequence when you create the database.
在所有情况下，DB2数据库在创建时指定使用哪个collation table，如果你想要对双字节字符进行排序，你必须设置collacting sequence为identity在你创建数据库的时候

Once a collating sequence is defined, all future character comparisons for that database will be performed with that collating sequence. Except for character data defined as FOR BIT DATA or BLOB data, the collating sequence will be used for all SQL comparisons and ORDER BY clauses, and also in setting up indexes and statistics.
一旦collating sequence被定义，之后数据库中所有的字符比较将使用这个collating sequence，除非像是FOR BIT DATA和BLOB这样的特殊字符数据，collating sequence将用于SQL的ORDER BY分句，同时在索引和统计信息的配置中也需要用到。

Potential problems can occur in the following cases:
下面的case中包含一些潜在的问题：

An application merges sorted data from a database with application data that was sorted using a different collating sequence.
当应用使用不同的collating sequence时，在合并应用排序的数据和数据库排序的数据时会发生问题

An application merges sorted data from one database with sorted data from another, but the databases have different collating sequences.
合并使用不同collating sequences数据库排序后的数据时会发生问题

An application makes assumptions about sorted data that are not true for the relevant collating sequence. For example, numbers collating lower than alphabetics might or might not be true for a particular collating sequence.
应用程序获得排序后的数据，而这些数据对于相关的collating sequence而言并不正确。例如，对于特定的collating sequnce，数据的权重要低于字母这个认知是不全部准确的

A final point to remember is that the results of any sort based on a direct comparison of character code points will only match query results that are ordered using an identity collating sequence.
最后一个要点是记住任何排序的结果都是基于特定collating sequence的字符code points的直接比较（查询）结果.

In a Unicode database, graphic data is sorted using the database collation mechanism. In a non-Unicode database with SYSTEM collation, the graphic data is collated based on the weight of each byte.
在Unicode数据库中，图形数据使用database collation mechanism算法排序，在非Unicode数据库中当使用SYSTEM collation参数时图形数据使用每个子节的权重来排序