设置PostgreSQL排序规则
PostgreSQL Collation Support
先说下什么是locale,可以理解为地区或语言环境,排序时指定locale,则针对语言特性进行排序,常用的有en和zh。比如设置为zh,则汉字使用拼音字母(或偏旁部首)顺序来进行排序。
默认情况下是不设置locale的,直接按字符UTF8编码顺序排序(兼容ASCII码)
pg collation查询:
select pg_encoding_to_char(collencoding) as encoding,collname,collcollate,collctype from pg_collation;
select datname,datcollate,datctype,pg_encoding_to_char(encoding) as encoding from pg_database;
pg排序:
SELECT name FROM
(
select unnest(array['a','A','b', 'B','哈哈','123','呵呵']) as name
) t
order by name COLLATE "en-US-x-icu"
//['123', 'a', 'A', 'b', 'B', '呵呵', '哈哈']
SELECT name FROM
(
select unnest(array['a','A','b', 'B','哈哈','123','呵呵']) as name
) t
order by name COLLATE "zh-x-icu"
// ['123', '呵呵', '哈哈','a', 'A', 'b', 'B']
SELECT name FROM
(
select unnest(array['a','A','b', 'B','哈哈','123','呵呵']) as name
) t
order by name COLLATE "C"
//['123', 'A', 'B', 'a', 'b', '呵呵', '哈哈']
js排序:
var arr =["a", "A","b", "B","123","哈哈","呵呵"];
function sort(arr, locale) {
arr.sort(function (item1, item2) {
return item1.localeCompare(item2, locale);
})
}
sort(arr, 'en');
console.log(arr);// ['123', 'a', 'A', 'b', 'B', '呵呵', '哈哈']
sort(arr, 'zh-CN');
console.log(arr); // ['123', '哈哈', '呵呵', 'a', 'A', 'b', 'B']
arr.sort();
console.log(arr);// ['123', 'A', 'B', 'a', 'b', '呵呵', '哈哈']
java排序:
list = Arrays.stream(arr).sorted((a, b) -> {
return Collator.getInstance(Locale.ENGLISH).compare(a, b);
}).collect(Collectors.toList());
System.out.println(list); //['123', 'a', 'A', 'b', 'B', '呵呵', '哈哈']
String[] arr = {"a", "A","b", "B","123","哈哈","呵呵"};
List<String> list = Arrays.stream(arr).sorted((a, b) -> {
return Collator.getInstance(Locale.CHINESE).compare(a, b);
}).collect(Collectors.toList());
System.out.println(list); //['123', 'a', 'A', 'b', 'B', '哈哈', '呵呵']
Collections.sort(list);
System.out.println(list); //['123', 'A', 'B', 'a', 'b', '呵呵', '哈哈']
doris:
CREATE TABLE `test` (
`dt` varchar(1000) NULL COMMENT "",
`name` varchar(1000) NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`dt`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`dt`) BUCKETS 3
PROPERTIES (
"replication_allocation" = "tag.location.default: 3",
"in_memory" = "false",
"storage_format" = "V2"
);
insert into test values ("1", "123"),("1", "a"),("1", "A"),("1", "b"),("1", "B"),("1", "哈哈"),("1", "呵呵");
SELECT name FROM test ORDER BY name;
//['123', 'A', 'B', 'a', 'b', '呵呵', '哈哈']
结果对比:
locale | 语言 | 结果 | 备注 |
---|---|---|---|
en | pg | ['123', 'a', 'A', 'b', 'B', '呵呵', '哈哈'] | 无差异,但是Doris不支持 |
js | ['123', 'a', 'A', 'b', 'B', '呵呵', '哈哈'] | ||
java | ['123', 'a', 'A', 'b', 'B', '呵呵', '哈哈'] | ||
zh | pg | ['123', '呵呵', '哈哈','a', 'A', 'b', 'B'] | 不同语言差异较大 |
js | ['123', '哈哈', '呵呵', 'a', 'A', 'b', 'B'] | ||
java | ['123', 'a', 'A', 'b', 'B', '哈哈', '呵呵'] | ||
- | pg | ['123', 'A', 'B', 'a', 'b', '呵呵', '哈哈'] | 无差异 |
js | ['123', 'A', 'B', 'a', 'b', '呵呵', '哈哈'] | ||
java | ['123', 'A', 'B', 'a', 'b', '呵呵', '哈哈'] | ||
doris | ['123', 'A', 'B', 'a', 'b', '呵呵', '哈哈'] |
可以看到,针对zh的排序,不同语言都是有差异的,想要使结果一致是很难做到的
en虽然在java、js、pg中的表现一致,但是Doris不支持
为了保持各个端的排序显示一致,最好的做法是排序不指定locale,直接按字符UTF8编码排序
pg可以组装sql时order by中加入COLLATE “C”,也可以再定义表结构时定义好COLLATE ,针对已经定义好的字段,可以:ALTER TABLE ${tableName} ALTER COLUMN {columnName} type varchar(64) COLLATE “C”