事情的起因是这样的,某个用户问我为什么在GP里面查询到的'A' > 'a'和PostgreSQL中查询到的结果不一样,一个是false, 一个true.
但是这个原因其实和Greenplum还是PostgreSQL是没关系的。
原因的根源还是collate。
例如,在使用比较操作符时,可以指定需要比较的字符串的collate来查看这里的区别:
postgres=# select 'a' > 'A' collate "en_US";
?column?
----------
f
(1 row)
postgres=# select 'a' > 'A' collate "C";
?column?
----------
t
(1 row)
postgres=# select 'a' > 'A' collate "zh_CN";
?column?
----------
f
(1 row)
操作符>的源代码 :
使用C作为collate时,实际上是memcmp直接对被比较的字符串的比较,而使用非C的collate则需要COPY字符串的内存,然后使用strcoll_l或strcoll进行比较。
所以建议数据库初始化时使用 --locale=C 。
Datum
text_gt(PG_FUNCTION_ARGS)
{
text *arg1 = PG_GETARG_TEXT_PP(0);
text *arg2 = PG_GETARG_TEXT_PP(1);
bool result;
result = (text_cmp(arg1, arg2, PG_GET_COLLATION()) > 0);
PG_FREE_IF_COPY(arg1, 0);
PG_FREE_IF_COPY(arg2, 1);
PG_RETURN_BOOL(result);
}
/* text_cmp()
* Internal comparison function for text strings.
* Returns -1, 0 or 1
*/
static int
text_cmp(text *arg1, text *arg2, Oid collid)
{
char *a1p,
*a2p;
int len1,
len2;
a1p = VARDATA_ANY(arg1);
a2p = VARDATA_ANY(arg2);
len1 = VARSIZE_ANY_EXHDR(arg1);
len2 = VARSIZE_ANY_EXHDR(arg2);
return varstr_cmp(a1p, len1, a2p, len2, collid);
}
/* varstr_cmp()
* Comparison function for text strings with given lengths.
* Includes locale support, but must copy strings to temporary memory
* to allow null-termination for inputs to strcoll().
* Returns an integer less than, equal to, or greater than zero, indicating
* whether arg1 is less than, equal to, or greater than arg2.
*/
int
varstr_cmp(char *arg1, int len1, char *arg2, int len2, Oid collid)
{
int result;
/*
* Unfortunately, there is no strncoll(), so in the non-C locale case we
* have to do some memory copying. This turns out to be significantly
* slower, so we optimize the case where LC_COLLATE is C. We also try to
* optimize relatively-short strings by avoiding palloc/pfree overhead.
*/
if (lc_collate_is_c(collid))
{
result = memcmp(arg1, arg2, Min(len1, len2));
if ((result == 0) && (len1 != len2))
result = (len1 < len2) ? -1 : 1;
}
else
{
#define STACKBUFLEN 1024
char a1buf[STACKBUFLEN];
char a2buf[STACKBUFLEN];
char *a1p,
*a2p;