pgsql 筛选中文字符正则,PostgreSQL：正则表达式转义函数

最新推荐文章于 2025-05-04 08:55:45 发布

我有一只小喵咭

最新推荐文章于 2025-05-04 08:55:45 发布

阅读量646

点赞数

文章标签： pgsql 筛选中文字符正则

在PostgreSQL中，为避免插入重复名称，一个过程会在现有名称后附加带计数的括号。当前的查询存在潜在问题，可能会因不正确的括号格式或包含正则字符的名称而引发错误。问题集中在如何正确使用正则表达式进行筛选，并转义可能存在的正则字符。解决方案提出了一种新的查询方法，通过`regexp_replace`和`substring`函数结合，来获取序列中的下一个数字并确保正则匹配的准确性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

To forgo reading the entire problem, my basic question is:

Is there a function in PostgreSQL to escape regular expression characters in a string?

I've probed the documentation (but perhaps not thoroughly enough) but was unable to find such a function.

Here is the full problem:

In a PostgreSQL database, I have a column with unique names in it. I also have a process which periodically inserts names into this field, and, to prevent duplicates, if it needs to enter a name that already exists, it appends a space and parentheses with a count to the end.

i.e. Name, Name (1), Name (2), Name (3), etc.

As it stands, I use the following code to find the next number to add in the series (written in plpgsql):

var_name_id := 1;

SELECT CAST(substring(a.name from E'\$(\\d+)\$$') AS int)

INTO var_last_name_id

FROM my_table.names a

WHERE a.name LIKE var_name || ' (%)'

ORDER BY CAST(substring(a.name from E'\$(\\d+)\$$') AS int) DESC

LIMIT 1;

IF var_last_name_id IS NOT NULL THEN

var_name_id = var_last_name_id + 1;

END IF;

var_new_name := var_name || ' (' || var_name_id || ')';

("var_name" contains the name I'm trying to insert)

This works for now, but the problem lies in the WHERE statement:

WHERE a.name LIKE var_name || ' (%)'

This check doesn't verify that the % in question is a number, and it doesn't account for multiple parentheses, as in something like "Name ((1))", and if either case existed a cast exception would be thrown.

The WHERE statement really needs to be something more like:

WHERE a.r1_name ~* var_name || E' \$\\d+\$'

Now we come to the real problem, which is that "var_name" could contain regular expression characters.

So, my question as stated above is: Is there a function in PostgreSQL that escapes regular expression characters in a string, so I could do something like:

WHERE a.r1_name ~* regex_escape(var_name) || E' \$\\d+\$'

Any suggestions are much appreciated, including a possible reworking of my duplicate name solution.

解决方案

how about trying something like this, substituting var_name for my hard-coded 'John Bernard':

create table my_table(name text primary key);

insert into my_table(name) values ('John Bernard'),

('John Bernard (1)'),

('John Bernard (2)'),

('John Bernard (3)');

select max(regexp_replace(substring(name, 13), ' |$|$', '', 'g')::integer+1)

from my_table

where substring(name, 1, 12)='John Bernard'

and substring(name, 13)~'^ $[1-9][0-9]*$$';

max

-----

(1 row)

one caveat: I am assuming single-user access to the database while this process is running (and so are you in your approach). If that is not the case then the max(n)+1 approach will not be a good one.