To forgo reading the entire problem, my basic question is:
Is there a function in PostgreSQL to escape regular expression characters in a string?
I've probed the documentation (but perhaps not thoroughly enough) but was unable to find such a function.
Here is the full problem:
In a PostgreSQL database, I have a column with unique names in it. I also have a process which periodically inserts names into this field, and, to prevent duplicates, if it needs to enter a name that already exists, it appends a space and parentheses with a count to the end.
i.e. Name, Name (1), Name (2), Name (3), etc.
As it stands, I use the following code to find the next number to add in the series (written in plpgsql):
var_name_id := 1;
SELECT CAST(substring(a.name from E'\\((\\d+)\\)$') AS int)
INTO var_last_name_id
FROM my_table.names a
WHERE a.name LIKE var_name || ' (%)'
ORDER BY CAST(substring(a.name from E'\\((\\d+)\\)$') AS int) DESC
LIMIT 1;
IF var_last_name_id IS NOT NULL THEN
var_name_id = var_last_name_id + 1;
END IF;
var_new_name := var_name || ' (' || var_name_id || ')';
("var_name" contains the name I'm trying to insert)
This works for now, but the problem lies in the WHERE statement:
WHERE a.name LIKE var_name || ' (%)'
This check doesn't verify that the % in question is a number, and it doesn't account for multiple parentheses, as in something like "Name ((1))", and if either case existed a cast exception would be thrown.
The WHERE statement really needs to be something more like:
WHERE a.r1_name ~* var_name || E' \\(\\d+\\)'
Now we come to the real problem, which is that "var_name" could contain regular expression characters.
So, my question as stated above is: Is there a function in PostgreSQL that escapes regular expression characters in a string, so I could do something like:
WHERE a.r1_name ~* regex_escape(var_name) || E' \\(\\d+\\)'
Any suggestions are much appreciated, including a possible reworking of my duplicate name solution.
解决方案
how about trying something like this, substituting var_name for my hard-coded 'John Bernard':
create table my_table(name text primary key);
insert into my_table(name) values ('John Bernard'),
('John Bernard (1)'),
('John Bernard (2)'),
('John Bernard (3)');
select max(regexp_replace(substring(name, 13), ' |\(|\)', '', 'g')::integer+1)
from my_table
where substring(name, 1, 12)='John Bernard'
and substring(name, 13)~'^ \([1-9][0-9]*\)$';
max
-----
4
(1 row)
one caveat: I am assuming single-user access to the database while this process is running (and so are you in your approach). If that is not the case then the max(n)+1 approach will not be a good one.
在PostgreSQL中,为避免插入重复名称,一个过程会在现有名称后附加带计数的括号。当前的查询存在潜在问题,可能会因不正确的括号格式或包含正则字符的名称而引发错误。问题集中在如何正确使用正则表达式进行筛选,并转义可能存在的正则字符。解决方案提出了一种新的查询方法,通过`regexp_replace`和`substring`函数结合,来获取序列中的下一个数字并确保正则匹配的准确性。
1424

被折叠的 条评论
为什么被折叠?



