Regular Expressions
Regular expressions enable you to search for patterns in string data by using standardized syntax conventions. You specify a regular expression by means of the following types of characters:
-
Metacharacters, which are operators that specify search algorithms
-
Literals, which are the characters for which you are searching
***正则表达式由2部分组成,一个是元字符+需要寻找的字符
A regular expression can specify complex patterns of character sequences. For example, the following regular expression searches for the literals f
or ht
, thet
literal, the p
literal optionally followed by the s
literal, and finally the colon (:
) literal:
(f|ht)tps?:
The parentheses are metacharacters that group a series of pattern elements to a single element; the pipe symbol (|
) matches one of the alternatives in the group. The question mark (?
) is a metacharacter indicating that the preceding pattern, in this case the s
character, is optional. Thus, the preceding regular expression matches the http:
, https:
, ftp:
, and ftps:
strings.
Metacharacters:
POSIX Metacharacters in Oracle Database Regular Expressions
Table 4-2 lists the list of metacharacters supported for use in regular expressions passed to SQL regular expression functions and conditions. These metacharacters conform to the POSIX standard; any differences in behavior from the standard are noted in the "Description" column.
Table 4-2 POSIX Metacharacters in Oracle Database Regular Expressions
Syntax | Operator Name | Description | Example |
---|---|---|---|
| Any Character — Dot | Matches any character in the database character set. If the Note: In the POSIX standard, this operator matches any English character except NULL and the newline character. | The expression |
| One or More — Plus Quantifier | Matches one or more occurrences of the preceding subexpression. | The expression |
| Zero or One — Question Mark Quantifier | Matches zero or one occurrence of the preceding subexpression. | The expression |
| Zero or More — Star Quantifier | Matches zero or more occurrences of the preceding subexpression. By default, a quantifier match is greedy because it matches as many times as possible while still allowing the rest of the match to succeed. | The expression |
| Interval—Exact Count | Matches exactly | The expression |
| Interval—At Least Count | Matches at least | The expression |
| Interval—Between Count | Matches at least | The expression |
| Matching Character List | Matches any single character in the list within the brackets. The following operators are allowed within the list, but other metacharacters included are treated as literals:
A dash ( Note: In the POSIX standard, a range includes all collation elements between the start and end of the range in the linguistic definition of the current locale. Thus, ranges are linguistic rather than byte values ranges; the semantics of the range expression are independent of character set. In Oracle Database, the linguistic range is determined by the | The expression |
| Non-Matching Character List | Matches any single character not in the list within the brackets. Characters not in the non-matching character list are returned as a match. Refer to the description of the Matching Character List operator for an account of metacharacters allowed in the character list. | The expression The expression |
| Or | Matches one of the alternatives. | The expression |
| Subexpression or Grouping | Treats the expression within parentheses as a unit. The subexpression can be a string of literals or a complex expression containing operators. | The expression |
| Backreference | Matches the nth preceding subexpression, that is, whatever is grouped within parentheses, where Oracle supports the backreference expression in the regular expression pattern and the replacement string of the | The expression A backreference enables you to search for a repeated string without knowing the actual string ahead of time. For example, the expression |
| Escape Character | Treats the subsequent metacharacter in the expression as a literal. Use a backslash (\) to search for a character that is normally treated as a metacharacter. Use consecutive backslashes ( | The expression |
| Beginning of Line Anchor | Matches the beginning of a string (default). In multiline mode, it matches the beginning of any line within the source string. | The expression |
| End of Line Anchor | Matches the end of a string (default). In multiline mode, it matches the beginning of any line within the source string. | The expression |
| POSIX Character Class | Matches any character belonging to the specified POSIX character Note: In English regular expressions, range expressions often indicate a character class. For example, | The expression |
| POSIX Collating Element Operator | Specifies a collating element to use in the regular expression. The | The expression |
| POSIX Character Equivalence Class | Matches all characters that are members of the same character equivalence class in the current locale as the specified The character equivalence class must occur within a character list, so the character equivalence class is always nested within the brackets for the character list in the regular expression. Usage of character equivalents depends on how canonical rules are defined for your database locale. Refer to the Oracle Database Globalization Support Guide for more information on linguistic sorting and string searching. | The expression |
Perl-Influenced Extensions in Oracle Regular Expressions
Table 4-4 describes Perl-influenced metacharacters supported in Oracle Database regular expression functions and conditions. These metacharacters are not in the POSIX standard, but are common at least partly due to the popularity of Perl. Note that Perl character class matching is based on the locale model of the operating system, whereas Oracle Database regular expressions are based on the language-specific data of the database. In general, a regular expression involving locale data cannot be expected to produce the same results between Perl and Oracle Database.
Table 4-4 Perl-Influenced Extensions in Oracle Regular Expressions
Reg. Exp. | Matches . . . | Example |
---|---|---|
| A digit character. It is equivalent to the POSIX class | The expression |
| A non-digit character. It is equivalent to the POSIX class | The expression |
| A word character, which is defined as an alphanumeric or underscore ( | The expression |
| A non-word character. It is equivalent to the POSIX class | The expression |
| A whitespace character. It is equivalent to the POSIX class | The expression |
| A non-whitespace character. It is equivalent to the POSIX class | The expression |
| Only at the beginning of a string. In multi-line mode, that is, when embedded newline characters in a string are considered the termination of a line, | The expression |
| Only at the end of string or before a newline ending a string. In multi-line mode, that is, when embedded newline characters in a string are considered the termination of a line, | In the expression |
| Only at the end of a string. | In the expression |
| The preceding pattern element 0 or more times (non-greedy). Note that this quantifier matches the empty string whenever possible. | The expression |
| The preceding pattern element 1 or more times (non-greedy). | The expression |
| The preceding pattern element 0 or 1 time (non-greedy). Note that this quantifier matches the empty string whenever possible. | The expression |
| The preceding pattern element exactly | The expression |
| The preceding pattern element at least | The expression |
| At least | The expression |
egular expressions are a useful way to enforce integrity constraints.
For example regexp_like()
CREATE TABLE contacts ( l_name VARCHAR2(30), p_number VARCHAR2(30) CONSTRAINT p_number_format CHECK ( REGEXP_LIKE ( p_number, '^\(\d{3}\) \d{3}-\d{4}$' ) ) -----约束检查p_number列 的value是否符合表达式要求: );
表达式解说:
Regular Expression Element | Matches . . . |
---|---|
| The beginning of the string. 字符开始符号 |
| A left parenthesis. The backward slash ( |
| Exactly three digits. 3个数字 |
| A right parenthesis. The backward slash ( |
(space character) | A space character. 空格字符 |
| Exactly three digits.3个数字 |
| A hyphen.- 为连接字符 |
| Exactly four digits. 4个数字 |
| The end of the string. 字符串结束符号 |
Example 4-2 shows a SQL script that attempts to insert seven phone numbers into the contacts
table. Only the first two INSERT
statements use a format that conforms to the p_number_format
constraint; the remaining statements generate check constraint errors.
Example 4-2 insert_contacts.sql
-- first two statements use valid phone number format INSERT INTO contacts (p_number) VALUES( '(650) 555-5555' ); INSERT INTO contacts (p_number) VALUES( '(215) 555-3427' ); 插入正常 -- remaining statements generate check contraint errors INSERT INTO contacts (p_number) VALUES( '650 555-5555' ); INSERT INTO contacts (p_number) VALUES( '650 555 5555' ); INSERT INTO contacts (p_number) VALUES( '650-555-5555' ); INSERT INTO contacts (p_number) VALUES( '(650)555-5555' ); INSERT INTO contacts (p_number) VALUES( ' (650) 555-5555' );----违反约束
查询结果
For example regexp_replace()
SELECT names "names", REGEXP_REPLACE(names, '^(\S+)\s(\S+)\s(\S+)$', '\3, \1 \2') AS "names after regexp" FROM famous_people;
Table 4-7 explains the elements of the regular expression.
Table 4-7 Explanation of the Regular Expression Elements in Example 4-4
Regular Expression Element | Description |
---|---|
| Matches the beginning of the string. |
| Matches the end of the string. |
| Matches one or more non-space characters. The parentheses are not escaped so they function as a grouping expression. |
| Matches a whitespace character. |
| Substitutes the first subexpression, that is, the first group of parentheses in the matching pattern. |
| Substitutes the second subexpression, that is, the second group of parentheses in the matching pattern. |
| Substitutes the third subexpression, that is, the third group of parentheses in the matching pattern. |
| Inserts a comma character. |
Example 4-5 shows the result set of the query in Example 4-4. The regular expression matched only the first two rows.