NotePad++、Scite正则表达式
(2012-09-20 10:47:53)-
因为Notepad++使用了Scintilla的正则表达式引擎,跟SciTE一样,其正则表达式选项列表可以在这里找到(不同的是,总是开启POSIX模式): http://www.scintilla.org/SciTERegEx.html
. | 匹配任意字符 |
| | 匹配表达式左边和右边的字符. 例如, "ab|bc" 匹配 "ab" 或者 "bc" |
[] | 匹配列表之中的任何单个字符. 例如, "[ab]" 匹配 "a" 或者 "b". "[0-9]" 匹配任意数字. |
[^] | 匹配列表之外的任何单个字符. 例如, "[^ab]" 匹配 "a" 和 "b" 以外的字符. "[^0-9]" 匹配任意非数字字符. |
* | 其左边的字符被匹配任意次(0次,或者多次). 例如 "be*" 匹配 "b", "be" 或者 "bee". |
+ | 其左边的字符被匹配至少一次(1次,或者多次). 例如 "be+" 匹配 "be" 或者 "bee" 但是不匹配 "b". |
? | 其左边的字符被匹配0次或者1次. 例如 "be?" 匹配 "b" 或者 "be" 但是不匹配 "bee". |
^ | 其右边的表达式被匹配在一行的开始. 例如 "^A" 仅仅匹配以 "A" 开头的行. |
$ | 其左边的表达式被匹配在一行的结尾. 例如 "e$" 仅仅匹配以 "e" 结尾的行. |
() | 影响表达式匹配的顺序,并且用作表达式的分组标记. |
\\ | \ |
\t | 制表符 |
\r | 回车 |
\n | 换行 |
\d | 数字 |
\D | 非数字 |
\s | 空白 |
\S | 非空白 |
\w | 词(字母数字下划线) |
\W | 非词 |
\1-9 | 代表前述括号内的匹配项,按顺序 |
Regular Expressions in SciTE
Purpose
Regular expressions can be used for searching for patterns rather than literals. For example, it is possible to search for variables in SciTE property files, which look like $(name.subname) with the regular expression:
\$([a-z.]+)
Replacement with regular expressions allows complex transformations with the use of tagged expressions. For example, pairs of numbers separated by a ',' could be reordered by replacing the regular expression:
\([0-9]+\),\([0-9]+\)
with:
\2,\1
Syntax
Regular expression syntax depends on a parameter: find.replace.regexp.posix
If set to 0, syntax uses the old Unix style where
If set to 1, syntax uses the more common style where
[1]
matches itself, unless it is a special character (metachar):
[2]
matches any character.
[3]
matches the character following it, except:
- \a,
\b, \f, \n, \r, \t, \v match the corresponding C escape char, respectively BEL, BS, FF, LF, CR, TAB and VT;
Note that\r and \n are never matched because in Scintilla, regular expression searches are made line per line (stripped of end-of-line chars). - if not in posix mode, when followed by a left or right round bracket (see
[8]); - when followed by a digit 1 to 9 (see
[9]); - when followed by a left or right angle bracket (see
[10]); - when followed by d, D, s, S, w or W (see
[11]); - when followed by x and two hexa digits (see
[12]);
Backslash is used as an escape character for all other meta-characters, and itself.
[4]
matches one of the characters in the set. If the first character in the set is
example | match |
[-]|] | matches these 3 chars, |
[]-|] | matches from ] to | chars |
[a-z] | any lowercase alpha |
[^-]] | any char except - and ] |
[^A-Z] | any char except uppercase alpha |
[a-zA-Z] | any alpha |
[5]
any regular expression form
[6]
same as
[5-6]
Both
[7]
same as
[8]
a regular expression in the form
[9]
a
[10]
a regular expression starting with a
[11]
a backslash followed by d, D, s, S, w or W, becomes a character class (both inside and outside sets []).
- d: decimal digits
- D: any char except decimal digits
- s: whitespace (space, \t \n \r \f \v)
- S: any char except whitespace (see above)
- w: alphanumeric & underscore (changed by user setting)
- W: any char except alphanumeric & underscore (see above)
[12]
a backslash followed by x and two hexa digits, becomes the character whose Ascii code is equal to these digits. If not followed by two digits, it is 'x' char itself.
[13]
a composite regular expression xy where x and y are in the form
[14]
a regular expression starting with a ^ character and/or ending with a $ character, restricts the pattern matching to the beginning of the line, or the end of line. [anchors] Elsewhere in the pattern, ^ and $ are treated as ordinary characters.