Reguler Expression
\d [0-9] Digit character
\D [^0-9] Any character except a digit
\s [\s\t\r\n\f] Whitespace character
\S [^\s\t\r\n\f] Any character except whitespace
\w [A-Za-z0-9_] Word character
\W [^A-Za-z0-9_] Any character except a word character
[:alnum:] Alphanumeric
[:alpha:] Uppercase or lowercase letter
[:blank:] Blank and tab
[:cntrl:] Control characters (at least 0x00–0x1f, 0x7f)
[:digit:] Digit
[:graph:] Printable character excluding space
[:lower:] Lowercase letter
[:print:] Any printable character (including space)
[:punct:] Printable character excluding space and alphanumeric
[:space:] Whitespace (same as \s)
[:upper:] Uppercase letter
[:xdigit:] Hex digit (0–9, a–f, A–F)
r* matches zero or more occurrences of r.
r+ matches one or more occurrences of r.
r? matches zero or one occurrence of r.
r{m,n} matches at least “m” and at most “n” occurrences
r{m,} matches at least “m” occurrences of r.
r{m} matches exactly “m” occurrences of r.
Backslash Sequences in the Substitution
Earlier we noted that the sequences \1, \2, and so on, are available in the pattern,
standing for the nth group matched so far. The same sequences are available in the
second argument of sub and gsub.
"fred:smith".sub(/(\w+):(\w+)/, '\2, \1') ! "smith, fred"
"nercpyitno".gsub(/(.)(.)/, '\2\1') ! "encryption"
Additional backslash sequences work in substitution strings: \& (last match), \+ (last
matched group), \` (string prior to match), \' (string after match), and \\ (a literal
backslash).
It gets confusing if you want to include a literal backslash in a substitution. The obvious
thing is to write
str.gsub(/\\/, '\\\\')
Clearly, this code is trying to replace each backslash in str with two. The programmer
doubled up the backslashes in the replacement text, knowing that they’d be converted
to \\ in syntax analysis. However, when the substitution occurs, the regular expression
engine performs another pass through the string, converting \\ to \, so the net effect
is to replace each single backslash with another single backslash. You need to write
gsub(/\\/, '\\\\\\\\')!
str = 'a\b\c' ! "a\b\c"
str.gsub(/\\/, '\\\\\\\\') ! "a\\b\\c"
\d [0-9] Digit character
\D [^0-9] Any character except a digit
\s [\s\t\r\n\f] Whitespace character
\S [^\s\t\r\n\f] Any character except whitespace
\w [A-Za-z0-9_] Word character
\W [^A-Za-z0-9_] Any character except a word character
[:alnum:] Alphanumeric
[:alpha:] Uppercase or lowercase letter
[:blank:] Blank and tab
[:cntrl:] Control characters (at least 0x00–0x1f, 0x7f)
[:digit:] Digit
[:graph:] Printable character excluding space
[:lower:] Lowercase letter
[:print:] Any printable character (including space)
[:punct:] Printable character excluding space and alphanumeric
[:space:] Whitespace (same as \s)
[:upper:] Uppercase letter
[:xdigit:] Hex digit (0–9, a–f, A–F)
r* matches zero or more occurrences of r.
r+ matches one or more occurrences of r.
r? matches zero or one occurrence of r.
r{m,n} matches at least “m” and at most “n” occurrences
r{m,} matches at least “m” occurrences of r.
r{m} matches exactly “m” occurrences of r.
Backslash Sequences in the Substitution
Earlier we noted that the sequences \1, \2, and so on, are available in the pattern,
standing for the nth group matched so far. The same sequences are available in the
second argument of sub and gsub.
"fred:smith".sub(/(\w+):(\w+)/, '\2, \1') ! "smith, fred"
"nercpyitno".gsub(/(.)(.)/, '\2\1') ! "encryption"
Additional backslash sequences work in substitution strings: \& (last match), \+ (last
matched group), \` (string prior to match), \' (string after match), and \\ (a literal
backslash).
It gets confusing if you want to include a literal backslash in a substitution. The obvious
thing is to write
str.gsub(/\\/, '\\\\')
Clearly, this code is trying to replace each backslash in str with two. The programmer
doubled up the backslashes in the replacement text, knowing that they’d be converted
to \\ in syntax analysis. However, when the substitution occurs, the regular expression
engine performs another pass through the string, converting \\ to \, so the net effect
is to replace each single backslash with another single backslash. You need to write
gsub(/\\/, '\\\\\\\\')!
str = 'a\b\c' ! "a\b\c"
str.gsub(/\\/, '\\\\\\\\') ! "a\\b\\c"
本文详细介绍了正则表达式的各种元字符及模式匹配语法,包括常见的d、w等预定义类别,以及如何通过重复限定符如*、+、?进行模式匹配。此外还涉及了在替换操作中如何引用捕获组。
1760

被折叠的 条评论
为什么被折叠?



