Accessing Backreferences ruby-优快云博客

本文介绍了使用正则表达式进行字符串匹配的方法，包括特殊变量引用、非捕获组的应用及如何通过不同方法获取匹配子串的位置信息。

The special global variables $1, $2, and so on, can be used to reference matches:

str="a123b45c678"

if /(a\d+)(b\d+)(c\d+)/=~str

puts "Matches are: '#{$1}','#{$2}','#{$3}'"

end

Within a substitution such as sub or gsub, these variables cannot be used:

str = "a123b45c678"
str.sub(/(a\d+)(b\d+)(c\d+)/, "1st=#$1, 2nd=#$2, 3rd=#$3")
# "1st=, 2nd=, 3rd="

Why didn't this work? Because the arguments to sub are evaluated before sub is called. This code is equivalent:

str = "a123b45c678"
s2 = "1st=#$1, 2nd=#$2, 3rd=#$3"
reg = /(a\d+)(b\d+)(c\d+)/
str.sub(reg,s2)
# "1st=, 2nd=, 3rd="

This code, of course, makes it much clearer that the values $1 through $3 are unrelated to the match done inside the sub call.

In this kind of case, the special codes \1, \2, and so on, can be used:

str = "a123b45c678"
str.sub(/(a\d+)(b\d+)(c\d+)/, '1st=\1, 2nd=\2, 3rd=\3')
# "1st=a123, 2nd=b45, 3rd=c768"

Notice that we used single quotes (hard quotes) in the preceding example. If we used double quotes (soft quotes) in a straightforward way, the backslashed items would be interpreted as octal escape sequences:

str = "a123b45c678"
str.sub(/(a\d+)(b\d+)(c\d+)/, "1st=\1, 2nd=\2, 3rd=\3")
# "1st=\001, 2nd=\002, 3rd=\003"

The way around this is to double-escape:

str = "a123b45c678"
str.sub(/(a\d+)(b\d+)(c\d+)/, "1st=\\1, 2nd=\\2, 3rd=\\3")
# "1st=a123, 2nd=b45, 3rd=c678"

It's also possible to use the block form of a substitution, in which case the global variables may be used:

str = "a123b45c678"
str.sub(/(a\d+)(b\d+)(c\d+)/)  { "1st=#$1, 2nd=#$2, 3rd=#$3" }
# "1st=a123, 2nd=b45, 3rd=c678"

When using a block in this way, it is not possible to use the special backslashed numbers inside a double-quoted string (or even a single-quoted one). This is reasonable if you think about it.

As an aside here, I will mention the possibility of noncapturing groups. Sometimes you may want to regard characters as a group for purposes of crafting a regular expression; but you may not need to capture the matched value for later use. In such a case, you can use a noncapturing group, denoted by the (?:...) syntax:

str = "a123b45c678"
str.sub(/(a\d+)(?:b\d+)(c\d+)/, "1st=\\1, 2nd=\\2, 3rd=\\3")
# "1st=a123, 2nd=c678, 3rd="

In the preceding example, the second grouping was thrown away, and what was the third submatch became the second.

I personally don't like either the \1 or the $1 notations. They are convenient sometimes, but it isn't ever necessary to use them. We can do it in a "prettier," more object-oriented way.

The class method Regexp.last_match returns an object of class MatchData (as does the instance method match). This object has instance methods that enable the programmer to access backreferences.

The MatchData object is manipulated with a bracket notation as though it were an array of matches. The special element 0 contains the text of the entire matched string. Thereafter, element n refers to the nth match:

pat = /(.+[aiu])(.+[aiu])(.+[aiu])(.+[aiu])/i
# Four identical groups in this pattern
refs = pat.match("Fujiyama")
# refs is now: ["Fujiyama","Fu","ji","ya","ma"]
x = refs[1]
y = refs[2..3]
refs.to_a.each {|x| print "#{x}\n"}

Note that the object refs is not a true array. Thus when we want to treat it as one by using the iterator each, we must use to_a (as shown) to convert it to an array.

We may use more than one technique to locate a matched substring within the original string. The methods begin and end return the beginning and ending offsets of a match. (It is important to realize that the ending offset is really the index of the next character after the match.)

str = "alpha beta gamma delta epsilon"
#      0....5....0....5....0....5....
#      (for  your counting convenience)
pat = /(b[^ ]+ )(g[^ ]+ )(d[^ ]+ )/
# Three words, each one a single match
refs = pat.match(str)
# "beta "
p1 = refs.begin(1)         # 6
p2 = refs.end(1)           # 11
# "gamma "
p3 = refs.begin(2)         # 11
p4 = refs.end(2)           # 17
# "delta "
p5 = refs.begin(3)         # 17
p6 = refs.end(3)           # 23
# "beta gamma delta"
p7 = refs.begin(0)         # 6
p8 = refs.end(0)           # 23

Similarly, the offset method returns an array of two numbers, which are the beginning and ending offsets of that match. To continue the previous example:

range0 = refs.offset(0)    # [6,23]
range1 = refs.offset(1)    # [6,11]
range2 = refs.offset(2)    # [11,17]
range3 = refs.offset(3)    # [17,23]

The portions of the string before and after the matched substring can be retrieved by the instance methods pre_match and post_match, respectively. To continue the previous example:

before = refs.pre_match    # "alpha "
after  = refs.post_match   # "epsilon"

本文转自 fsjoy1983 51CTO博客，原文链接：http://blog.51cto.com/fsjoy/68556，如需转载请自行联系原作者