以下各节介绍 .NET Framework 正则表达式类。
Regex
类表示不可变(只读)正则表达式类。它还包含各种静态方法,允许在不显式创建其他类的实例的情况下使用其他正则表达式类。
以下代码示例创建了 Regex 类的实例并在初始化对象时定义一个简单的正则表达式。请注意,使用了附加的反斜杠作为转义字符,它将 \s
匹配字符类中的反斜杠指定为原义字符。
[Visual Basic]
' Declare object variable of type Regex.
Dim r As Regex
' Create a Regex object and define its regular expression.
r = New Regex("\s2000")
[C#]
// Declare object variable of type Regex.
Regex r;
// Create a Regex object and define its regular expression.
r = new Regex("\\s2000");
Match
类表示正则表达式匹配操作的结果。以下示例使用 Regex 类的 Match 方法返回 Match 类型的对象,以便找到输入字符串中的第一个匹配项。此示例使用 Match 类的 属性来指示是否已找到匹配。
[Visual Basic]
' cCreate a new Regex object.
Dim r As New Regex("abc")
' Find a single match in the input string.
Dim m As Match = r.Match("123abc456")
If m.Success Then
' Print out the character position where a match was found.
' (Character position 3 in this case.)
Console.WriteLine("Found match at position " & m.Index.ToString())
End If
[C#]
// Create a new Regex object.
Regex r = new Regex("abc");
// Find a single match in the string.
Match m = r.Match("123abc456");
if (m.Success)
{
// Print out the character position where a match was found.
// (Character position 3 in this case.)
Console.WriteLine("Found match at position " + m.Index);
}
MatchCollection
类表示成功的非重叠匹配的序列。该集合为不可变(只读)的,并且没有公共构造函数。MatchCollection 的实例是由 Regex.Matches 属性返回的。
以下示例使用 Regex 类的 方法,通过在输入字符串中找到的所有匹配填充 MatchCollection。该示例将此集合复制到一个字符串数组和一个整数数组中,其中字符串数组用以保存每个匹配项,整数数组用以指示每个匹配项的位置。
[Visual Basic]
Dim mc As MatchCollection
Dim results(20) As String
Dim matchposition(20) As Integer
' Create a new Regex object and define the regular expression.
Dim r As New Regex("abc")
' Use the Matches method to find all matches in the input string.
mc = r.Matches("123abc4abcd")
' Loop through the match collection to retrieve all
' matches and positions.
Dim i As Integer
For i = 0 To mc.Count - 1
' Add the match string to the string array.
results(i) = mc(i).Value
' Record the character position where the match was found.
matchposition(i) = mc(i).Index
Next i
[C#]
MatchCollection mc;
String[] results = new String[20];
int[] matchposition = new int[20];
// Create a new Regex object and define the regular expression.
Regex r = new Regex("abc");
// Use the Matches method to find all matches in the input string.
mc = r.Matches("123abc4abcd");
// Loop through the match collection to retrieve all
// matches and positions.
for (int i = 0; i < mc.Count; i++)
{
// Add the match string to the string array.
results[i] = mc[i].Value;
// Record the character position where the match was found.
matchposition[i] = mc[i].Index;
}
GroupCollection
类表示捕获的组的集合并返回单个匹配中捕获的组的集合。该集合为不可变(只读)的,并且没有公共构造函数。GroupCollection 的实例在 Match.Groups 属性返回的集合中返回。
以下控制台应用程序示例查找并输出由正则表达式捕获的组的数目。有关如何提取组集合的每一成员中的各个捕获项的示例,请参见下面一节的 Capture Collection 示例。
[Visual Basic]
Imports System
Imports System.Text.RegularExpressions
Public Class RegexTest
Public Shared Sub RunTest()
' Define groups "abc", "ab", and "b".
Dim r As New Regex("(a(b))c")
Dim m As Match = r.Match("abdabc")
Console.WriteLine("Number of groups found = " _
& m.Groups.Count.ToString())
End Sub
Public Shared Sub Main()
RunTest()
End Sub
End Class
[C#]
using System;
using System.Text.RegularExpressions;
public class RegexTest
{
public static void RunTest()
{
// Define groups "abc", "ab", and "b".
Regex r = new Regex("(a(b))c");
Match m = r.Match("abdabc");
Console.WriteLine("Number of groups found = " + m.Groups.Count);
}
public static void Main()
{
RunTest();
}
}
该示例产生下面的输出。
[Visual Basic]
Number of groups found = 3
[C#]
Number of groups found = 3
CaptureCollection
类表示捕获的子字符串的序列,并且返回由单个捕获组执行的捕获的集合。由于,捕获组可以在单个匹配中捕获多个字符串。Captures 属性(CaptureCollection 类的对象)是作为 和 类的成员提供的,以便于对捕获的子字符串的集合的访问。
例如,如果使用正则表达式 ((a(b))c)+
(其中 + 限定符指定一个或多个匹配)从字符串“abcabcabc”中捕获匹配,则子字符串的每一匹配的 Group 的 CaptureCollection 将包含三个成员。
以下控制台应用程序示例使用正则表达式 (Abc)+
来查找字符串“XYZAbcAbcAbcXYZAbcAb”中的一个或多个匹配。该示例阐释了使用 Captures 属性来返回多组捕获的子字符串。
[Visual Basic]
Imports System
Imports System.Text.RegularExpressions
Public Class RegexTest
Public Shared Sub RunTest()
Dim counter As Integer
Dim m As Match
Dim cc As CaptureCollection
Dim gc As GroupCollection
' Look for groupings of "Abc".
Dim r As New Regex("(Abc)+")
' Define the string to search.
m = r.Match("XYZAbcAbcAbcXYZAbcAb")
gc = m.Groups
' Print the number of groups.
Console.WriteLine("Captured groups = " & gc.Count.ToString())
' Loop through each group.
Dim i, ii As Integer
For i = 0 To gc.Count - 1
cc = gc(i).Captures
counter = cc.Count
' Print number of captures in this group.
Console.WriteLine("Captures count = " & counter.ToString())
' Loop through each capture in group.
For ii = 0 To counter - 1
' Print capture and position.
Console.WriteLine(cc(ii).ToString() _
& " Starts at character " & cc(ii).Index.ToString())
Next ii
Next i
End Sub
Public Shared Sub Main()
RunTest()
End Sub
End Class
[C#]
using System;
using System.Text.RegularExpressions;
public class RegexTest
{
public static void RunTest()
{
int counter;
Match m;
CaptureCollection cc;
GroupCollection gc;
// Look for groupings of "Abc".
Regex r = new Regex("(Abc)+");
// Define the string to search.
m = r.Match("XYZAbcAbcAbcXYZAbcAb");
gc = m.Groups;
// Print the number of groups.
Console.WriteLine("Captured groups = " + gc.Count.ToString());
// Loop through each group.
for (int i=0; i < gc.Count; i++)
{
cc = gc[i].Captures;
counter = cc.Count;
// Print number of captures in this group.
Console.WriteLine("Captures count = " + counter.ToString());
// Loop through each capture in group.
for (int ii = 0; ii < counter; ii++)
{
// Print capture and position.
Console.WriteLine(cc[ii] + " Starts at character " +
cc[ii].Index);
}
}
}
public static void Main() {
RunTest();
}
}
此示例返回下面的输出结果。
[Visual Basic]
Captured groups = 2
Captures count = 1
AbcAbcAbc Starts at character 3
Captures count = 3
Abc Starts at character 3
Abc Starts at character 6
Abc Starts at character 9
[C#]
Captured groups = 2
Captures count = 1
AbcAbcAbc Starts at character 3
Captures count = 3
Abc Starts at character 3
Abc Starts at character 6
Abc Starts at character 9
Group
类表示来自单个捕获组的结果。因为 Group 可以在单个匹配中捕获零个、一个或更多的字符串(使用限定符),所以它包含 Capture 对象的集合。因为 Group 继承自 Capture,所以可以直接访问最后捕获的子字符串(Group 实例本身等价于 Captures 属性返回的集合的最后一项)。
Group 的实例是由 Match.Groups(groupnum) 属性返回的,或者在使用“(?<groupname>)”分组构造的情况下,是由 Match.Groups("groupname") 属性返回的。
以下代码示例使用嵌套的分组构造来将子字符串捕获到组中。
[Visual Basic]
Dim matchposition(20) As Integer
Dim results(20) As String
' Define substrings abc, ab, b.
Dim r As New Regex("(a(b))c")
Dim m As Match = r.Match("abdabc")
Dim i As Integer = 0
While Not (m.Groups(i).Value = "")
' Copy groups to string array.
results(i) = m.Groups(i).Value
' Record character position.
matchposition(i) = m.Groups(i).Index
i = i + 1
End While
[C#]
int[] matchposition = new int[20];
String[] results = new String[20];
// Define substrings abc, ab, b.
Regex r = new Regex("(a(b))c");
Match m = r.Match("abdabc");
for (int i = 0; m.Groups[i].Value != ""; i++)
{
// Copy groups to string array.
results[i]=m.Groups[i].Value;
// Record character position.
matchposition[i] = m.Groups[i].Index;
}
此示例返回下面的输出结果。
[Visual Basic]
results(0) = "abc" matchposition(0) = 3
results(1) = "ab" matchposition(1) = 3
results(2) = "b" matchposition(2) = 4
[C#]
results[0] = "abc" matchposition[0] = 3
results[1] = "ab" matchposition[1] = 3
results[2] = "b" matchposition[2] = 4
以下代码示例使用命名的分组构造,从包含“DATANAME:VALUE”格式的数据的字符串中捕获子字符串,正则表达式通过冒号“:”拆分数据。
[Visual Basic]
Dim r As New Regex("^(?<name>\w+):(?<value>\w+)")
Dim m As Match = r.Match("Section1:119900")
[C#]
Regex r = new Regex("^(?<name>\\w+):(?<value>\\w+)");
Match m = r.Match("Section1:119900");
此正则表达式返回下面的输出结果。
[Visual Basic]
m.Groups("name").Value = "Section1"
m.Groups("value").Value = "119900"
[C#]
m.Groups["name"].Value = "Section1"
m.Groups["value"].Value = "119900"
Capture
类包含来自单个子表达式捕获的结果。
以下示例在 Group 集合中循环,从 Group 的每一成员中提取 Capture 集合,并且将变量 posn 和 length 分别分配给找到每一字符串的初始字符串中的字符位置,以及每一字符串的长度。
[Visual Basic]
Dim r As Regex
Dim m As Match
Dim cc As CaptureCollection
Dim posn, length As Integer
r = New Regex("(abc)*")
m = r.Match("bcabcabc")
Dim i, j As Integer
i = 0
While m.Groups(i).Value <> ""
' Grab the Collection for Group(i).
cc = m.Groups(i).Captures
For j = 0 To cc.Count - 1
' Position of Capture object.
posn = cc(j).Index
' Length of Capture object.
length = cc(j).Length
Next j
i += 1
End While
[C#]
Regex r;
Match m;
CaptureCollection cc;
int posn, length;
r = new Regex("(abc)*");
m = r.Match("bcabcabc");
for (int i=0; m.Groups[i].Value != ""; i++)
{
// Capture the Collection for Group(i).
cc = m.Groups[i].Captures;
for (int j = 0; j < cc.Count; j++)
{
// Position of Capture object.
posn = cc[j].Index;
// Length of Capture object.
length = cc[j].Length;
}
}