正则表达式实战-优快云博客

Regex 类.NET Framework 4其他版本 .NET Framework 3.5.NET Framework 3.0.NET Framework 2.0Silverlight 此内容为质量更高的人工翻译。若想同时查看此页面和原始英文页面的内容，请单击“首选项”然后选择“经典视图”作为您的查看首选项。表示不可变的正则表达式。命名空间： System.Text.RegularExpressions程序集： System（在 System.dll 中）语法--------------------------------------------------------------------------------VBC#C++F#JScript复制[SerializableAttribute]public class Regex : ISerializable备注--------------------------------------------------------------------------------Regex 类表示 .NET Framework 的正则表达式引擎。它可用来快速分析大量的文本，以查找特定字符模式；提取、编辑、替换或删除文本子字符串；或将提取的字符串添加到集合中，以便生成报告。说明如果您的主要兴趣是通过确定是否符合特定模式来验证字符串，则可以使用 System.Configuration.RegexStringValidator 类。若要使用正则表达式，请使用正则表达式语言元素中记录的语法，定义要在文本流中识别的模式。接下来，您可以选择实例化 Regex 对象。最后，执行某种操作，如替换与正则表达式模式匹配的文本或标识模式匹配。Regex vs。字符串方法System.String 类包括多种搜索和比较方法，可用于执行模式与文本的匹配。例如，String.Contains、String.EndsWith 和 String.StartsWith 方法确定字符串实例是否包含指定的子字符串；String.IndexOf、String.IndexOfAny、String.LastIndexOf 和 String.LastIndexOfAny 方法返回字符串中指定的子字符串的起始位置。搜索特定字符串时，使用 System.String 类的方法。搜索字符串中的特定模式时，使用 Regex 类。有关更多信息和示例，请参见.NET Framework 正则表达式。静态与实例方法定义正则表达式模式之后，可以使用以下两种方式之一将其提供给正则表达式引擎。实例化表示正则表达式的 Regex 对象。若要执行此操作，应将正则表达式模式传递给 Regex 构造函数。Regex 对象是不可变的；当您使用正则表达式实例化 Regex 对象时，将无法更改该对象的正则表达式。向 static（在 Visual Basic 中为 Shared）Regex 方法同时提供正则表达式和要搜索的文本。这使您无需显式创建 Regex 对象即可使用正则表达式。所有 Regex 模式标识方法均同时包括静态重载和实例重载。正则表达式引擎必须编译特定的模式，然后才可以使用该模式。因为 Regex 对象不可变，这是调用 Regex 类构造函数或静态方法时发生的一次性过程。为了避免重复编译单个正则表达式，正则表达式引擎将缓存在静态方法调用中所使用的已编译正则表达式。因此，正则表达式模式匹配方法为静态方法和实例方法提供了同等的性能。重要事项在 .NET Framework 版本 1.0 和 1.1 中，所有已编译的正则表达式都会被缓存，而不论它们是在实例中使用还是静态方法调用。从 .NET Framework 2.0 开始，只有静态方法调用中使用的正则表达式才会被缓存。但是，由正则表达式引擎实现的缓存系统在以下两种情况下可能对性能产生不利影响：当使用大量的正则表达式进行静态方法调用时。默认情况下，正则表达式引擎将缓存 15 个最近使用的静态正则表达式。如果应用程序使用的静态正则表达式超过 15 个，则必须重新编译某些正则表达式。为了防止执行此类重新编译，您可以将 Regex.CacheSize 属性增加到适当的值。当应用程序使用先前已编译的正则表达式实例化新的 Regex 对象时。例如，下面的代码定义一个正则表达式，以定位某个文本流的各个行中重复的单词。虽然本示例使用一个正则表达式，但它将实例化一个新的 Regex 对象来处理每行文本。这将导致在每次循环迭代时都重新编译此正则表达式。VBC#C++F#JScript复制StreamReader sr = new StreamReader(filename);string input;string pattern = @"/b(/w+)/s/1/b";while (sr.Peek() >= 0){ input = sr.ReadLine(); Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase); MatchCollection matches = rgx.Matches(input); if (matches.Count > 0) { Console.WriteLine("{0} ({1} matches):", input, matches.Count); foreach (Match match in matches) Console.WriteLine(" " + match.Value); }}sr.Close(); 若要防止重新编译，此应用程序应实例化一个 Regex 对象，该对象供需要它的所有代码访问，如以下重写示例所示。VBC#C++F#JScript复制StreamReader sr = new StreamReader(filename);string input;string pattern = @"/b(/w+)/s/1/b";Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);while (sr.Peek() >= 0){ input = sr.ReadLine(); MatchCollection matches = rgx.Matches(input); if (matches.Count > 0) { Console.WriteLine("{0} ({1} matches):", input, matches.Count); foreach (Match match in matches) Console.WriteLine(" " + match.Value); }}sr.Close(); 执行正则表达式操作无论您决定是实例化一个 Regex 对象并调用其方法，还是调用静态方法，Regex 类都将提供以下模式匹配功能：验证匹配。您可以调用 IsMatch 方法以确定是否存在匹配。检索单个匹配。您可以调用 Match 方法来检索 Match 对象，该对象表示字符串或字符串一部分中的第一个匹配项。后续匹配项可以通过调用 Match.NextMatch 方法进行检索。检索所有匹配。您可以调用 Matches 方法来检索 System.Text.RegularExpressions.MatchCollection 对象，该对象表示在字符串或字符串一部分中找到的所有匹配项。替换匹配的文本。您可以调用 Replace 方法来替换匹配的文本。此替换文本还可通过正则表达式来定义。此外，某些 Replace 方法包括一个 MatchEvaluator 参数，该参数使您能够以编程方式定义替换文本。创建字符串数组，该数组是由输入字符串的各个部分构成。您可以调用 Split 方法，在正则表达式定义的位置拆分输入字符串。除了其匹配模式方法之外，Regex 类还包括几种特殊用途的方法：Escape 方法可以对任何在正则表达式或输入字符串中可能被解释为正则表达式运算符的字符进行转义。Unescape 方法移除这些转义字符。CompileToAssembly 方法创建一个包含预定义正则表达式的程序集。.NET Framework 在 System.Web.RegularExpressions 命名空间中包含这些特殊用途的程序集的示例。示例--------------------------------------------------------------------------------下面的示例使用正则表达式检查字符串中重复出现的词。正则表达式 /b(?/w+)/s+(/k)/b 可按下表中的方式解释。模式 Description /b 从单词边界开始匹配。 (?/w+) 匹配一个或多个单词字符（最多可到单词边界）。将此捕获组命名为 word。 /s+ 匹配一个或多个空白字符。 (/k) 匹配名为 word 的捕获组。 /b 与字边界匹配。 VBC#C++F#JScript复制using System;using System.Text.RegularExpressions;public class Test{ public static void Main () { // Define a regular expression for repeated words. Regex rx = new Regex(@"/b(?/w+)/s+(/k)/b", RegexOptions.Compiled | RegexOptions.IgnoreCase); // Define a test string. string text = "The the quick brown fox fox jumped over the lazy dog dog."; // Find matches. MatchCollection matches = rx.Matches(text); // Report the number of matches found. Console.WriteLine("{0} matches found in:/n {1}", matches.Count, text); // Report on each match. foreach (Match match in matches) { GroupCollection groups = match.Groups; Console.WriteLine("'{0}' repeated at positions {1} and {2}", groups["word"].Value, groups[0].Index, groups[1].Index); } }}// The example produces the following output to the console:// 3 matches found in:// The the quick brown fox fox jumped over the lazy dog dog.// 'The' repeated at positions 0 and 4// 'fox' repeated at positions 20 and 25// 'dog' repeated at positions 50 and 54下面的示例演示如何使用正则表达式来检查字符串是表示货币值还是具有表示货币值的正确格式。在这种情况下，将从用户的当前区域性的 NumberFormatInfo.CurrencyDecimalSeparator、CurrencyDecimalDigits、NumberFormatInfo.CurrencySymbol、NumberFormatInfo.NegativeSign 和 NumberFormatInfo.PositiveSign 属性中动态生成正则表达式。如果系统的当前区域性为 en-US，导致的正则表达式将是 ^/w*[/+-]?/w?/$?/w?(/d*/.?/d{2}?){1}$.此正则表达式可按下表中所示进行解释。模式 Description ^ 在字符串的开头处开始。 /w* 匹配零��或多个空白字符。 [/+-]? 匹配正号或负号的零个或一个匹配项。 /w? 匹配零个或一个空白字符。 /$? 匹配美元符号的零个或一个匹配项。 /w? 匹配零个或一个空白字符。 /d* 匹配零个或多个十进制数字。 /.? 匹配零个或一个小数点符号。 /d{2}? 匹配两位十进制数零次或一次。 (/d*/.?/d{2}?){1} 至少匹配一次由小数点符号分隔整数和小数的模式。 $ 匹配字符串的末尾部分。在这种情况下，正则表达式假定有效货币字符串不包括组分隔符，并且此字符串既没有小数数字，也没有由当前区域性的 CurrencyDecimalDigits 属性定义的小数位数。VBC#C++F#JScript复制using System;using System.Globalization;using System.Text.RegularExpressions;public class Example{ public static void Main() { // Get the current NumberFormatInfo object to build the regular // expression pattern dynamically. NumberFormatInfo nfi = NumberFormatInfo.CurrentInfo; // Define the regular expression pattern. string pattern; pattern = @"^/w*["; // Get the positive and negative sign symbols. pattern += Regex.Escape(nfi.PositiveSign + nfi.NegativeSign) + @"]?/w?"; // Get the currency symbol. pattern += Regex.Escape(nfi.CurrencySymbol) + @"?/w?"; // Add integral digits to the pattern. pattern += @"(/d*"; // Add the decimal separator. pattern += Regex.Escape(nfi.CurrencyDecimalSeparator) + "?"; // Add the fractional digits. pattern += @"/d{"; // Determine the number of fractional digits in currency values. pattern += nfi.CurrencyDecimalDigits.ToString() + "}?){1}$"; Regex rgx = new Regex(pattern); // Define some test strings. string[] tests = { "-42", "19.99", "0.001", "100 USD", ".34", "0.34", "1,052.21", "$10.62", "+1.43", "-$0.23" }; // Check each test string against the regular expression. foreach (string test in tests) { if (rgx.IsMatch(test)) Console.WriteLine("{0} is a currency value.", test); else Console.WriteLine("{0} is not a currency value.", test); } }}// The example displays the following output:// -42 is a currency value.// 19.99 is a currency value.// 0.001 is not a currency value.// 100 USD is not a currency value.// .34 is a currency value.// 0.34 is a currency value.// 1,052.21 is not a currency value.// $10.62 is a currency value.// +1.43 is a currency value.// -$0.23 is a currency value.因为本示例中的正则表达式是动态生成的，所以在设计时我们不知道正则表达式引擎是否可能将当前区域性的货币符号、小数符号或正号及负号错误解释为正则表达式语言运算符。若要防止任何解释错误，本示例将每个动态生成的字符串传递到 Escape 方法。