boost regex的使用

最新推荐文章于 2025-07-10 14:30:40 发布

一直在路上25

最新推荐文章于 2025-07-10 14:30:40 发布

阅读量1k

点赞数

分类专栏：程序员之路

程序员之路专栏收录该内容

286 篇文章

订阅专栏

本文深入解析正则表达式的特性和用法，包括特殊字符、字符集、重复、子模式、可选项等，并通过代码示例展示了如何使用正则表达式进行字符串匹配和搜索。同时介绍了正则表达式的错误处理和常用方法，如`regex_match`和`regex_search`，并提供了实际应用场景的代码实现。

正则表达式语法
正则表达式的特性分成很多种类，下面的内容是perl类型的Regex。
============================================================================================
特殊含义的字符
. ：任意单个字符
[] ：字符集
{} ：计数
() ：子模式
\ ：下一个字符具有特殊含义
* ：0个或多个
+ ：一个或多个
? ：0个或一个
| ：或
^ ：行的开始；否定
$ ：行的结束
===========================================================================================
字符集
\d ：一个十进制数字
\l ：一个小写字母
\s ：一个空白符（空格符，制表符等）
\u ：一个大写字母
\w ：一个字母（a～z或A～Z）或数字（0～9）或下划线（_）
\D ：除了\d之外的字符
\L ：除了\l之外的字符
\S ：除了\s之外的字符
\U ：除了\u之外的字符
\W ：除了\w之外的字符
===========================================================================================
重复
{n} ：严格重复n次
{n,} ：重复n次或更多次
{n,m}：重复至少n次，至多m次
* ：{0,}
+ ：{1,}
? ：{0,1}
===========================================================================================
子模式
为了指定模式中的子模式，用括号将其括起来
（\d*:）?（\d+）：它表示字符串前半部分可以为空，若非空，则是任意长度的数字后接一个冒号，后半部分是一个或多个数字的序列。
===========================================================================================
可选项
| 表示二选一的概念。
Subject：（FW：| Re：）：表示匹配Subject:Fw：或者是Subject:Re:
===========================================================================================
正则表达式错误
当我们将一个模式富裕regex时，它会对模式进行检查，如果发现模式不合法或过于复杂，无法用于匹配时，它会抛出一个bad_expression异常。
===========================================================================================
下面是一个常用的套路，比较稳～～～

注意一点，编译的时候要指定链接：g++ -Wall -lboost_regex test.cpp -o chen
*****************************************************************************************************
正则表达式方法
=====================================================================================================
（1） regex_match ：确定一行字符串是否和指定的正则表达式完全匹配
-----------------------------------------------------------------------
// 检查模式是否匹配
8 #include <boost/regex.hpp>
9 #include <iostream>
10 #include <string>
11
12 using namespace std;
13 using namespace boost;
14
15 int main()
16 {
17 // "\w+\s*($\w+,\d+$\s*)*"
18 regex pattern("\\w+\\s*(\$\\w+,\\d+\$\\s*)*");
19 cout << pattern << endl;
20
21 string str_1 = "chen (chen,0) (huan,1) (jiang,2)";
22 string str_2 = "chen(chen,0)(huan,1)(jiang,2)";
23 string str_3 = "chen";
24 string str_4 = "(chen,0)(huan,1)(jiang,2)";
25 string str_5 = "chen (chen,0) (huan,1)(jiang,2) chen";
26
27 vector<string> strings;
28 strings.push_back(str_1); strings.push_back(str_2);
29 strings.push_back(str_3); strings.push_back(str_4);
30 strings.push_back(str_5);
31
32 for(int n = 0 ; n < 5 ; ++n)
33 if(regex_match(strings[n], pattern))
34 cout << strings[n] << " is matched" << endl;
35
36 return 0;
37 }
结果为：
\w+\s*($\w+,\d+$\s*)*
chen (chen,0) (huan,1) (jiang,2) is matched
chen(chen,0)(huan,1)(jiang,2) is matched
chen is matched
--------------------------------------------------------------------------
// regex_match不仅验证是否匹配，而且可以从中提取出正则表达式括号对应的子串
8 #include <boost/regex.hpp>
9 #include <iostream>
10 #include <string>
11
12 using namespace std;
13 using namespace boost;
14
15 int main()
16 {
17 // "\w+\s*($\w+,\d+$\s*)*"
18 regex pattern("\\w+\\s*((\$\\w+,\\d+\$\\s*)*)");
19 cout << pattern << endl;
20
21 string str_1 = "chen (chen,0) (huan,1) (jiang,2)";
22
23 smatch mat;
24 if(regex_match(str_1, mat, pattern))
25 for(smatch::iterator iter=mat.begin() ; iter!=mat.end() ; ++iter)
26 cout << *iter <<endl;
27
28 return 0;
29 }
结果为：
\w+\s*(($\w+,\d+$\s*)*)
chen (chen,0) (huan,1) (jiang,2)
(chen,0) (huan,1) (jiang,2)
(jiang,2)
注意，这个的regex表达式和上面的不同，将后面的子串($\w+,\d+$\s*)*通过括号合并成一个完整的子串。
=====================================================================================================
（2） regex_search：regex_match是验证是否完全匹配，而regex_search是从一大串string中找出匹配的一小段字符串
---------------------------------------------------------------
15 int main()
16 {
17 regex pattern("\\d+");
18 cout << pattern << endl;
19
20 string str_1 = "chen1234huan12345jiang12 34567";
21
22 smatch mat;
23 if(regex_search(str_1, mat, pattern))
24 for(smatch::iterator iter=mat.begin() ; iter!=mat.end() ; ++iter)
25 cout << *iter <<endl;
26
27 return 0;
28 }
结果为：
1234
可以看出，regex_search是匹配到字符串中第一个符合条件的模式便会返回。
-------------------------------------------------------------
下面的方法可以将字符串中所有匹配到的模式，全部提取出来，如下：
15 int main()
16 {
17 regex pattern("\\d+");
18 cout << pattern << endl;
19
20 string str_1 = "chen1234huan12345jiang12 34567";
21 string::const_iterator start = str_1.begin();
22 string::const_iterator end = str_1.end();
23
24 smatch mat;
25 while(regex_search(start, end, mat, pattern))
26 {
27 string msg(mat[0].first, mat[0].second);
28 cout << msg << endl;
29 start = mat[0].second;
30 }
31
32 return 0;
33 }
结果是：
\d+
1234
12345
1234567
****************************************************************************************************
（3） 关于regex：：smatch类型
smatch类型，前缀s表示"子匹配"的概念。一个smatch本质上是一个子匹配的向量。第一个元素是完整匹配。如果i < smatch.size()，我们将smatch[i]当做一个字符串。对于一个正则表达式，如果最后N个子模式，则smatch.size() = N+1(因为有一个完整的匹配)。
模式中任何放在括号中的内容都可以作为一个子模式，可以看下面这个例子：

Expression: (ftp|http|https):\/\/((\w+\.)*(\w*))\/([\w\d]+\/{0,1})+ 
String: http://www.foo.com/bar 
matches[0] = http://www.foo.com/bar 
matches[1] = http 
matches[2] = www.foo.com 
matches[3] = foo. 
matches[4] = com 
matches[5] = bar