C++标准库中的正则表达式操作详解-优快云博客

本文链接：https://blog.youkuaiyun.com/hanzhaoqiao1436/article/details/124913900

1、完全匹配字符串

std::regex_match

std::string str = "<first>value</second>";
std::regex reg("<.*>.*</.*>");
bool found = std::regex_match(str, reg); //true

std::string str1 = "begin<tage>value</tag>end";
found = std::regex_match(str1, reg);
std::cout << found << std::endl; //false

str能完全匹配所以结果true，str1前后多了begin end所以不能完全匹配结果false

2、查找字符串中是否包含

std::regex_search

std::string str1 = "begin<first>value</second>end";
std::regex reg("<.*>.*</.*>");
bool found = std::regex_search(str1, reg); //true

3、截取字符串中匹配的部分

std::regex_match

std::string str = "<tag>one</tag>";
bool found;
std::smatch m;
std::regex reg("<(.*)>(.*)</\\1>");
found = std::regex_match(str, m, reg);
if (found)
{
	std::cout << m.str(0) << std::endl;
	std::cout << m.str(1) << std::endl;
	std::cout << m.str(2) << std::endl;
    
    //等价于上面的输出
	//for (auto pos = m.begin(); pos != m.end(); ++pos)
	//	std::cout << *pos << std::endl;
}

/*输出
<tag>one</tag>
tag
one
*/

()括起来的表示分组的概念，\1表示指代第一个分组，要求匹配的值必须和第一分组一样。上面的表达式共有两个分组。

m.str(0)表示正则表达式匹配到的完整字符串；

m.str(1)表示取匹配到的第一分组的值

m.str(2)表示取匹配到的第二分组的值

std::regex_search

std::string str1 = "begin<tag>value</tag>end";
bool found;
std::smatch m;
std::regex reg("<(.*)>(.*)</\\1>");
found = std::regex_search(str1, m, reg);
if (found)
{
    std::cout << m.prefix().str() << std::endl;
	std::cout << m.suffix().str() << std::endl;
	for (auto pos = m.begin(); pos != m.end(); ++pos)
		std::cout << *pos << std::endl;
}

/*
begin
end
<tag>value</tag>
tag
value
*/

m.prefix().str() 输出的是匹配字符串的前面的字符串

m.suffix().str()输出的是匹配字符串的后面的字符串

4、连续截取字符串

std::regex_search

std::string str = "<tag>one</tag>";
std::string str1 = "begin <tag><one>first</one><two>second</two><three>third</three></tag> end";
bool found;
std::smatch m;
std::regex reg("<(.*)>(.*)</\\1>");
found = std::regex_search(str1, m, reg);
if (found)
{
	for (auto pos = m.begin(); pos != m.end(); ++pos)
		std::cout << *pos << std::endl;
}

/*
<tag><one>first</one><two>second</two><three>third</three></tag>
tag
<one>first</one><two>second</two><three>third</three>
*/

看起来好像不对劲，直接截取了<tag>...</tag>，这是greedy（贪婪）风格导致的。

    std::string str1 = "begin <tag><one>first</one><two>second</two><three>third</three></tag> end";
	bool found;
	std::smatch m;
	std::regex reg("<(.*)>([^<]*)</\\1>"); //调整正则表达式
	auto pos = str1.cbegin();
	auto end = str1.cend();
	while (std::regex_search(pos, end, m, reg))
	{
		//重新设置下一次搜索的起始位置
		pos = m.suffix().first;

		for (auto iter = m.begin(); iter != m.end(); ++iter)
			std::cout << *iter << std::endl;
		std::cout << std::endl;
	}


/*
<one>first</one>
one
first

<two>second</two>
two
second

<three>third</three>
three
third
*/

使用起来特别麻烦。。。

std::sregex_iterator

    std::string str1 = "begin <tag><one>first</one><two>second</two><three>third</three></tag> end";
	bool found;
	std::smatch m;
	std::regex reg("<(.*)>([^<]*)</\\1>");
	std::sregex_iterator pos(str1.begin(), str1.end(), reg);
	std::sregex_iterator end;
	for (; pos != end; ++pos)
	{
		std::cout << pos->str() << std::endl;
		std::cout << pos->str(1) << std::endl;
		std::cout << pos->str(2) << std::endl;
		std::cout << std::endl;
	}

/*
<one>first</one>
one
first

<two>second</two>
two
second

<three>third</three>
three
third
*/

std::sregex_token_iterator

    std::string str1 = "begin <tag><one>first</one><two>second</two><three>third</three></tag> end";
	bool found;
	std::smatch m;
	std::regex reg("<(.*)>([^<]*)</\\1>");
	std::sregex_token_iterator pos(str1.begin(), str1.end(), reg, {0,1,2});
	std::sregex_token_iterator end;
	for (; pos != end; ++pos)
	{
		std::cout << pos->str() << std::endl;
	}

/*
<one>first</one>
one
first
<two>second</two>
two
second
<three>third</three>
three
third
*/

{0,1,2} 0：取和正则匹配的字符串；1：取第一个分组的字符串；2：取第二分组的字符串。

如果只想拿到one two three 就传入2。

    std::string str2 = "one , two ; three :";
	bool found;
	std::smatch m;
	std::regex reg("[ \t\n]*[,;:][ \t\n]*");
	std::sregex_token_iterator pos(str2.begin(), str2.end(), reg, -1);
	std::sregex_token_iterator end;
	for (; pos != end; ++pos)
		std::cout << pos->str() << std::endl;

/*
one
two
three
*/

-1表示匹配正则表达式的我不要，我要其它没有匹配到的字符串

5、替换字符串

std::regex_replace

    std::string str2 = "one , two ; three :";
	std::regex reg("[ \t\n]*[,;:][ \t\n]*");
	std::string newstr = std::regex_replace(str2, reg, "|");
	std::cout << newstr << std::endl;

/*
one|two|three|
*/

把匹配到的字符串替换成‘|’

    std::string str1 = "begin <tag><one>first</one><two>second</two><three>third</three></tag> end";
	std::regex reg("<(.*)>([^<]*)</\\1>");
	std::string newstr = std::regex_replace(str1, reg, "<$1 value=\"$2\">");
	std::cout << newstr << std::endl;
/*
begin <tag><one value="first"><two value="second"><three value="third"></tag> end
*/

把匹配到的字符串替换成想要的形式

常用的也就这些，能熟练运用已经能解决很多问题了大概吧。