Regex - C++11, 6 of n

本文深入探讨正则表达式的高级应用,包括字符串匹配、替换及解析技术,通过实例展示了如何使用正则表达式进行复杂模式匹配、分组引用和全局搜索。同时介绍了正则迭代器和分词器的使用方法,以及正则表达式语法的特殊字符和ECMAScript语法。此外,还提供了错误处理机制和代码示例,帮助开发者在实际项目中灵活运用正则表达式。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1) Raw string
R"delim(...)delim", where delim is a character sequence of at most 16 basic characters except the backslash, whitespace and parentheses.
2) Note regex algorithm is greedy
3) Main interfaces

    a. basic_regex<> template: holds the regular expression pattern
(predefined regex, wregex)
    b. match_results<> template: carry back the match results (predefined s/c/ws/wcmatch, s for string, c for const char*)
    c. regex_match(): checks whether the whole character sequence matches a regular expression

    d. regex_search(): checks whether the character sequence partially matches a regular expression
     f. regex_replace(): replace character sequences that match a regular expression.

4) Basic matching examples
regex reg1("<.*>.*</.*>");
bool found = regex_match ("<tag>value</tag>", reg1);

regex reg2("<(.*)>.*</\\1>");  // grouping and forward referencing
found = regex_match ("<tag>value</tag>", reg2);

// use grep grammer
regex reg3("<\\(.*\\)>.*</\\1>",regex_constants::grep);
found = regex_match ("<tag>value</tag>", reg3);

found = regex_match ("XML tag: <tag>value</tag>", regex(R"#<(.*)>.*</\1>#")); // fails to match
found = regex_search ("XML tag: <tag>value</tag>", regex(R"#<(.*)>.*</\1>#")); // match

5) match_results carry back the detail match result

match_results object m has a state which provides the following interface:
In general, the match_results object contains:
– A sub_match object m[0] for all the matched characters
– A prefix(), a sub_match object that represents all characters before the first matched character
– A suffix(), a sub_match object that represents all characters after the last matched character
• In addition, for any capture group, you have access to a corresponding sub_match object m[n].
• size() yields the number of sub_match objects (including m[0]).
• All sub_match objects are derived from pair<> and have the position of the first character as
member first and the position after the last character as member second. In addition, str()
yields the characters as a string, length() yields the number of characters, operator << writes
the characters to a stream, and an implicit type conversion to a string is defined.

In addition, the match_results object as a whole provides:
– member function str() to yield the matched string as a whole (calling str() or str(0))
or the nth matched substring (calling str(n)), which is empty if no matched substring exists
(thus, passing an n greater than size() is valid)
– member function length() to yield the length of the matched string as a whole (calling
length() or length(0)) or the length of the nth matched substring (calling length(n)),
which is 0 if no matched substring exists (thus, passing an n greater than size() is valid)
– member function position() to yield the position of the matched string as a whole (calling
position() or position(0)) or the position of the nth matched substring (calling
length(n))
– member functions begin(), cbegin(), end(), and cend() to iterate over the sub_match
objects m[0] to m[n]

6) regex_iterator<>, predefinesare s/c/w/wc regex_iterator
Examples:
string data = "<person>\n"
                   " <first>Nico</first>\n"
                   " <last>Josuttis</last>\n"
                   "</person>\n";
regex reg("<(.*)>(.*)</(\\1)>");
// iterate over all matches (using a regex_iterator):
sregex_iterator pos(data.cbegin(),data.cend(),reg);
sregex_iterator end;
for ( ; pos != end ; ++pos ) {
    cout << "match: " << pos->str() << endl;
    cout << " tag: " << pos->str(1) << endl;
    cout << " value: " << pos->str(2) << endl;
}
// use a regex_iterator to process each matched substring as element in an algorithm:
sregex_iterator beg(data.cbegin(),data.cend(),reg);
for_each (beg,end,[](const smatch& m) {
                               cout << "match: " << m.str() << endl;
                               cout << " tag: " << m.str(1) << endl;
                               cout << " value: " << m.str(2) << endl;
                           });

7) regex_token_iterator<> tokenizer, predefines are s/c/w/wc regex_token_iterator
Example:
string data = "<person>\n"
                   "<first>Nico</first>\n"
                   "<last>Josuttis</last>\n"
                   "</person>\n";
regex reg("<(.*)>(.*)</(\\1)>");
// iterate over all matches (using a regex_token_iterator):
sregex_token_iterator pos(data.cbegin(),data.cend(), // sequence
                                      reg, // token separator
                                      {0,2}); // 0: full match, 2: second substring
sregex_token_iterator end;
for ( ; pos != end ; ++pos ) {
    cout << "match: " << pos->str() << endl;
}
cout << endl;
string names = "nico, jim, helmut, paul, tim, john paul, rita";
regex sep("[ \t\n]*[,;.][ \t\n]*"); // separated by , ; or . and spaces
sregex_token_iterator p(names.cbegin(),names.cend(), // sequence
                                   sep, // separator
                                   -1); // -1: values between separators
sregex_token_iterator e;
for ( ; p != end ; ++p ) {
    cout << "name: " << *p << endl;
}

8) regex_replace
string data = "<person>\n"
                   "<first>Nico</first>\n"
                   "<last>Josuttis</last>\n"
                   "</person>\n";
regex reg("<(.*)>(.*)</(\\1)>");
// print data with replacement for matched patterns
cout << regex_replace (data, // data
                                  reg, // regular expression
                                  "<$1 value=\"$2\"/>") // replacement
       << endl;
// same using sed syntax
cout << regex_replace (data, // data
                                  reg, // regular expression
                                  "<\\1 value=\"\\2\"/>", // replacement
                                  regex_constants::format_sed) // format flag
      << endl;
// use iterator interface, and
// - format_no_copy: don’t copy characters that don’t match
// - format_first_only: replace only the first match found
string res2;
regex_replace (back_inserter(res2), // destination
                      data.begin(), data.end(), // source range
                      reg, // regular expression
                      "<$1 value=\"$2\"/>", // replacement
                      regex_constants::format_no_copy | regex_constants::format_first_only);// format flags
cout << res2 << endl;

                                            Regex Replacement Symbols

9) Regex flags

Examples:
// case-insensitive find LaTeX index entries
string pat1 = R"(\\.*index\{([^}]*)\})"; // first capture group
string pat2 = R"(\\.*index\{(.*)\}\{(.*)\})"; // 2nd and 3rd capture group
// grep and egrep can search for multiple regular expressions separated by '\n' at the same time
regex pat (pat1+"\n"+pat2, regex_constants::egrep|regex_constants::icase);
// initialize string with characters from standard input:
string data((istreambuf_iterator<char>(cin)), istreambuf_iterator<char>());
// search and print matching index entries:
smatch m;
auto pos = data.cbegin();
auto end = data.cend();
for ( ; regex_search (pos,end,m,pat); pos = m.suffix().first) {
    cout << "match: " << m.str() << endl;
    cout << " val: " << m.str(1)+m.str(2) << endl;
    cout << " see: " << m.str(3) << endl;
}


Input samples:
\chapter{The Standard Template Library}
\index{STL}%
\MAININDEX{standard template library}%
\SEEINDEX{standard template library}{STL}%
This is the basic chapter about the STL.
\section{STL Components}
\hauptindex{STL, introduction}%
The \stl{} is based on the cooperation of

...

10) The regex ECMAScript grammar and special chars

                                                                ECMAScript grammar

                                                  Character classes in ECMAScript
(11Grammar differences

(12) Regex exception
exception code to explanation string for debugging
template <typename T>
std::string regexCode (T code)
{
    switch (code) {
        case std::regex_constants::error_collate:
               return "error_collate: regex has invalid collating element name";
        case std::regex_constants::error_ctype:
               return "error_ctype: regex has invalid character class name";
        case std::regex_constants::error_escape:
               return "error_escape: regex has invalid escaped char. or trailing escape";
        case std::regex_constants::error_backref:
               return "error_backref: regex has invalid back reference";
        case std::regex_constants::error_brack:
               return "error_brack: regex has mismatched ’[’ and ’]’";
        case std::regex_constants::error_paren:
               return "error_paren: regex has mismatched ’(’ and ’)’";
        case std::regex_constants::error_brace:
               return "error_brace: regex has mismatched ’{’ and ’}’";
        case std::regex_constants::error_badbrace:
               return "error_badbrace: regex has invalid range in {} expression";
        case std::regex_constants::error_range:
               return "error_range: regex has invalid character range, such as ’[b-a]’";
        case std::regex_constants::error_space:
               return "error_space: insufficient memory to convert regex into finite state";
        case std::regex_constants::error_badrepeat:
               return "error_badrepeat: one of *?+{ not preceded by valid regex";
        case std::regex_constants::error_complexity:
               return "error_complexity: complexity of match against regex over pre-set level";
        case std::regex_constants::error_stack:
               return "error_stack: insufficient memory to determine regex match";
    }
    return "unknown/non-standard regex error code";
}

int main()
{
    try {
        // initialize regular expression with invalid syntax:
        regex pat ("\\\\.*index\\{([^}]*)\\}",
        regex_constants::grep|regex_constants::icase);
        ...
    }
    catch (const regex_error& e) {
        cerr << "regex_error: \n"
              << " what(): " << e.what() << "\n"
              << " code(): " << regexCode(e.code()) << endl;
    }
}

<think>好的,用户想了解C++11的新特性和改进。首先,我需要回忆C++11的主要更新点,并结合提供的引用内容来组织回答。根据引用[1],C++11是继C++98后的重要版本,引入了大量更改,标准化了既有实践,改进了程序员的抽象能力。引用[3]提到了Lambda表达式的增强,但用户主要关注C++11,所以需要区分后续版本的变化。 接下来,我需要列举C++11的关键特性。首先是自动类型推导auto关键字,这简化了变量声明。然后是基于范围的for循环,方便容器遍历。智能指针如shared_ptr和unique_ptr,用于自动内存管理,防止泄漏。右值引用和移动语义,提升性能,减少不必要的拷贝。Lambda表达式,允许内联匿名函数,增强STL算法的使用。nullptr代替NULL,提高类型安全。委托构造函数和继承构造函数,简化类的构造过程。override和final关键字,增强多态控制。变长模板参数,支持泛型编程。多线程支持,包括std::thread和原子操作。 需要注意引用中的内容,比如引用[1]提到标准化既有实践,所以可能包括之前编译器支持但未标准化的特性。引用[3]提到Lambda在后续版本的增强,但C++11是基础。此外,用户可能对性能改进和现代编程范式感兴趣,比如移动语义和多线程。 然后生成相关问题,需要涵盖其他版本对比、具体特性的应用、影响等。例如,C++11与后续版本的差异,移动语义的具体应用,智能指针的使用场景等。确保问题与回答内容相关,帮助用户深入理解。</think>### C++11 核心新特性解析 C++11 作为现代 C++ 的开端,其特性可分为以下 8 个关键领域: 1. **类型推导增强** - `auto` 类型推导:$auto\ x=5;$ 允许编译器自动推导变量类型[^1] - `decltype` 表达式:$decltype(f()) y;$ 获取表达式类型信息 - 类型别名模板:$template<typename\ T>\ using\ Vec=std::vector<T>;$ 2. **内存管理革新** ```cpp std::unique_ptr<Widget> p1(new Widget); // 独占所有权 std::shared_ptr<Widget> p2 = std::make_shared<Widget>(); // 共享计数 ``` 智能指针通过 RAII 机制自动管理内存生命周期 3. **移动语义体系** ```cpp std::vector<std::string> getData() { std::vector<std::string> tmp; // ... 填充数据 return std::move(tmp); // 触发移动构造 } ``` 右值引用($T&&$)与移动构造可将资源转移效率提升 3-10 倍 4. **函数式编程支持** ```cpp std::vector<int> nums{1,2,3}; std::for_each(nums.begin(), nums.end(), [](int x){ std::cout << x*2 << " "; }); // Lambda表达式 ``` Lambda 支持捕获列表 $[&]$ 和参数模板化 5. **并发编程模型** ```cpp std::mutex mtx; std::thread t([&mtx]{ std::lock_guard<std::mutex> lock(mtx); // 线程安全操作 }); ``` 引入原子操作库 $<atomic>$ 和内存顺序模型 6. **模板增强** ```cpp template<typename... Args> void log(Args&&... args) { /* 变参处理 */ } ``` 变参模板支持任意数量类型参数 7. **字面量改进** ```cpp auto s = R"(Raw\nString)"; // 原生字符串 auto bin = 0b1100; // 二进制字面量 ``` 8. **标准库扩展** - 容器:$unordered\_map$, $array$ - 算法:$all_of$, $copy_if$ - 正则表达式库 $<regex>$ $$ O(n\log n) \ \text{时间复杂度算法在移动语义下的性能表现提升显著} $$
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值