LeetCode(10)RegularExpresssionMatching

最新推荐文章于 2020-12-31 12:51:18 发布

原创最新推荐文章于 2020-12-31 12:51:18 发布 · 3.7k 阅读

0 ·

CC 4.0 BY-SA版权

C++ 同时被 2 个专栏收录

262 篇文章

订阅专栏

InterviewQuestions

183 篇文章

订阅专栏

本文深入探讨了正则表达式匹配算法的实现原理及难点，通过递归方法解决了含有‘*’的复杂匹配问题，并提供了简洁高效的代码示例。

我觉这道题目得挺难的，想了很久也没想清楚%>_<%于是看了解题报告

题目解释

'.' Matches any single character.
'*' Matches zero or more of the preceding element.
The matching should cover the entire input string (not partial).
The function prototype should be:
bool isMatch(const char *s, const char *p)
Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

给定的例子中，有一个比较难懂，是这样的。
isMatch("ab", ".*") → true
这个式子的判定结果是true。理由是.*理解为重复任意字符任意次.......而a是任意字符，b也是任意字符。所以ture

基本思路:
比较困难的问题在这里。看下面两个例子的比较：
例1： isMatch("aaaaab","a*b")，
为了表达方便记录为 isMatch("a1a2a3a4a5b1","a6*b2"),
显然a6应该匹配a1a2a3a4a5，然后b2匹配b1。

例2： isMatch("aaaaab","a*ab")，
为了表达方便记录为 isMatch("a1a2a3a4a5b1","a6*a7b2"),
显然a6应该匹配a1a2a3a4，然后a7匹配a5，然后b2匹配b1。
于是发现，当正则表达式中有*的时候，比较讨厌，你不知道到底要在s中匹配多长。
这个问题的结构说明这个问题天生适合用递归来解决。就像leetcode的解答报告中说的If you are stuck, recursion is your friend.
这个问题的结构还说明*很特殊所以需要特殊对待，于是基本思路是
把p分隔为若干个小单位，要么是一个字符构成的小单位，要么是一个字符和后面一个'*'构成的小单位。
逐一检查这样的小单位是否能够和s匹配。这其中，到底这样的小单位匹配多长的s是需要我们逐一检查的，也就是需要我们暴力一下的。

抄袭代码:

//
//  Solution.h
//  LeetCodeOJ_010_RegMatching
//
//  Created by feliciafay on 12/4/13.
//  Copyright (c) 2013 feliciafay. All rights reserved.
//

#ifndef LeetCodeOJ_010_RegMatching_Solution_h
#define LeetCodeOJ_010_RegMatching_Solution_h
#include <iostream>
#include <string>
#include <assert.h>
// Input:	"bbbba", ".*a*a"
class Solution {
public:
    bool isMatch(const char *s, const char *p) {
        assert(s && p);
        if (*p == '\0') return *s == '\0';
        
        // next char is not '*': must match current character
        if (*(p+1) != '*') {
            assert(*p != '*');
            return ((*p == *s) || (*p == '.' && *s != '\0')) && isMatch(s+1, p+1);
        }
        // next char is '*'
        while ((*p == *s) || (*p == '.' && *s != '\0')) {
            if (isMatch(s, p+2))
                return true;
            s++;
        }
        //继续匹配剩下的部分
        return isMatch(s, p+2);
        }
};
#endif

小结
(1)欣赏一下这句的简洁和巧妙

return ((*p == *s) || (*p == '.' && *s != '\0')) && isMatch(s+1, p+1);

(2)
注意这里是

    if (isMatch(s, p+2)) return true;

而不是

    return  (isMatch(s, p+2));

因为逻辑是，在若干次暴力搜索中，只要又一次匹配就返回true,不管之前失败了几次。
(3)
之前个人错误地认为逻辑其实应该是

//错误写法
s++;
if (isMatch(s, p+2)) return true;

但是实际正确写法却是

if (isMatch(s, p+2)) return true;
s++;

为什么呢？

看这样一个例子就能理解了。 isMatch(“a”, "a*a")，这里"a"和"a*a"显然应该是匹配的。为了方便，表达为
原串: a1
正则: a2*a3
如果按照错误写法，那么a1匹配a2之后，将检查a3和'\0'发现不匹配于是返回结果为不匹配。
可是实际上，a2*表达的意思是把a2重复0次或者多次，错误的写法没有考虑到把a2重复0次的情况。
按照正确的写法，就可以考虑到a2重复0次的情况，从而判断a1是否匹配a3，从将最终结果返回为匹配。
所以正确的写法是

if (isMatch(s, p+2)) return true;
s++;

(4) LeetCode对思路讲解的的原话

We need some kind of backtracking mechanism such that when a matching fails, we return to the last successful matching state and attempt to match more characters in s with ‘*’. This approach leads naturally to recursion.

If the next character of p is NOT ‘*’, then it must match the current character of s. Continue pattern matching with the next character of both s and p.
If the next character of p is ‘*’, then we do a brute force exhaustive matching of 0, 1, or more repeats of current character of p… Until we could not match any more characters.