Leetcode 10. Regular Expression Matching

最新推荐文章于 2024-02-23 10:40:54 发布

原创最新推荐文章于 2024-02-23 10:40:54 发布 · 130 阅读

0 ·

CC 4.0 BY-SA版权

leetcode 专栏收录该内容

18 篇文章

订阅专栏

博客围绕正则表达式匹配展开，要求匹配整个输入字符串。实现时，难点在于处理'*'，'.'较易实现。先介绍了递归实现方法，该方法易理解但时间复杂度高；后又提到DP方法，其核心思路与递归类似，能提升性能，但需额外二维数组作为空间开销。

matching with support for '.' and '*'.

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

Note:

s could be empty and contains only lowercase letters a-z.
p could be empty and contains only lowercase letters a-z, and characters like . or *.

Example 1:

Input:
s = "aa"
p = "a"
Output: false
Explanation: "a" does not match the entire string "aa".

Example 2:

Input:
s = "aa"
p = "a*"
Output: true
Explanation: '*' means zero or more of the precedeng element, 'a'. Therefore, by repeating 'a' once, it becomes "aa".

这道题就是让我们通过逻辑代码硬编程的方式实现正则表达式匹配。麻烦点在于'*'；而'.'的实现是很容易，我学到的第一个方法是通过递归来实现的这个方法比较好理解代码如下:

public class Solution {
    public boolean match(char[] str, char[] pattern) {
    if (str == null || pattern == null) {
        return false;
    }
    int strIndex = 0;
    int patternIndex = 0;
    return matchCore(str, strIndex, pattern, patternIndex);
}
  
public boolean matchCore(char[] str, int strIndex, char[] pattern, int patternIndex) {
    //有效性检验：str到尾，pattern到尾，匹配成功
    if (strIndex == str.length && patternIndex == pattern.length) {
        return true;
    }
    //pattern先到尾，匹配失败
    if (strIndex != str.length && patternIndex == pattern.length) {
        return false;
    }
    //模式第2个是*，且字符串第1个跟模式第1个匹配,分3种匹配模式；如不匹配，模式后移2位
    if (patternIndex + 1 < pattern.length && pattern[patternIndex + 1] == '*') {
        if ((strIndex != str.length && pattern[patternIndex] == str[strIndex]) || (pattern[patternIndex] == '.' && strIndex != str.length)) {
            return matchCore(str, strIndex, pattern, patternIndex + 2)//模式后移2，视为x*匹配0个字符
                    || matchCore(str, strIndex + 1, pattern, patternIndex + 2)//视为模式匹配1个字符
                    || matchCore(str, strIndex + 1, pattern, patternIndex);//*匹配1个，再匹配str中的下一个
        } else {
//  还是视为 x*匹配0个字符
            return matchCore(str, strIndex, pattern, patternIndex + 2);
        }
    }
    //模式第2个不是*，且字符串第1个跟模式第1个匹配，则都后移1位，否则直接返回false
    if ((strIndex != str.length && pattern[patternIndex] == str[strIndex]) || (pattern[patternIndex] == '.' && strIndex != str.length)) {
        return matchCore(str, strIndex + 1, pattern, patternIndex + 1);
    }
    return false;
    }
}

递归的方法时间复杂度太高了。后面又尝试去学到了一种DP的方法，这个DP的话核心思路和这个递归的代码是类似的，但是因为没有多余的递归遍历可以将性能提到最大，但是需要新增一个额外的二维数组作为空间开销。

// 个人理解这里的dp数组， 第一维代表字符串， 第二维代表正则表达式。
public boolean isMatch(String s, String p) {
        if (s == null || p == null) {
        return false;
    }
    boolean[][] dp = new boolean[s.length()+1][p.length()+1];
    dp[0][0] = true;
    for (int i = 0; i < p.length(); i++) {
        if (p.charAt(i) == '*' && dp[0][i-1]) {
// 这里可以先记住暂时不用理解， 往下看
            dp[0][i+1] = true;
        }
    }
    for (int i = 0 ; i < s.length(); i++) {
        for (int j = 0; j < p.length(); j++) {
// dp[i+1][j+1] = dp[i][j] 的实际意义就是如果之前的表达式是匹配的. 

//那么append当前索引符号后也是匹配的。


            if (p.charAt(j) == '.') {
                dp[i+1][j+1] = dp[i][j];
            }
            if (p.charAt(j) == s.charAt(i)) {
                dp[i+1][j+1] = dp[i][j];
            }
// 下面的术语移除表达的意思就是怎么样把正则表达式变成一个不包含正则表达式而是字符串的过程.

/* dp[i+1][j+1] = dp[i+1][j-1] 形如此类的赋值只是针对正则表达式是否还有可能匹配上字符串这个意义而

言.  比如 s = abc p = ac*bc, 符合x*匹配0个字符的情况， 那么我现在需要知道正则表达式的索引到*后

它是否还存在可能去匹配上字符串，肉眼看这个例子是有可能的， 但怎么让程序知道这个可能的依据呢，就需

要去移除c*这两个字符。怎么做到移除呢，也就是通过dp[i+1][j+1] = dp[i+1][j-1]来移除。(暂且可以理

解j+1索引对应的是*, j 对应的是c) 这样就很明白了
*/
            if (p.charAt(j) == '*') {
                if (p.charAt(j-1) != s.charAt(i) && p.charAt(j-1) != '.') {
// 这里就是那种 x*匹配0个字符的情况， 
// 想象一下 x* 匹配0个字符的话我们实际上是可以在正则表达式中移除x*两个字符的， 所以可以移除一下。
                    dp[i+1][j+1] = dp[i+1][j-1];
                } else {
// dp[i+1][j] 就代表了 x*就单独匹配了一个x, 所以*这个字符实际上也是可以被移除的
// dp[i][j+1] 代表了x* 要匹配多个x。但是这里我们是不知道匹配几个x的， 只知道下一个肯定是x
// 所以实际上我们是可以把字符串中的当前x移除的。
// dp[i+1][j-1] 代表的还是 x* 匹配0个字符的情况,也就是移除x*两个字符.
                    dp[i+1][j+1] = (dp[i+1][j] || dp[i][j+1] || dp[i+1][j-1]);
                }
            }
        }
    }
    return dp[s.length()][p.length()];
    }