Wildcard Matching

本文介绍了两种实现通配符匹配的方法:递归方法和动态规划方法,并提供了详细的思路解析及C++代码实现。

题目

Implement wildcard pattern matching with support for '?' and '*'.

'?' Matches any single character.
'*' Matches any sequence of characters (including the empty sequence).

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") ? false
isMatch("aa","aa") ? true
isMatch("aaa","aa") ? false
isMatch("aa", "*") ? true
isMatch("aa", "a*") ? true
isMatch("ab", "?*") ? true
isMatch("aab", "c*a*b") ? false

思路一

 递归,大数据超时。


class Solution {
public:
    bool isMatch(const char *s, const char *p) {
        // Start typing your C/C++ solution below
        // DO NOT write int main() function
        if(*p=='\0')
            return *s=='\0';    
        if(*p!='*') {
            if(*s==*p)
                return isMatch(s+1,p+1);
            if(*p=='?' && *s!='\0')
                return isMatch(s+1,p+1);
            return false;
        }
        while(*s!='\0') {
            if(isMatch(s,p+1)) 
                return true ;
            s++;
        }
        return isMatch(s,p+1);        
    }
};


思路二

动态规划 DP  Think Process:

The recursive method is intuitive and gives great insight of the matching process. If we neglect the boundary cases for a moment, there are three basic cases when we try to match the characters ins andp:

  1. No match. Simply return false. This is the last case in the program.
  2. Single match. Either *s == *p, or *p == '?'. The return value of this case depends on the result of the rest parts of boths andp (recursion call), which start at the 'diagonal' position by advancing boths andp by 1 (++s, ++p)
  3. Star case, i.e. when *p == '*'. This is where the complication comes in. A star can match 0, 1, 2, ..., to the end of strings. So, as long as one of the substrings match (recursion call), after advance over'*', it returns true. This case returns false only after exhausting all possible substrings without a match.

After we have some sense on the dependencies of each step, learned from the recursive function calls, we can set up our dynamic programming frame. For examples = "abcdef" andp = "a?c*f"

matrix setup

The strings are indexed 1 for convenience. Now let's directly apply the rules learned from the recursion method:

The arrow means "depends on" or "recursion call". The cells without a match can be pre-filled withFALSE's. The tail cell '\0' '\0' is markedTRUE.

recursion process

We eventually want to know cell(0,0), but we have to know cell(1)first;

  • s[1] == s[2] gives case 2, so cell(1) depends on cell(2);
  • p[2] == '?' gives case 2, so cell(2) depends on cell(3);
  • s[3] == p[3] gives case 2, so cell(3) depends on cell(4);
  • p[4] == '*' gives case 3, so cell(4) depends on all the crimson shaded cells. As long as one of the shaded cells is TRUE, Cell(4) is TRUE. Cell(4) depends the right cells because it can match 0 characters.

...

  • p[5] == s[6] gives case 2, so cell(5) depends on the tail '\0','\0' case, which is TRUE. So cell(5) = TRUE.

Then we trackback, just as the recursive functions. cell(0) = cell(1) = cell(2) = cell(3) = cell(4) = cell(5) =TRUE. At last the function returnsTRUE.

And then we do the really dynamic programming. Note that the problem is symmetric, which means you can match the strings from left to right, or from right to left, they are identical. In the recursion method, the actual result propagates from the bottom right corner to the up left corner. In dynamic programming, we want to start with row one, so we can flip the whole dependency graph. Again the arrows mean dependencies, or get value from.

DP process

All the non-matching cells are pre-filled with FALSE's. The only initialTRUE is at cell(0), which is also the case when you match two NULL strings. So now you just need a matrix size(s)*size(p), and fill the cells row by row according to the three rules:

  1. No matching: fill FALSE;
  2. Matching, or '?': copy the value from previous diagonal cell
  3. '*': Look up all cells to the left, and look up the cells to the left of previous row, and the cell directly above ---- if there is at least oneTRUE, fillTRUE; otherwise fillFALSE

Finally return the value of the last cell.

There are some more tricks in practice. Firstly, successive '*' is equivalent to a single'*', so we may suppress them together. After doing this, the number of'*'s is at most size(p)/2. So the worst run time is O(m*n + m^2), where m=size(p) and n=size(s).

Also consider that after removing all '*'s, size(p) <= size(s), which means m is at most 2n for the worst case, so that m = O(n). Thus the worst run time is O(m*n).

Secondly, the matrix is updated row by row, and even the '*' case requires two latest rows. So it is possible to have a space efficient way solve the matching problem by using two size(s) arrays. 

代码实现:

class Solution {
public:
    bool isMatch(const char *s, const char *p) {
        // Start typing your C/C++ solution below
        // DO NOT write int main() function
        int lens = strlen(s);
        int lenp = strlen(p);
        vector<vector<bool> >  result(lens, vector<bool>(lenp,false));
        result[lens][lenp] = true;  // s="" and p="", return true
        string S(s);
        string P(p);
        for(int i=lenp-1; i>=0; i--) {
            if(P[i]=='*' && result[lens][i+1])
                result[lens][i]=true;
            else
                result[lens][i]=false;            
        }
        for(int i=lenp-1;i>=0;i--)
            for(int j=lens-1;j>=0;j--) {
                if(S[j]==P[i] || P[i]=='?')
                    result[j][i]=result[j+1][i+1];
                if(P[i]=='*') {
                    for(int k=j;k<=lens;k++) {
                        if(result[k][i+1]) {
                            result[j][i] = true;
                            break;
                        }
                        result[j][i]=false;
                    }                    
                }
                
            }           
        return result[0][0];        
    }
};

 

Wildcard matching algorithms

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值