重复的子字符串 - 简单

*************

C++

topic:459. 重复的子字符串 - 力扣(LeetCode)

*************

An eye on the topic.

Even numbers maybe the key to solve the topic. Because repeating alway makes double or trible or more. Soon I find that the string AAA is made of A for 3 times, but the number of subsequence is 3 and 3 is odd number. 

Maybe double pointers work. The first pointer moves fron left to right and the second moves exactly the opposite way.  But the string -> abab doesnot follow the rule.

Count the numbers of each letter. The bumbers have to be even. But soon shoud think if the string is aabcbc, follow the rule without fitting the topic.

Let us talk about next. Steve founded the NExT company after he was expelled from apple. And then the pixar company was founded at the same time, the owner is Steve. I like watch movies. I like to see the different ways of life. People have different friends and different characters. I like different person people. In a few hours participating the one's whole life. Watching them smiling, laughing, crying, shouting, suffering, starving, is a really interesting feeling.

But wait, subsequence is the key to the topic. 

字符串 "abcabc"

  • 前缀集合:"a""ab""abc""abca""abcab"
  • 后缀集合:"c""bc""abc""cabc""bcabc"
  • 最长公共前后缀:"abc" → 长度 3‌

Actualy this method is refered from other's genius. I spend a really long time to understand the algorythm. Just in the past few weeks, I had a desire to be the first one who find the way. I changed the thoughts recently. Because Lei Jun has already told me the learning method.

The code to find the longest subset is something really puzzling.

int kmpSearch(const string& text, const string& pattern) {
    vector<int> next = buildNext(pattern);
    int i = 0, j = 0; // i: 主串指针,j: 模式串指针
    while(i < text.length()) {
        if(text[i] == pattern[j]) {
            i++; j++;
            if(j == pattern.length()) 
                return i - j; // 匹配成功
        } else {
            j ? j = next[j-1] : i++;
        }
    }
    return -1;
}

next[i] 表示子串 s[0...i] 的最长相等前缀后缀的长度,注意前缀不包含最后一个字符,后缀不包含第一个字符。

前缀:开头必须是s[i];

后缀:结尾必须是s[i];

suppose s = “ababab”

j, i012345
stringababab
步骤ijs[i]s[j]操作next数组状态
110ba不匹配,j保持0next[1]=0 → [0,0]
220aa匹配,j++ → j=1next[2]=1 → [0,0,1]
331bb匹配,j++ → j=2next[3]=2 → [0,0,1,2]
442aa匹配,j++ → j=3next[4]=3 → [0,0,1,2,3]
553bb匹配,j++ → j=4next[5]=4 → [0,0,1,2,3,4]

I become confused that is this topic really an eazy one? Why the next() function can find the length of the longest subset.

ababab
j ->aababaababababaababab
abababbababababbababb<-i

Mathematic tricks always fool me. Avoid repeat is the chase. The code doesnot generate the subsets and doesnot conpare the subsets, then why the code can tell the length?

debug the code:

j, i012345
stringababab
  • i = 1, j = 0,
  • temporary string = {a, b};
  • s[0] != s[1], which means perfix string {a} and postfix {b} are different. j remains 0, and i++;
  • next[0] = 0;

  • i = 2, j = 0,
  • temporary string = {a, b, a};
  • s[0] == s[2];
  • j++ -> j = 1;
  • perfix strings: {a}, {a, b};
  • postfix strings: {a}, {b, a};
  • because  s[0] != s[1], so s[0]s[1] != s[1]s[2];
  • next[1] = 1, next = [0, 1, 0, 0];

  • i = 3; j = 1;
  • temporary string = {b, a, b};
  • perfix string : b, ba,;
  • postfix string: b, ab;
  • s[3] == s[1];
  • j ++, j = 2 >> here is the thing to figure out.

remember the last step:

s[0] = s[2], then 

if s[1] = s[3], the the ab is the possiable unit.

0123
abab
0123
abac
如果s[1] != s[3], 那么ab就不是那个能组成字符串的,就得重新回到开始的地方,abac可能是最小单元字符串。
abac

The spark kills me one day to figure out. Basically, this method is something about quick matching. 

If prefix and postfix share the same letter, that menas aXXXaXXX maybe the same. Move to the next to see if the next letter is the same. However, if s[i] ! s[j], the form may like aXXXbXXX , this unit is definitly not the smallest unit to combine the string.

class Solution {
public:
    bool repeatedSubstringPattern(string s) {
        int n = s.size();
        if (n <= 1) return false; // 长度不足时无法重复‌:ml-citation{ref="8" data="citationList"}
        
        vector<int> next(n, 0); // 构建 next 数组
        int j = 0; // 前缀指针
        
        for (int i = 1; i < n; ++i) 
        {
            // 不匹配时回退前缀指针
            while (j > 0 && s[i] != s[j]) 
            {
                j = next[j - 1];
            }
            // 匹配时前缀指针后移
            if (s[i] == s[j]) 
            {
                j++;
            }
            next[i] = j; // 记录当前最长相等前后缀长度‌:ml-citation{ref="4,6" data="citationList"}
        }
        
        int max_prefix = next.back(); // 最长相等前后缀长度
        int len = n - max_prefix; // 重复子串长度
        
        // 需满足:子串长度能整除原串,且存在非全串的重复‌:ml-citation{ref="3,4" data="citationList"}
        return max_prefix != 0 && n % len == 0;
    }
};


There is another easy way to find the result. Double the string and make them the one. Cut the first and last letter to seet the string s is the substring of doubleS or not. Let's try.

class Solution 
{
public:
    bool repeatedSubstringPattern(string s) 
    {
        string ss = s + s; //将两个string拼接成一个长string
        ss = ss.substr(1, ss.size() - 2); // 掐头去尾
    }
};

ss.substr(a, b) means ss call string function and the paraments come fron ss[a] to ss[b]. Do you have any idea that why the last parament is ss[ss.size() - 2], not ss[ss.size() - 1]. The reason is as bellow. Pay special attention to length and the position.

ssabcabc
ss.size123456
ss[i]012345

Introduce find function.

母数列.find(目标数列);

string s = 'hello ElseWhere';
size_t found = s.find('hello'); // 查看hello是否在 string s 中

Just call functions. 

class Solution 
{
public:
    bool repeatedSubstringPattern(string s) 
    {
        string ss = s + s; //将两个string拼接成一个长string
        ss = ss.substr(1, ss.size() - 2); // 掐头去尾

        // 调用find函数
        if (ss.find(s) == string::npos)
        {
            return false;
        }
        else
        {
            return true;
        }
    }
};

The mathematic tricks here is period. If string s is made up of the repetition unit, The string s_modified contains all possible versions of s that have been circularly shifted. These versions must contain the original string s. It is hard to figure it out anyway.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值