重复的子字符串 - 简单-优快云博客

本文链接：https://blog.youkuaiyun.com/ElseWhereR/article/details/146058496

*************

C++

*************

An eye on the topic.

Even numbers maybe the key to solve the topic. Because repeating alway makes double or trible or more. Soon I find that the string AAA is made of A for 3 times, but the number of subsequence is 3 and 3 is odd number.

Maybe double pointers work. The first pointer moves fron left to right and the second moves exactly the opposite way. But the string -> abab doesnot follow the rule.

Count the numbers of each letter. The bumbers have to be even. But soon shoud think if the string is aabcbc, follow the rule without fitting the topic.

Let us talk about next. Steve founded the NExT company after he was expelled from apple. And then the pixar company was founded at the same time, the owner is Steve. I like watch movies. I like to see the different ways of life. People have different friends and different characters. I like different person people. In a few hours participating the one's whole life. Watching them smiling, laughing, crying, shouting, suffering, starving, is a really interesting feeling.

But wait, subsequence is the key to the topic.

‌字符串 "abcabc"‌

前缀集合："a", "ab", "abc", "abca", "abcab"
后缀集合："c", "bc", "abc", "cabc", "bcabc"
最长公共前后缀："abc" → 长度 3‌

Actualy this method is refered from other's genius. I spend a really long time to understand the algorythm. Just in the past few weeks, I had a desire to be the first one who find the way. I changed the thoughts recently. Because Lei Jun has already told me the learning method.

The code to find the longest subset is something really puzzling.

int kmpSearch(const string& text, const string& pattern) {
    vector<int> next = buildNext(pattern);
    int i = 0, j = 0; // i: 主串指针，j: 模式串指针
    while(i < text.length()) {
        if(text[i] == pattern[j]) {
            i++; j++;
            if(j == pattern.length()) 
                return i - j; // 匹配成功
        } else {
            j ? j = next[j-1] : i++;
        }
    }
    return -1;
}

next[i] 表示子串 s[0...i] 的最长相等前缀和后缀的长度，注意前缀不包含最后一个字符，后缀不包含第一个字符。

前缀：开头必须是s[i];

后缀：结尾必须是s[i];

suppose s = “ababab”

j, i	0	1	2	3	4	5
string	a	b	a	b	a	b

步骤	i	j	s[i]	s[j]	操作	next数组状态
1	1	0	b	a	不匹配，j保持0	next[1]=0 → [0,0]
2	2	0	a	a	匹配，j++ → j=1	next[2]=1 → [0,0,1]
3	3	1	b	b	匹配，j++ → j=2	next[3]=2 → [0,0,1,2]
4	4	2	a	a	匹配，j++ → j=3	next[4]=3 → [0,0,1,2,3]
5	5	3	b	b	匹配，j++ → j=4	next[5]=4 → [0,0,1,2,3,4]

I become confused that is this topic really an eazy one? Why the next() function can find the length of the longest subset.

	a	b	a	b	a	b
j ->	a	ab	aba	abab	ababa	ababab
	ababab	babab	abab	bab	ab	b	<-i

Mathematic tricks always fool me. Avoid repeat is the chase. The code doesnot generate the subsets and doesnot conpare the subsets, then why the code can tell the length?

debug the code:

j, i	0	1	2	3	4	5
string	a	b	a	b	a	b

i = 1, j = 0,
temporary string = {a, b};
s[0] != s[1], which means perfix string {a} and postfix {b} are different. j remains 0, and i++;
next[0] = 0;

i = 2, j = 0,
temporary string = {a, b, a};
s[0] == s[2];
j++ -> j = 1;
perfix strings: {a}, {a, b};
postfix strings: {a}, {b, a};
because s[0] != s[1], so s[0]s[1] != s[1]s[2];
next[1] = 1, next = [0, 1, 0, 0];

i = 3; j = 1;
temporary string = {b, a, b};
perfix string : b, ba,;
postfix string: b, ab;
s[3] == s[1];
j ++, j = 2 >> here is the thing to figure out.

remember the last step:

s[0] = s[2], then

if s[1] = s[3], the the ab is the possiable unit.

0	1	2	3
a	b	a	b


0	1	2	3
a	b	a	c
如果s[1] != s[3], 那么ab就不是那个能组成字符串的，就得重新回到开始的地方，abac可能是最小单元字符串。
a	b	a	c

The spark kills me one day to figure out. Basically, this method is something about quick matching.

If prefix and postfix share the same letter, that menas aXXXaXXX maybe the same. Move to the next to see if the next letter is the same. However, if s[i] ! s[j], the form may like aXXXbXXX , this unit is definitly not the smallest unit to combine the string.

class Solution {
public:
    bool repeatedSubstringPattern(string s) {
        int n = s.size();
        if (n <= 1) return false; // 长度不足时无法重复‌:ml-citation{ref="8" data="citationList"}
        
        vector<int> next(n, 0); // 构建 next 数组
        int j = 0; // 前缀指针
        
        for (int i = 1; i < n; ++i) 
        {
            // 不匹配时回退前缀指针
            while (j > 0 && s[i] != s[j]) 
            {
                j = next[j - 1];
            }
            // 匹配时前缀指针后移
            if (s[i] == s[j]) 
            {
                j++;
            }
            next[i] = j; // 记录当前最长相等前后缀长度‌:ml-citation{ref="4,6" data="citationList"}
        }
        
        int max_prefix = next.back(); // 最长相等前后缀长度
        int len = n - max_prefix; // 重复子串长度
        
        // 需满足：子串长度能整除原串，且存在非全串的重复‌:ml-citation{ref="3,4" data="citationList"}
        return max_prefix != 0 && n % len == 0;
    }
};

There is another easy way to find the result. Double the string and make them the one. Cut the first and last letter to seet the string s is the substring of doubleS or not. Let's try.

class Solution 
{
public:
    bool repeatedSubstringPattern(string s) 
    {
        string ss = s + s; //将两个string拼接成一个长string
        ss = ss.substr(1, ss.size() - 2); // 掐头去尾
    }
};

ss.substr(a, b) means ss call string function and the paraments come fron ss[a] to ss[b]. Do you have any idea that why the last parament is ss[ss.size() - 2], not ss[ss.size() - 1]. The reason is as bellow. Pay special attention to length and the position.

ss	a	b	c	a	b	c
ss.size	1	2	3	4	5	6
ss[i]	0	1	2	3	4	5

Introduce find function.

母数列.find(目标数列）；

string s = 'hello ElseWhere';
size_t found = s.find('hello'); // 查看hello是否在 string s 中

Just call functions.

class Solution 
{
public:
    bool repeatedSubstringPattern(string s) 
    {
        string ss = s + s; //将两个string拼接成一个长string
        ss = ss.substr(1, ss.size() - 2); // 掐头去尾

        // 调用find函数
        if (ss.find(s) == string::npos)
        {
            return false;
        }
        else
        {
            return true;
        }
    }
};

The mathematic tricks here is period. If string s is made up of the repetition unit, The string s_modified contains all possible versions of s that have been circularly shifted. These versions must contain the original string s. It is hard to figure it out anyway.