leetcode 30. Substring with Concatenation of All Words Python3

最新推荐文章于 2022-06-19 22:55:16 发布

原创最新推荐文章于 2022-06-19 22:55:16 发布 · 571 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#leetcode #算法 #字符串 #滑动窗口 #双指针

leetcode算法从零到结束同时被 2 个专栏收录

112 篇文章

订阅专栏

19 篇文章

订阅专栏

本文介绍了一种高效的子串匹配算法，用于找出字符串中由给定单词列表构成的所有可能子串的起始索引。通过使用双指针技术和优化后的计数器策略，实现了对输入字符串的有效遍历。

一.问题描述

You are given a string s and an array of strings words of the same length. Return all starting indices of substring(s) in s that is a concatenation of each word in words exactly once, in any order, and without any intervening characters.

You can return the answer in any order.

Example 1:

Input: s = "barfoothefoobarman", words = ["foo","bar"]
Output: [0,9]
Explanation: Substrings starting at index 0 and 9 are "barfoo" and "foobar" respectively.
The output order does not matter, returning [9,0] is fine too.

Example 2:

Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output: []

Example 3:

Input: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
Output: [6,9,12]

Constraints:

1 <= s.length <= 104
s consists of lower-case English letters.
1 <= words.length <= 5000
1 <= words[i].length <= 30
words[i] consists of lower-case English letters.

二.解题思路

这题主要要注意到一点，在words里面的每个单词的长度都是一样的，一开始我就是每注意到这点，最后提交上去超时。

主要的解决思路就是。

方法一：

我们首先记录下words里面每个word的出现次数。

一个正确的符合题意的匹配，它的长度应该是所有word的长度和n，因此假设s的长度为l，我们只需要迭代到s[i]即可，这个i满足i+n=l即可。这可以因此继续往后迭代，剩余子串的长度都小于n，很明显是不可能匹配的。

本题要用到双指针，第一个指针迭代s，让我们再整个s上滑动，第二个指针我们遍历以第一个指针位置起始的，l长度为的子串。

在这个子串中，我们每次取k（单个word的长度，固定值）个子串作为一个word，如果这个word不在words里面或者它在但是出现的次数超过了words里面，说明这不是一个符合题意的字串，直接结束本次迭代，第一个指针向后滑动。

如果在words里面，我们将计数器中（需要一个计数器来记录迭代每个长度为l的子串中可能的word的出现次数）相应的word次数加一。然后将第二个指针加k，移动到下一个位置。

做到这一步已经可以ac了，

但是其实还可以更快，

方法二：

在之前的实现中，对于每一个长度为l的字串，我们都需要重新遍历，然后为他分配一个计数器。

但是其实我们可以维持一个全局的计数器。

当之前的一个字串是可以的时候，我们并不退出循环，让第二个指针直接移动k，即第一个单词被跳过，让计数器中该单词数量-1，相应的我们新增加了一个单词，然后让相应的计数+1.之后是否匹配的判断和前面一致。直到达到字符串末尾，我们结束循环。这里主要利用到的一个性质是，当在一个子串中存在一个满足题意的解，那么在它之后的解的位置必然和前面解的位置相差单个word长度k的倍数。

问题是，这种情况下，外部的指针怎么变化？

其实外部的指针在单个word长度下迭代，即可覆盖所有可能的解。

想象一下，内循环从某个索引j开始，一直迭代到字符串s末尾结束。每次的步长是单个word的长度，假设字符串所有满足题意的字串的起始索引刚好和j相差单个word倍数，那这样没问题，但是如果不是呢？这种情况下就会漏掉。

但是如果我把单个word长度下所有的起始都试试，就可以解决这个问题。

比如说:

s:'abcabcd' word: ['abc']

如果没有外部循环，只有内部的循环，即第二个指针，这种情况下是没有问题的。可以匹配到所有abc

但是如果：

s:'dabcabc' word: ['abc']

此时只有内部循环就无法匹配，但是如果把外部循环的起始索引设置为1，即跳过d，从a开始，情况就和上面类似。

因此，才需要一个外部循环，来迭代每个单个长度范围内的起始位置来覆盖所有的解。

此时我们也不需要如前面，判断前一个解可行才滑动k个长度，因为现在的搜索覆盖了所有可能的解空间。跳到下一个word的时候，只需要把前面的word的计数减掉就行了。

这段需要好好理解一下，具体看代码吧。

三.源码

注：滑动版代码来自：滑动

# method 1
class Solution:
    def findSubstring(self, s: str, words: List[str]) -> List[int]:
        if not s or not words:return []
        l,l_max,cnt=len(words[0]),len(words)*len(words[0]),dict.fromkeys(words,0)
        for word in words:
            cnt[word]+=1
        result=[]
        for i in range(0,len(s)-l_max+1):
            start,word_cnt,cnt2=i,0,{}
            flag=True
            while start+l<=i+l_max:
                word=s[start:start+l]
                if  word in cnt:
                    word_cnt+=1
                    start+=l
                    if word not in cnt2:
                        cnt2[word]=1
                    else:
                        cnt2[word]+=1
        
                    if cnt2[word]>cnt[word]:
                        flag=False
                        break
                else:
                    break
            if word_cnt==len(words) and flag:
                result.append(i)
        return result

# faster:
from collections import defaultdict
import copy

class Solution:
    def findSubstring(self, s: str, words: List[str]) -> List[int]:
        if words == []:
            return []
        result_indices = []
        size_word = len(words[0])
        size_res = len(words) * size_word

        word_count = defaultdict(lambda: 0)
        words_matched = 0
        for word in words:
            word_count[word] += 1

        for i in range(size_word):
            words_matched = 0
            missing = copy.copy(word_count)
            for end in range(i, len(s)+1, size_word):
                if end == i:
                    continue

                # Add new word to the hashmap
                new_word = s[end-size_word:end]
                if new_word in missing:
                    missing[new_word] -= 1
                    if missing[new_word] == 0:
                        words_matched += 1

                # Check if we found a matching substring
                if words_matched == len(missing):
                    result_indices.append(end-size_res)

                # Prepare for next iteration by discounting first word of window
                start = end-size_res
                old_word = s[start:start+size_word]
                if start >= 0:
                    if old_word in missing:
                        if missing[old_word] == 0:
                            words_matched -= 1
                        missing[old_word] += 1
        return result_indices