LeetCode 1055. Shortest Way to Form String 字符串贪心 NextMatch数组

本文链接：https://blog.youkuaiyun.com/taoqick/article/details/106699543

本文探讨了如何从源字符串中找到最小子序列组合，使其连接等于目标字符串。通过实例展示了算法实现，包括直接贪心法、数组映射法及优化后的NextMatch数组法，深入分析了每种方法的效率和应用场景。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

From any string, we can form a subsequence of that string by deleting some number of characters (possibly no deletions).

Given two strings source and target, return the minimum number of subsequences of source such that their concatenation equals target. If the task is impossible, return -1.

Example 1:

Input: source = "abc", target = "abcbc"
Output: 2
Explanation: The target "abcbc" can be formed by "abc" and "bc", which are subsequences of source "abc".

Example 2:

Input: source = "abc", target = "acdbc"
Output: -1
Explanation: The target string cannot be constructed from the subsequences of source string due to the character "d" in target string.

Example 3:

Input: source = "xyz", target = "xzyxz"
Output: 3
Explanation: The target string can be constructed as follows "xz" + "y" + "xz".

Constraints:

Both the source and target strings consist of only lowercase English letters from "a"-"z".
The lengths of source and target string are between 1 and 1000.

---------------------------------------------------------------------------------------------

这题据说和LeetCode 792很像，上来如果能迅速想到贪心，这题就成功一半，否则写了一个BFS，差点超时。当然贪心的证明比较容易，最后就成了消耗了source完整几次+半次的问题。

直接贪心1：

class Solution:
    def shortestWay(self, source: str, target: str) -> int:
        slen,tlen = len(source),len(target)
        si,res = 0,0
        for ti in range(tlen):
            #print("si={0}".format(si))
            i = 0
            while (i < slen):
                idx = (i+si)%slen
                if (idx+1 == slen):
                    res+=1
                if (source[idx] == target[ti]):
                    si = (idx+1)%slen
                    break
                i+=1
            if (i == slen):
                return -1
        return res if si == 0 else res+1

对于Python这种慢语言，字符串操作是比较慢的，用一些find函数会好很多，因此：

直接贪心2：

class Solution(object):
    def shortestWay(self, source, target):
        """
        :type source: str
        :type target: str
        :rtype: int
        """
        source_cursor = 0
        round_count = 1
        for c in target:
            idx = source.find(c, source_cursor)
            if idx == -1:
                round_count += 1
                idx = source.find(c)
                if idx == -1:
                    return -1
            
            source_cursor = idx+1
        
        return round_count

这些还有几种优化的思路，最直接的方法是建立字符到位置的映射。例如source="abcab"，那么a->[0,3], b->[1,4],c->[2]，对于每个List，可以二分，这样复杂度是tlen*log(slen)，python3里，发现不二分也很快：

数组映射法（Python最快的结果）：

class Solution(object):
    def shortestWay(self, source: str, target: str) -> int:
            # construct a hashmap
            hashmap = collections.defaultdict(list)
            for i, s in enumerate(source):
                hashmap[s].append(i)
            result, idx = 1, -1
            for c in target:
                if c not in hashmap: return -1
                else:
                    indicator = False
                    for i in hashmap[c]:
                        if i > idx: 
                            idx = i
                            indicator = True
                            break
                    if not indicator:
                        idx = hashmap[c][0]
                        result += 1     
            return result

在这个基础上还有进一步的优化，例如source="abcab"，那么a->[0,3], b->[1,4],c->[2]，刚才映射成了list。如果不用list，用f[ch][pos]表示在source串中pos位置（包含pos位置）以后找ch的下一个pos，那么source="abcab"就会变成：

先算这个矩阵的思路比较巧妙，注意不要写成O(slen*slen)的写法，完全可以在26*O(slen)的复杂度内完成，以下是代码，注释部分是刚开始写的O(slen*slen)的代码，非常慢，但是即使最终是O(slen)+O(tlen)的复杂度，Python并没有速度优势：

from collections import defaultdict

class Solution:
    def shortestWay(self, source: str, target: str) -> int:
        slen, tlen = len(source), len(target)
        dic = defaultdict(lambda: [slen for i in range(slen)])
        # for i in range(slen):
        #     ch = source[i]
        #     dic[ch][i] = (i + 1) % slen
        #     for j in range(1, slen):
        #         if (ch == source[i - j]):
        #             break
        #         else:
        #             dic[ch][(i - j + slen) % slen] = (i + 1) % slen
        for i in range(slen):
            dic[source[i]][i] = (i+1)%slen
        for ch in set(list(source)):
            prev = slen
            for j in range(slen-1,-slen,-1):
                if (dic[ch][j] == slen):
                    dic[ch][j] = prev
                else:
                    prev = dic[ch][j]

        #print(dic)
        si, cross_end = 0, 0  # si是基准pos
        for ti in range(tlen):
            if (target[ti] in dic):
                nxt_si = dic[target[ti]][si]
                cross_end = cross_end + 1 if nxt_si < si else cross_end
                si = nxt_si
            else:
                return -1
        return cross_end if si == 0 else cross_end + 1

s = Solution()
print(s.shortestWay("xyz","xzyxz"))

后记：求NextMatch数组如果把行列颠倒一下，每次把整个26个字母的位置都拷贝一下，写起来就不用像上面这么拧巴，可以参考https://blog.youkuaiyun.com/taoqick/article/details/106735554的思路。

class Solution(object):
    def minWindow(self, S, T):
        slen,tlen,c,n = len(S),len(T),0,1
        nxt = [-1 for i in range(26)]
        nxt_arr = [None for i in range(slen)]
 
        for i in range(slen-1, -1, -1):
            ch_idx = ord(S[i])-ord('a')
            nxt[ch_idx] = i
            nxt_arr[i] = tuple(nxt)