Substring with Concatenation of All Words

最新推荐文章于 2024-01-06 14:27:05 发布

原创最新推荐文章于 2024-01-06 14:27:05 发布 · 3.3k 阅读

0 ·

CC 4.0 BY-SA版权

本文介绍了一种高效的字符串匹配算法，用于查找字符串中所有由指定单词列表构成的子串，并确保每个单词仅出现一次且连续无间隔。算法通过预处理单词列表和目标字符串来减少不必要的比较，实现了较高的查找效率。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

class Solution
{
public:
	vi findSubstring(string s,vector<string>& L)
	{
		if(s.empty()||L.empty())
			return vi();
		int len_word=L[0].length();
		set<int> retS;
		for(int i=0;i<len_word;i++)
		{
			string t=string(&s[0+i],&s[0]+s.length());
			_findSubstring(t,L,retS,i);
		}
		vi ret(retS.begin(),retS.end());
		return ret;
	}
	void _findSubstring(string s,vector<string>& L,set<int>& ret,int off)
	{
		int len_word=L[0].length();
		int sz=L.size();
		//assert(s.length()%len_word==0);
		int word_num=s.length()/len_word;
		if(word_num==0)
			return ;

		int i,j;

		vector<string> words;
		vector<int> wordsTime;
		words.push_back(L[0]);
		wordsTime.push_back(1);
		for(i=1;i<sz;i++)
		{
			for(j=0;j<i;j++)
			{
				if(L[i].compare(L[j])==0)
				{
					wordsTime[j]++;
					break;
				}
			}
			if(j==i)
			{
				words.push_back(L[i]);
				wordsTime.push_back(1);
			}
		}
		int wsz=words.size();

		vi indexs;
		indexs.reserve(word_num);
		for(i=0;i<word_num;i++)
		{
			string t=string(&s[len_word*i],&s[len_word*i]+len_word);
			int idx=-1;
			for(j=0;j<wsz;j++)
			{
				if(words[j].compare(t)==0)
				{
					idx=j;
					break;
				}
			}
			indexs.push_back(idx);
		}

		vi emerge(sz,0);
		int exist=0;
		int left=0;
		for(i=0;i<word_num;i++)
		{
			if (indexs[i]==-1)
			{
				for(j=left;j<i;j++)
				{
					emerge[indexs[j]]--;
					exist-=1;
				}
				left=i+1;
			}
			else if (emerge[indexs[i]]==wordsTime[indexs[i]])
			{
				emerge[indexs[i]]=-1;
				for(j=left;emerge[indexs[j]]!=-1;j++)
				{
					emerge[indexs[j]]--;
					exist--;
				}
				emerge[indexs[i]]=wordsTime[indexs[i]];
			
				left=j+1;
			}
			else
			{
				emerge[indexs[i]]++;
				exist++;
				if (exist==sz)
				{
					ret.insert(len_word*left+off);
					emerge[indexs[left]]--;
					exist--;
					left++;
				}
			}
		}
		
	}
};

题目如下：

You are given a string, S, and a list of words, L, that are all of the same length. Find all starting indices of substring(s) in S that is a concatenation of each word in L exactly once and without any intervening characters.

For example, given:
S: "barfoothefoobarman"
L: ["foo", "bar"]

You should return the indices: [0,9].
(order does not matter).

在编程之美上有一个题与此类似，此题不同在于

1.L中每个单词只能出现一次

2.单词间必须是连续的

注意L中的单词可以有重复的，如L[ "a","b","a"]，所以统计每个单词应当出现的次数的时候，要考虑重复；

另外很恶心的是待检查的是个字符串而不是一个数组，我采用的方法是首先在s[0]开头的子串中检查；然后是s[1]。。。可知这样的效率实在很差，尤其是L中单词的长度很长的时候。

如果您有更好的方法，希望能告知我~

解题思想：

1.首先为L中数组建立希望的重复次数数组words_time，同时去除掉重复的单词words，这部分的复杂度是O( len(L[0]) * len( L[0]) );

2.将待检查的子串S按L[0]的长度分成K ( K=len(S) / len(L[0]) )个等长单词，同时建立S中单词在words中的下标，这部分的复杂度是 O( K * len(L[0] ) );

3.枚举S' ( S ' 是原S去掉开头的子串）中的每一个单词i,同时维护当前已包含单词的范围最左单词left，检查单词i 的出现情况：