[LeetCode] (medium) 3. Longest Substring Without Repeating Characters

最新推荐文章于 2024-01-06 13:50:14 发布

原创最新推荐文章于 2024-01-06 13:50:14 发布 · 124 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#LeetCode

LeetCode 专栏收录该内容

98 篇文章

订阅专栏

本文深入探讨了LeetCode经典题目“最长无重复子字符串”的解决方案，通过维护活动子串状态，采用标记数组和左边界变量高效求解。文章对比了两种优化策略，最终实现快速准确地找到最长无重复子串的长度。

https://leetcode.com/problems/longest-substring-without-repeating-characters/

Given a string, find the length of the longest substringwithout repeating characters.

Example 1:

Input: "abcabcbb"
Output: 3 
Explanation: The answer is "abc", with the length of 3.

Example 2:

Input: "bbbbb"
Output: 1
Explanation: The answer is "b", with the length of 1.

Example 3:

Input: "pwwkew"
Output: 3
Explanation: The answer is "wke", with the length of 3. 
             Note that the answer must be a substring, "pwke" is a subsequence and not a substring.

直观上一开始是想用分而治之的方法的，但是在归并的时候处理“跨立两段”的情况时遇到了问题——向左和向右延伸发生冲突时的取舍难以确定，而这一问题又与直接从左往右筛选遇到冲突时的问题相仿，因此这种分治法感觉意义不大。

继续考虑从左至右遍历遇到冲突时，第一思路肯定是要将发生冲突的位置及其左侧的长度与当前位置及其右侧能够添加进来的长度（当前字符右侧的最近冲突位置之间的距离）进行比较，以决定是否要加入新字符。但是首先“右侧能够添加进来的长度”本身就难以判断，又会受到其它字符冲突的限制，其次如果选择不加入新字符则当前子串与下一位比较位置会发生割裂，这个逻辑难以循环。

因此第二思路就很直观地只考虑当前比较位置的字符，维护的子串是“以当前位置为末尾的满足条件约束的最长子串”，可以同时解决上两个问题。同时显然任何一个满足答案的全局最长字串必然会被遍历到，因此只需要记录遍历过程中的最大长度即可。

为了避免在每个位置都要回溯统计长度的情况，自然是引入了一个标记数组arr，用于记录在当前的active子串中各个字符的存在情况与位置，以及一个整型变量lef，用于记录active子串的左侧位置，以计算长度。当发生冲突时将冲突位置及其左侧的子串擦出（标记数组中相应位置变为-1），更新lef值。

class Solution {
public:
    int lengthOfLongestSubstring(string s) {
        static int fast_io = []() { std::ios::sync_with_stdio(false); cin.tie(nullptr); return 0; }();
        vector<int> arr(300, -1);
        int result = 0;
        int lef = 0;
        for(int i = 0; i < s.size(); ++i){
            if(arr[(int)(s[i])] == -1){    //当前尾串中不含s[i]
                //cout << "ent";
                arr[(int)(s[i])] = i;      //加入当前尾串
                result = max(result, i-lef+1);
                // if(result == 4){
                //     cout << i << endl;
                //     cout << lef << endl;
                // }
            }else{                      //尾串中已含s[i]
                for(int j = lef; j < arr[(int)(s[i])]; ++j){  //将尾串中s[i]及其左侧截断,
                                                            //注意这里j的上界不能是<=因为在下一步的lef标定中要用到
                    arr[(int)(s[j])] = -1;
                }
                lef = arr[(int)(s[i])]+1;
                //cout << i << " lef: " << lef << endl;
                arr[(int)(s[i])] = i;
                //因为发生截断的情况必然不可能是（唯一）最大值所以不用更新result
            }
        }
        return result;
    }   
};

虽然这样已经能达到4ms，但是在看了别人的答案之后发现仍然可以优化：在处理冲突的分支中，为了将active子串截断，我将标记数组中前缀部分擦除为了-1，这一步一开始的想法是为了使得标记数组能够用于判断当前字符是否已经在active子串中出现过，但其实我们注意到lef变量本身就能够起到这样的过滤作用——标记数组中记录的是该字符在当前扫描过的前缀中出现的最右侧位置，而lef记录的是当前active子串的最左侧起始位置。

class Solution {
public:
    int lengthOfLongestSubstring(string s) {
        static int fast_io = []() { std::ios::sync_with_stdio(false); cin.tie(nullptr); return 0; }();
        vector<int> arr(300, -1);
        int result = 0;
        int lef = 0;
        for(int i = 0; i < s.size(); ++i){
            if(arr[int(s[i])] < lef){    //当前尾串中不含s[i]
                //cout << "ent";
                arr[int(s[i])] = i;      //加入当前尾串
                result = max(result, i-lef+1);
                // if(result == 4){
                //     cout << i << endl;
                //     cout << lef << endl;
                // }
            }else{                      //尾串中已含s[i]
                lef = arr[int(s[i])]+1;
                //cout << i << " lef: " << lef << endl;
                arr[int(s[i])] = i;
                //因为发生截断的情况必然不可能是（唯一）最大值所以不用更新result
            }
        }
        return result;
    }
    
};

这就是所谓的有时候我们创建的变量拥有我们赋予它们的作用之外的能力。