算法题匹配子序列的单词数

原创于 2025-12-19 16:06:41 发布 · 363 阅读

4 ·

CC 4.0 BY-SA版权

文章标签：

#算法

算法专栏收录该内容

405 篇文章

订阅专栏

匹配子序列的单词数

问题描述

给定字符串 s 和一个字符串数组 words，返回 words 中是 s 的子序列的单词数目。

子序列：通过删除 s 中的一些字符（也可以不删除）而不改变剩余字符相对位置所形成的新字符串。

示例：

输入: s = "abcde", words = ["a","bb","acd","ace"]
输出: 3
解释: 有三个单词是s的子序列："a"，"acd"，"ace"。

算法思路

暴力：

对每个单词都从头开始在 s 中匹配
时间复杂度：O(words.length × s.length × avg_word_length)
对于大量重复单词会重复计算

方法：

预处理 + 二分查找：为每个字符预处理其在 s 中的位置，然后对每个单词使用二分查找
多指针：为每个单词维护一个指针，同时遍历 s
缓存：使用哈希表缓存已计算的结果，避免重复单词的重复计算

代码实现

方法一：预处理 + 二分查找

import java.util.*;

class Solution {
    /**
     * 使用预处理和二分查找判断子序列
     * 
     * @param s     源字符串
     * @param words 单词数组
     * @return 是s的子序列的单词数目
     */
    public int numMatchingSubseq(String s, String[] words) {
        // 1: 预处理 - 为每个字符记录其在s中出现的所有位置
        List<Integer>[] positions = new List[26];
        for (int i = 0; i < 26; i++) {
            positions[i] = new ArrayList<>();
        }
        
        for (int i = 0; i < s.length(); i++) {
            positions[s.charAt(i) - 'a'].add(i);
        }
        
        // 2: 使用缓存避免重复计算
        Map<String, Boolean> cache = new HashMap<>();
        int count = 0;
        
        // 3: 对每个单词判断是否为子序列
        for (String word : words) {
            if (cache.containsKey(word)) {
                if (cache.get(word)) {
                    count++;
                }
                continue;
            }
            
            boolean isSubseq = isSubsequence(word, positions);
            cache.put(word, isSubseq);
            if (isSubseq) {
                count++;
            }
        }
        
        return count;
    }
    
    /**
     * 使用二分查找判断单词是否为子序列
     * 
     * @param word      待检查的单词
     * @param positions 字符位置预处理数组
     * @return true表示是子序列，false表示不是
     */
    private boolean isSubsequence(String word, List<Integer>[] positions) {
        int prevIndex = -1; // 上一个匹配字符在s中的位置
        
        for (char c : word.toCharArray()) {
            List<Integer> charPositions = positions[c - 'a'];
            
            // 如果字符c在s中不存在，直接返回false
            if (charPositions.isEmpty()) {
                return false;
            }
            
            // 二分查找第一个大于prevIndex的位置
            int left = 0, right = charPositions.size();
            while (left < right) {
                int mid = left + (right - left) / 2;
                if (charPositions.get(mid) <= prevIndex) {
                    left = mid + 1;
                } else {
                    right = mid;
                }
            }
            
            // 如果没有找到合适的位置
            if (left == charPositions.size()) {
                return false;
            }
            
            // 更新prevIndex为找到的位置
            prevIndex = charPositions.get(left);
        }
        
        return true;
    }
}

方法二：多指针

import java.util.*;

class Solution {
    /**
     * 使用多指针判断子序列
     * 为每个单词维护一个指针，同时遍历s
     */
    public int numMatchingSubseq(String s, String[] words) {
        // 使用缓存避免重复计算
        Map<String, Integer> wordCount = new HashMap<>();
        for (String word : words) {
            wordCount.put(word, wordCount.getOrDefault(word, 0) + 1);
        }
        
        // 为每个唯一单词创建指针
        Map<String, Integer> pointers = new HashMap<>();
        for (String word : wordCount.keySet()) {
            pointers.put(word, 0);
        }
        
        int matchedCount = 0;
        
        // 遍历s的每个字符
        for (char c : s.toCharArray()) {
            // 复制需要更新的单词列表
            List<String> toRemove = new ArrayList<>();
            
            // 检查每个单词的当前指针位置
            for (String word : pointers.keySet()) {
                int ptr = pointers.get(word);
                if (ptr < word.length() && word.charAt(ptr) == c) {
                    ptr++;
                    pointers.put(word, ptr);
                    
                    // 如果单词完全匹配
                    if (ptr == word.length()) {
                        matchedCount += wordCount.get(word);
                        toRemove.add(word);
                    }
                }
            }
            
            // 移除已完全匹配的单词
            for (String word : toRemove) {
                pointers.remove(word);
            }
        }
        
        return matchedCount;
    }
}

方法三：优化二分查找

import java.util.*;

class Solution {
    /**
     * 使用Collections.binarySearch优化的二分查找
     */
    public int numMatchingSubseq(String s, String[] words) {
        // 预处理字符位置
        List<Integer>[] positions = new List[26];
        for (int i = 0; i < 26; i++) {
            positions[i] = new ArrayList<>();
        }
        
        for (int i = 0; i < s.length(); i++) {
            positions[s.charAt(i) - 'a'].add(i);
        }
        
        Map<String, Boolean> cache = new HashMap<>();
        int count = 0;
        
        for (String word : words) {
            if (cache.computeIfAbsent(word, w -> isSubsequenceOptimized(w, positions))) {
                count++;
            }
        }
        
        return count;
    }
    
    private boolean isSubsequenceOptimized(String word, List<Integer>[] positions) {
        int prevIndex = -1;
        
        for (char c : word.toCharArray()) {
            List<Integer> list = positions[c - 'a'];
            if (list.isEmpty()) return false;
            
            // 使用Collections.binarySearch找到插入位置
            int pos = Collections.binarySearch(list, prevIndex + 1);
            if (pos < 0) {
                pos = -pos - 1; // 转换为插入位置
            }
            
            if (pos >= list.size()) {
                return false;
            }
            
            prevIndex = list.get(pos);
        }
        
        return true;
    }
}

方法四：暴力

import java.util.*;

class Solution {
    /**
     * 暴力双指针，使用缓存优化
     */
    public int numMatchingSubseq(String s, String[] words) {
        Map<String, Boolean> cache = new HashMap<>();
        int count = 0;
        
        for (String word : words) {
            if (cache.computeIfAbsent(word, w -> isSubsequenceBrute(s, w))) {
                count++;
            }
        }
        
        return count;
    }
    
    private boolean isSubsequenceBrute(String s, String word) {
        int i = 0, j = 0;
        while (i < s.length() && j < word.length()) {
            if (s.charAt(i) == word.charAt(j)) {
                j++;
            }
            i++;
        }
        return j == word.length();
    }
}

算法分析

时间复杂度：
- 预处理 + 二分查找：O(s.length + (word.length × log(s.length)))
- 多指针：O(s.length × unique_words_count)
- 暴力（带缓存）：O(s.length × unique_words_count)
空间复杂度：
- 预处理 + 二分查找：O(s.length + unique_words_count)
- 多指针：O(unique_words_count × avg_word_length)
- 暴力：O(unique_words_count × avg_word_length)

算法过程

1：s = “abcde”, words = [“a”,“bb”,“acd”,“ace”]

预处理：

positions[‘a’] = [0]
positions[‘b’] = [1]
positions[‘c’] = [2]
positions[‘d’] = [3]
positions[‘e’] = [4]

单词检查：

“a”：
- 字符’a’：在positions[0]中找> -1的位置 → 找到0
- 完全匹配
“bb”：
- 第一个’b’：在positions[1]中找> -1的位置 → 找到1
- 第二个’b’：在positions[1]中找> 1的位置 → 未找到
“acd”：
- ‘a’：找到位置0，prevIndex=0
- ‘c’：在positions[2]中找> 0的位置 → 找到2，prevIndex=2
- ‘d’：在positions[3]中找> 2的位置 → 找到3，prevIndex=3
- 完全匹配
“ace”：
- ‘a’：找到位置0，prevIndex=0
- ‘c’：找到位置2，prevIndex=2
- ‘e’：在positions[4]中找> 2的位置 → 找到4，prevIndex=4
- 完全匹配

结果：3个单词匹配

测试用例

public static void main(String[] args) {
    Solution solution = new Solution();
    
    // 测试用例1：标准示例
    String[] words1 = {"a","bb","acd","ace"};
    System.out.println("Test 1: " + solution.numMatchingSubseq("abcde", words1)); // 3
    
    // 测试用例2：重复单词
    String[] words2 = {"a","a","a"};
    System.out.println("Test 2: " + solution.numMatchingSubseq("abcde", words2)); // 3
    
    // 测试用例3：空单词
    String[] words3 = {""};
    System.out.println("Test 3: " + solution.numMatchingSubseq("abcde", words3)); // 1
    
    // 测试用例4：无匹配
    String[] words4 = {"bb","cb","bd"};
    System.out.println("Test 4: " + solution.numMatchingSubseq("abcde", words4)); // 0
    
    // 测试用例5：完全匹配
    String[] words5 = {"abcde"};
    System.out.println("Test 5: " + solution.numMatchingSubseq("abcde", words5)); // 1
    
    // 测试用例6：长字符串
    String longS = "abcdefghijklmnopqrstuvwxyz";
    String[] words6 = {"ace","xyz","aeiou","bcdfg"};
    System.out.println("Test 6: " + solution.numMatchingSubseq(longS, words6)); // 4
    
    // 测试用例7：单字符s
    String[] words7 = {"a","b","c"};
    System.out.println("Test 7: " + solution.numMatchingSubseq("a", words7)); // 1
    
    // 测试用例8：大量重复单词
    String[] words8 = new String[5000];
    Arrays.fill(words8, "ace");
    System.out.println("Test 8: " + solution.numMatchingSubseq("abcde", words8)); // 5000
    
    // 测试用例9：边界情况
    String[] words9 = {"a", "z"};
    System.out.println("Test 9: " + solution.numMatchingSubseq("a", words9)); // 1
    
    // 测试用例10：空s
    String[] words10 = {"a", ""};
    System.out.println("Test 10: " + solution.numMatchingSubseq("", words10)); // 1 (只有空字符串匹配)
}