KMP算法及其优化算法

最新推荐文章于 2022-04-11 10:03:39 发布

原创最新推荐文章于 2022-04-11 10:03:39 发布 · 2.1k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#kmp #算法 #字符串模式匹配

字符串专栏收录该内容

2 篇文章

订阅专栏

本文介绍了一种高效的字符串匹配算法——KMP算法。该算法通过预处理模式串构造next数组，避免了传统匹配算法中主串指针的回溯，提高了匹配效率。文章详细解释了KMP算法的工作原理，并给出了具体的实现代码。

1.算法以及问题说明

本文探讨的KMP算法其实是以三个技术大牛的名字来进行命名的，同时，也是一种效率非常高的字符串匹配算法。

KMP算法主要完成的任务：给定两个字符串s和t，长度分别为n和m，判断t是否在s中出现，如果出现则返回出现的位置。常规的做法是遍历字符串s的每一个位置，然后从该位置开始和字符串b进行匹配，但是这种方法的时间复杂度为。然而，KMP算法通过一个O（m）的预处理，使匹配的复杂度降为O（m）。

2.算法思想

首先定义两个指针i,j分别指向字符串s 和 t。然后，在字符串s中去匹配t，当匹配到位置i时连个字符串不相等，这时不需要回溯i指针（此处是理解的重点），而是利用已经得到的“部分匹配”的结果将模式向右“滑动”尽可能远的一段距离后，继续进行比较，其实，就是j指针回到指定的某个位置，这个位置主要是由一个跳转数组next来实现的。

next数组用来的j指针需要回溯的位置，j值的多少取决于当前字符之前的串的前后缀的相似度。根据经验所得，如果前后缀一个字符相等，k值为2，两个字符相等，k值为3，n个相等k值为n + 1。

3.可实现代码

首先，给出原始字符串匹配的算法，即时间复杂度为O（nm）的：

//问题定义：给定两个字符串s和t，长度分别为n和m，判断t是否在s中出现，如果出现则返回出现的位置。
//子串的定位操作通常称为串的模式匹配
public class StringMatchTest {
	public static int Index(char[] s,char[] t){
		int i =0,j = 0,length = t.length;
		while(i < s.length && j < t.length){
			if(s[i] == t[j]){  //两字母相等则继续
				i++;//++i;
				j++;//++j;
			}else{
				i = i - j + 1;  //i退回到上次匹配首位的下一位
				j = 0;   //j退回到子串的首位
			}
		}
		return j == length ? i - length : 0;
	}
	public static void main(String[] args){
		String s = "goodgoogle";
		String t ="google";
		char[] ss = s.toCharArray();
		char[] tt = t.toCharArray();
		System.out.println(Index(ss,tt));
	}

}

然后，给出KMP模式匹配算法的代码，时间复杂度为O(m+n)

import java.util.Scanner;
public class KMPTest1 {
	public static int[]  getNext(char[] t){
		int[] next = new int[t.length];
		int i = 0,j = -1;
		next[0] = -1;
		while(i < t.length - 1){
			if(j == -1 || t[i] == t[j]){
		        ++i;
				++j; 
	                        next[i] = j;  
	               
			}else {
				j = next[j];
			}
		}
		return next;
	}
	public static int Index(char[] s,char[] t){
		int i =0,j = 0;
		int[] next = getNext(t);
		while(i < s.length && j < t.length){
			if(j == -1 || s[i] == t[j]){  //两字母相等则继续
			    ++i;
				++j;
			}else{
				j = next[j];
			}
		}
		return j == t.length ? i -t.length : 0;
	}
	public static void main(String[] args){
		Scanner sc = new Scanner(System.in);
		while(sc.hasNext()){
			String s1 = sc.nextLine();
			String t1 = sc.nextLine();
			char[] s2 = s1.toCharArray();
			char[] t2 =  t1.toCharArray();
			System.out.println(Index(s2, t2));
		}
	}
}

给出一个测试用例的截图：

其实，该代码还可以进行优化，当字符串t中出现过多的重复字符时，势必会影响回溯效率，因此，对next数组的进行了优化处理。代码如下：

import java.util.Scanner;
public class KMPTest1 {
	public static int[]  getNext(char[] t){
		int[] next = new int[t.length];
		int i = 0,j = -1;
		next[0] = -1;
		while(i < t.length - 1){
			if(j == -1 || t[i] == t[j]){
		        ++i;
				++j; 
		    if (t[i] != t[j]) {  
	                   next[i] = j;  
	             } else {  
	                  next[i] = next[j];  
	             }   
			}else {
				j = next[j];
			}
		}
		return next;
	}
	public static int Index(char[] s,char[] t){
		int i =0,j = 0;
		int[] next = getNext(t);
		while(i < s.length && j < t.length){
			if(j == -1 || s[i] == t[j]){  //两字母相等则继续
			    ++i;
				++j;
			}else{
				j = next[j];
			}
		}
		return j == t.length ? i -t.length : 0;
	}
	public static void main(String[] args){
		Scanner sc = new Scanner(System.in);
		while(sc.hasNext()){
			String s1 = sc.nextLine();
			String t1 = sc.nextLine();
			char[] s2 = s1.toCharArray();
			char[] t2 =  t1.toCharArray();
			System.out.println(Index(s2, t2));
		}
	}
}

测试用例截图：