Substring Search Algorithms (字符串匹配系列算法）

本文链接：https://blog.youkuaiyun.com/Negative12/article/details/116551503

本文介绍了四种字符串匹配算法：暴力算法、KMP算法、BM算法和RK指纹搜索算法。在字符串搜索中，给定文本和模式，目标是查找模式是否存在于文本中。KMP算法通过避免回溯提高了效率，达到O(m+n)的时间复杂度。BM算法和RK算法将进一步探讨。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Catalogue

Introduction
Basic Definitions
I. Brute-Force Algorithm
- Time complexity
II. KMP Algorithm
- Basic concept
- 2 component: Search and Next
III. BM Algorithm
IV. RK Fingerprint Algorithm

Introduction

Substring search is a very fundamental operation on strings. Usually, we are given a text string of size n and a pattern string of size m. Our goal is to find if there’s the same pattern in the text string.

A great deal of things are relating to this topic, when you are seaching for key words in text editor or web browser, for example. This article will introduce 4 different algorithms to solve this question: The naive approach(Brute-force), KMP algorithm,BM algorithm and RK fingerprint search algorithm. There will be examples and code in C to stress the ideas. Enjoy!

Basic Definitions

For a better understanding of this article, I list a few definitions for charactors:
n : Length of the text string
m: Length of the pattern string
T : Text string
P : Pattern string

I. Brute-Force Algorithm

A very straight forward algorithm is to find the pattern for each possible position in the text. For example, we find the position that matches the first character of the pattern string, then we check the successive characters and check if they match with the rest pattern string, if there’s unmatch one, then we break the loop and go for the next position:
Example
The code below shows the method to find the first occurrence of a pattern string P in text string T: It keeps one pointer i into the text and another pointer j into the pattern. For each i, it reset j to 0 and increments it until finding a mismatch or the end of the pattern (j==m).

int Brute_Force(char* T, char* P) {
   
	int m = strlen(P);//Length of the pattern string
	int n = strlen(T);//Length of the text string
	for (