Catalogue
Introduction
Substring search is a very fundamental operation on strings. Usually, we are given a text string of size n and a pattern string of size m. Our goal is to find if there’s the same pattern in the text string.
A great deal of things are relating to this topic, when you are seaching for key words in text editor or web browser, for example. This article will introduce 4 different algorithms to solve this question: The naive approach(Brute-force), KMP algorithm,BM algorithm and RK fingerprint search algorithm. There will be examples and code in C to stress the ideas. Enjoy!
Basic Definitions
For a better understanding of this article, I list a few definitions for charactors:
n : Length of the text string
m: Length of the pattern string
T : Text string
P : Pattern string
I. Brute-Force Algorithm
A very straight forward algorithm is to find the pattern for each possible position in the text. For example, we find the position that matches the first character of the pattern string, then we check the successive characters and check if they match with the rest pattern string, if there’s unmatch one, then we break the loop and go for the next position:
The code below shows the method to find the first occurrence of a pattern string P in text string T: It keeps one pointer i into the text and another pointer j into the pattern. For each i, it reset j to 0 and increments it until finding a mismatch or the end of the pattern (j==m).
int Brute_Force(char* T, char* P) {
int m = strlen(P);//Length of the pattern string
int n = strlen(T);//Length of the text string
for (