Suppose that pattern P and text T are randomly chosen strings of length m and n, respectively, from the d-ary alphabet Σd = {0, 1, . . . , d - 1}, where d ≥ 2. Show that the expected number of character-to-character comparisons made by the implicit loop in line 4 of the naive algorithm is
over all executions of this loop. (Assume that the naive algorithm stops comparing characters for a given shift once a mismatch is found or the entire pattern is matched.) Thus, for randomly chosen strings, the naive algorithm is quite efficient.
证明:
令P为单个字符比较匹配的概率,1-P为失配的概率,P=d-1
比较次数 概率 比较次数 * 概率
1 1 - P 1 - P
2 P(1 - P) 2P - 2P2
3 P2 (1 - P) 3P - 3P3
…
m-1 Pm-2(1 - P) (m-1)Pm-2 - (m-1) Pm-1
m Pm-1(1 - P)+Pm-1P (m)Pm-1 - (m) Pm + (m)Pm
E =∑(比较次数 * 概率) = 1+P+P2+…..Pm-1 = (1-Pm)/1-P = (1- d –m )/( 1- d-1)
令f(x) = (1- x –m )/( 1- x-1),对f(x)求导可知,在m>1,x>=2时导数为负,则f(x)在x>=2严格减函数,所以f(x)<=f(2)<=2.证毕。
由此可知,对于随机的字符串,朴素的字符串比较还是有效的。