[转载]Dynamic Programming Algorithm (DPA) for Edit-Distance

本文介绍了一种衡量两个字符串相似度的方法——编辑距离算法。通过定义三种基本操作:替换、插入和删除字符,该算法计算出将一个字符串转换为另一个字符串所需的最小操作数。文章详细解释了递归关系及动态规划实现过程。

转自:http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/Edit/

 

The words `computer' and `commuter' are very similar, and a change of just one letter, p->m will change the first word into the second. The word `sport' can be changed into `sort' by the deletion of the `p', or equivalently, `sort' can be changed into `sport' by the insertion of `p'.

The edit distance of two strings, s1 and s2, is defined as the minimum number of point mutations required to change s1 into s2, where a point mutation is one of:

  1. change a letter,
  2. insert a letter or
  3. delete a letter

 

The following recurrence relations define the edit distance, d(s1,s2), of two strings s1 and s2:

d('', '') = 0               -- '' = empty string
d(s, '')  = d('', s) = |s|  -- i.e. length of s
d(s1+ch1, s2+ch2)
  = min( d(s1, s2) + if ch1=ch2 then 0 else 1 fi,
         d(s1+ch1, s2) + 1,
         d(s1, s2+ch2) + 1 )

The first two rules above are obviously true, so it is only necessary consider the last one. Here, neither string is the empty string, so each has a last character, ch1 and ch2 respectively. Somehow, ch1 and ch2 have to be explained in an edit of s1+ch1 into s2+ch2. If ch1 equals ch2, they can be matched for no penalty, i.e. 0, and the overall edit distance is d(s1,s2). If ch1 differs from ch2, then ch1 could be changed into ch2, i.e. 1, giving an overall cost d(s1,s2)+1. Another possibility is to delete ch1 and edit s1 into s2+ch2, d(s1,s2+ch2)+1. The last possibility is to edit s1+ch1 into s2 and then insert ch2, d(s1+ch1,s2)+1. There are no other alternatives. We take the least expensive, i.e. min, of these alternatives.

 

The recurrence relations imply an obvious ternary-recursive routine. This is not a good idea because it is exponentially slow, and impractical for strings of more than a very few characters.

Examination of the relations reveals that d(s1,s2) depends only on d(s1',s2') where s1' is shorter than s1, or s2' is shorter than s2, or both. This allows the dynamic programming technique to be used.

A two-dimensional matrix, m[0..|s1|,0..|s2|] is used to hold the edit distance values:

m[i,j] = d(s1[1..i], s2[1..j])

m[0,0] = 0
m[i,0] = i,  i=1..|s1|
m[0,j] = j,  j=1..|s2|

m[i,j] = min(m[i-1,j-1]
             + if s1[i]=s2[j] then 0 else 1 fi,
             m[i-1, j] + 1,
             m[i, j-1] + 1 ),  i=1..|s1|, j=1..|s2|

m[,] can be computed row by row. Row m[i,] depends only on row m[i-1,]. The time complexity of this algorithm is O(|s1|*|s2|). If s1 and s2 have a `similar' length, about `n' say, this complexity is O(n2), much better than exponential!

 

 

附算法:

 

### 动态规划算法解释 动态规划是一种用于解决多阶段决策过程最优化问题的方法。这种方法通过把原问题分解成更简单的子问题来求解复杂问题,并存储这些子问题的结果以避免重复计算,从而提高效率[^1]。 #### 实现方法 为了更好地理解如何实现动态规划,在这里提供了一个经典的例子——最大子数组和问题(Maximum Subarray Problem),该问题是寻找给定整数数组中的连续子数组,使其元素之和最大[^2]。 ```python def max_sub_array(nums): if not nums: return 0 current_sum = best_sum = nums[0] for i in range(1, len(nums)): # 如果当前元素加上之前的和小于当前元素,则重新开始计数 current_sum = max(nums[i], current_sum + nums[i]) best_sum = max(best_sum, current_sum) return best_sum ``` 这段代码展示了如何利用动态规划的思想解决问题:每次迭代都决定是否要将当前数值加入到现有序列中还是自成一个新的序列;同时记录下迄今为止发现的最大值作为最终返回结果的一部分。 #### 应用场景 除了上述提到的应用外,动态规划还广泛应用于其他领域: - **路径规划**:机器人运动规划可以通过A*搜索、快速扩展随机树(Rapidly-exploring Random Trees)以及动态规划等技术完成。其中,后者特别适合于具有固定状态转移模式的情况。 - **背包问题(Knapsack Problems)**:当面临有限资源分配的选择时(比如旅行者携带物品),可以采用动态规划找到最优方案。 - **字符串编辑距离(Edit Distance)**:衡量两个字符串之间的相似度也可以借助此方法高效地得出最小操作次数。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值