Lucene学习插曲——LevenStain算法

参考资料:http://www.cppblog.com/whncpp/archive/2008/09/21/62378.html

Lucene提供了模糊查找的机制,帮助用户进行单字模糊查找。所采用的匹配算法为Levenstain算法(编辑距离算法),此算法在比较两字符串时,将动作分为3种加一个字母、删一个字母、替换一个字母。该算法的主要应用场景:拼写检查、语句识别、DNA分析、抄袭检测等。

  算法描述:

 1. StepDescription1
Set n to be the length of s.
Set m to be the length of t.
If n = 0, return m and exit.
If m = 0, return n and exit.
Construct a matrix containing 0..m rows and 0..n columns. 
2. Initialize the first row to 0..n.
Initialize the first column to 0..m.
3. Examine each character of s (i from 1 to n).
4. Examine each character of t (j from 1 to m).
5. If s[i] equals t[j], the cost is 0.
If s[i] doesn't equal t[j], the cost is 1.
6. Set cell d[i,j] of the matrix equal to the minimum of:
a. The cell immediately above plus 1: d[i-1,j] + 1.
b. The cell immediately to the left plus 1: d[i,j-1] + 1.

c. The cell diagonally above and to the left plus the cost: d[i-1,j-1] + cost.

7. After the iteration steps(3, 4, 5, 6) are complete, the distance is found in cell d[n,m].




//算法
int ldistance(const string source,const string target)
{
    //step 1

    int n=source.length();
    int m=target.length();
    if (m==0) return n;
    if (n==0) return m;
    //Construct a matrix
    typedef vector< vector<int> >  Tmatrix;
    Tmatrix matrix(n+1);
    for(int i=0; i<=n; i++)  matrix[i].resize(m+1);

    //step 2 Initialize

    for(int i=1;i<=n;i++) matrix[i][0]=i;
    for(int i=1;i<=m;i++) matrix[0][i]=i;

     //step 3
     for(int i=1;i<=n;i++)
     {
        const char si=source[i-1];
        //step 4
        for(int j=1;j<=m;j++)
        {

            const char dj=target[j-1];
            //step 5
            int cost;
            if(si==dj){
                cost=0;
            }
            else{
                cost=1;
            }
            //step 6
            const int above=matrix[i-1][j]+1;
            const int left=matrix[i][j-1]+1;
            const int diag=matrix[i-1][j-1]+cost;
            matrix[i][j]=min(above,min(left,diag));

        }
     }//step7
      return matrix[n][m];
}
int main(){
    string s;
    string d;
    cout<<"source=";
    cin>>s;
    cout<<"diag=";
    cin>>d;
    int dist=ldistance(s,d);
    cout<<"dist="<<dist<<endl;
}

































评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值