Oracle 求文本相似度的问题

本文介绍了字符串相似度度量的基本概念及其在多个领域的应用。详细解释了字符串距离(如Levenshtein距离)如何衡量两个文本字符串之间的差异,并概述了从简单的编辑距离到更复杂的语法和基于统计的方法的发展。

http://docs.oracle.com/cd/E18283_01/appdev.112/e16760/u_match.htm

http://www.tuicool.com/articles/NBZFfe

https://en.wikipedia.org/wiki/String_similarity

https://github.com/rockymadden/stringmetric/

String metric

From Wikipedia, the free encyclopedia
  (Redirected from String similarity)
"String distance" redirects here. For the distance between strings and the fingerboard in musical instruments, see  Action (music).

In mathematics and computer science, a string metric (also known as a string similarity metric or string distance function) is a metric that measuresdistance ("inverse similarity") between two text strings for approximate string matching or comparison and in fuzzy string searching. A necessary requirement for a string metric (e.g. in contrast to string matching) is fulfillment of the triangle inequality. For example, the strings "Sam" and "Samuel" can be considered to be close. A string metric provides a number indicating an algorithm-specific indication of distance.

The most widely known string metric is a rudimentary one called the Levenshtein Distance (also known as Edit Distance). It operates between two input strings, returning a number equivalent to the number of substitutions and deletions needed in order to transform one input string into another. Simplistic string metrics such as Levenshtein distance have expanded to include phonetic, token, grammatical and character-based methods of statistical comparisons.

String metrics are used heavily in information integration and are currently used in areas including fraud detectionfingerprint analysisplagiarism detectionontology mergingDNA analysis, RNA analysis, image analysis, evidence-based machine learning, database data deduplicationdata mining,incremental searchdata integration, and semantic knowledge integration.


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值