文本相似度Shingling和Minhash算法

文本相似度Shingling和Minhash算法
目录:
1、测试案例:
2、程序流程:
3、源代码示例:
4、运行结果:

1、测试案例:
采用Shinling及Minhash技术分析以下两段文本的Jaccard相似度:
(1)IELTS (International English Language Testing System) conducted by the British Council, University of Cambridge Local Examinations Syndicate and International Development Program of Australian Universities and College: providing grade 6.5 or higher (i.e. 7, 8, 9) overall has been obtained with a breakdown of 6.0 in reading and writing and 5.5 in listening and speaking.
(2)IELTS / UKVI –IELTS 6.5 overall with 6.0 in reading and writing, 5.5 in listening and speaking for Law, Psychology, Architecture, English, Accounting and Finance
(3)采用的hash函数:
h1(r)=(3r +1) mod 7
h2(r)=(5r +1) mod 7

2、程序流程:

图2.1 程序流程图

3、源代码示例:
评论 10
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值