R语言实现模糊匹配（1）

weixin_49320263

于 2025-03-25 20:05:32 发布

阅读量105

点赞数 1

分类专栏：常用方法文章标签： r语言

本文链接：https://blog.youkuaiyun.com/weixin_49320263/article/details/146503691

版权

常用方法专栏收录该内容

26 篇文章

订阅专栏

使用textreuse包。

（1）两个字符串的对比

#textreuse计算相似度
library(textreuse)
help(package="textreuse")
txtc<-function(text1,text2){
  tokens1 <- tokenize_words(text1)
  tokens2 <- tokenize_words(text2)
  xsd=jaccard_similarity(tokens1, tokens2)
  return(xsd)
}
txtc("Pairwise comparisons among documents in a corpus","Candidate pairs from pairwise comparisons")

（2）一个向量所有字符串的对比

xltxtc<-function(textv){
  library(textreuse)
  unordered_pairs <- combn(textv, 2, simplify = FALSE)
  results<-lapply(unordered_pairs,function(x) txtc(x[1],x[2]))
  re2=unlist(results)
  jg=unordered_pairs[re2>=0.8]
  return(jg)
}
title=c("a", "a", "a", "b")
xltxtc(title)