Solr 同义词搜索 synonyms

Solr同义词搜索是一个很好的功能实现,解决了产品需求中很大的问题,如:搜索用户搜索"刮胡刀" 更好的展示结果是把 "刮胡刀"跟"剃须刀"都显示给用户,这样就可以达到更好的效果。下面讲下具体实现: solr.SynonymFilterFactory

Creates SynonymFilter

Matches strings of tokens and replaces them with other strings of tokens.

  1. The synonyms parameter names an external file defining the synonyms.
  2. If ignoreCase is true, matching will lowercase before checking equality.
  3. If expand is true, a synonym will be expanded to all equivalent synonyms. If it is false, all equivalent synonyms will be reduced to the first in the list.
  4. The optional tokenizerFactory parameter names a tokenizer factory class to analyze synonyms (see https://issues.apache.org/jira/browse/SOLR-319 ), which can help with the synonym+stemming problem described in http://search-lucene.com/m/hg9ri2mDvGk1 .

schema.xml配置

<fieldTypename="text"class="solr.TextField"positionIncrementGap="100"><analyzertype="index"><tokenizerclass="solr.ChineseTokenizerFactory"/><filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="true"tokenizerFactory="solr.ChineseTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"enablePositionIncrements="true"/><filterclass="solr.WordDelimiterFilterFactory"generateWordParts="1"generateNumberParts="1"catenateWords="1"catenateNumbers="1"catenateAll="0"splitOnCaseChange="0"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.RemoveDuplicatesTokenFilterFactory"/></analyzer><analyzertype="query"><tokenizerclass="solr.ChineseTokenizerFactory"/><filterclass="solr.SynonymFilterFactory"synonyms="synonyms.txt"ignoreCase="true"expand="true"tokenizerFactory="solr.ChineseTokenizerFactory"/><filterclass="solr.StopFilterFactory"ignoreCase="true"words="stopwords.txt"enablePositionIncrements="true"/><filterclass="solr.WordDelimiterFilterFactory"generateWordParts="1"generateNumberParts="1"catenateWords="0"catenateNumbers="0"catenateAll="0"splitOnCaseChange="1"/><filterclass="solr.LowerCaseFilterFactory"/><filterclass="solr.RemoveDuplicatesTokenFilterFactory"/></analyzer></fieldType>

synonyms.txt配置

# blank lines and lines starting with pound are comments.  #Explicit mappings match any token sequence on the LHS of "=>"#and replace with all alternatives on the RHS.  These types of mappings  #ignore the expand parameter in the schema.  #Examples:  #-----------------------------------------------------------------------  #some test synonym mappings unlikely to appear in real input text  
aaafoo => aaabar  
bbbfoo => bbbfoo bbbbar  
cccfoo => cccbar cccbaz  
fooaaa,baraaa,bazaaa  

# Some synonym groups specific to this example  
GB,gib,gigabyte,gigabytes  
MB,mib,megabyte,megabytes  
Television,Televisions, TV,TVs#notice we use "gib" instead of "GiB" so any WordDelimiterFilter coming  #after us won't split it into two words.  飞利浦刮胡刀,飞利浦剃须刀# Synonym mappings can be used for spelling correction too  
pixima => pixma  

a\,a => b\,b  
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值