1. Configure clutering in solrconfig.xml
<searchComponent name="clustering"
enable="true"
class="solr.clustering.ClusteringComponent" >
<lst name="engine">
<str name="name">lingo</str>
<str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
<str name="carrot.resourcesDir">clustering/carrot2</str>
</lst>
<lst name="engine">
<str name="name">stc</str>
<str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
</lst>
<lst name="engine">
<str name="name">kmeans</str>
<str name="carrot.algorithm">org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm</str>
</lst>
</searchComponent>
<requestHandler name="/clustering"
startup="lazy"
enable="true"
class="solr.SearchHandler">
<lst name="defaults">
<bool name="clustering">true</bool>
<str name="clustering.engine">lingo</str>
<bool name="clustering.results">true</bool>
<!-- Field name with the logical "title" of a each document (optional) -->
<str name="carrot.title">content</str>
<!-- Field name with the logical "URL" of a each document (optional) -->
<str name="carrot.url">id</str>
<!-- Field name with the logical "content" of a each document (optional) -->
<str name="carrot.snippet">content</str>
<!-- Apply highlighter to the title/ content and use this for clustering. -->
<bool name="carrot.produceSummary">true</bool>
<!-- the maximum number of labels per cluster -->
<!--<int name="carrot.numDescriptions">5</int>-->
<!-- produce sub clusters -->
<bool name="carrot.outputSubClusters">false</bool>
<!-- Configure the remaining request handler parameters. -->
<str name="defType">edismax</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
</lst>
<arr name="last-components">
<str>clustering</str>
</arr>
</requestHandler>
2. alter clustering/carrot2/lingo-attributes.xml
<attribute key="MultilingualClustering.defaultLanguage">
<value type="org.carrot2.core.LanguageCode" value="CHINESE_SIMPLIFIED"/>
</attribute>
3. add chinese tokenizer jar to classpath in solrconfig.xml
lucene-analyzers-smartcn-4.7.0.jar
<lib dir="../contrib/analysis-extras/lucene-libs" regex=".*\.jar" />
References
http://wiki.apache.org/solr/ClusteringComponent
http://www.cnblogs.com/tomcattd/archive/2013/08/20/3270143.html
http://carrot2.github.io/solr-integration-strategies/carrot2-3.6.3/index.html
本文详细介绍了如何在Solr中配置集群组件,使用特定算法进行聚类,并将多语言特性应用于聚类过程,以提高搜索效率和用户体验。
471

被折叠的 条评论
为什么被折叠?



