Apache SOLR and Carrot2 integration strategies 1

本文详细介绍了如何使用Git克隆Carrot2仓库,编译并部署Web应用,配置Carrot2与Solr集成,包括修改配置文件以实现与Solr的无缝对接,以及定制中文分词器和调整Solr配置来增强搜索体验。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

deploy carrot2-webapp

1.  download soucre code

#git clone git://github.com/carrot2/carrot2.git

 

2.compile

#cd carrot2

#ant webapp

 

3.deploy

#cp tmp/webapp/carrot2-webapp.war  /path/to/tomcat/webapps

 

4.configure  carrot2

#cd /path/to/tomcat/webapps/carrot2-webapp/WEB-INF/suites

#mv  suite-webapp.xml    suite-webapp.xml.old

#cp   source-solr.xml     suite-webapp.xml

alter it like this:

<component-suite>
  <sources>
    <source component-class="org.carrot2.source.solr.SolrDocumentSource" id="solr"
            attribute-sets-resource="source-solr-attributes.xml">
      <label>Solr</label>
      <title>Solr Search Engine</title>
      <icon-path>icons/solr.png</icon-path>
      <mnemonic>s</mnemonic>
      <description>Solr document source queries an instance of Apache Solr search engine.</description>
      <example-queries>
        <example-query>test</example-query>
        <example-query>solr</example-query>
      </example-queries>
    </source>
  </sources>

  <include suite="algorithm-lingo.xml"></include>

</component-suite>

 

 4. edit source-solr-attributes.xml

<attribute-sets default="overridden-attributes">
  <attribute-set id="overridden-attributes">
    <value-set>
      <label>overridden-attributes</label>
      <attribute key="SolrDocumentSource.serviceUrlBase">
        <value type="java.lang.String" value="http://192.168.10.204:8983/inokarticle/clustering"/>
      </attribute>
      <attribute key="SolrDocumentSource.solrSummaryFieldName">
        <value type="java.lang.String" value="content"/>
      </attribute>
      <attribute key="SolrDocumentSource.solrTitleFieldName">
        <value type="java.lang.String" value="content"/>
      </attribute>
    </value-set>
  </attribute-set>
</attribute-sets>

  

5. edit algorithm-lingo-attributes.xml   algorithm-lingo.xml

 

 ----------------------------------------------------

integrate with solr

1. configure solrconfig.xml

a. import related jars

  <lib dir="../contrib/clustering/lib/" regex=".*\.jar" />
  <lib dir="../dist/" regex="solr-clustering-\d.*\.jar" />

 

b. add component  adn clustering requesthandler

 

<searchComponent name="clustering"
                   enable="true"
                   class="solr.clustering.ClusteringComponent" >
    <lst name="engine">
      <str name="name">lingo</str>
      <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
      <str name="carrot.resourcesDir">clustering/carrot2</str>
      <str name="MultilingualClustering.defaultLanguage">CHINESE_SIMPLIFIED</str>
      <str name="PreprocessingPipeline.tokenizerFactory">org.carrot2.text.linguistic.DefaultTokenizerFactory</str>

      </lst>

  </searchComponent>
<requestHandler name="/clustering"
                  startup="lazy"
                  enable="true"
                  class="solr.SearchHandler">
    <lst name="defaults">
      <bool name="clustering">true</bool>
      <str name="clustering.engine">lingo</str>
      <bool name="clustering.results">true</bool>
      <!-- Field name with the logical "title" of a each document (optional) -->
      <str name="carrot.title">content</str>
      <!-- Field name with the logical "URL" of a each document (optional) -->
      <str name="carrot.url">id</str>
      <!-- Field name with the logical "content" of a each document (optional) -->
      <str name="carrot.snippet">content</str>
      <!-- Apply highlighter to the title/ content and use this for clustering. -->
      <bool name="carrot.produceSummary">true</bool>
      <!-- the maximum number of labels per cluster -->
      <int name="carrot.numDescriptions">5</int>
      <!-- produce sub clusters -->
      <bool name="carrot.outputSubClusters">true</bool>
      <str name="MultilingualClustering.defaultLanguage">CHINESE_SIMPLIFIED</str>

      <!-- Configure the remaining request handler parameters. -->
      <str name="defType">edismax</str>
      <str name="q.alt">*:*</str>
      <str name="rows">10</str>
      <str name="fl">*,score</str>
    </lst>
    <arr name="last-components">
      <str>clustering</str>
    </arr>
  </requestHandler>

 

2.custom chinese tokenizer for clustering

a. modify related carrot souce code and recompile

b. copy related jars and lexicon  to solr web lib dir

 

Details see Apache SOLR and Carrot2 integration strategies 2

 

 

 

 

 

 

 

References

http://wiki.apache.org/solr/ClusteringComponent

http://www.cnblogs.com/cy163/archive/2010/05/07/1730172.html

http://carrot2.github.io/solr-integration-strategies/carrot2-3.8.0/index.html

http://download.carrot2.org/head/manual/index.html#section.advanced-topics.building-from-source-code

http://www.cnblogs.com/shm10/p/3700604.html
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值