Apache SOLR and Carrot2 integration strategies 1

最新推荐文章于 2018-03-20 15:11:00 发布

原创最新推荐文章于 2018-03-20 15:11:00 发布 · 164 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#人工智能 #git #java

Solr 同时被 3 个专栏收录

41 篇文章

订阅专栏

Lucene

9 篇文章

订阅专栏

Carrot

3 篇文章

订阅专栏

本文详细介绍了如何使用Git克隆Carrot2仓库，编译并部署Web应用，配置Carrot2与Solr集成，包括修改配置文件以实现与Solr的无缝对接，以及定制中文分词器和调整Solr配置来增强搜索体验。

deploy carrot2-webapp

1. download soucre code

#git clone git://github.com/carrot2/carrot2.git

2.compile

#cd carrot2

#ant webapp

3.deploy

#cp tmp/webapp/carrot2-webapp.war /path/to/tomcat/webapps

4.configure carrot2

#cd /path/to/tomcat/webapps/carrot2-webapp/WEB-INF/suites

#mv suite-webapp.xml suite-webapp.xml.old

#cp source-solr.xml suite-webapp.xml

alter it like this:

<component-suite>
  <sources>
    <source component-class="org.carrot2.source.solr.SolrDocumentSource" id="solr"
            attribute-sets-resource="source-solr-attributes.xml">
      <label>Solr</label>
      <title>Solr Search Engine</title>
      <icon-path>icons/solr.png</icon-path>
      <mnemonic>s</mnemonic>
      <description>Solr document source queries an instance of Apache Solr search engine.</description>
      <example-queries>
        <example-query>test</example-query>
        <example-query>solr</example-query>
      </example-queries>
    </source>
  </sources>

  <include suite="algorithm-lingo.xml"></include>

</component-suite>

4. edit source-solr-attributes.xml

<attribute-sets default="overridden-attributes">
  <attribute-set id="overridden-attributes">
    <value-set>
      <label>overridden-attributes</label>
      <attribute key="SolrDocumentSource.serviceUrlBase">
        <value type="java.lang.String" value="http://192.168.10.204:8983/inokarticle/clustering"/>
      </attribute>
      <attribute key="SolrDocumentSource.solrSummaryFieldName">
        <value type="java.lang.String" value="content"/>
      </attribute>
      <attribute key="SolrDocumentSource.solrTitleFieldName">
        <value type="java.lang.String" value="content"/>
      </attribute>
    </value-set>
  </attribute-set>
</attribute-sets>

5. edit algorithm-lingo-attributes.xml algorithm-lingo.xml

－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－

integrate with solr

1. configure solrconfig.xml

a. import related jars

  <lib dir="../contrib/clustering/lib/" regex=".*\.jar" />
  <lib dir="../dist/" regex="solr-clustering-\d.*\.jar" />

b. add component adn clustering requesthandler

<searchComponent name="clustering"
                   enable="true"
                   class="solr.clustering.ClusteringComponent" >
    <lst name="engine">
      <str name="name">lingo</str>
      <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
      <str name="carrot.resourcesDir">clustering/carrot2</str>
      <str name="MultilingualClustering.defaultLanguage">CHINESE_SIMPLIFIED</str>
      <str name="PreprocessingPipeline.tokenizerFactory">org.carrot2.text.linguistic.DefaultTokenizerFactory</str>

      </lst>

  </searchComponent>
<requestHandler name="/clustering"
                  startup="lazy"
                  enable="true"
                  class="solr.SearchHandler">
    <lst name="defaults">
      <bool name="clustering">true</bool>
      <str name="clustering.engine">lingo</str>
      <bool name="clustering.results">true</bool>
      <!-- Field name with the logical "title" of a each document (optional) -->
      <str name="carrot.title">content</str>
      <!-- Field name with the logical "URL" of a each document (optional) -->
      <str name="carrot.url">id</str>
      <!-- Field name with the logical "content" of a each document (optional) -->
      <str name="carrot.snippet">content</str>
      <!-- Apply highlighter to the title/ content and use this for clustering. -->
      <bool name="carrot.produceSummary">true</bool>
      <!-- the maximum number of labels per cluster -->
      <int name="carrot.numDescriptions">5</int>
      <!-- produce sub clusters -->
      <bool name="carrot.outputSubClusters">true</bool>
      <str name="MultilingualClustering.defaultLanguage">CHINESE_SIMPLIFIED</str>

      <!-- Configure the remaining request handler parameters. -->
      <str name="defType">edismax</str>
      <str name="q.alt">*:*</str>
      <str name="rows">10</str>
      <str name="fl">*,score</str>
    </lst>
    <arr name="last-components">
      <str>clustering</str>
    </arr>
  </requestHandler>

2.custom chinese tokenizer for clustering

a. modify related carrot souce code and recompile

b. copy related jars and lexicon to solr web lib dir

Details see Apache SOLR and Carrot2 integration strategies 2

References

http://wiki.apache.org/solr/ClusteringComponent

http://www.cnblogs.com/cy163/archive/2010/05/07/1730172.html

http://carrot2.github.io/solr-integration-strategies/carrot2-3.8.0/index.html

http://download.carrot2.org/head/manual/index.html#section.advanced-topics.building-from-source-code

http://www.cnblogs.com/shm10/p/3700604.html