deploy carrot2-webapp
1. download soucre code
#git clone git://github.com/carrot2/carrot2.git
2.compile
#cd carrot2
#ant webapp
3.deploy
#cp tmp/webapp/carrot2-webapp.war /path/to/tomcat/webapps
4.configure carrot2
#cd /path/to/tomcat/webapps/carrot2-webapp/WEB-INF/suites
#mv suite-webapp.xml suite-webapp.xml.old
#cp source-solr.xml suite-webapp.xml
alter it like this:
<component-suite>
<sources>
<source component-class="org.carrot2.source.solr.SolrDocumentSource" id="solr"
attribute-sets-resource="source-solr-attributes.xml">
<label>Solr</label>
<title>Solr Search Engine</title>
<icon-path>icons/solr.png</icon-path>
<mnemonic>s</mnemonic>
<description>Solr document source queries an instance of Apache Solr search engine.</description>
<example-queries>
<example-query>test</example-query>
<example-query>solr</example-query>
</example-queries>
</source>
</sources>
<include suite="algorithm-lingo.xml"></include>
</component-suite>
4. edit source-solr-attributes.xml
<attribute-sets default="overridden-attributes">
<attribute-set id="overridden-attributes">
<value-set>
<label>overridden-attributes</label>
<attribute key="SolrDocumentSource.serviceUrlBase">
<value type="java.lang.String" value="http://192.168.10.204:8983/inokarticle/clustering"/>
</attribute>
<attribute key="SolrDocumentSource.solrSummaryFieldName">
<value type="java.lang.String" value="content"/>
</attribute>
<attribute key="SolrDocumentSource.solrTitleFieldName">
<value type="java.lang.String" value="content"/>
</attribute>
</value-set>
</attribute-set>
</attribute-sets>
5. edit algorithm-lingo-attributes.xml algorithm-lingo.xml
----------------------------------------------------
integrate with solr
1. configure solrconfig.xml
a. import related jars
<lib dir="../contrib/clustering/lib/" regex=".*\.jar" /> <lib dir="../dist/" regex="solr-clustering-\d.*\.jar" />
b. add component adn clustering requesthandler
<searchComponent name="clustering"
enable="true"
class="solr.clustering.ClusteringComponent" >
<lst name="engine">
<str name="name">lingo</str>
<str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
<str name="carrot.resourcesDir">clustering/carrot2</str>
<str name="MultilingualClustering.defaultLanguage">CHINESE_SIMPLIFIED</str>
<str name="PreprocessingPipeline.tokenizerFactory">org.carrot2.text.linguistic.DefaultTokenizerFactory</str>
</lst>
</searchComponent>
<requestHandler name="/clustering"
startup="lazy"
enable="true"
class="solr.SearchHandler">
<lst name="defaults">
<bool name="clustering">true</bool>
<str name="clustering.engine">lingo</str>
<bool name="clustering.results">true</bool>
<!-- Field name with the logical "title" of a each document (optional) -->
<str name="carrot.title">content</str>
<!-- Field name with the logical "URL" of a each document (optional) -->
<str name="carrot.url">id</str>
<!-- Field name with the logical "content" of a each document (optional) -->
<str name="carrot.snippet">content</str>
<!-- Apply highlighter to the title/ content and use this for clustering. -->
<bool name="carrot.produceSummary">true</bool>
<!-- the maximum number of labels per cluster -->
<int name="carrot.numDescriptions">5</int>
<!-- produce sub clusters -->
<bool name="carrot.outputSubClusters">true</bool>
<str name="MultilingualClustering.defaultLanguage">CHINESE_SIMPLIFIED</str>
<!-- Configure the remaining request handler parameters. -->
<str name="defType">edismax</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
</lst>
<arr name="last-components">
<str>clustering</str>
</arr>
</requestHandler>
2.custom chinese tokenizer for clustering
a. modify related carrot souce code and recompile
b. copy related jars and lexicon to solr web lib dir
Details see Apache SOLR and Carrot2 integration strategies 2
References
http://wiki.apache.org/solr/ClusteringComponent
http://www.cnblogs.com/cy163/archive/2010/05/07/1730172.html
http://carrot2.github.io/solr-integration-strategies/carrot2-3.8.0/index.html
http://download.carrot2.org/head/manual/index.html#section.advanced-topics.building-from-source-code
http://www.cnblogs.com/shm10/p/3700604.html

本文详细介绍了如何使用Git克隆Carrot2仓库,编译并部署Web应用,配置Carrot2与Solr集成,包括修改配置文件以实现与Solr的无缝对接,以及定制中文分词器和调整Solr配置来增强搜索体验。
1835

被折叠的 条评论
为什么被折叠?



