Solr4.5+mmseg4j-1.9.1的简单配置

最新推荐文章于 2021-05-25 13:44:19 发布

原创最新推荐文章于 2021-05-25 13:44:19 发布 · 608 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#Solr #mmseg4j

Solr 专栏收录该内容

2 篇文章

订阅专栏

本文详细介绍了如何将MMSeg4j与Solr集成，通过在Solr的schema.xml文件中添加特定字段类型和配置，实现对中文文本的高效分词检索。包括解压并部署MMSeg4j，修改Solr配置，以及启动验证过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

首先，将mmseg4j-1.9.1解压缩，将/dist下的三个jar包放入$tomcat_home/lib里

然后在\solr\collection1\conf中的schema.xml文件中加入如下代码：

<filed>标签中加入：

 <field name="simple" type="textSimple" indexed="true" stored="true" multiValued="true"/>  
 <field name="complex" type="textComplex" indexed="true" stored="true" multiValued="true"/>  
<field name="text_mmseg" type="textMaxWord" indexed="true" stored="true" multiValued="true"/>

</pre><pre code_snippet_id="365531" snippet_file_name="blog_20140527_3_1993842" name="code" class="html">

然后再<fieldType>标签中加入：

<fieldType name="textComplex" class="solr.TextField" positionIncrementGap="100">
	  <analyzer>
	    <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" 
		dicPath="D:\tomcat-solr\solr\dic"/>
        <filter class="solr.LowerCaseFilterFactory"/>		
	  </analyzer>	
	</fieldType>
	
	<fieldType name="textMaxWord" class="solr.TextField" positionIncrementGap="100">
	  <analyzer>
	    <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word"
		 dicPath="D:\tomcat-solr\solr\dic"/>
		 <filter class="solr.LowerCaseFilterFactory"/>
	  </analyzer>
	</fieldType>
	
	<fieldType name="textSimple" class="solr.TextField" positionIncrementGap="100">
	  <analyzer>
	    <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="simple"
		 dicPath="D:\tomcat-solr\solr\dic"/>
		 <filter class="solr.LowerCaseFilterFactory"/>
	  </analyzer>
	</fieldType>

最后，在<copyField>标签中加入：

  <copyField source="simple" dest="text_mmseg"/>
   <copyField source="complex" dest="text_mmseg"/>

重新启动tomcat/solr，如果不报错就配置成功。