solr3.5 schema.xml解析

最新推荐文章于 2019-03-04 13:13:53 发布

原创最新推荐文章于 2019-03-04 13:13:53 发布 · 167 阅读

0 ·

CC 4.0 BY-SA版权

solr 专栏收录该内容

5 篇文章

订阅专栏

本文介绍Solr的schema.xml配置文件，包括字段类型定义、字段属性配置、主键设置及默认搜索字段等。针对中文应用提供了分词组件配置指导。

schema.xml中注释比较详细，这里简单概括介绍一下，结构参考example/solr/conf/schema.xml

<types>： fields类别定义，对应solr内部类别实现。需要重点注意的是solr.TextField类型的索引查询设置，整个结构如下：

1、

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

其中text_general为类别名称，analyzer又分为index和query，对于中文应用需要修改配置中文分词组件，比如IKAnalyzer文档中已经包含的了配置的说明，具体步骤请参考帮助文档。

2、

<fields>则是应用中所有涉及到的fields属性的定义，如：

<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>

根据以上1、中定义的 text_general类型配置属性。

3、

<uniqueKey>id</uniqueKey>

设置主键，solr必须有一个主键，一般为id也可以自行定义。

4、

<defaultSearchField>content</defaultSearchField>
<solrQueryParser defaultOperator="OR"/>

默认检索的field，检索格式化方式为or。

5、

<dynamicField name="*_i" type="integer" indexed="true" stored="true"/>

定义动态filed，如上所示，当提交传入的filed后缀为_i的filed，都会自动映射为 type="integer"，也可以前缀方式定义。

总结：

solr作为企业及的全文检索服务器，并且也在最新的nutch1.4中被作为唯一的索引方式。相信solr可以承担重任。