Similarity
业务需要,想对不同的field使用不同的similarity。
继承DefaultSimilarity
实现里面的方法,重写了tf, idf,toString这三个方法,直接在schema中替换就可以
对不同的field使用不同的similarity
在schema中可以设置similarity,发现不太行,solr4.2并不支持这么细粒度的similarity用法,不过可以支持不同的fieldType用不同的Similarity,如下:
<types>
<fieldType name="text_dfr" class="solr.TextField">
<analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
<similarity class="solr.DFRSimilarityFactory">
<str name="basicModel">I(F)</str>
<str name="afterEffect">B</str>
<str name="normalization">H2</str>
</similarity>
</fieldType>
<fieldType name="text_ib" class="solr.TextField">
<analyzer class="org.apache.lucene.analysis.standard.StandardAnalyzer"/>
<similarity class="solr.IBSimilarityFactory">
<str name="distribution">SPL</str>
<str name="lambda">DF</str>
<str name="normalization">H2</str>
</similarity>
</fieldType>
...
</types>
<similarity class="solr.BM25SimilarityFactory"/>
其他field我想用其他的similarityFactory,例如原生的DefaultSimilarityFactory,但是重启tomcat以后报错:“org.apache.solr.common.SolrException: FieldType ‘content’ is configured with a similarity, but the global similarity does not support it: class org.apache.solr.search.similarities.DefaultSimilarityFactory
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:466)”
排查了一下原因,找到如下文档:
stackoverflow地址
mail-archives.apache地址
就是如果自己要对不同的fieldType配置不同的similarity,那么全局就要指定一个
<similarity class="solr.SchemaSimilarityFactory"/>
如果不指定该Factory,那么就会默认用DefaultSimilarityFactory替代,因为在SchemaSimilarityFactory中默认使用的similarity就是DefaultSimilarity。但是这里如果还是瞎比使用其他没有实现SchemaAware接口的SimilarityFactory就会报上面的错误。
SchemaSimilarityFactory的代码如下:
public class SchemaSimilarityFactory extends SimilarityFactory implements SchemaAware {
private Similarity similarity;
private Similarity defaultSimilarity = new DefaultSimilarity();
@Override
public void inform(final IndexSchema schema) {
similarity = new PerFieldSimilarityWrapper() {
@Override
public Similarity get(String name) {
FieldType fieldType = schema.getFieldTypeNoEx(name);
if (fieldType == null) {
return defaultSimilarity;
} else {
Similarity similarity = fieldType.getSimilarity();
return similarity == null ? defaultSimilarity : similarity;
}
}
};
}
@Override
public Similarity getSimilarity() {
assert similarity != null : "inform must be called first";
return similarity;
}
}
所以如果想在外部使用其他Similarity的话,需要重新定义一个SimilarityFactory继承SimilarityFactory并且实现SchemaAware接口,在里面把自己的Similarity设置成defaultSimilarity即可。