2016重走solr长征之路:solr的多field字段查询

本文探讨了在Solr中实现多字段查询的不同方法及其背后的评分机制差异,详细比较了直接拼接查询与使用DisMax查询处理器的效果。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

我们做搜索的时候,经常会遇到需要搜索多字段的情况。假设index格式如下:
document:{title:“solr的多field字段查询”,content:“solr的多field字段查询******************************”,describe:“这是一篇关于solr的技术文章”}。
搜索文章,会到“标题”(title)、“正文”(content)、“简介”(describe)里搜索是否包含相关关键字。区别无非是各个field字段的权重不同,通常title权重最高,describe最少。
###lucene中可以如下操作:

Map<String , Float> boosts = new HashMap<String, Float>();
        boosts.put("title", 1.0f);
        boosts.put("content", 0.1f);
        boosts.put("describe", 0.01f);


        String[] fields = new String[]{"title","content","describe"};
        QueryParser parser = new MultiFieldQueryParser(fields, new WhitespaceAnalyzer(),boosts);
        Query query = parser.parse("直播");

###solr中可以如下操作:
在SOLR中,有两种方式实现类似搜索效果,不过结果score大相径庭。

  • 1:在q中直接拼接
    q=title:直播1+OR+content:直播0.1+OR+describe:直播^0.01

  • 2:
    q=直播&defType=dismax/edismax/MydefType&qf=title1+content0.1+describe^0.01


    MydefType指自定义的defType

很多情况下,两个方法,能够实现相同的排序结果,但是它们的计算规则并不相同,实际score也不同 第一钟方式,相当于多个field得分累加,最终匹配分值(sumOfSquaredWeights)=查询语句在两个域中的得分之和,实际上也就是上述lucene的实现方式; 第二种,是多个field计算各自的得分,然后取其中一个得分最高的field的得分,solr源码可见: org.apache.lucene.search.DisjunctionMaxQuery: `` public float getValueForNormalization() throws IOException { float max = 0.0f, sum = 0.0f; for (Weight currentWeight : weights) { float sub = currentWeight.getValueForNormalization(); sum += sub; max = Math.max(max, sub);
  }
  float boost = getBoost();
  return (((sum - max) * tieBreakerMultiplier * tieBreakerMultiplier) + max) * boost * boost;
}

``

通过debugQuery也可以看到两种打分的区别:

"1067563474": "\n4.9853425 = max of:\n 2.2417927 = weight(yy_keyword:直播^0.2 in 3527) [ApplistSolrSimilarity], result of:\n 2.2417927 = score(doc=3527,freq=1.0), product of:\n 0.23708561 = queryWeight, product of:\n 0.2 = boost\n 9.455626 = idf(docFreq=17, maxDocs=84626)\n 0.12536749 = queryNorm\n 9.455626 = fieldWeight in 3527, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 9.455626 = idf(docFreq=17, maxDocs=84626)\n 1.0 = fieldNorm(doc=3527)\n 0.6107991 = weight(describe:直播^0.1 in 3527) [ApplistSolrSimilarity], result of:\n 0.6107991 = score(doc=3527,freq=1.0), product of:\n 0.08750677 = queryWeight, product of:\n 0.1 = boost\n 6.980021 = idf(docFreq=213, maxDocs=84626)\n 0.12536749 = queryNorm\n 6.980021 = fieldWeight in 3527, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.980021 = idf(docFreq=213, maxDocs=84626)\n 1.0 = fieldNorm(doc=3527)\n 4.9853425 = weight(title:直播 in 3527) [ApplistSolrSimilarity], result of:\n 4.9853425 = score(doc=3527,freq=1.0), product of:\n 0.99999994 = queryWeight, product of:\n 7.976549 = idf(docFreq=78, maxDocs=84626)\n 0.12536749 = queryNorm\n 4.985343 = fieldWeight in 3527, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 7.976549 = idf(docFreq=78, maxDocs=84626)\n 0.625 = fieldNorm(doc=3527)\n",


"1067563474": "\n5.574838 = sum of:\n 4.9663644 = weight(title:直播 in 3527) [ApplistSolrSimilarity], result of:\n 4.9663644 = score(doc=3527,freq=1.0), product of:\n 0.9961931 = queryWeight, product of:\n 7.976549 = idf(docFreq=78, maxDocs=84626)\n 0.12489024 = queryNorm\n 4.985343 = fieldWeight in 3527, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 7.976549 = idf(docFreq=78, maxDocs=84626)\n 0.625 = fieldNorm(doc=3527)\n 0.6084739 = weight(describe:直播^0.1 in 3527) [ApplistSolrSimilarity], result of:\n 0.6084739 = score(doc=3527,freq=1.0), product of:\n 0.08717365 = queryWeight, product of:\n 0.1 = boost\n 6.980021 = idf(docFreq=213, maxDocs=84626)\n 0.12489024 = queryNorm\n 6.980021 = fieldWeight in 3527, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.980021 = idf(docFreq=213, maxDocs=84626)\n 1.0 = fieldNorm(doc=3527)\n",

可见,一个是max(取最大值),一个是sum(相加合并)。

最后:这里有一个旨在替换掉tomcat+spring+mybatis的基于netty,支持ioc,router,aop,ddd,restful的极简后端框架: https://github.com/rongjoker/quarantineJ ,.欢迎star和fork,相信会对你做java开发、并发编程、网络编程有非常大的帮助.

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值