单字符串多字段的查询(二)
多字段使用的三种场景
最佳匹配
POST blogs/_search
{
"query": {
"multi_match": {
"type": "best_fields",
"query": "Query pets",
"fields": ["title","body"],
"tie_breaker": 0.2,
"minimum_should_match": "20%"
}
}
}
多个字段结合搜索
DELETE /titles
PUT /titles
{
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "english",
"fields": {"std": {"type": "text","analyzer": "standard"}}
}
}
}
}
POST titles/_bulk
{ "index": { "_id": 1 }}
{ "title": "My dog barks" }
{ "index": { "_id": 2 }}
{ "title": "I see a lot of barking dogs on the road " }
GET /titles/_search
{
"query": {
"multi_match": {
"query": "barking dogs",
"type": "most_fields",
"fields": [ "title", "title.std" ]
}
}
}
GET /titles/_search
{
"query": {
"multi_match": {
"query": "barking dogs",
"type": "most_fields",
"fields": [ "title^10", "title.std" ]
}
}
}
结合使用标准分析器,提高准确率。
most_field模式所存在的问题
https://www.elastic.co/guide/cn/elasticsearch/guide/current/field-centric.html
- 两个 字段都与
poland
匹配的文档要比一个字段同时匹配poland
与street
文档的评分高。 - 使用
and
操作符要求所有词都必须存在于 相同字段 ,这显然是不对的!可能就不存在能与这个查询匹配的文档。 - 影响算法精度,当搜索多个字段时,TF/IDF 会带来某些令人意外的结果,IDF的计算值可能会有偏颇。(当然在7.0的版本中使用的是BM25算法)
解决方式一:使用copy_to字段
https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html
使用copy_to的缺点是会占用额外的存贮空间
解决方式二:cross_field跨字段搜索
https://www.elastic.co/guide/cn/elasticsearch/guide/current/_cross_fields_queries.html
POST address/_doc/1
{
"street":"5 Poland Street",
"city":"London",
"country":"United Kingdom",
"postcode":"WIV 30G"
}
POST address/_search
{
"query": {
"multi_match": {
"type": "cross_fields",
"operator": "and",
"query": "Poland Street WIV",
"fields": ["street","city","country","postcode"]
}
}
}
如果这里是most_fields
或者best_fields
使用operator:and会搜索不到结果,因为这样就表示每个字段都必须包括所有词,而这显然是不对的