Elasticsearch(四):query_string查询介绍

大家好,我是欧阳方超,可以我的公众号“欧阳方超”,后续内容将在公众号首发。在这里插入图片描述

1 概述

Elasticsearch中的query_string查询是一种强大的工具,允许用户使用复杂的查询语法来搜索文档。它支持多个字段、布尔逻辑、通配符等功能,适合于需要灵活搜索的场景。本文将结合示例详细讲解query_string的用法。

2 基本概念

query_stirng查询使用一种严格的语法来解析用户输入的查询字符串。允许用户使用简洁的字符串实现复杂的查询逻辑,它可以分割查询字符串并根据操作符(如and、or、not)分析每个部分,从而返回匹配的文档。

3 数据准备

创建一个存储博客信息的索引,并插入一些数据以便后续的查询。

{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "content": {
        "type": "text"
      },
      "tags": {
        "type": "keyword"
      },
      "author": {
        "type": "keyword"
      },
      "publish_date": {
        "type": "date",
        "format": "yyyy-MM-dd"
      },
      "views": {
        "type": "long"
      },
      "status": {
        "type": "keyword"
      }
    }
  }
}

插入数据准备:

{"index":{"_id":"1"}}
{"title":"Getting Started with Elasticsearch","content":"Elasticsearch is a powerful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine.","tags":["elasticsearch","guide","search"],"author":"John Doe","publish_date":"2023-01-15","views":1000,"status":"published"}
{"index":{"_id":"2"}}
{"title":"Advanced Elasticsearch Query Guide","content":"Learn about complex queries in Elasticsearch including query_string, bool queries and aggregations.","tags":["elasticsearch","advanced","query"],"author":"Jane Smith","publish_date":"2023-02-20","views":800,"status":"published"}
{"index":{"_id":"3"}}
{"title":"Elasticsearch vs Solr Comparison","content":"A detailed comparison between Elasticsearch and Solr. Both are powerful search engines built on Apache Lucene.","tags":["elasticsearch","solr","comparison"],"author":"John Doe","publish_date":"2023-03-10","views":1200,"status":"published"}
{"index":{"_id":"4"}}
{"title":"Mastering Kibana Dashboards","content":"Create powerful visualizations and dashboards using Kibana with Elasticsearch data.","tags":["kibana","elasticsearch","visualization"],"author":"Alice Johnson","publish_date":"2023-04-05","views":600,"status":"draft"}
{"index":{"_id":"5"}}
{"title":"Elasticsearch Security Best Practices","content":"Learn about securing your Elasticsearch cluster, including authentication, authorization, and encryption.","tags":["elasticsearch","security","best practices"],"author":"Bob Wilson","publish_date":"2023-05-01","views":1500,"status":"published"}

4 query_string查询示例

4.1 基本查询

简单查询
下面的查询将查询content字段包含powerful字符串的文档,并将其返回。

{
  "query": {
    "query_string": {
      "default_field":"content",
      "query":"powerful"
    }
  }
}

多字段查询
下面的多字段查询的查询逻辑为

  • 在title和content字段中搜索同时包含elasticsearch和security的文档,注意只要在两个字段中能匹配到elasticsearch和security即可,不要求在这两个字段的每个字段中都能匹配到elasticsearch和security。
  • and操作符要求两个条件都满足
{
  "query": {
    "query_string": {
      "fields":["title","content"],
      "query":"elasticsearch AND security"
    }
  }
}

只有id=5的文档能被查出来,因为它的title包含security且content包含elasticsearch。

4.2 复杂查询解析

组合条件查询

{
  "query": {
    "query_string": {
      "fields":["title","content"],
      "query":"(elasticsearch OR solr) AND (guide OR comparison)"
    }
  }
}

上面的DSL逻辑为:

  • 在title和content字段中搜索
  • 文档必须满足:
    包含"elasticsearch"或"solr"中的至少一个,AND
    包含"guide"或"comparison"中的至少一个

会查询出两个文档:

  • id=2 的文档(包含elasticsearch和guide)
  • id=3 的文档(包含elasticsearch/solr和comparison)
    范围查询
{
  "query": {
    "query_string": {
      "query":"elasticsearch AND publish_date:[2023-01-01 TO 2023-03-31] AND views:>1000"
    }
  }
}

上面DSL查询逻辑为:
搜索满足以下所有条件的文档:

  • 包含"elasticsearch"
  • 发布日期在2023-01-01到2023-03-31之间
  • 浏览量大于1000
    只有id=3的文档可以被查询到。

4.3 高级过滤解析

{
  "query": {
    "query_string": {
      "query": "status:published AND author:\"John Doe\" AND (title:elasticsearch OR content:elasticsearch)"
    }
  }
}

搜索满足以下所有条件的文档:

  • 状态为"published"
  • 作者为"John Doe"
  • 标题或内容中包含"elasticsearch"
    最终文档1和3符合条件,被查询到。
    以下是查询结果:
{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1.525382,
    "hits": [
      {
        "_index": "blog_index",
        "_id": "3",
        "_score": 1.525382,
        "_source": {
          "title": "Elasticsearch vs Solr Comparison",
          "content": "A detailed comparison between Elasticsearch and Solr. Both are powerful search engines built on Apache Lucene.",
          "tags": [
            "elasticsearch",
            "solr",
            "comparison"
          ],
          "author": "John Doe",
          "publish_date": "2023-03-10",
          "views": 1200,
          "status": "published"
        }
      },
      {
        "_index": "blog_index",
        "_id": "1",
        "_score": 1.5210661,
        "_source": {
          "title": "Getting Started with Elasticsearch",
          "content": "Elasticsearch is a powerful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine.",
          "tags": [
            "elasticsearch",
            "guide",
            "search"
          ],
          "author": "John Doe",
          "publish_date": "2023-01-15",
          "views": 1000,
          "status": "published"
        }
      }
    ]
  }
}

4.4 模糊查询解析

{
    "query": {
        "query_string": {
            "query": "elasticsearch AND status:published"
        }
    },
    "size" : 0,
    "aggs": {
        "authors": {
            "terms": {
                "field": "author"
            }
        },
        "avg_views": {
            "avg": {
                "field": "views"
            }
        }
    }
}

这是一个用于搜索和聚合数据的请求,稍微复杂一些,下面详细介绍下。
查询部分

  • query:这是整个查询的主体,指定了要执行的搜索操作。
  • query_string:这部分使用了查询字符串语法,允许通过简单的文本表达式来构建复杂的查询。
    • query:这是的值是elasticsearch AND status:published,意味着要搜索包含elasticsearch这个词并且其status字段为published的文档,AND确保两个条件都满足。
      聚合部分
      aggs这个部分用于定义聚合操作,可以对查询结果进行统计和分析。
  • 作者聚合
    • authors:这是一个自定义的聚合名称,用于统计不同作者的文档数量。
      • terms:指定使用分组聚合,terms是桶聚合的一种,其作用类似于SQL的group by,根据字段分组,相同字段值的文档分为一组。
        • “field”:"author"表示按照author字段的值进行分组,结果将返回每个作者及其对应的文档计数。
  • 平均浏览量聚合
    • avg_views:这是另一个自定义聚合名称,用于计算文档的平均浏览量。
      • avg:指定平均值聚合。
        • “field”: “views"表示计算views字段的平均值。这将返回所有匹配文档中views字段的平均值。
          注意,上面的DSL中设置了,这将仅返回聚合查询结果,不返回普通query查询结果(即"hits”: [])。以下是查询结果:
{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "avg_views": {
      "value": 1125
    },
    "authors": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "John Doe",
          "doc_count": 2
        },
        {
          "key": "Bob Wilson",
          "doc_count": 1
        },
        {
          "key": "Jane Smith",
          "doc_count": 1
        }
      ]
    }
  }
}

4.5 高亮查询解析

{
  "query": {
    "query_string": {
      "query": "elasticsearch security"
    }
  },
  "highlight": {
        "fields": {
            "title": {},
            "content": {}
        }
    }
}

上面的DSL分查询和高亮两部分,下面详细解释一下。

  • 查询部分
    • query:这是整个查询的主体,定义了要执行的搜索操作。
    • query_string:这个部分使用了查询字符串语法,运行通过简单的文本表达式构建复杂的查询。
      • query:这里的值是elasticsearch security,这意味着要查找包含elasticsearch和security这两个词的文档。默认情况下,elasticsearch将这些词视为单独的词进行处理,并使用OR逻辑运算符连接它们,这意味着只要文档中包含其中一个词,就会被匹配。
      • fields:这个参数指定了要搜索的字段,这个例子中,搜索将在title和content字段中进行,只有这两个字段中的内容会被考虑用于匹配查询。
  • 高亮部分
    • highlight:这部分用于定义特殊标记的设置,每个文档中匹配的词会被特殊标记(默认用标签包围),以便在搜索结果中突出显示匹配的内容。
      • fields:指定需要高亮显示的字段,上例中,指定了title和content字段,这意味着当搜索结果返回时,如果这些字段中的内容与查询匹配,它们将被高亮显示,以便用户能够快速识别相关信息。
        下面是查询结果:
{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 5,
      "relation": "eq"
    },
    "max_score": 1.6386936,
    "hits": [
      {
        "_index": "blog_index",
        "_id": "5",
        "_score": 1.6386936,
        "_source": {
          "title": "Elasticsearch Security Best Practices",
          "content": "Learn about securing your Elasticsearch cluster, including authentication, authorization, and encryption.",
          "tags": [
            "elasticsearch",
            "security",
            "best practices"
          ],
          "author": "Bob Wilson",
          "publish_date": "2023-05-01",
          "views": 1500,
          "status": "published"
        },
        "highlight": {
          "title": [
            "<em>Elasticsearch</em> <em>Security</em> Best Practices"
          ],
          "content": [
            "Learn about securing your <em>Elasticsearch</em> cluster, including authentication, authorization, and encryption"
          ]
        }
      },
      {
        "_index": "blog_index",
        "_id": "1",
        "_score": 0.28161854,
        "_source": {
          "title": "Getting Started with Elasticsearch",
          "content": "Elasticsearch is a powerful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine.",
          "tags": [
            "elasticsearch",
            "guide",
            "search"
          ],
          "author": "John Doe",
          "publish_date": "2023-01-15",
          "views": 1000,
          "status": "published"
        },
        "highlight": {
          "title": [
            "Getting Started with <em>Elasticsearch</em>"
          ],
          "content": [
            "<em>Elasticsearch</em> is a powerful search and analytics engine."
          ]
        }
      },
      {
        "_index": "blog_index",
        "_id": "2",
        "_score": 0.28161854,
        "_source": {
          "title": "Advanced Elasticsearch Query Guide",
          "content": "Learn about complex queries in Elasticsearch including query_string, bool queries and aggregations.",
          "tags": [
            "elasticsearch",
            "advanced",
            "query"
          ],
          "author": "Jane Smith",
          "publish_date": "2023-02-20",
          "views": 800,
          "status": "published"
        },
        "highlight": {
          "title": [
            "Advanced <em>Elasticsearch</em> Query Guide"
          ],
          "content": [
            "Learn about complex queries in <em>Elasticsearch</em> including query_string, bool queries and aggregations."
          ]
        }
      },
      {
        "_index": "blog_index",
        "_id": "3",
        "_score": 0.28161854,
        "_source": {
          "title": "Elasticsearch vs Solr Comparison",
          "content": "A detailed comparison between Elasticsearch and Solr. Both are powerful search engines built on Apache Lucene.",
          "tags": [
            "elasticsearch",
            "solr",
            "comparison"
          ],
          "author": "John Doe",
          "publish_date": "2023-03-10",
          "views": 1200,
          "status": "published"
        },
        "highlight": {
          "title": [
            "<em>Elasticsearch</em> vs Solr Comparison"
          ],
          "content": [
            "A detailed comparison between <em>Elasticsearch</em> and Solr."
          ]
        }
      },
      {
        "_index": "blog_index",
        "_id": "4",
        "_score": 0.09708915,
        "_source": {
          "title": "Mastering Kibana Dashboards",
          "content": "Create powerful visualizations and dashboards using Kibana with Elasticsearch data.",
          "tags": [
            "kibana",
            "elasticsearch",
            "visualization"
          ],
          "author": "Alice Johnson",
          "publish_date": "2023-04-05",
          "views": 600,
          "status": "draft"
        },
        "highlight": {
          "content": [
            "Create powerful visualizations and dashboards using Kibana with <em>Elasticsearch</em> data."
          ]
        }
      }
    ]
  }
}

4.6 分页查询解析

下面是一个使用查询字符串语法进行分页查询的示例:

{
  "query": {
    "query_string": {
      "query": "elasticsearch security"
    }
  },
  "from":0,
  "size":4,
  "sort":[{"views":"desc"}]
}

有三部分组成:查询部分、分页控制部分和排序部分。

  • 查询部分:字符串查询语法。
  • 分页控制部分:
    • “from”: 0:这个参数指定从结果集中的第0个文档开始返回(即从第一条记录开始)。用于实现分页功能。
    • “size”: 2:指定要返回的文档数量。在这个例子中,最多返回2条匹配的文档。这与from参数结合使用你,可以实现更灵活的分页。
  • 排序部分
    • sort:用于定义如何对搜索结果进行排序。
    • { “views”: “desc” }:表示根据views字段进行降序排序。
      下面是返回值:
{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 5,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [
      {
        "_index": "blog_index",
        "_id": "5",
        "_score": null,
        "_source": {
          "title": "Elasticsearch Security Best Practices",
          "content": "Learn about securing your Elasticsearch cluster, including authentication, authorization, and encryption.",
          "tags": [
            "elasticsearch",
            "security",
            "best practices"
          ],
          "author": "Bob Wilson",
          "publish_date": "2023-05-01",
          "views": 1500,
          "status": "published"
        },
        "sort": [
          1500
        ]
      },
      {
        "_index": "blog_index",
        "_id": "3",
        "_score": null,
        "_source": {
          "title": "Elasticsearch vs Solr Comparison",
          "content": "A detailed comparison between Elasticsearch and Solr. Both are powerful search engines built on Apache Lucene.",
          "tags": [
            "elasticsearch",
            "solr",
            "comparison"
          ],
          "author": "John Doe",
          "publish_date": "2023-03-10",
          "views": 1200,
          "status": "published"
        },
        "sort": [
          1200
        ]
      }
    ]
  }
}

5 总结

介绍了查询字符串(query_string)语法,并结合一些高级查询展示了查询字符串语法的使用。如果你觉得“查询字符串”这种叫法有些奇怪,大可不必,因为这完全是安装query_string译过来的。
我是欧阳方超,把事情做好了自然就有兴趣了,如果你喜欢我的文章,欢迎点赞、转发、评论加关注。我们下次见。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值