query_string查询介绍
大家好,我是欧阳方超,可以我的公众号“欧阳方超”,后续内容将在公众号首发。
1 概述
Elasticsearch中的query_string查询是一种强大的工具,允许用户使用复杂的查询语法来搜索文档。它支持多个字段、布尔逻辑、通配符等功能,适合于需要灵活搜索的场景。本文将结合示例详细讲解query_string的用法。
2 基本概念
query_stirng查询使用一种严格的语法来解析用户输入的查询字符串。允许用户使用简洁的字符串实现复杂的查询逻辑,它可以分割查询字符串并根据操作符(如and、or、not)分析每个部分,从而返回匹配的文档。
3 数据准备
创建一个存储博客信息的索引,并插入一些数据以便后续的查询。
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"title": {
"type": "text"
},
"content": {
"type": "text"
},
"tags": {
"type": "keyword"
},
"author": {
"type": "keyword"
},
"publish_date": {
"type": "date",
"format": "yyyy-MM-dd"
},
"views": {
"type": "long"
},
"status": {
"type": "keyword"
}
}
}
}
插入数据准备:
{"index":{"_id":"1"}}
{"title":"Getting Started with Elasticsearch","content":"Elasticsearch is a powerful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine.","tags":["elasticsearch","guide","search"],"author":"John Doe","publish_date":"2023-01-15","views":1000,"status":"published"}
{"index":{"_id":"2"}}
{"title":"Advanced Elasticsearch Query Guide","content":"Learn about complex queries in Elasticsearch including query_string, bool queries and aggregations.","tags":["elasticsearch","advanced","query"],"author":"Jane Smith","publish_date":"2023-02-20","views":800,"status":"published"}
{"index":{"_id":"3"}}
{"title":"Elasticsearch vs Solr Comparison","content":"A detailed comparison between Elasticsearch and Solr. Both are powerful search engines built on Apache Lucene.","tags":["elasticsearch","solr","comparison"],"author":"John Doe","publish_date":"2023-03-10","views":1200,"status":"published"}
{"index":{"_id":"4"}}
{"title":"Mastering Kibana Dashboards","content":"Create powerful visualizations and dashboards using Kibana with Elasticsearch data.","tags":["kibana","elasticsearch","visualization"],"author":"Alice Johnson","publish_date":"2023-04-05","views":600,"status":"draft"}
{"index":{"_id":"5"}}
{"title":"Elasticsearch Security Best Practices","content":"Learn about securing your Elasticsearch cluster, including authentication, authorization, and encryption.","tags":["elasticsearch","security","best practices"],"author":"Bob Wilson","publish_date":"2023-05-01","views":1500,"status":"published"}
4 query_string查询示例
4.1 基本查询
简单查询
下面的查询将查询content字段包含powerful字符串的文档,并将其返回。
{
"query": {
"query_string": {
"default_field":"content",
"query":"powerful"
}
}
}
多字段查询
下面的多字段查询的查询逻辑为
- 在title和content字段中搜索同时包含elasticsearch和security的文档,注意只要在两个字段中能匹配到elasticsearch和security即可,不要求在这两个字段的每个字段中都能匹配到elasticsearch和security。
- and操作符要求两个条件都满足
{
"query": {
"query_string": {
"fields":["title","content"],
"query":"elasticsearch AND security"
}
}
}
只有id=5的文档能被查出来,因为它的title包含security且content包含elasticsearch。
4.2 复杂查询解析
组合条件查询
{
"query": {
"query_string": {
"fields":["title","content"],
"query":"(elasticsearch OR solr) AND (guide OR comparison)"
}
}
}
上面的DSL逻辑为:
- 在title和content字段中搜索
- 文档必须满足:
包含"elasticsearch"或"solr"中的至少一个,AND
包含"guide"或"comparison"中的至少一个
会查询出两个文档:
- id=2 的文档(包含elasticsearch和guide)
- id=3 的文档(包含elasticsearch/solr和comparison)
范围查询
{
"query": {
"query_string": {
"query":"elasticsearch AND publish_date:[2023-01-01 TO 2023-03-31] AND views:>1000"
}
}
}
上面DSL查询逻辑为:
搜索满足以下所有条件的文档:
- 包含"elasticsearch"
- 发布日期在2023-01-01到2023-03-31之间
- 浏览量大于1000
只有id=3的文档可以被查询到。
4.3 高级过滤解析
{
"query": {
"query_string": {
"query": "status:published AND author:\"John Doe\" AND (title:elasticsearch OR content:elasticsearch)"
}
}
}
搜索满足以下所有条件的文档:
- 状态为"published"
- 作者为"John Doe"
- 标题或内容中包含"elasticsearch"
最终文档1和3符合条件,被查询到。
以下是查询结果:
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.525382,
"hits": [
{
"_index": "blog_index",
"_id": "3",
"_score": 1.525382,
"_source": {
"title": "Elasticsearch vs Solr Comparison",
"content": "A detailed comparison between Elasticsearch and Solr. Both are powerful search engines built on Apache Lucene.",
"tags": [
"elasticsearch",
"solr",
"comparison"
],
"author": "John Doe",
"publish_date": "2023-03-10",
"views": 1200,
"status": "published"
}
},
{
"_index": "blog_index",
"_id": "1",
"_score": 1.5210661,
"_source": {
"title": "Getting Started with Elasticsearch",
"content": "Elasticsearch is a powerful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine.",
"tags": [
"elasticsearch",
"guide",
"search"
],
"author": "John Doe",
"publish_date": "2023-01-15",
"views": 1000,
"status": "published"
}
}
]
}
}
4.4 模糊查询解析
{
"query": {
"query_string": {
"query": "elasticsearch AND status:published"
}
},
"size" : 0,
"aggs": {
"authors": {
"terms": {
"field": "author"
}
},
"avg_views": {
"avg": {
"field": "views"
}
}
}
}
这是一个用于搜索和聚合数据的请求,稍微复杂一些,下面详细介绍下。
查询部分
- query:这是整个查询的主体,指定了要执行的搜索操作。
- query_string:这部分使用了查询字符串语法,允许通过简单的文本表达式来构建复杂的查询。
- query:这是的值是elasticsearch AND status:published,意味着要搜索包含elasticsearch这个词并且其status字段为published的文档,AND确保两个条件都满足。
聚合部分
aggs这个部分用于定义聚合操作,可以对查询结果进行统计和分析。
- query:这是的值是elasticsearch AND status:published,意味着要搜索包含elasticsearch这个词并且其status字段为published的文档,AND确保两个条件都满足。
- 作者聚合
- authors:这是一个自定义的聚合名称,用于统计不同作者的文档数量。
- terms:指定使用分组聚合,terms是桶聚合的一种,其作用类似于SQL的group by,根据字段分组,相同字段值的文档分为一组。
- “field”:"author"表示按照author字段的值进行分组,结果将返回每个作者及其对应的文档计数。
- terms:指定使用分组聚合,terms是桶聚合的一种,其作用类似于SQL的group by,根据字段分组,相同字段值的文档分为一组。
- authors:这是一个自定义的聚合名称,用于统计不同作者的文档数量。
- 平均浏览量聚合
- avg_views:这是另一个自定义聚合名称,用于计算文档的平均浏览量。
- avg:指定平均值聚合。
- “field”: “views"表示计算views字段的平均值。这将返回所有匹配文档中views字段的平均值。
注意,上面的DSL中设置了,这将仅返回聚合查询结果,不返回普通query查询结果(即"hits”: [])。以下是查询结果:
- “field”: “views"表示计算views字段的平均值。这将返回所有匹配文档中views字段的平均值。
- avg:指定平均值聚合。
- avg_views:这是另一个自定义聚合名称,用于计算文档的平均浏览量。
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"avg_views": {
"value": 1125
},
"authors": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "John Doe",
"doc_count": 2
},
{
"key": "Bob Wilson",
"doc_count": 1
},
{
"key": "Jane Smith",
"doc_count": 1
}
]
}
}
}
4.5 高亮查询解析
{
"query": {
"query_string": {
"query": "elasticsearch security"
}
},
"highlight": {
"fields": {
"title": {},
"content": {}
}
}
}
上面的DSL分查询和高亮两部分,下面详细解释一下。
- 查询部分
- query:这是整个查询的主体,定义了要执行的搜索操作。
- query_string:这个部分使用了查询字符串语法,运行通过简单的文本表达式构建复杂的查询。
- query:这里的值是elasticsearch security,这意味着要查找包含elasticsearch和security这两个词的文档。默认情况下,elasticsearch将这些词视为单独的词进行处理,并使用OR逻辑运算符连接它们,这意味着只要文档中包含其中一个词,就会被匹配。
- fields:这个参数指定了要搜索的字段,这个例子中,搜索将在title和content字段中进行,只有这两个字段中的内容会被考虑用于匹配查询。
- 高亮部分
- highlight:这部分用于定义特殊标记的设置,每个文档中匹配的词会被特殊标记(默认用标签包围),以便在搜索结果中突出显示匹配的内容。
- fields:指定需要高亮显示的字段,上例中,指定了title和content字段,这意味着当搜索结果返回时,如果这些字段中的内容与查询匹配,它们将被高亮显示,以便用户能够快速识别相关信息。
下面是查询结果:
- fields:指定需要高亮显示的字段,上例中,指定了title和content字段,这意味着当搜索结果返回时,如果这些字段中的内容与查询匹配,它们将被高亮显示,以便用户能够快速识别相关信息。
- highlight:这部分用于定义特殊标记的设置,每个文档中匹配的词会被特殊标记(默认用标签包围),以便在搜索结果中突出显示匹配的内容。
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": 1.6386936,
"hits": [
{
"_index": "blog_index",
"_id": "5",
"_score": 1.6386936,
"_source": {
"title": "Elasticsearch Security Best Practices",
"content": "Learn about securing your Elasticsearch cluster, including authentication, authorization, and encryption.",
"tags": [
"elasticsearch",
"security",
"best practices"
],
"author": "Bob Wilson",
"publish_date": "2023-05-01",
"views": 1500,
"status": "published"
},
"highlight": {
"title": [
"<em>Elasticsearch</em> <em>Security</em> Best Practices"
],
"content": [
"Learn about securing your <em>Elasticsearch</em> cluster, including authentication, authorization, and encryption"
]
}
},
{
"_index": "blog_index",
"_id": "1",
"_score": 0.28161854,
"_source": {
"title": "Getting Started with Elasticsearch",
"content": "Elasticsearch is a powerful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine.",
"tags": [
"elasticsearch",
"guide",
"search"
],
"author": "John Doe",
"publish_date": "2023-01-15",
"views": 1000,
"status": "published"
},
"highlight": {
"title": [
"Getting Started with <em>Elasticsearch</em>"
],
"content": [
"<em>Elasticsearch</em> is a powerful search and analytics engine."
]
}
},
{
"_index": "blog_index",
"_id": "2",
"_score": 0.28161854,
"_source": {
"title": "Advanced Elasticsearch Query Guide",
"content": "Learn about complex queries in Elasticsearch including query_string, bool queries and aggregations.",
"tags": [
"elasticsearch",
"advanced",
"query"
],
"author": "Jane Smith",
"publish_date": "2023-02-20",
"views": 800,
"status": "published"
},
"highlight": {
"title": [
"Advanced <em>Elasticsearch</em> Query Guide"
],
"content": [
"Learn about complex queries in <em>Elasticsearch</em> including query_string, bool queries and aggregations."
]
}
},
{
"_index": "blog_index",
"_id": "3",
"_score": 0.28161854,
"_source": {
"title": "Elasticsearch vs Solr Comparison",
"content": "A detailed comparison between Elasticsearch and Solr. Both are powerful search engines built on Apache Lucene.",
"tags": [
"elasticsearch",
"solr",
"comparison"
],
"author": "John Doe",
"publish_date": "2023-03-10",
"views": 1200,
"status": "published"
},
"highlight": {
"title": [
"<em>Elasticsearch</em> vs Solr Comparison"
],
"content": [
"A detailed comparison between <em>Elasticsearch</em> and Solr."
]
}
},
{
"_index": "blog_index",
"_id": "4",
"_score": 0.09708915,
"_source": {
"title": "Mastering Kibana Dashboards",
"content": "Create powerful visualizations and dashboards using Kibana with Elasticsearch data.",
"tags": [
"kibana",
"elasticsearch",
"visualization"
],
"author": "Alice Johnson",
"publish_date": "2023-04-05",
"views": 600,
"status": "draft"
},
"highlight": {
"content": [
"Create powerful visualizations and dashboards using Kibana with <em>Elasticsearch</em> data."
]
}
}
]
}
}
4.6 分页查询解析
下面是一个使用查询字符串语法进行分页查询的示例:
{
"query": {
"query_string": {
"query": "elasticsearch security"
}
},
"from":0,
"size":4,
"sort":[{"views":"desc"}]
}
有三部分组成:查询部分、分页控制部分和排序部分。
- 查询部分:字符串查询语法。
- 分页控制部分:
- “from”: 0:这个参数指定从结果集中的第0个文档开始返回(即从第一条记录开始)。用于实现分页功能。
- “size”: 2:指定要返回的文档数量。在这个例子中,最多返回2条匹配的文档。这与from参数结合使用你,可以实现更灵活的分页。
- 排序部分
- sort:用于定义如何对搜索结果进行排序。
- { “views”: “desc” }:表示根据views字段进行降序排序。
下面是返回值:
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "blog_index",
"_id": "5",
"_score": null,
"_source": {
"title": "Elasticsearch Security Best Practices",
"content": "Learn about securing your Elasticsearch cluster, including authentication, authorization, and encryption.",
"tags": [
"elasticsearch",
"security",
"best practices"
],
"author": "Bob Wilson",
"publish_date": "2023-05-01",
"views": 1500,
"status": "published"
},
"sort": [
1500
]
},
{
"_index": "blog_index",
"_id": "3",
"_score": null,
"_source": {
"title": "Elasticsearch vs Solr Comparison",
"content": "A detailed comparison between Elasticsearch and Solr. Both are powerful search engines built on Apache Lucene.",
"tags": [
"elasticsearch",
"solr",
"comparison"
],
"author": "John Doe",
"publish_date": "2023-03-10",
"views": 1200,
"status": "published"
},
"sort": [
1200
]
}
]
}
}
5 总结
介绍了查询字符串(query_string)语法,并结合一些高级查询展示了查询字符串语法的使用。如果你觉得“查询字符串”这种叫法有些奇怪,大可不必,因为这完全是安装query_string译过来的。
我是欧阳方超,把事情做好了自然就有兴趣了,如果你喜欢我的文章,欢迎点赞、转发、评论加关注。我们下次见。