ElasticSearch DSL python

最新推荐文章于 2025-04-08 22:00:05 发布

码码更快乐

最新推荐文章于 2025-04-08 22:00:05 发布

阅读量2.9k

点赞数

分类专栏： ElasticSearch python 文章标签： ElasticSearch DSL terms should query

本文链接：https://blog.youkuaiyun.com/u012089823/article/details/82424679

版权

python 同时被 2 个专栏收录

37 篇文章

订阅专栏

ElasticSearch

6 篇文章

订阅专栏

SDL常用组合查询

注意：

1、多条件查询should、must_not、must的使用必须使用bool来进行组合，这三种类型可以相互嵌套，但是里层的嵌套必须使用bool来进行组合嵌套，不可以直接嵌套。

2、should、must_not、must如果不嵌套时，属于平级关系，任何一个满足条件即可。

3、should、must_not、must中任何一个被嵌套在里层的时候，里面再不能包含query查询语句，不然会报错。

""" 日常用查询语句 """
query = {
    "size": 10,                      # 返回查询结果的条数，默认返回10条
    "from": 0,                       # 从查询结果中的第几条数据开始返回，from不能大于20000，如果要进行大数据分页，建议使用scroll
    "query": {
        "filtered": {                # 使用filtered，可以借助filter快速过滤出来文档，再通过query精确查询出来文档
            "query": {
                "bool": {            # bool用来组合多个条件查询
                    "should": [],    # 只要满足其中一个条件就可以，相当于OR条件
                    "must_not": [],  # 必须全部不匹配的条件，例如过滤不包含，值不等于
                    "must": []       # 必须全部满足条件，相当于and
                }
            }
        },
        "filter": {
            "bool": {
                "should": [],
                "must_not": [],
                "must": []
            }
        }
    },
    "sort": [{                       # 按字段排序，升序为'asc',降序为'desc'
	    "title.sort": {              # 按字段降序排序，一般需要显示最新更新的数据，需要倒排序
		    "order": "desc"          
	    },
	    "_score": {                  # luence按查询综合评分倒序排序
	   	    "order": "desc"          
	    }
    }]
}

""" 多层嵌套结构，should中嵌套must """
# 适用场景：同时从多个索引中获取查询数据，多个索引中没有共同的字段，
# 查询之前必须先判断哪个字段是否存在，再进行一些匹配操作
query = {
	'query': {
		'bool': {
			'should': [{
				'bool': {
					'must': [{
						'terms': {
							'categories': [u'ip']
						}
					},
					{
						'range': {
							'modified': {
								'gte': '2018-12-30T05:31:03.000Z',
								'lte': '2019-03-30T05:31:03.000Z'
							}
						}
					},
					{
						'exists': {
							'field': 'observables.value'
						}
					},
					{
						'wildcard': {
							'observables.value': '*201.149*'
						}
					}]
				}
			},
			{
				'bool': {
					'must': [{
						'terms': {
							'categories': [u'ip']
						}
					},
					{
						'range': {
							'modify': {
								'gte': '2018-12-30T05:31:03.000Z',
								'lte': '2019-03-30T05:31:03.000Z'
							}
						}
					},
					{
						'exists': {  # exists判断字段是否存在
							'field': 'observables.h.md5'
						}
					},
					{
						'wildcard': { # 模糊查询之包含查询，两侧都加*
							'observables.h.md5': '*201.149*'
						}
					}]
				}
			}]
		}
	}
}

关键词解释：

filter：适用于查找范围数据，例如时间范围，优势：不会计算相关性分数_score，同时可以cache，因此，filter速度要快于query
query：适用于全文数据精准查询，查询时通过计算关性分数_score，用来对匹配了的文档进行相关性排序，优势：查询的数据更为准确，一般相同条件查询，要比filter查询的数据多
filtered：一般用来组合query和filter使用，提升文档的过滤速度

DSL之区间查询 range

term = {
    "range": {
        "idate": {
            "gte": "2015-09-01T00:00:00",
            "lte": "2015-09-10T00:00:00"
        }
    }
}

# 查询条件插入主query语句
query["query"]["filter"]["bool"]["must"].append(term)

其中：
idate：是需要查询的关键字
gte：Greater-than or equal to
gt：Greater-than
lte：Less-than or equal to
lt：Less-than

DSL之模糊匹配 query_string

# 查询关键字中包含title的模糊匹配
term = {
    "query_string": {
        "default_field": "title",  # 需要查询的字段
        "query": '"%s"' % title    # 查询的关键字
    }
}

query["query"]["filtered"]["query"]["bool"]["must"].append(term)
注意：query查询条件中，如果条件在""中间，则需要查询的关键字不会被拆解，查询直接按照全关键字查询，
如果没有双引号，则会拆解关键字，模糊匹配拆解后的任意一个关键字

DSL之精确匹配 term和terms

# 单一条件匹配
term = {
    "term": {
        "categories": categories
    }
}

# 多条件任意一个匹配
term = {
    "terms": {
        "categories": [categories, vol]
    }
}

query["query"]["filtered"]["query"]["bool"]["must"].append(term)
term与terms的区别：term只能用来单条件精确匹配，而terms可以用来多条件

DSL之分词匹配 match、match_phrase、multi_match

# 以下查询都需要根据lucene的评分机制(TF/IDF)来进行评分
# match 查询 我的宝马多少马力 这个查询语句匹配的文档，会被分词为"宝马 多少 马力", 所有有关"宝马 多少 马力", 那么所有包含这三个词中的一个或多个的文档就会被搜索出来。
{
    "query": {
        "match": {
            "title" : {
                "query" : "宝马多少马力"
            }
        }
    }
}

# match_phrase 精确匹配同时包含查询语句的所有分词，例如同时包含"宝马 多少 马力"，相当于模糊查询 宝马多少马力，但是性能肯定没有query_string高
{
    "query": {
        "match_phrase": {
            "title" : {
                "query" : "宝马多少马力"
            }
        }
    }
}

# multi_match 增加完全匹配可调因子slop，slop的值表示可以少匹配多少个分词
{
    "query": {
        "match_phrase": {
            "content" : {
                "query" : "宝马多少马力",
                "slop" : 1
            }
        }
    }
}

# multi_match 同时匹配多个字段，其中一个字段有这个文档就满足条件
{
    "query": {
        "multi_match": {
            "query" : "宝马多少马力",
            "fields" : ["title", "content"]
        }
    }
}

# multi_match 希望完全匹配的文档占的评分比较高，则需要使用best_fields
{
    "query": {
        "multi_match": {
            "query": "我的宝马发动机多少",
            "type": "best_fields",
            "fields": [
                "tag",
                "content"
            ],
            "tie_breaker": 0.3,   # tie_breaker的意思是少匹配一个分词，评分就会乘以0.3的系数，区分出来高评分文档
        }
    }
}

# 希望越多字段匹配的文档评分越高，就要使用most_fields，也就是tag和content同时包含分词评分会越高
{
    "query": {
        "multi_match": {
            "query": "我的宝马发动机多少",
            "type": "most_fields",
            "fields": [
                "tag",
                "content"
            ]
        }
    }
}

注意：match查询，遇到数字，日期，布尔值或者not_analyzed 的字符串时，不进行分词，它将为你搜索你给定的值

DSL之是否存在过滤 exists、missing

# exists表示文档中包含某个字段，missing表示文档中不包含某个字段，这个字段为key，而不是key的值
{
   "query":{ 
        "exists":{ 
            "field": "title" 
        } 
    } 
}

DSL之嵌套对象查询 nested

由于嵌套对象被索引在独立隐藏的文档中，我们无法直接查询它们。相应地，我们必须使用 nested 查询去获取它们
title 子句是查询根文档的
nested 子句作用于嵌套字段，comments 在此查询中，既不能查询根文档字段，也不能查询其他嵌套文档
comments.name 和 comments.age 子句操作在同一个嵌套文档中
提示：nested 字段可以包含其他的 nested字段。同样地，nested 查询也可以包含其他的nested
查询。而嵌套的层次会按照你所期待的被应用
nested详细讲解：https://www.elastic.co/guide/cn/elasticsearch/guide/current/nested-query.html

GET /my_index/blogpost/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "eggs" 
          }
        },
        {
          "nested": {
            "path": "comments", 
            "query": {
              "bool": {
                "must": [ 
                  {
                    "match": {
                      "comments.name": "john"
                    }
                  },
                  {
                    "match": {
                      "comments.age": 28
                    }
                  }
                ]
              }
            }
          }
        }
      ]
}}}

查询语句样例

# 查询所有数据
{
    "query": { 
        "match_all": {} 
    }
}

# 查询部分数据
query = {
    "size": 10,
    "from": 0,
    "query": {
        "filtered": {
            "query": {
                "bool": {
                    "should": [],
                    "must_not": [],
                    "must": [
                        {"term": {
                            "channel_name": "微信自媒体微信"
                            }
                        }
                    ]
                }
            }
        },
        "filter": {
            "range": {
                "idate": {
                    "gte": "2015-09-01T00:00:00",
                    "lte": "2015-09-10T00:00:00"
                }
            }
        }
    }
}