Elasticsearch DSL Python 库 Search DSL 详解

童霆腾Sorrowful

于 2025-06-10 09:17:42 发布

阅读量368

点赞数 5

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/gitblog_01144/article/details/148552169

Elasticsearch DSL Python 库 Search DSL 详解

elasticsearch-dsl-py High level Python client for Elasticsearch 项目地址: https://gitcode.com/gh_mirrors/el/elasticsearch-dsl-py

概述

Elasticsearch DSL Python 库是一个高级的 Elasticsearch 客户端，它提供了更 Pythonic 的方式来构建和执行 Elasticsearch 查询。本文将深入探讨该库中的 Search DSL 功能，帮助开发者更好地理解和使用这个强大的工具。

Search 对象

Search 对象是整个搜索请求的核心表示，它封装了以下功能：

查询(queries)
过滤器(filters)
聚合(aggregations)
k近邻搜索(k-nearest neighbor searches)
排序(sort)
分页(pagination)
高亮(highlighting)
建议(suggestions)
折叠(collapsing)
额外参数
关联的客户端

设计特点

Search 对象的设计采用了链式调用(chainable)模式，除了聚合功能外，Search 对象是不可变的(immutable)。这意味着任何修改都会创建一个包含变更的浅拷贝，从而可以安全地将 Search 对象传递给外部代码而不用担心被意外修改。

基本使用

首先需要创建一个 Search 对象并关联 Elasticsearch 客户端：

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search

client = Elasticsearch()
s = Search(using=client)

或者稍后指定客户端：

s = s.using(client)

链式调用示例：

s = Search().using(client).query("match", title="python")

执行查询：

response = s.execute()

迭代结果：

for hit in s:
    print(hit.title)

删除查询匹配的文档

可以通过调用 delete() 方法删除匹配查询的文档：

s = Search(index='i').query("match", title="python")
response = s.delete()

查询(Queries)

查询类型

该库为所有 Elasticsearch 查询类型提供了对应的类：

from elasticsearch_dsl.query import MultiMatch, Match

# 等价于 {"multi_match": {"query": "python django", "fields": ["title", "body"]}}
MultiMatch(query='python django', fields=['title', 'body'])

# 等价于 {"match": {"title": {"query": "web framework", "type": "phrase"}}}
Match(title={"query": "web framework", "type": "phrase"})

Q 快捷方式

使用 Q 快捷方式构造查询：

from elasticsearch_dsl import Q

Q("multi_match", query='python django', fields=['title', 'body'])
Q({"multi_match": {"query": "python django", "fields": ["title", "body"]}})

添加到 Search 对象：

q = Q("multi_match", query='python django', fields=['title', 'body'])
s = s.query(q)

点号字段表示

处理嵌套字段或多字段时，可以使用双下划线 __ 代替点号：

s = Search()
s = s.filter('term', category__keyword='Python')
s = s.query('match', address__city='prague')

或者使用字典解包：

s = Search()
s = s.filter('term', **{'category.keyword': 'Python'})
s = s.query('match', **{'address.city': 'prague'})

查询组合

查询对象可以通过逻辑运算符组合：

# OR 组合
Q("match", title='python') | Q("match", title='django')

# AND 组合
Q("match", title='python') & Q("match", title='django')

# NOT 组合
~Q("match", title="python")

多次调用 .query() 方法会使用 & 运算符内部组合查询。

过滤器(Filters)

过滤上下文

使用 filter() 方法将查询放入过滤上下文：

s = Search()
s = s.filter('terms', tags=['search', 'python'])

等价于：

s = Search()
s = s.query('bool', filter=[Q('terms', tags=['search', 'python'])])

排除查询

使用 exclude() 方法排除匹配项：

s = Search()
s = s.exclude('terms', tags=['search', 'python'])

聚合(Aggregations)

聚合定义

使用 A 快捷方式定义聚合：

from elasticsearch_dsl import A

A('terms', field='tags')  # {"terms": {"field": "tags"}}

嵌套聚合

使用 .bucket(), .metric() 和 .pipeline() 方法嵌套聚合：

a = A('terms', field='category')
a.metric('clicks_per_category', 'sum', field='clicks')\
    .bucket('tags_per_category', 'terms', field='tags')

添加到 Search 对象

使用 .aggs 属性添加聚合：

s = Search()
a = A('terms', field='category')
s.aggs.bucket('category_terms', a)

k近邻搜索(K-Nearest Neighbor Searches)

使用 .knn() 方法进行 kNN 搜索：

s = Search()
vector = get_embedding("search text")

s = s.knn(
    field="embedding",
    k=5,
    num_candidates=10,
    query_vector=vector
)

排序(Sorting)

使用 .sort() 方法指定排序：

s = Search().sort(
    'category',  # 升序
    '-title',    # 降序
    {"lines": {"order": "asc", "mode": "avg"}}  # 复杂排序
)

重置排序：

s = s.sort()  # 清除所有排序

分页(Pagination)

使用 Python 切片语法进行分页：

s = s[10:20]  # {"from": 10, "size": 10}
s = s[:20]    # {"size": 20}
s = s[10:]    # {"from": 10}

获取所有匹配文档（不排序）：

for hit in s.scan():
    print(hit.title)

高亮(Highlighting)

设置高亮选项：

s = s.highlight_options(order='score')

为字段启用高亮：

s = s.highlight('title')
# 或带参数
s = s.highlight('title', fragment_size=50)

访问高亮片段：

response = s.execute()
for hit in response:
    for fragment in hit.meta.highlight.title:
        print(fragment)

建议(Suggestions)

使用 .suggest() 方法添加建议请求：

# 检查拼写
s = s.suggest('my_suggestion', 'pyhton', term={'field': 'title'})

折叠(Collapsing)

使用 .collapse() 方法折叠搜索结果：

s = Search().query("match", message="GET /search")
s = s.collapse("user_id")  # 按 user_id 折叠结果

展开折叠结果：

inner_hits = {"name": "recent_search", "size": 5, "sort": [{"@timestamp": "desc"}]}
s = s.collapse("user_id", inner_hits=inner_hits, max_concurrent_group_searches=4)

响应(Response)

执行查询后返回的 Response 对象提供了便捷的访问方式：

response = s.execute()

print(response.success())  # 是否成功
print(response.took)       # 耗时
print(response.hits.total.value)  # 总命中数
print(response.suggest.my_suggestions)  # 建议结果

访问命中结果

response = s.execute()
print('Total %d hits found.' % response.hits.total)
for h in response:
    print(h.title, h.body)

访问结果元数据

h = response.hits[0]
print('/%s/%s/%s returned with score %f' % (
    h.meta.index, h.meta.doc_type, h.meta.id, h.meta.score))

访问聚合结果

for tag in response.aggregations.per_tag.buckets:
    print(tag.key, tag.max_lines.value)

MultiSearch

使用 MultiSearch 类可以同时执行多个搜索：

from elasticsearch_dsl import MultiSearch, Search

ms = MultiSearch(index='blogs')
ms = ms.add(Search().filter('term', tags='python'))
ms = ms.add(Search().filter('term', tags='elasticsearch'))

responses = ms.execute()

for response in responses:
    print("Results for query %r." % response._search.query)
    for hit in response:
        print(hit.title)

EmptySearch

EmptySearch 是一个特殊版本的 Search，无论查询如何都会返回空结果：

from elasticsearch_dsl import EmptySearch

es = EmptySearch()
response = es.execute()  # 总是返回空结果

总结

Elasticsearch DSL Python 库的 Search DSL 提供了强大而灵活的方式来构建和执行 Elasticsearch 查询。通过本文的介绍，你应该已经掌握了如何使用这个库来构建复杂的查询、聚合和分析请求。记住，这个库的核心设计理念是提供 Pythonic 的接口，同时保持与原生 Elasticsearch API 的紧密对应关系。

elasticsearch-dsl-py High level Python client for Elasticsearch 项目地址: https://gitcode.com/gh_mirrors/el/elasticsearch-dsl-py

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考