ES查询笔记（附python语句）_es postman条件查询需要转化为小写才可以查到-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_33905939/article/details/108279388

本文介绍了Elasticsearch的查询方式，如terms查询、match和match_phrase的区别，以及如何进行批量处理数据。在查询时，terms考虑了字段的分析设置，match是短语的OR关系，match_phrase则是AND关系。批量处理数据使用了helpers.bulk，包含index、create、update和delete四种操作。同时，文章还提到了相关性得分和自定义分词器的重要性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

检索数据：helpers.scan：

# 从xx_index中检索满足query的数据，重点在于query的编写
helpers.scan(client,
    query={"query": {"match": {"title": "python"}}},
    index=xx_index,
    doc_type=XX_type
)

要注意，terms是精确查询，如果待查字段设置了analyzed(分词？忽略大小写？去除停用词？)，terms查询也考虑到。比如将查询条件转成小写，并且查询时会自动去除停用词。(如果设置了忽略停用词，那么查询build the wall会查不到，因为the是停用词，查询的时候去掉了。)
terms查询的查询条件是list列表，不是字符串；match和match_phrase查询的查询条件是str字符串
"query": {
            "bool": {
                "must": [{
                    "terms": {
                        "text": ["abc","def"]
                    }
                },{
                    "match":{
                        "text": "build the wall"
                    }
                }]
            }
        }
match 和match_phrase查询之前会分词。
match 查询短语是or的关系，也就是说如果查询build the wall，查询的字段里只要包括build或wall，就可以返回
match_phrase查询短语是and的关系，也就是说，如果查询build the wall，查询的字段里需要既包括build，也包括wall，且顺序不变

批量处理数据：helpers.bulk：

helpers.bulk分成action、metadata和doc三部分：

action : 必须是以下4种选项之一(在python中，使用_op_type来选择，默认是index）

index(最常用) : 如果文档不存在就创建他，如果文档存在就更新他

create : 如果文档不存在就创建他，但如果文档存在就返回错误

使用时一定要在metadata设置_id值，他才能去判断这个文档是否存在

update : 更新一个文档，如果文档不存在就返回错误

使用时也要给_id值，且后面文档的格式和其他人不一样

delete : 删除一个文档，如果要删除的文档id不存在，就返回错误

使用时也必须在metadata中设置文档_id，且后面不能带一个doc，因为没意义，他是用_id去删除文档的

metadata : 设置这个文档的metadata，像是_id、_index、_type...

doc : 就是一般的文档格式，比如{"tag_value":xxx, "tag_reason":[xxx], "tag_exist":True}

#向ES的XX_index中插入数据{"aa": 1}，指定该doc的id是doc_id（不指定就随便分配）
query ={
            '_index': XX_index,
            '_type': xx_type,
            '_source': {"aa": 1},
            '_id': doc_id
        }
actions.append(query)
helpers.bulk(client, actions, request_timeout=100)
#对应ES查询语句如下：
POST 127.0.0.1/_bulk
{"_index": XX_index, "_type": xx_type,
"index": {"_id": doc_id}, doc: {"aa": 1}}

#actions里面有几个查询语句，就会有几个返回结果。如果没指定index和tag，比如：
tag = {
    '_op_type':'update',
    'doc': {"aa": 1},
    '_id': doc_id
}
actions.append(tag)
helpers.bulk(client, actions, index=XX_index, doc_type=xx_type)   #要在bulk语句中指定。

#对应ES查询语句如下：
POST 127.0.0.1/XX_index/xx_type/_bulk
{"update": {"_id": doc_id}, doc: {"aa": 1}}

补充：相关性得分

match和match_phrase的检索结果会按照score的分值大小从大到小返回出来，这个分值体现了检索词和被检索项的相关性，使用TF-IDF来计算的

自定义分词器和过滤规则

再创建索引的时候，设置自定义的分词器和过滤规则，否则默认为standard：

PUT /my_index
{
    "settings": {
        "analysis": {
            "char_filter": { ... custom character filters ... },
            "filter":      { ...   custom token filters   ... },
            "analyzer":    { ...    custom analyzers      ... }
        }
    }
}

其中每一项都是json格式，比如：
 "filter": {
    "html_filter": {
      "pattern": "^https://t.co/.*",
      "type": "pattern_replace",
      "replacement": ""
    },
"analyzer": {
    "my_analyser": {
      "filter": [
        "hashtag_filter",
        "html_filter",
      ],
      "char_filter": [
        "html_strip"
      ],
      "type": "custom",
      "tokenizer": "uax_url_email"
    }
  },

可以测试一下分词器的效果（这个再任何时候都可以用）


post XX_index/_analyze
 
{
 
"analyzer":"my_analyser", #指定的分词器stardard、english、自定义的等
 
"text":"hello world" #要进行分析的文本

然后对相应字段设置使用自定义的analyzer（也是创建索引的时候，而不是插入数据之后，因为analyzer只能新建不能更新），比如我们对text字段使用：

PUT XX_index/_mapping/xx_type
{
  "properties": {
    "text": {
      "type": "text",
      "analyzer": "my_analyser"
    }
  }
}

最后可以插入数据，text字段的数据会在分析时使用自定义的my_analyser分析

ES查询笔记（附python语句）

检索数据：helpers.scan：

批量处理数据：helpers.bulk：

补充：相关性得分

自定义分词器和过滤规则