5.Elasticsearch 如何手动控制全文检索结果的精准度

本文探讨了在Elasticsearch中使用Java、Hadoop、Spark和相关技术进行全文检索的方法,包括多值检索、精确匹配控制及关键词组合搜索。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

  • 全文检索的时候,进行多个值的检索,有两种做法,match query;should
  • 控制搜索结果精准度:and operator,minimum_should_match
 
POST /forum/article/_bulk
{ "index": { "_id": 1 }}
{ "articleID" : "XHDK-A-1293-#fJ3", "userID" : 1, "hidden": false, "postDate": "2017-01-01" }
{ "index": { "_id": 2 }}
{ "articleID" : "KDKE-B-9947-#kL5", "userID" : 1, "hidden": false, "postDate": "2017-01-02" }
{ "index": { "_id": 3 }}
{ "articleID" : "JODL-X-1937-#pV7", "userID" : 2, "hidden": false, "postDate": "2017-01-01" }
{ "index": { "_id": 4 }}
{ "articleID" : "QQPX-R-3956-#aD8", "userID" : 2, "hidden": true, "postDate": "2017-01-02" }

POST /forum/article/_bulk
{ "update": { "_id": "1"} }
{ "doc" : {"title" : "this is java and elasticsearch blog"} }
{ "update": { "_id": "2"} }
{ "doc" : {"title" : "this is java blog"} }
{ "update": { "_id": "3"} }
{ "doc" : {"title" : "this is elasticsearch blog"} }
{ "update": { "_id": "4"} }
{ "doc" : {"title" : "this is java, elasticsearch, hadoop blog"} }
{ "update": { "_id": "5"} }
{ "doc" : {"title" : "this is spark blog"} }

 

1.搜索标题中包含java或elasticsearch的blog

这个,就跟之前的那个term query,不一样了。不是搜索exact value,是进行full text全文检索。
match query,是负责进行全文检索的。当然,如果要检索的field,是not_analyzed类型的,那么match query也相当于term query。

GET /forum/article/_search
{
    "query": {
        "match": {
            "title": "java elasticsearch"
        }
    }
}

结果:
{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 0.8092568,
    "hits" : [
      {
        "_index" : "forum",
        "_type" : "article",
        "_id" : "4",
        "_score" : 0.8092568,
        "_source" : {
          "articleID" : "QQPX-R-3956-#aD8",
          "userID" : 2,
          "hidden" : true,
          "postDate" : "2017-01-02",
          "title" : "this is java, elasticsearch, hadoop blog"
        }
      },
      {
        "_index" : "forum",
        "_type" : "article",
        "_id" : "1",
        "_score" : 0.5753642,
        "_source" : {
          "articleID" : "XHDK-A-1293-#fJ3",
          "userID" : 1,
          "hidden" : false,
          "postDate" : "2017-01-01",
          "title" : "this is java and elasticsearch blog"
        }
      },
      {
        "_index" : "forum",
        "_type" : "article",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "articleID" : "JODL-X-1937-#pV7",
          "userID" : 2,
          "hidden" : false,
          "postDate" : "2017-01-01",
          "title" : "this is elasticsearch blog"
        }
      },
      {
        "_index" : "forum",
        "_type" : "article",
        "_id" : "2",
        "_score" : 0.19856805,
        "_source" : {
          "articleID" : "KDKE-B-9947-#kL5",
          "userID" : 1,
          "hidden" : false,
          "postDate" : "2017-01-02",
          "title" : "this is java blog"
        }
      }
    ]
  }
}

 

2、搜索标题中包含java和elasticsearch的blog

搜索结果精准控制的第一步:灵活使用and关键字,如果你是希望所有的搜索关键字都要匹配的,那么就用and,可以实现单纯match query无法实现的效果

GET /forum/article/_search
{
    "query": {
        "match": {
            "title": {
		"query": "java elasticsearch",
		"operator": "and"
   	    }
        }
    }
}

结果:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.8092568,
    "hits" : [
      {
        "_index" : "forum",
        "_type" : "article",
        "_id" : "4",
        "_score" : 0.8092568,
        "_source" : {
          "articleID" : "QQPX-R-3956-#aD8",
          "userID" : 2,
          "hidden" : true,
          "postDate" : "2017-01-02",
          "title" : "this is java, elasticsearch, hadoop blog"
        }
      },
      {
        "_index" : "forum",
        "_type" : "article",
        "_id" : "1",
        "_score" : 0.5753642,
        "_source" : {
          "articleID" : "XHDK-A-1293-#fJ3",
          "userID" : 1,
          "hidden" : false,
          "postDate" : "2017-01-01",
          "title" : "this is java and elasticsearch blog"
        }
      }
    ]
  }
}

3、搜索包含java,elasticsearch,spark,hadoop,4个关键字中,至少3个的blog

控制搜索结果的精准度的第二步:指定一些关键字中,必须至少匹配其中的多少个关键字,才能作为结果返回

GET /forum/article/_search
{
  "query": {
    "match": {
      "title": {
        "query": "java elasticsearch spark hadoop",
        "minimum_should_match": "75%"
      }
    }
  }
}

结果

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.449981,
    "hits" : [
      {
        "_index" : "forum",
        "_type" : "article",
        "_id" : "4",
        "_score" : 1.449981,
        "_source" : {
          "articleID" : "QQPX-R-3956-#aD8",
          "userID" : 2,
          "hidden" : true,
          "postDate" : "2017-01-02",
          "title" : "this is java, elasticsearch, hadoop blog"
        }
      }
    ]
  }
}

4、用bool组合多个搜索条件,来搜索title

GET /forum/article/_search
{
  "query": {
    "bool": {
      "must":     { "match": { "title": "java" }},
      "must_not": { "match": { "title": "spark"  }},
      "should": [
                  { "match": { "title": "hadoop" }},
                  { "match": { "title": "elasticsearch"   }}
      ]
    }
  }
}

bool组合多个搜索条件,如何计算relevance score

must和should搜索对应的分数,加起来,除以must和should的总数

      排名第一:java,同时包含should中所有的关键字,hadoop,elasticsearch
      排名第二:java,同时包含should中的elasticsearch
      排名第三:java,不包含should中的任何关键字

      should是可以影响相关度分数的

must是确保说,谁必须有这个关键字,同时会根据这个must的条件去计算出document对这个搜索条件的relevance score
在满足must的基础之上,should中的条件,不匹配也可以,但是如果匹配的更多,那么document的relevance score就会更高

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.449981,
    "hits" : [
      {
        "_index" : "forum",
        "_type" : "article",
        "_id" : "4",
        "_score" : 1.449981,
        "_source" : {
          "articleID" : "QQPX-R-3956-#aD8",
          "userID" : 2,
          "hidden" : true,
          "postDate" : "2017-01-02",
          "title" : "this is java, elasticsearch, hadoop blog"
        }
      },
      {
        "_index" : "forum",
        "_type" : "article",
        "_id" : "1",
        "_score" : 0.5753642,
        "_source" : {
          "articleID" : "XHDK-A-1293-#fJ3",
          "userID" : 1,
          "hidden" : false,
          "postDate" : "2017-01-01",
          "title" : "this is java and elasticsearch blog"
        }
      },
      {
        "_index" : "forum",
        "_type" : "article",
        "_id" : "2",
        "_score" : 0.19856805,
        "_source" : {
          "articleID" : "KDKE-B-9947-#kL5",
          "userID" : 1,
          "hidden" : false,
          "postDate" : "2017-01-02",
          "title" : "this is java blog"
        }
      }
    ]
  }
}

 

 

5.搜索java,hadoop,spark,elasticsearch,至少包含其中3个关键字

默认情况下,should是可以不匹配任何一个的,比如上面的搜索中,this is java blog,就不匹配任何一个should条件
但是有个例外的情况,如果没有must的话,那么should中必须至少匹配一个才可以
比如下面的搜索,should中有4个条件,默认情况下,只要满足其中一个条件,就可以匹配作为结果返回

但是可以精准控制,should的4个条件中,至少匹配几个才能作为结果返回

GET /forum/article/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title": "java" }},
        { "match": { "title": "elasticsearch"   }},
        { "match": { "title": "hadoop"   }},
    { "match": { "title": "spark"   }}
      ],
      "minimum_should_match": 3 
    }
  }
}

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值