ES中------拼音搜索

本文介绍如何在Elasticsearch中配置拼音插件实现拼音搜索功能,并通过自定义分词器提升搜索体验。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

拼音搜索的关键是汉字与拼音的转换,只要找到这样的elasticsearch插件就可以了。在GitHub上恰好有这样的拼音插件
相关学习链接

安装拼音插件

在GitHub页面中找到releases:
首先下载ES版本对应的拼音插件
安装位置放到这个位置:
在这里插入图片描述
然后重启你的elasticsearch即可。

测试
在kibana中,输入命令测试:

POST _analyze
{
  "text": ["张学友", "刘德华"],
  "analyzer": "pinyin"
}

结果:

{
  "tokens" : [
    {
      "token" : "zhang",
      "start_offset" : 0,
      "end_offset" : 0,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "zxy",
      "start_offset" : 0,
      "end_offset" : 0,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "xue",
      "start_offset" : 0,
      "end_offset" : 0,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "you",
      "start_offset" : 0,
      "end_offset" : 0,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "liu",
      "start_offset" : 1,
      "end_offset" : 1,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "ldh",
      "start_offset" : 1,
      "end_offset" : 1,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "de",
      "start_offset" : 1,
      "end_offset" : 1,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "hua",
      "start_offset" : 1,
      "end_offset" : 1,
      "type" : "word",
      "position" : 5
    }
  ]
}

组合分词器
在分词处理时,会用到analyzer,我们以前称它为分词器。但其实它叫分析器,一般包含两部分:

-Tokenizer:分词器,对文本内容分词,得到词条Term

  • filter:过滤器,对分好的词条做进一步处理,例如拼音转换同义词转换

我们可以把各种下载的分词插件组合,作为tokenizer或者filter,来完成自定义分词效果。

示例:

PUT /goods
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_pinyin": {
          "tokenizer": "ik_smart",
          "filter": [
            "py"
          ]
        }
      },
      "filter": {
        "py": {
          "type": "pinyin",
          "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_length": 16,
          "none_chinese_pinyin_tokenize": false
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "name": {
        "type": "completion",
        "analyzer": "my_pinyin",
        "search_analyzer": "ik_smart"
      },
      "title":{
        "type": "text",
        "analyzer": "my_pinyin",
        "search_analyzer": "ik_smart"
      },
      "price":{
        "type": "long"
      }
    }
  }
}

说明:【注意一下拼音分词器的设置内容】

在这里插入图片描述
测试自定义分词器
我们在kibana中运行测试,看看分词效果:

POST /goods/_analyze
{
  "text": "你好,华为",
  "analyzer": "my_pinyin"
}

结果:

{
  "tokens" : [
    {
      "token" : "你好",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "nihao",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "nh",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "华为",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "huawei",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "hw",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 1
    }
  ]
}

测试拼音补全
一旦有了拼音分词器,尽管用户使用拼音,我们也能完成自动补全了。

先插入一部分数据:

PUT /goods/_bulk
{ "index" : {"_id":1 } }
{ "id": 1, "name": ["小米","手机"],"title":"小米10手机"}
{ "index" : {"_id":2 } }
{"id": 2,"name": ["小米", "空调"] ,"title":"小米空调"}
{ "index" : {"_id":3 } }
{"id": 3,"name": ["sony", "mp3"],"title":"sony播放器"}
{ "index" : {"_id":4 } }
{"id": 4,"name": ["松下", "电视"],"title":"松下电视"}

然后来一个自动补全的查询:使用prefix前缀来进行自动补全查询

POST /goods/_search
{
  "suggest": {
    "name_suggest": {
      "prefix": "s",
      "completion": {
        "field": "name"
      }
    }
  }
}

注意,我们输入的关键字是字母:s

看结果:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "name_suggest" : [
      {
        "text" : "s",
        "offset" : 0,
        "length" : 1,
        "options" : [
          {
            "text" : "sony",
            "_index" : "goods",
            "_type" : "_doc",
            "_id" : "3",
            "_score" : 1.0,
            "_source" : {
              "id" : 3,
              "name" : "sony",
              "title" : "sony播放器"
            }
          },
          {
            "text" : "手机",
            "_index" : "goods",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 1.0,
            "_source" : {
              "id" : 1,
              "name" : "手机",
              "title" : "小米手机"
            }
          },
          {
            "text" : "松下",
            "_index" : "goods",
            "_type" : "_doc",
            "_id" : "4",
            "_score" : 1.0,
            "_source" : {
              "id" : 4,
              "name" : "松下",
              "title" : "松下电视"
            }
          }
        ]
      }
    ]
  }
}

返回的提示包括:sony、松下、手机,都是以s开头,是不是很酷炫呢!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值