elasticsearch使用ik分词器

最新推荐文章于 2025-03-26 17:39:29 发布

97年的典藏版

最新推荐文章于 2025-03-26 17:39:29 发布

阅读量196

点赞数

分类专栏：服务端/搜索引擎/solr

本文链接：https://blog.youkuaiyun.com/lwy572039941/article/details/107912801

版权

服务端/搜索引擎/solr 专栏收录该内容

5 篇文章

订阅专栏

1.下载对应es版本的IK
地址：https://github.com/medcl/elasticsearch-analysis-ik/releases

2.解压–>将文件复制到 es的安装目录/plugin/ik下面即可，完成之后效果如下：
在这里插入图片描述
3.重启ElasticSearch
4.测试效果
未使用ik分词器的时候测试分词效果：

POST book/_analyze
{
  "text": "我是中国人"
}
//结果是：
{
  "tokens": [
    {
      "token": "我",
      "start_offset": 0,
      "end_offset": 1,
      "type": "<IDEOGRAPHIC>",
      "position": 0
    },
    {
      "token": "是",
      "start_offset": 1,
      "end_offset": 2,
      "type": "<IDEOGRAPHIC>",
      "position": 1
    },
    {
      "token": "中",
      "start_offset": 2,
      "end_offset": 3,
      "type": "<IDEOGRAPHIC>",
      "position": 2
    },
    {
      "token": "国",
      "start_offset": 3,
      "end_offset": 4,
      "type": "<IDEOGRAPHIC>",
      "position": 3
    },
    {
      "token": "人",
      "start_offset": 4,
      "end_offset": 5,
      "type": "<IDEOGRAPHIC>",
      "position": 4
    }
  ]
}

使用IK分词器之后，结果如下：
POST book_v6/_analyze
{
“analyzer”: “ik_max_word”,
“text”: “我是中国人”
}
//结果如下：
{
“tokens”: [
{
“token”: “我”,
“start_offset”: 0,
“end_offset”: 1,
“type”: “CN_CHAR”,
“position”: 0
},
{
“token”: “是”,
“start_offset”: 1,
“end_offset”: 2,
“type”: “CN_CHAR”,
“position”: 1
},
{
“token”: “中国人”,
“start_offset”: 2,
“end_offset”: 5,
“type”: “CN_WORD”,
“position”: 2
},
{
“token”: “中国”,
“start_offset”: 2,
“end_offset”: 4,
“type”: “CN_WORD”,
“position”: 3
},
{
“token”: “国人”,
“start_offset”: 3,
“end_offset”: 5,
“type”: “CN_WORD”,
“position”: 4
}
]
}