安装IK中文分词和拼音插件
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.3.2/elasticsearch-analysis-ik-6.3.2.zip
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v6.3.2/elasticsearch-analysis-pinyin-6.3.2.zip
我的ES版本是6.3.2,具体安装插件版本看自己的ES版本。
可以验证插件是否安装成功:
POST /_analyze
{
"analyzer":"pinyin",
"text":"北京东"
}
POST /_analyze
{
"analyzer":"ik_max_word",
"text":"北京东"
}
结果如下
拼音的分析结果
{
"tokens": [
{
"token": "bei",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 0
},
{
"token": "jing",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 1
},
{
"token": "dong",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 2
},
{
"token": "bjd",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 2
}
]
}
--------
IK分词分析结果
{
"tokens": [
{
"token": "北京",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
{
"token": "京东",
"start_offset": 1,
"end_offset": 3,
"type": "CN_WORD",
"position": 1
}
]
}
新建index的表结构
PUT /station_test/
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"pinyin_analyzer": {
"tokenizer": "my_pinyin"
}
},
"tokenizer": {
"my_pinyin": {
"type": "pinyin",
"keep_first_letter":true,
"keep_separate_first_letter": true,
"keep_full_pinyin": true,