自定义词典:
一、添加词典
mkdir -p elasticsearch-2.4.4/plugins/analysis-ik/config/custom
vi elasticsearch-2.4.4/plugins/analysis-ik/config/custom/ext_word.txt
博世
bosch
注意事项:
1,每个单词一行
2,编码为utf-8 无bom
二、修改ik配置
<?xml version= "1.0" encoding= "UTF-8" ?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd" > <properties> <comment>IK Analyzer 扩展配置</comment> <!--用户可以在这里配置自己的扩展字典 --> <entry key= "ext_dict" >custom/ext_word.dic;custom/single_word_low_freq.dic</entry> <!--用户可以在这里配置自己的扩展停止词字典--> <entry key= "ext_stopwords" >custom/ext_stopword.dic</entry> <!--用户可以在这里配置远程扩展字典 --> <!-- <entry key= "remote_ext_dict" >words_location</entry> --> <!--用户可以在这里配置远程扩展停止词字典--> <!-- <entry key= "remote_ext_stopwords" >words_location</entry> --> </properties> |
三、重启es
同义词配置:
一、添加词典:
mkdir -p elasticsearch-2.4.4/config/analysis
vi elasticsearch-2.4.4/config/analysis/synonym.txt
博世,bosch
注意事项:
1,每行一组同义词,以逗号分隔
2,编码为utf-8 无bom
3,修改synonym.txt后需要重启es
二、索引配置修改
新建业务索引2_syn,添加同义词过滤器synonym_filter
setting设置如下:
{ "index" : { "analysis" : { "filter" : { "light_english_stemmer" : { "type" : "stemmer" , "language" : "light_english" }, "special_character_spliter" : { "type" : "word_delimiter" , "preserve_original" : "true" }, "synonym_filter" : { "type" : "synonym" , "synonyms_path" : "analysis/synonym.txt" } }, "analyzer" : { "charSplit" : { "filter" : [ "lowercase" , "synonym_filter" ], "type" : "custom" , "tokenizer" : "ngram_tokenizer" }, "optik_smart" : { "filter" : [ "lowercase" , "light_english_stemmer" , "special_character_spliter" , "synonym_filter" ], "type" : "custom" , "tokenizer" : "ik_smart" }, "optik" : { "filter" : [ "lowercase" , "light_english_stemmer" , "special_character_spliter" , "synonym_filter" ], "type" : "custom" , "tokenizer" : "ik" } }, "tokenizer" : { "ngram_tokenizer" : { "token_chars" : [ "letter" , "digit" , "punctuation" ], "min_gram" : "1" , "type" : "nGram" , "max_gram" : "30" } } } } } |
三、测试同义词
GET /2_syn/_analyze?analyzer=optik&pretty=true&text=博世
结果:
{ "tokens" : [ { "token" : "博世" , "start_offset" : 0 , "end_offset" : 2 , "type" : "CN_WORD" , "position" : 0 }, { "token" : "bosch" , "start_offset" : 0 , "end_offset" : 2 , "type" : "SYNONYM" , "position" : 0 } ] } |
四、数据迁移
使用reindex api迁移数据
POST _reindex { "source" : { "index" : "2" }, "dest" : { "index" : "2_syn" } } |
问题:
1,修改同义词词典synonym.txt 需要重启es
2,ik无法正确分词的token无法找到同义词,需要配合自定义词库使用