-
Project structure
-
IKAnalyzer.cfg.xml
The configuration file of IKAnalyzer, a sentence separator supported Chinese. It must in the root of src.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties> <comment>IK Analyzer extra configuration</comment> <!-- configure your own dic here --> <entry key="ext_dict">cn/com/tragicEnding/prov/util/ext.dic;</entry> <!-- configure your own stop dic here --> <entry key="ext_stopwords">cn/com/tragicEnding/prov/util/stopword.dic</entry> </properties>
-
ext.dic / stopword.dic
View the spec of dic.
-
Keywords.java
Function to separate sentence.
public static List<String> splitToKeywords(String word) { List<String> keywords = new ArrayList<String>(); try { Analyzer anal = new IKAnalyzer(true); StringReader reader = new StringReader(word); TokenStream ts = anal.tokenStream("", reader); CharTermAttribute term = ts.getAttribute(CharTermAttribute.class); while(ts.incrementToken()) { keywords.add(term.toString()); } reader.close(); anal.close(); } catch(Exception e) { e.printStackTrace(); } return keywords; }
IKAnalyzer中文分词
最新推荐文章于 2020-05-22 18:44:46 发布