提示如下错误,
LookupError:
===========================================================================
NLTK was unable to find stanford-postagger.jar! Set the CLASSPATH
environment variable.
For more information, on stanford-postagger.jar, see:
<http://nlp.stanford.edu/software/tokenizer.shtml>
===========================================================================
解决思路:
打开 StanfordTokenizer 定义发现默认的 path_to_jar 为 _JAR = 'stanford-postagger.jar'。
解决方法:
从连接下载源文件,https://nlp.stanford.edu/software/stanford-postagger-full-2017-06-09.zip。
或通过wget https://nlp.stanford.edu/software/stanford-postagger-full-2017-06-09.zip下载。
解压之后,将stanford-postagger.jar所在路径传递给StanfordTokenizer即可,即
MyTokenizer = StanfordTokenizer(path_to_jar = path)