OpenNLP 命令行

OpenNLP 命令行

1 安装

OPENNLP_HOME

E:\Software\NLP\apache-opennlp1.9.1
  • 在CLASSPATH变量后追加:
%OPENNLP_HOME%\lib;
  • 在Path后追加:
%OPENNLP_HOME%\bin;
  • 使用
    linux使用bin目录下的opennlp,windows使用opennlp.bat。
    栗子:如果当前命令行所在目录下有文档setence.txt,则该文档中的句子分词:

  linux

./opennlp SimpleTokenizer < sentences.txt

  windows

opennlp.bat SimpleTokenizer <sentences.txt

1.2 工具列表

LanguageDetector					#语言检测
LanguageDetectorTrainer 			#语言检测模型训练
LanguageDetectorConverter			#将莱比锡(leipzig)数据格式转换为本机OpenNLP格式
LanguageDetectorCrossValidator		#K-fold交叉验证器
LanguageDetectorEvaluator			#检测模型的效率

DictionaryBuilder					#穿件词典

SentenceDetector					#分句
SentenceDetectorTrainer
SentenceDetectorEvaluator
SentenceDetectorCrossValidator
SentenceDetectorConverter

SimpleTokenizer						#字符类分词
TokenizerME							#分词
TokenizerTrainer					#训练分词模型
TokenizerMEEvaluator				
TokenizerCrossValidator
TokenizerConverter					#将外国语言格式转换为本机OpenNLP格式
DictionaryDetokenizer

TokenNameFinder						#实体识别
TokenNameFinderTrainer
TokenNameFinderEvaluator
TokenNameFinderCrossValidator
TokenNameFinderConverter
CensusDictionaryCreator				#将1990年美国人口普查名称转换为字典

Doccat								#文档分类
DoccatTrainer			
DoccatCrossValidator
DoccatConverter
POSTagger 							#词性标记
POSTaggerTrainer
POSTaggerEvaluator
POSTaggerCrossValidator
POSTaggerConverter

LemmatizerME						#指代消除
LemmatizerTrainerME		
LemmatizerEvaluator

ChunkerME 							#分块
ChunkerTrainerME
ChunkerEvaluator
ChunkerCrossValidator
ChunkerConverter					#

Parser								#语法分析
ParserTrainer
ParserEvaluator
ParserConverter
BuildModelUpdater					#训练、更新语法分析模型
CheckModelUpdater					#训练、更新语法分析的检查模型
TaggerModelReplacer					#替换语法分析模型

EntityLinker						#将实体链接到外部数据集

NGramLanguageModel

1.3 使用详细说明

1.3.1 句子检测器
  • SentenceDetector
Usage: opennlp SentenceDetector model < sentences

Arguments description:
	-model     
		模型
	-setences 
		要解析的文件

 栗子:

opennlp.bat SentenceDetector ch_sentence_detector.bin < sentences.txt > output.txt
  • SentenceDetectorTrainer
Usage: opennlp SentenceDetectorTrainer [.irishsentencebank|.ad|.pos|.conllx|.namefinder|.parse|.moses|.conllu|.letsmt] 
        [-factory factoryName]
		[-eosChars string]
		[-abbDict path] 
		[-params paramsFile] 
		-lang language 
		-model modelFile 
		-data sampleData 
		[-encoding charsetName] 

Arguments description:
	-factory factoryName
		A sub-class of SentenceDetectorFactory where to get implementation and resources.
	-eosChars string
		EOS characters.
	-abbDict path
		abbreviation dictionary in XML format.
	-params paramsFile
		training parameters file.
	-lang language
		language which is being processed.
	-model modelFile
		output model file.
	-data sampleData
		data to be used, usually a file name.
	-encoding charsetName
		encoding for reading and writing text, if absent the system default is used.

 栗子:

opennlp.bat SentenceDetectorTrainer -model ch_sentence_detector.bin -lang jpn -data ch_sentence_detector.train -encoding UTF-8

注:中文训练时,如果使用默认符号分句,则lang必须为jpn。

  • SentenceDetectorEvaluator
Usage: opennlp SentenceDetectorEvaluator[.nkjp|.irishsentencebank|.ad|.pos|.conllx|.namefinder|.parse|.moses|.conllu|.letsmt] 
		-model model 
		[-misclassified true|false]
		-data sampleData 
		[-encoding charsetName]

Arguments description:
        -model model
                the model file to be evaluated.
        -misclassified true|false
                if true will print false negatives and false positives.
        -data sampleData
                data to be used, usually a file name.
        -encoding charsetName
                encoding for reading and writing text, if absent the system default is used.

 栗子:

opennlp.bat SentenceDetectorEvaluator -model ch_sentence_detector.bin -misclassified true -data sentences.txt -encoding UTF-8
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值