Stanford Parser (2)

最新推荐文章于 2017-10-09 09:44:11 发布

leeharry

最新推荐文章于 2017-10-09 09:44:11 发布

阅读量2.9k

点赞数

文章标签： file printing processing tools download java

本文链接：https://blog.youkuaiyun.com/leeharry/article/details/2145615

版权

引用:

作者: armstrong

how can I process a text and output the result?
thank you.

For English Text:

Under DOS, go to the directory where the parser is located, then type the line below:

lexparser.bat input.txt >output.txt

Then, enter to get your result.

For processing Chinese texts

Firstly, you need segement the input text (search ICTCLAS in this forum if you don't have). That is, convert 今天真热。to 今天真热。

Then save the segmented text in GB format (not UTF-8, which is used for the GUI/windows version).

Next, creat a bat file by copying and pasting the following lines (between the equal signs in blue) to your notepad, and save it with a name of lexparserCh.bat to the same folder where your parser program is:

=============================
@echo off
:: Runs the Chinese PCFG parser on one or more files, printing trees only
:: usage: lexparser fileToparse
java -server -mx800m -cp "stanford-parser.jar;" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "penn,typedDependenciesCollapsed" chineseFactored.ser.gz %1
=============================

Finally, go to the directory where the parser is located, and type the line below:

lexparserCh.bat inputCh.txt >outputCh.txt

Then, enter to get your result.

引用:

作者: armstrong

But this thread is mainly about Standford parser,not about the tagger.

It's quite similar actually. Anyway, firstly creat a bat file by copying and pasting the following lines (between the equal signs in blue) to your notepad:

=============================
@echo off
:: To tag a file using the pre-trained bidirectional model
:: usage: postagger.bat inputfile
java -mx300m -classpath postagger-2006-05-21.jar edu.stanford.nlp.tagger.maxent.MaxentTagger -model wsj3t0-18-bidirectional/train-wsj-0-18 -file input.txt >output.txt
=============================

Next, save it as a plain text file with the name of postagger.bat to the same folder where your Standford POS Tagger program is;

Then, save an English text with the name as input.txt to the same folder where the Tagger and postagger.bat are;

Finally, go to the folder where the Tagger, the postagger.bat and the input.txt are located, and double click the postagger.bat file to get your result file output.txt.

To tag another file, simply rename output.txt, and change the content of the input.txt file.

Good luck!

The PosTagger was trained for English texts, though it's said you can train it to tag Chinese texts. However, it may be difficult for many of us to do so. It'd be good to use ICTCLAS_Win.exe to tag your Chinese tests. You can download it under "NLP Tools" in my online storage at:

http://corpuslaohong.ys168.com/
Password: corpus4u
Leave a message there after you got it.

If you do want to tag Chinese texts with Standford tools, the Standford Parser can also produce the POS information for Chinese texts. Read my instruction on how to parse a Chinese text with Standford Parser in earlier posts.

__________________