Stanford Parser (2)

引用:
作者: armstrong 查看帖子
how can I process a text and output the result?
thank you.
For English Text:

Under DOS, go to the directory where the parser is located, then type the line below:

lexparser.bat input.txt >output.txt

Then, enter to get your result.

For processing Chinese texts

Firstly, you need segement the input text (search ICTCLAS in this forum if you don't have). That is, convert 今天真热。to 今天 真 热 。

Then save the segmented text in GB format (not UTF-8, which is used for the GUI/windows version).

Next, creat a bat file by copying and pasting the following lines (between the equal signs in blue) to your notepad, and save it with a name of lexparserCh.bat to the same folder where your parser program is:

=============================
@echo off
:: Runs the Chinese PCFG parser on one or more files, printing trees only
:: usage: lexparser fileToparse
java -server -mx800m -cp "stanford-parser.jar;" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "penn,typedDependenciesCollapsed" chineseFactored.ser.gz %1
=============================

Finally, go to the directory where the parser is located, and type the line below:

lexparserCh.bat inputCh.txt >outputCh.txt

Then, enter to get your result.

引用:
作者: armstrong 查看帖子
But this thread is mainly about Standford parser,not about the tagger.
It's quite similar actually. Anyway, firstly creat a bat file by copying and pasting the following lines (between the equal signs in blue) to your notepad:

=============================
@echo off
:: To tag a file using the pre-trained bidirectional model
:: usage: postagger.bat inputfile
java -mx300m -classpath postagger-2006-05-21.jar edu.stanford.nlp.tagger.maxent.MaxentTagger -model wsj3t0-18-bidirectional/train-wsj-0-18 -file input.txt >output.txt
=============================

Next, save it as a plain text file with the name of postagger.bat to the same folder where your Standford POS Tagger program is;

Then, save an English text with the name as input.txt to the same folder where the Tagger and postagger.bat are;

Finally, go to the folder where the Tagger, the postagger.bat and the input.txt are located, and double click the postagger.bat file to get your result file output.txt.

To tag another file, simply rename output.txt, and change the content of the input.txt file.

Good luck!

 

The PosTagger was trained for English texts, though it's said you can train it to tag Chinese texts. However, it may be difficult for many of us to do so. It'd be good to use ICTCLAS_Win.exe to tag your Chinese tests. You can download it under "NLP Tools" in my online storage at:

http://corpuslaohong.ys168.com/
Password: corpus4u
Leave a message there after you got it.

If you do want to tag Chinese texts with Standford tools, the Standford Parser can also produce the POS information for Chinese texts. Read my instruction on how to parse a Chinese text with Standford Parser in earlier posts.
__________________
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值