StanfordParser句法分析输入输出

最新推荐文章于 2021-12-07 08:18:05 发布

原创最新推荐文章于 2021-12-07 08:18:05 发布 · 2.5k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#SD #输入流 #句法分析

本文详细介绍了如何使用StanfordParser进行语法分析，包括从输入流读取数据并输出到文件的具体步骤。通过示例展示了如何解析英语句子并生成依存关系树。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

在使用StanfordParser（SD）进行语法分析时，SD默认使用的是从文件读入和输出到输出流，如下：

在cmd的python命令行里输入：

java -mx150m -cp "*;" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "penn,typedDependencies" edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz input.txt

可以看到输入为input.txt，输出为默认的stdout

但有时候我们只想从输入流中读入，并输出到文件中，方便处理，这时候就可以参考SD的FAQ文档http://nlp.stanford.edu/software/parser-faq.shtml，里面有如下解释：

这里写图片描述

可以看到，SD是可以从输入流中读取数据，并输出到文件中的。

在cmd的python命令行里输入：

java -mx150m -cp "*;" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "penn,typedDependencies" edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz  -

在导入parser包后，会提示输入分析的句子，如下所示

G:\Bioinformatics\TextMining\stanford-parser>java -mx150m -cp "*;" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "penn,typedDependencies" edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz -
Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [1.1 sec].
Parsing file: -
The first matrix was generated from pseudo-random draws from a Gaussian distribution.
The second matrix was generated to precisely match the conditions that NMF models.
Parsing [sent. 1 len. 13]: The first matrix was  generated from pseudo-random dra
ws from a Gaussian distribution .
(ROOT
  (S
    (NP (DT The) (JJ first) (NN matrix))
    (VP (VBD was)
      (VP (VBN generated)
        (PP (IN from)
          (NP (JJ pseudo-random) (NNS draws)))
        (PP (IN from)
          (NP (DT a) (NNP Gaussian) (NN distribution)))))
    (. .)))

det(matrix-3, The-1)
amod(matrix-3, first-2)
nsubjpass(generated-5, matrix-3)
auxpass(generated-5, was-4)
root(ROOT-0, generated-5)
amod(draws-8, pseudo-random-7)
prep_from(generated-5, draws-8)
det(distribution-12, a-10)
nn(distribution-12, Gaussian-11)
prep_from(generated-5, distribution-12)

细心的同学会发现，最后一句还没有被处理。细看FAQ发现，最后一句说，需要关闭输入流最后一句才会被处理或者使用参数 -sentences newline（有关参数 -sentences newline请查看上一篇博客）

这里我们关闭输入流，结果如下：

-
Parsing [sent. 2 len. 14]: The second matrix was generated to precisely match th
e conditions that NMF models .
(ROOT
  (S
    (NP (DT The) (JJ second) (NN matrix))
    (VP (VBD was)
      (VP (VBN generated)
        (S
          (VP (TO to)
            (VP
              (ADVP (RB precisely))
              (VB match)
              (NP (DT the) (NNS conditions))
              (NP (DT that) (NNP NMF) (NNS models)))))))
    (. .)))

det(matrix-3, The-1)
amod(matrix-3, second-2)
nsubjpass(generated-5, matrix-3)
xsubj(match-8, matrix-3)
auxpass(generated-5, was-4)
root(ROOT-0, generated-5)
aux(match-8, to-6)
advmod(match-8, precisely-7)
xcomp(generated-5, match-8)
det(conditions-10, the-9)
iobj(match-8, conditions-10)
det(models-13, that-11)
nn(models-13, NMF-12)
dobj(match-8, models-13)