在使用stanford ner工具的时候,每次执行程序时,都需要进行load model,这是非常耗时间的。因此希望能实现一次加载模型,多次使用。
这是可以的,官方也提供这样的API。
英文使用
//启动Server端
java -mx1000m -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer -loadClassifier classifiers/english.conll.4class.distsim.crf.ser.gz -port 2314 -outputFormat inlineXML
//目前英文4类识别效果是最好,-port指定端口,也可以使用nohup命令后台执行
//启动Client端
java -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer -port 2314 -client
//API使用
import edu.stanford.nlp.ie.NERServer.NERClient //内部类
NERClient.communicateWithNERServer(String host, int port, String charset,BufferedReader input, BufferedWriter output, boolean closeOnblank)
//host: ip地址 port:端口
//charset:制定字符编码 closeOnblank:输入空行后结束
//写一下处理字符串的过程
StringReader sr = new StringReader(String text);
BufferedReader br = new BufferedReader(sr);
StringWriter sw = new StringWriter();
BufferedWriter bw = new BufferedWriter(sw);
NERClient.commucicateWithNERServer("localhost",2314,"UTF-8",br,bw,true);
bw.close();
br.close();
String result = sw.toString();
其他语言的使用
类似使用其他的语言,会出现错误。初步认为是model配置的原因。
因此需要自己来写实现的。以中文举例:
中文使用
//抄袭官方的写法
package com.li.cnServer;
import java.io.IOException;
import java.util.Properties;
import edu.stanford.nlp.ie.AbstractSequenceClassifier;
import edu.stanford.nlp.ie.NERServer;
import edu.stanford.nlp.ie.crf.CRFClassifier;
/**
* Hello world!
*
*/
public class App
{
public static void main( String[] args ) throws ClassCastException, ClassNotFoundException, IOException
{
Properties props = new Properties();
props.setProperty("loadClassifier", "edu/stanford/nlp/models/ner/chinese.misc.distsim.crf.ser.gz");
props.setProperty("port", "2310");
String loadFile = props.getProperty("loadClassifier");
String loadJarFile = props.getProperty("loadJarClassifier");
String client = props.getProperty("client");
String portStr = props.getProperty("port", "4465");
props.remove("port"); // so later code doesn't complain
if (portStr == null || portStr.equals("")) {
// System.err.println(USAGE);
return;
}
String charset = "utf-8";
String encoding = props.getProperty("encoding");
if (encoding != null && ! "".equals(encoding)) {
charset = encoding;
}
int port;
try {
port = Integer.parseInt(portStr);
} catch (NumberFormatException e) {
System.err.println("Non-numerical port");
// System.err.println(USAGE);
return;
}
// default output format for if no output format is specified
if (props.getProperty("outputFormat") == null) {
props.setProperty("outputFormat", "inlineXML");
}
if (client != null && ! client.equals("")) {
// run a test client for illustration/testing
String host = props.getProperty("host");
// NERClient.communicateWithNERServer(host, port, charset);
} else {
AbstractSequenceClassifier asc;
if (loadFile != null && ! loadFile.equals("")) {
asc = CRFClassifier.getClassifier(loadFile, props);
} else if (loadJarFile != null && ! loadJarFile.equals("")) {
asc = CRFClassifier.getJarClassifier(loadJarFile, props);
} else {
asc = CRFClassifier.getDefaultClassifier(props);
}
new NERServer(port, asc, charset).run();
}
// System.out.println( "Hello World!" );
}
}
使用mvn打成jar包
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.li</groupId>
<artifactId>cnServer</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>cnServer</name>
<url>http://maven.apache.org</url>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>2.6</version>
<configuration>
<archive>
<manifest>
//指定主类 根据自己的需要修改
<mainClass>com.li.cnServer.App</mainClass>
<addClasspath>true</addClasspath>
<classpathPrefix>lib/</classpathPrefix>
</manifest>
</archive>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>2.10</version>
<executions>
<execution>
<id>copy-dependencies</id>
<phase>package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/lib</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.6.0</version>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.6.0</version>
<classifier>models-chinese</classifier>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
</dependencies>
//启动服务端
java -jar cnServer-0.0.1-SNAPSHOT.jar
这样以后,客户端的使用就跟英文使用是一样的了。

本文介绍如何利用Stanford NER工具通过一次加载模型实现多次使用的方案,包括启动Server和Client端的具体步骤及代码示例,特别针对中文NER任务。
1534

被折叠的 条评论
为什么被折叠?



