java对文字按照语义切分

2401_89793006

于 2025-04-28 15:39:37 发布

阅读量349

点赞数 5

CC 4.0 BY-SA版权

分类专栏： java 人工智能文章标签： java python 开发语言

本文链接：https://blog.youkuaiyun.com/2401_89793006/article/details/147588918

java 同时被 2 个专栏收录

33 篇文章

订阅专栏

人工智能

10 篇文章

订阅专栏

实现目标

把一段文本按照一个完整的一句话为单元进行切分。如：以逗号，感叹号结尾看作是一个句子。

实现方案

StanfordCoreNLP切分

引入依赖

        <dependency>
            <groupId>edu.stanford.nlp</groupId>
            <artifactId>stanford-corenlp</artifactId>
            <version>4.5.4</version>
        </dependency>

测试验证


import edu.stanford.nlp.pipeline.*;
import java.util.Properties;

public class CoreNLPSentenceSplitter {
    public static void main(String[] args) {
        // 设置属性
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize, ssplit");

        // 创建管道
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

        // 创建文档
        String text = """
                你好吗？我今天去了公园。你知道公园在哪里吗？天气真好啊！你喜欢什么运动？
                """;
        CoreDocument document = new CoreDocument(text);

        // 分析文本
        pipeline.annotate(document);

        // 获取句子
        for (CoreSentence sentence : document.sentences()) {
            System.out.println(sentence.text());
        }
    }
}

输出

你好吗？
我今天去了公园。
你知道公园在哪里吗？
天气真好啊！
你喜欢什么运动？

说明

通过StanfordCoreNLP知识对文本进行切割，如果需要进行句法分析还需要引入对应语言模型的依赖。由于我们没有用到这种功能，所以暂时就不引入了。