序
本文主要研究一下langchain4j+poi读取文档
步骤
pom.xml
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j</artifactId>
<version>1.0.0-beta1</version>
</dependency>
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-document-parser-apache-poi</artifactId>
<version>1.0.0-beta1</version>
</dependency>
example
public class POITest {
public static void main(String[] args) {
String path = System.getProperty("user.home") + "/downloads/tmp.xlsx";
DocumentParser parser = new ApachePoiDocumentParser();
Document document = FileSystemDocumentLoader.loadDocument(path, parser);
log.info("textSegment:{}", document.toTextSegment());
log.info("meta data:{}", document.metadata().toMap());
log.info("text:{}", document.text());
}
}
指定好了文件路径,通过ApachePoiDocumentParser来解析,最后统一返回Document对象,它可以返回textSegment,这个可以跟向量数据库结合在一起
EmbeddingModel embeddingModel = new AllMiniLmL6V2EmbeddingModel();
TextSegment segment1 = document.toTextSegment();
Embedding embedding1 = embeddingModel.embed(segment1).content();
embe

最低0.47元/天 解锁文章
2256

被折叠的 条评论
为什么被折叠?



