tika提取pdf信息异常,tika提取pdf信息org.apache.tika.sax.WriteOutContentHandler$WriteLimitReachedException: Your document contained more than 100000 characters, and so your requested limit has been reached. To receive the full text of the document, increase your limit. (Text up to the limit is however available).
at org.apache.tika.sax.WriteOutContentHandler.characters(WriteOutContentHandler.java:141)
at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at org.apache.tika.sax.xpath.MatchingContentHandler.characters(MatchingContentHandler.java:85)
at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHand

在使用Apache Tika提取PDF信息时遇到字符限制错误:'Your document contained more than 100000 characters...'。通过调整BodyContentHandler构造函数的参数,增大字符限制,成功获取PDF的元数据信息,包括dc:subject、Creation-Date、dc:creator等。
最低0.47元/天 解锁文章
1621

被折叠的 条评论
为什么被折叠?



