08-可选和排除依赖

本文介绍了Maven中如何使用可选依赖来节省空间,处理不相关的库问题,以及如何通过依赖排除来控制项目构建时的类路径。还讨论了为何按依赖级别而非全局排除,以及如何配置禁用规则确保特定依赖不在classpath中。

Introduction

This section discusses optional dependencies and dependency exclusions. This will help users to understand what they are and when and how to use them. It also explains why exclusions are made on a per dependency basis instead of at the POM level.

Optional Dependencies

Optional dependencies are used when it’s not possible (for whatever reason) to split a project into sub-modules. The idea is that some of the dependencies are only used for certain features in the project and will not be needed if that feature isn’t used. Ideally, such a feature would be split into a sub-module that depends on the core functionality project. This new subproject would have only non-optional dependencies, since you’d need them all if you decided to use the subproject’s functionality.

However, since the project cannot be split up (again, for whatever reason), these dependencies are declared optional. If a user wants to use functionality related to an optional dependency, they have to redeclare that optional dependency in their own project. This is not the clearest way to handle this situation, but both optional dependencies and dependency exclusions are stop-gap【权宜之计】 solutions.

Why use optional dependencies?

Optional dependencies save space and memory. They prevent problematic【有问题的】 jars that violate【违反;违背(法律、协议等);强奸;亵渎】 a license agreement or cause classpath issues from being bundled into a WAR, EAR, fat jar, or the like.

How do I use the optional tag?

A dependency is declared optional by setting the <optional> element to true in its dependency declaration:

<project>
  ...
  <dependencies>
    <!-- declare the dependency to be set as optional -->
    <dependency>
      <groupId>sample.ProjectA</groupId>
      <artifactId>Project-A</artifactId>
      <version>1.0</version>
      <scope>compile</scope>
      <optional>true</optional> <!-- value will be true or false only -->
    </dependency>
  </dependencies>
</project>

How do optional dependencies work?

Project-A -> Project-B

The diagram above says that Project-A depends on Project-B. When A declares B as an optional dependency in its POM, this relationship remains unchanged. It’s just like a normal build where Project-B will be added in Project-A’s classpath.

Project-X -> Project-A

When another project (Project-X) declares Project-A as a dependency in its POM, the optional nature of the dependency takes effect. Project-B is not included in the classpath of Project-X. You need to declare it directly in the POM of Project X for B to be included in X’s classpath.

Example

Suppose there is a project named X2 that has similar functionality to Hibernate. It supports many databases such as MySQL, PostgreSQL, and several versions of Oracle. Each supported database requires an additional dependency on a driver jar. All of these dependencies are needed at compile time to build X2. However your project only uses one specific database and doesn’t need drivers for the others. X2 can declare these dependencies as optional, so that when your project declares X2 as a direct dependency in its POM, all the drivers supported by the X2 are not automatically included in your project’s classpath. Your project will have to include an explicit dependency on the specific driver for the one database it does use.

Dependency Exclusions

Since Maven resolves dependencies transitively, it is possible for unwanted dependencies to be included in your project’s classpath. For example, a certain older jar may have security issues or be incompatible with the Java version you’re using. To address this, Maven allows you to exclude specific dependencies. Exclusions are set on a specific dependency in your POM, and are targeted at a specific groupId and artifactId. When you build your project, that artifact will not be added to your project’s classpath by way of the dependency in which the exclusion was declared.

How to use dependency exclusions

Add an <exclusions> element in the <dependency> element by which the problematic jar is included.

<project>
  ...
  <dependencies>
    <dependency>
      <groupId>sample.ProjectA</groupId>
      <artifactId>Project-A</artifactId>
      <version>1.0</version>
      <scope>compile</scope>
      <exclusions>
        <exclusion>  <!-- declare the exclusion here -->
          <groupId>sample.ProjectB</groupId>
          <artifactId>Project-B</artifactId>
        </exclusion>
      </exclusions> 
    </dependency>
  </dependencies>
</project>

How dependency exclusion works and when to use it ( as a last resort! )

Project-A
   -> Project-B
        -> Project-D <! -- This dependency should be excluded -->
              -> Project-E
              -> Project-F
   -> Project C

The diagram shows that Project-A depends on both Project-B and C. Project-B depends on Project-D. Project-D depends on both Project-E and F. By default, Project A’s classpath will include:

B, C, D, E, F

Suppose you don’t want project D and its dependencies to be added to Project A’s classpath because some of Project-D’s dependencies are missing from the repository, and you don’t need the functionality in Project-B that depends on Project-D anyway. Project-B’s developers could have marked the dependency on Project-D <optional>true</optional>:

<dependency>
  <groupId>sample.ProjectD</groupId>
  <artifactId>ProjectD</artifactId>
  <version>1.0-SNAPSHOT</version>
  <optional>true</optional>
</dependency>

Unfortunately, they didn’t. As a last resort【采取(某手段或方法);度假胜地;可首先(或最后)采取的手段;】, you can exclude it on your own POM for Project-A like this:

<project>
  <modelVersion>4.0.0</modelVersion>
  <groupId>sample.ProjectA</groupId>
  <artifactId>Project-A</artifactId>
  <version>1.0-SNAPSHOT</version>
  <packaging>jar</packaging>
  ...
  <dependencies>
    <dependency>
      <groupId>sample.ProjectB</groupId>
      <artifactId>Project-B</artifactId>
      <version>1.0-SNAPSHOT</version>
      <exclusions>
        <exclusion>
          <groupId>sample.ProjectD</groupId> <!-- Exclude Project-D from Project-B -->
          <artifactId>Project-D</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
  </dependencies>
</project>

If you deploy Project-A to a repository, and Project-X declares a normal dependency on Project-A, will Project-D still be excluded from the classpath?

Project-X -> Project-A

The answer is Yes. Project-A has declared that it doesn’t need Project-D to run, so it won’t be brought in as a transitive dependency of Project-A.

Now, consider that Project-X depends on Project-Y, as in the diagram below:

Project-X -> Project-Y
               -> Project-B
                    -> Project-D
                       ...

Project-Y also has a dependency on Project-B, and it does need the features supported by Project-D. Therefore, it will NOT place an exclusion on Project-D in its dependency list. It may also supply an additional repository, from which it can resolve Project-E. In this case, it’s important that Project-D is not excluded globally, since it is a legitimate【合法的;合理的;使合法】 dependency of Project-Y.

As another scenario, suppose the dependency you don’t want is Project-E instead of Project-D. How do you exclude it? See the diagram below:

Project-A
   -> Project-B
        -> Project-D 
              -> Project-E <!-- Exclude this dependency -->
              -> Project-F
   -> Project C

Exclusions work on the entire dependency graph below the point where they are declared. If you want to exclude Project-E instead of Project-D, simply change the exclusion to point at Project-E, but you don’t move the exclusion down to Project-D. You cannot change Project-D’s POM. If you could, you would use optional dependencies instead of exclusions, or split Project-D up into multiple subprojects, each with nothing but normal dependencies.

<project>
  <modelVersion>4.0.0</modelVersion>
  <groupId>sample.ProjectA</groupId>
  <artifactId>Project-A</artifactId>
  <version>1.0-SNAPSHOT</version>
  <packaging>jar</packaging>
  ...
  <dependencies>
    <dependency>
      <groupId>sample.ProjectB</groupId>
      <artifactId>Project-B</artifactId>
      <version>1.0-SNAPSHOT</version>
      <exclusions>
        <exclusion>
          <groupId>sample.ProjectE</groupId> <!-- Exclude Project-E from Project-B -->
          <artifactId>Project-E</artifactId>
        </exclusion>
      </exclusions>
    </dependency>
  </dependencies>
</project>

Why exclusions are made on a per-dependency basis, rather than at the POM level

This is mainly to be sure the dependency graph is predictable, and to keep inheritance effects from excluding a dependency that should not be excluded. If you get to the method of last resort and have to put in an exclusion, you should be absolutely certain which of your dependencies is bringing in that unwanted transitive dependency.

If you truly want to ensure that a particular dependency appears nowhere in your classpath, regardless of path, the banned dependencies rule can be configured to fail the build if a problematic dependency is found. When the build fails, you’ll need to add specific exclusions on each path the enforcer finds.

Banned Dependencies

This rule checks the dependencies and fails if any of the matching excludes are found.

The following parameters are supported by this rule:

  • searchTransitive - if transitive dependencies should be checked. Default is true.

  • excludes - a list of artifacts to ban. The format is groupId[:artifactId][:version][:type][:scope][:classifier] where artifactId, version, type, scope and classifier are optional. Wildcards【通配符】 may be used to replace an entire or just parts of a section. Examples:

    • org.apache.maven
    • org.apache.maven:badArtifact
    • org.apache.maven:artifact:badVersion
    • org.apache.maven:*:1.2 (exclude version 1.2 and above, equivalent to【等价于】 [1.2,) )
    • org.apache.maven:*:[1.2] (explicit exclude of version 1.2)
    • org.apache.maven:::jar:test
    • :😗:jar:compile:tests
    • org.apache.:maven-😗
  • includes - a list of artifacts to include. These are exceptions to the excludes. It is meant to allow wide exclusion rules with wildcards and fine tune【微调】 using includes. If nothing has been excluded, then the includes have no effect. In otherwords, includes only subtract from【从…减去】 artifacts that matched an exclude rule.

    For example, to ban all xerces except xerces-api you would exclude “xerces” (groupId) and include “xerces:xerces-api”

  • message - an optional message to the user if the rule fails.

Sample Plugin Configuration:

<project>
  [...]
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-enforcer-plugin</artifactId>
        <version>3.3.0</version>
        <executions>
          <execution>
            <id>enforce-banned-dependencies</id>
            <goals>
              <goal>enforce</goal>
            </goals>
            <configuration>
              <rules>
                <bannedDependencies>
                  <excludes>
                    <exclude>org.apache.maven</exclude>
                    <exclude>org.apache.maven:badArtifact</exclude>
                    <exclude>*:badArtifact</exclude>
                  </excludes>
                  <includes>
                    <!--only 1.0 of badArtifact is allowed-->
                    <include>org.apache.maven:badArtifact:1.0</include>
                  </includes>
                </bannedDependencies>
              </rules>
              <fail>true</fail>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
  [...]
</project>

Example plugin configuration which ignores transitive dependencies:

<project>
  [...]
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-enforcer-plugin</artifactId>
        <version>3.3.0</version>
        <executions>
          <execution>
            <id>enforce-banned-dependencies</id>
            <goals>
              <goal>enforce</goal>
            </goals>
            <configuration>
              <rules>
                <bannedDependencies>
                  <excludes>
                    <exclude>commons-lang:commons-lang</exclude>
                  </excludes>
                  <searchTransitive>false</searchTransitive>
                </bannedDependencies>
              </rules>
              <fail>true</fail>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
  [...]
</project>
<think>我们正在对比多个具备知识图谱(KG)、文件管理、智能问答、知识问答报告生成功能的开源项目的基础依赖。 根据之前的讨论引用材料,我们关注以下项目: 1. DeepKE 2. LangChain(特别是与图数据库结合的部分) 3. OpenSPG 4. QAnything 我们将从以下角度分析每个项目的基础依赖- 知识图谱构建:存储查询知识图谱的数据库(如Neo4j、JanusGraph等)以及知识抽取工具(如NER、关系抽取工具) - 文件管理:支持的文件格式解析库(如PDF解析、文本提取等) - 智能问答:自然语言处理模型(如预训练语言模型)、问答框架等 - 报告生成:生成报告的工具(如模板引擎、Markdown/PDF导出库等) 注意:引用材料中[^1]提到知识获取从非结构化文本等构建知识图谱;[^3]推荐了一些工具(如jieba, Neo4j, BERT, Elasticsearch等)。 我们依次分析每个项目: 1. DeepKE: - 知识图谱构建:依赖PyTorch或TensorFlow进行深度学习(用于实体关系抽取),可能使用Neo4j等图数据库(但项目本身不强制,可导出到其他存储) - 文件管理:使用PyMuPDF(用于PDF)、python-docx(用于Word)等解析文档;也支持纯文本。 - 智能问答:基于规则或预训练模型(如BERT)的问答模块。 - 报告生成:使用Matplotlib或ECharts进行图谱可视化,报告生成可能依赖Jinja2模板等。 - 其他:需要Python 3.6+,以及一些NLP工具包(如transformers, nltk, spacy等)。 2. LangChain(以及其与图数据库的集成,如langchain-neo4j): - 知识图谱构建:需要图数据库(如Neo4j)或向量数据库(如Chroma, Weaviate);使用LangChain提供的图链(Graph Chains)进行图操作。 - 文件管理:依赖Unstructured、PyPDF2、pdfminer等库解析各种文件;也使用Tika等工具。 - 智能问答:依赖大型语言模型(如OpenAI的API或本地模型如Llama 2),以及LangChain的问答链(QA chains)检索器(retrievers)。 - 报告生成:可通过LangChain的链(Chains)代理(Agents)生成文本,并利用模板引擎(如Jinja2)或Markdown库生成报告;也可以集成ReportLab生成PDF。 - 其他:需要Python环境,以及相应的数据库驱动模型运行环境(如Hugging Face Transformers)。 3. OpenSPG: - 知识图谱构建:依赖自研的图存储(或适配其他存储如JanusGraph),以及知识抽取框架(包括实体链接、关系抽取等组件);使用Java/Scala开发。 - 文件管理:通过内置的ETL工具支持多种数据源(包括文本、表格等),可能使用Apache Tika等解析文件。 - 智能问答:支持SPARQL/Cypher查询,以及自然语言到查询语言的转换(NL2Query),可能使用预训练模型(如BERT)。 - 报告生成:内置知识图谱统计报告生成,使用Java的报表工具或集成前端可视化(如AntV)。 - 其他:需要Java运行环境(JDK 1.8+),以及Hadoop/Spark(用于大规模数据处理)等。 4. QAnything: - 知识图谱构建:虽然主要基于向量检索,但也支持知识图谱(通过外部图数据库),使用FAISS或Milvus等向量数据库;知识图谱功能相对较弱。 - 文件管理:支持多种格式(PDF、Word、Excel等),使用强大的解析库(如pdfplumber, openpyxl, python-docx等)。 - 智能问答:基于向量检索LLM(如ChatGLM、LLaMA等)的问答系统。 - 报告生成:支持将问答结果导出为Markdown或PDF(使用WeasyPrint或pandoc)。 - 其他:需要Python环境,以及相应的深度学习框架(如PyTorch)模型。 现在,我们将这些依赖整理成表格形式,并标记每个项目的主要依赖。 注意:由于项目依赖可能很复杂,我们只列出核心依赖,不列出全部传递依赖。 表格如下: | 项目 | 知识图谱构建依赖 | 文件管理依赖 | 智能问答依赖 | 报告生成依赖 | 其他重要依赖 | |--------------|----------------------------------|------------------------------------------|------------------------------------------------|---------------------------------|----------------------------------------------| | DeepKE | - PyTorch/TensorFlow<br>- 图数据库(可选,如Neo4j)<br>- 知识抽取模型 | - PyMuPDF<br>- python-docx<br>- txt处理库 | - transformers库(BERT等)<br>- 规则引擎 | - Matplotlib/ECharts<br>- Jinja2 | - Python 3.6+<br>- nltk, spacy, transformers | | LangChain | - Neo4j(或其他图数据库)<br>- 向量数据库(如Chroma) | - Unstructured<br>- PyPDF2<br>- Tika | - 大型语言模型(如OpenAI API, Llama2)<br>- LangChain框架 | - Jinja2<br>- ReportLab/Markdown库 | - Python 3.7+<br>- 数据库驱动(如neo4j-driver) | | OpenSPG | - 图存储引擎(JanusGraph/HBase等)<br>- Spark(分布式处理) | - Apache Tika<br>- 自定义ETL工具 | - 预训练模型(如BERT)<br>- NL2Query模块 | - 报表工具(如JasperReports)<br>- AntV | - JDK 1.8+<br>- Hadoop/Spark<br>- Scala/Java | | QAnything | - 向量数据库(FAISS/Milvus)<br>- 图数据库(弱) | - pdfplumber<br>- openpyxl<br>- python-docx | - 本地LLM(ChatGLM, LLaMA等)<br>- 检索增强生成(RAG) | - WeasyPrint/pandoc | - Python 3.8+<br>- PyTorch<br>- FAISS | 注意:引用[^4]中提到基线RAG可能检索不相关文本的问题,因此在使用向量检索(如QAnything)时需要注意优化检索质量。 最后,根据引用[^3]的内容,一些通用工具(如jieba, NLTK, Spacy, BERT, Neo4j, Elasticsearch等)在多个项目中可能被使用。 生成相关问题: 1. 如何部署这些开源项目的依赖环境? 2. 在资源受限的情况下,哪个项目对硬件的要求最低? 3. 如何优化基于向量检索的问答系统(如RAG)的准确性? 4. 知识图谱构建中,使用图数据库向量数据库有何区别? 回答中的引用标识:我们在引用材料中提到的地方添加引用标识,例如在提到知识获取时[^1],在提到教育领域应用挑战时[^2],在推荐工具时[^3]。 </think>以下是基于引用文献分析的**四大开源项目基础依赖对比表**,涵盖知识图谱(KG)、文件管理、智能问答、报告生成四大功能模块的核心技术栈: | 功能依赖 | DeepKE[^3] | LangChain[^3][^4] | OpenSPG[^1][^3] | QAnything[^3][^4] | |------------------------|--------------------------------|----------------------------------|----------------------------------|---------------------------------| | **知识图谱构建** | - PyTorch/TensorFlow<br>- OpenKG/CN-DBpedia[^3]<br>- jieba/THULAC分词[^3] | - Neo4j/Jena图数据库[^3]<br>- SPARQL/Cypher查询<br>- NetworkX图谱建模 | - SPG语义引擎<br>- 分布式图计算框架<br>- 实体对齐工具[^1] | - Faiss向量索引[^4]<br>- 轻量级RDF存储<br>- 实体链接工具 | | **文件管理** | - PyMuPDF/PDFMiner<br>- python-docx<br>- OpenCC中文处理 | - Unstructured.IO<br>- Apache Tika[^3]<br>- 多模态加载器 | - 结构化ETL管道<br>- 非结构化解析SDK<br>- 多源适配器 | - 100+格式解析器<br>- 文本提取OCR<br>- 分布式文件缓存 | | **智能问答** | - BERT/ERNIE模型[^3]<br>- 规则推理引擎<br>- 语义解析器 | - LLM集成(LLaMA/GPT)<br>- RAG增强架构[^4]<br>- 检索器(ES/Solr)[^3] | - NL2Cypher转换器<br>- 知识推理引擎<br>- 语义匹配服务 | - 本地化LLM(ChatGLM等)<br>- 混合检索系统[^4]<br>- 查询重写模块 | | **知识问答** | - 动态路径推理<br>- 关系验证模块<br>- 上下文感知服务 | - Agent工作流<br>- 工具调用链<br>- 自我修正机制 | - 行业schema模板<br>- 事理图谱引擎<br>- 可信度评估 | - 精排模型<br>- 多跳问答框架<br>- 消歧接口 | | **报告生成** | - Matplotlib可视化<br>- Jinja2模板<br>- JSON-LD导出 | - Markdown合成器<br>- ReportLab PDF生成<br>- 数据透视工具 | - 图谱质量分析器<br>- 自动摘要生成<br>- 合规性检查 | - WeasyPrint引擎<br>- 动态图表库<br>- 知识卡片生成 | | **核心基础设施** | Python 3.8+<br>Redis缓存<br>MySQL元数据库 | Docker/K8s<br>GPU推理服务<br>消息队列 | Java 17+<br>Hadoop/Spark<br>Prometheus监控 | ONNX运行时<br>MinIO存储<br>gRPC服务 | ### 关键依赖的技术注解 1. **知识抽取瓶颈**:所有项目均依赖NER工具(如jieba/THULAC[^3])关系抽取模型,这是构建高质量图谱的基础[^1] 2. **检索增强生成(RAG)**:LangChainQAnything采用RAG架构缓解LLM幻觉问题[^4],但需注意**向量检索可能返回无关片段**[^4] 3. **中文优化差异**: - OpenSPG内置中文schema模板语义解析器,适合教育领域知识建模[^2] - DeepKE采用OpenCC处理简繁转换并提供中文预训练模型[^3] 4. **报告生成技术栈**: ```mermaid graph LR A[结构化数据] --> B{报告引擎} C[图谱可视化] --> B D[自然语言生成] --> B B --> E[Jinja2模板] B --> F[ReportLab] B --> G[WeasyPrint] ``` ### 部署建议 - **教育领域应用**:优先考虑OpenSPG(工业级知识中台)或DeepKE(轻量级构建),可解决课程知识图谱构建挑战[^2] - **隐私敏感场景**:QAnything支持纯本地化部署,避免数据外泄风险 - **复杂问答需求**:LangChain的Agent架构支持多工具协同,但需配置LLM服务
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值