springai+ollama+deepseek R1+RAG定量数据库本地化部署学习

退休的梦想

已于 2025-03-07 13:36:24 修改

阅读量1.3k

点赞数 15

文章标签：学习

于 2025-03-07 08:41:10 首次发布

本文链接：https://blog.youkuaiyun.com/qq_34960590/article/details/145647899

版权

1.介绍

1. 本文用于介绍springai+ollama+deepseek R1进行本地化部署，实现一个聊天demo

1.spring-ai介绍

Spring AI是一个面向人工智能工程的应用框架。它的目标是将Spring生态系统的设计原则（如可移植性和模块化设计）应用于AI领域，并推广使用pojo作为AI领域应用的构建模块。

作用：

1. spring-ai只需要使用调用大模型接口进行使用即可，我们不需要了解其内部如何实现。

2.RAG检索增强生成：用于存储资料，知识到向量数据库，存储的过程叫Embeddings(绑定)，大模型就可以结合向量数据库进行返回结果；

向量数据库可以使用redis，es等

3. fine-tuning(微调)：垂直领域（具体行业），用于单独训练某个领域的知识、资料。4. function-call(函数调用)：通过定义Java函数接口，并使用特定注解来描述函数的输入参数、输出结果及其功能实现

2.ollama介绍

官网：https://ollama.com/
1> Ollama是一个用于部署和运行各种开源大模型的工具；【支持78种模型】
2> 它能够帮助用户快速在本地运行各种大模型，极大地简化了大模型在本地运行的过程。用户通过执行几条命令就能在本地运行开源大模型，如Llama 2等；
3> 综上，Ollama是一个大模型部署运行工具，在该工具里面可以部署运行各种大模型，方便开发者在本地搭建一套大模型运行环境；

2.ollama安装

1. 下载地址https://ollama.com/download

说明：Ollama的运行会受到所使用模型大小的影响；
1> 例如，运行一个7B(70亿参数)的模型至少需要8GB的可用内存(RAM),而运行一个13B(130亿参数)的模型需要16GB的内存，33B(330亿参数)的模型需要32GB的内存；
> 需要考虑有足够的磁盘空间，大模型的文件大小可能比较大，建议至少为Ollama和其模型预留50GB的磁盘空间；
3> 性能较高的CPU可以提供更好的运算速度和效率，多核处理器能够更好地处理并行任务，选择具有足够核心数的CPU;
4> 显卡(GPU):Ollama支持纯CPU运行，但如果电脑配备了NVIDIA GPU,可以利用GPU进行加速，提高模型的运行速度和性能；

1.下载

1.下载成功后点击执行OllamaSetup.exe，安装Ollama工具；注意：要以管理员身份运行

2.验证是否成功 ollama --version

2.安装deepseek R1大模型

1.点击Models

2.这里有很多大模型：可自行选择，本文演示使用deepseek-1

3.根据自己电脑配置选择不同的参数

4.配置要求：

1. DeepSeek-R1-1.5B
CPU: 最低 4 核（推荐 Intel/AMD 多核处理器）

内存: 8GB+

硬盘: 3GB+ 存储空间（模型文件约 1.5-2GB）

显卡: 非必需（纯 CPU 推理），若 GPU 加速可选 4GB+ 显存（如 GTX 1650）

场景:

低资源设备部署（如树莓派、旧款笔记本）

实时文本生成（聊天机器人、简单问答）

嵌入式系统或物联网设备

2. DeepSeek-R1-7B
CPU: 8 核以上（推荐现代多核 CPU）

内存: 16GB+

硬盘: 8GB+（模型文件约 4-5GB）

显卡: 推荐 8GB+ 显存（如 RTX 3070/4060）

场景:

本地开发测试（中小型企业）

中等复杂度 NLP 任务（文本摘要、翻译）

轻量级多轮对话系统

3. DeepSeek-R1-8B
硬件需求: 与 7B 相近，略高 10-20%

场景:

需更高精度的轻量级任务（如代码生成、逻辑推理）

4. DeepSeek-R1-14B
CPU: 12 核以上

内存: 32GB+

硬盘: 15GB+

显卡: 16GB+ 显存（如 RTX 4090 或 A5000）

场景:

企业级复杂任务（合同分析、报告生成）

长文本理解与生成（书籍/论文辅助写作）

5. DeepSeek-R1-32B
CPU: 16 核以上（如 AMD Ryzen 9 或 Intel i9）

内存: 64GB+

硬盘: 30GB+

显卡: 24GB+ 显存（如 A100 40GB 或双卡 RTX 3090）

场景:

高精度专业领域任务（医疗/法律咨询）

多模态任务预处理（需结合其他框架）

6. DeepSeek-R1-70B
CPU: 32 核以上（服务器级 CPU）

内存: 128GB+

硬盘: 70GB+

显卡: 多卡并行（如 2x A100 80GB 或 4x RTX 4090）

场景:

科研机构/大型企业（金融预测、大规模数据分析）

高复杂度生成任务（创意写作、算法设计）

7. DeepSeek-R1-671B
CPU: 64 核以上（服务器集群）

内存: 512GB+

硬盘: 300GB+

显卡: 多节点分布式训练（如 8x A100/H100）

场景:

国家级/超大规模 AI 研究（如气候建模、基因组分析）通用人工智能（AGI）探索

5.安装模型

打开cmd,复制安装代码

C:\Users\Admin>ollama run deepseek-r1:8b
pulling manifest
pulling aabd4debf0c8... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████▏ 1.1 GB
pulling 369ca498f347... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████▏  387 B
pulling 6e4c38e1172f... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████▏ 1.1 KB
pulling f4d24e9138dd... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████▏  148 B
pulling a85fe2a2e58e... 100% ▕███████████████████████████████████████████████████████████████████████████████████████████▏  487 B
verifying sha256 digest
writing manifest
success
>>> Send a message (/? for help)

1.可以发送问题提问，完成安装

查看模型列表命令：ollama list

3.创建项目

1.创建springboot项目

这里不进行演示了

2.配置

maven配置

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>spring-ai-ollama-demo</artifactId>
    <version>1.0-SNAPSHOT</version>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>3.3.2</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>

    <properties>
        <maven.compiler.source>17</maven.compiler.source>
        <maven.compiler.target>17</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <springboot.version>3.3.2</springboot.version>
        <sping-ai.version>1.0.0-M5</sping-ai.version>
    </properties>


    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-devtools</artifactId>
            <optional>true</optional>
        </dependency>

        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
    </dependencies>


    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework.ai</groupId>
                <artifactId>spring-ai-bom</artifactId>
                <version>${sping-ai.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

</project>

yml配置

server:
  port: 8080
  servlet:
    context-path: /spring-ai-ollama-demo
spring:
  application:
    name: spring-ai-ollama-demo
  ai:
    ollama:
      base-url: http://127.0.0.1:11434
      chat:
        model: deepseek-r1:8b

3.注入ChatClient

@Configuration
@RequiredArgsConstructor
public class AiConfig {

    final OllamaChatModel ollamaChatModel;

    @Bean
    ChatClient chatClient(ChatMemory chatMemory){
        return ChatClient.builder(ollamaChatModel)
                //system角色用于设置ai的行为，角色，背景等，通常可以用于设定对话的语境，让ai在指定的语境下工作
                .defaultSystem("你是一个医生")
                // 联系上下文，存储历史信息，到本地内存中
                .defaultAdvisors(new MessageChatMemoryAdvisor(chatMemory))
                .build();
    };


    /**
     * 联系上下文，存储历史信息，到本地内存中
     * @return
     */
    @Bean
    public ChatMemory chatMemory(){
        return new InMemoryChatMemory();
    }
}

4.controller

@RestController
@RequestMapping("/chatClient")
@RequiredArgsConstructor
public class ChatClientController {

    final ChatClient chatClient;

    @RequestMapping("/chat")
    public String chat(String msg) {
        return chatClient.prompt()
                .user(msg)
                .call()
                .content();
    }
}

5.访问

4.RAG定量数据库

1. 什么是RAG

1.检索增强生成
RAG即检索增强生成，为 LLM 提供了从某些数据源检索到的信息，并基于此修正生成的答案。RAG 基本上是 Search + LLM 提示，可以通过大模型回答查询，并将搜索算法所找到的信息作为大模型的上下文。查询和检索到的上下文都会被注入到发送到 LLM 的提示语中。

2.基础的 RAG 技术
RAG 系统的起点一般是一个文本文档的语料库，简单看起来是这样的: 把文本分割成块，然后把这些分块嵌入到向量与transformer编码器模型，把所有这些向量建立索引，最后创建一个 LLM 提示语，告诉模型回答用户的查询，给出在搜索步骤中找到的上下文。在运行时，我们用相同的编码器模型完成用户查询的向量化，然后执行这个查询向量的索引搜索，找到top-k 的结果，从数据库中检索到相应的文本块，并提供给 LLM 提示语Prompt作为上下文。

3.使用Embeding模型与向量数据库

Embeding的核心思想是将原本复杂，稀疏的输入数据（如一个词或一张图像）映射到一个连续的向量空间，其中相似的输入会被映射到空间中相近的点。通过训练神经网络或其他机器学习算法，模型可以学习到如何在这个向量空间中表示输入数据。

Embeding模型:https://ollama.com/search?c=embedding

2. 配置步骤

1.使用postgres作为向量数据库，也可以使用reids，mysql，内存等数据库进行。

2.使用ollama提供的embeding本地服务：本文使用：all-minilm，如果企业级的可以更改其他模型all-minilmhttps://ollama.com/library/all-minilm

命令窗口拉取镜像：

ollama pull all-minilm

3.创建向量数据库：

使用：Pgvector

GitHub - pgvector/pgvector: Open-source vector similarity search for Postgres

pgvector是一个基于PostgreSQL的向量数据库插件，它提供了一种高效的向量索引和搜索机制，能够处理大规模的向量数据。与传统的关系型数据库相比，pgvector插件能够更好地应对高维向量数据的存储和查询需求，实现了更快速、更准确的向量搜索。

windows安装：Windows 安装 PostgreSQL 并安装 vector 扩展_pgvector windows-优快云博客

docker安装：

docker pull pgvector/pgvector:pg17

docker run -d --name pgvecotr -p 5433:5432 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres pgvector/pgvector:pg17

安装完成后即可连接

执行sql,初始化向量库:

CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS hstore;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";


CREATE TABLE IF NOT EXISTS vector_store(
id uuid DEFAULT uuid_generate_v4() PRIMARY KEY,
content text,
metadata json,
embedding vector(384)
);

CREATE INDEX ON vector_store USING HNSW (embedding vector_cosine_ops);

4.maven依赖

       <!--spring-ai pgvector向量库依赖-->
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
        </dependency>

5.yml配置

spring:
  application:
    name: spring-ai-ollama-demo
  datasource:
    url: jdbc:postgresql://localhost:5433/springai
    username: postgres
    password: postgres
  ai:
    vectorstore:
      pgvector:
        index-type: HNSW
        distance-type: COSINE_DISTANCE
        #维度： all-minilm模型只支持384；pgvector最大只支持2000的维度
        dimensions: 384
        batching-strategy: TOKEN_COUNT
        max-document-batch-size: 1000
    ollama:
      base-url: http://127.0.0.1:11434
      chat:
        model: deepseek-r1:8b
      embedding:
        enabled: true

6.写入向量库

1. 在resouce文件下放入：nocode.txt;存入向量库的数据：

    @RequestMapping("/writer")
    public String writer() throws IOException {
        StringBuffer text =new StringBuffer();
        ClassLoader classLoader = getClass().getClassLoader();
        InputStream inputStream = classLoader.getResourceAsStream("nocode.txt");
        try (BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream))) {
            String line;
            while ((line = reader.readLine()) != null) {
                text.append(line);
            }
        }
        store.write(Arrays.stream(text.toString().split("\n")).map(Document::new).toList());
        return "success";
    }

chatClient引用向量数据

3.chatpdf

1. 可以将pdf类型的文件写入向量库中

2.用处有很多，可以打造自己的专属知识库，或企业知识库

引入依赖：

        <!--spring-ai pdf文档读取依赖-->
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-pdf-document-reader</artifactId>
        </dependency>

    @RequestMapping("/writerPdf")
    public String writerPdf() throws IOException {
        //读取pdf文件存入向量库
        PagePdfDocumentReader pagePdfDocumentReader = new PagePdfDocumentReader("classpath:/新乡市2023年度DIP病种主目录库及病种分值.pdf",
                PdfDocumentReaderConfig.builder()
                        .withPageTopMargin(0)
                        .withPageExtractedTextFormatter(ExtractedTextFormatter.builder()
                                .withNumberOfTopTextLinesToDelete(0)
                                .build())
                        .withPagesPerDocument(1)
                        .build());
        store.write(pagePdfDocumentReader.read());
        return "success";
    }

3.也可以引入markdown

        <!--spring-ai markdown文档读取依赖-->
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-markdown-document-reader</artifactId>
        </dependency>

    public List<Document> md(){
        MarkdownDocumentReader markdownDocumentReader = new MarkdownDocumentReader("classpath:/lists.md");
        return markdownDocumentReader.read();
    }

5.function calling

1.大型语言模型（LLM）在生成文本的过程中调用外部函数或服务

2.它允许大型语言模型（如GPT）在生成文本的过程中调用外部函数或服务。这种功能的核心在于，模型本身不直接执行函数，而是生成包含函数名称和执行函数所需参数的JSON，然后由外部系统执行这些函数，并将结果返回给模型以完成对话或生成任务

3.Spring AI的Function Calling功能主要解决了大型语言模型在处理任务时的局限性，尤其是模型自身无法获取实时信息或执行复杂计算的问题。通过Function Calling，模型可以利用外部工具或服务来扩展其能力，从而能够处理更广泛的任务，如实时数据查询、复杂计算等。

使用场景
实时数据查询：模型可以通过调用外部API来获取实时数据，如股票价格、天气预报等，并将这些数据整合到生成的文本中。
复杂计算：模型可以调用外部函数来执行复杂的计算任务，如数学运算、统计分析等。
业务逻辑处理：在业务场景中，模型可以调用自定义的函数来处理特定的业务逻辑，如订单处理、用户验证等。
简化集成：Spring AI提供了一套简化的API，使得开发者能够更容易地在Java应用程序中集成和使用AI功能。
跨平台兼容性：支持多种AI模型和数据库提供商，提供了良好的兼容性和可移植性。
抽象化：通过提供抽象层，Spring AI允许开发者在不深入了解底层AI模型细节的情况下，实现复杂的AI功能。

package org.example.func;

import java.util.function.Function;

/**
 * @Author xwf
 * @Date 2025-02-15 18:25
 * @Desc:
 **/
public class OAService implements Function<OAService.Request,OAService.Response> {


    @Override
    public Response apply(Request request) {
        System.out.printf("用户：%s 需要请假",request.user);
        return new Response(10);
    }

    public record Request(String user){

    }

    public record Response(int days){

    }

}

package org.example.func;

import org.springframework.ai.model.function.FunctionCallback;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * @Author xwf
 * @Date 2025-02-15 18:28
 * @Desc:
 **/
@Configuration
public class FunctionRegistry {

    /**
     * 注册函数
     * @return
     */
    @Bean
    public FunctionCallback askForLeaveCallBack(){
        return FunctionCallback.builder()
                .function("askForLeave", new OAService())
                .description("当有人请假时，返回天数")
                .inputType(OAService.Request.class)
                .build();
    }
}

    final ChatClient chatClient;

    @RequestMapping("/funcCall")
    public String funcCall(String msg) {
        return chatClient.prompt(msg)
                .functions("askForLeave")
                .call().content();
    }

运行报错，deepseek不支持函数回调