Java 与向量数据库的深度融合：构建智能应用的新范式

原创于 2025-12-05 10:20:02 发布 · 415 阅读

13 ·

CC 4.0 BY-SA版权

文章标签：

#java #数据库 #开发语言

数据库同时被 3 个专栏收录

57 篇文章

订阅专栏

人工智能AI

49 篇文章

订阅专栏

Java 进阶

23 篇文章

订阅专栏

引言

在人工智能与大数据时代，传统关系型数据库在处理非结构化数据（如文本、图像、音频）时显得力不从心。向量数据库作为一种新兴的数据存储技术，通过将数据转换为高维向量，实现了高效的相似度搜索和语义理解，为智能应用开发提供了强大支撑。本文将结合 Java 编程语言，通过具体案例深入探讨向量数据库的应用场景、技术实现和最佳实践。

一、向量数据库基础概念

1.1 什么是向量数据库？

向量数据库是专门用于存储、索引和查询高维向量数据的数据库系统。它将各种非结构化数据（文本、图像、音频等）通过深度学习模型转换为固定长度的向量表示，然后基于向量之间的相似度进行检索。

1.2 核心优势

语义理解：能够理解数据的语义信息，实现 "相似即相关" 的检索
高效检索：针对高维向量优化的索引算法，支持毫秒级相似性搜索
扩展性：支持大规模向量数据存储和查询
多模态支持：统一处理不同类型的数据

二、Java 与向量数据库的集成方案

2.1 主流向量数据库对比

向量数据库	特点	Java SDK 支持
Pinecone	托管服务，易用性高	有
Milvus	开源，高性能，可部署	有
Weaviate	开源，内置向量计算	有
Chroma	轻量级，适合开发测试	有
Qdrant	开源，支持多种索引	有

2.2 集成架构设计

Java 应用与向量数据库的典型集成架构如下：

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  应用层 (Java)  │────▶│  向量生成层     │────▶│  向量数据库     │
└─────────────────┘     └─────────────────┘     └─────────────────┘
        ▲                       │                       │
        │                       │                       │
        └───────────────────────┼───────────────────────┘
                                ▼
                        ┌─────────────────┐
                        │  结果处理层     │
                        └─────────────────┘

三、实战案例：基于 Milvus 的商品相似推荐系统

3.1 项目背景

某电商平台希望实现基于商品描述的相似推荐功能，当用户浏览某个商品时，系统能够推荐语义相似的其他商品。

3.2 技术栈选择

后端框架：Spring Boot 3.x
向量数据库：Milvus 2.3
向量生成模型：Sentence-BERT
数据库：MySQL（存储商品基本信息）

3.3 系统设计

3.3.1 数据流程

商品信息录入 MySQL
定时任务将商品描述转换为向量并存储到 Milvus
用户请求时，将目标商品向量与 Milvus 中的向量进行相似度匹配
返回相似商品列表

3.3.2 核心数据模型

// 商品实体类
@Entity
@Table(name = "products")
public class Product {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    private String name;
    private String description;
    private BigDecimal price;
    // getter 和 setter 方法
}

// 向量实体类（用于 Milvus）
public class ProductVector {
    private Long productId;
    private float[] vector;
    // getter 和 setter 方法
}

3.4 代码实现

3.4.1 环境配置

<!-- Maven 依赖 -->
<dependencies>
    <!-- Spring Boot -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-data-jpa</artifactId>
    </dependency>
    
    <!-- Milvus Java SDK -->
    <dependency>
        <groupId>io.milvus</groupId>
        <artifactId>milvus-sdk-java</artifactId>
        <version>2.3.0</version>
    </dependency>
    
    <!-- Sentence-BERT -->
    <dependency>
        <groupId>com.google.code.gson</groupId>
        <artifactId>gson</artifactId>
        <version>2.10.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.httpcomponents.client5</groupId>
        <artifactId>httpclient5</artifactId>
        <version>5.3</version>
    </dependency>
</dependencies>

3.4.2 Milvus 配置与连接

@Configuration
public class MilvusConfig {
    
    @Value("${milvus.host}")
    private String host;
    
    @Value("${milvus.port}")
    private Integer port;
    
    @Bean
    public MilvusClient milvusClient() {
        ConnectParam connectParam = ConnectParam.newBuilder()
                .withHost(host)
                .withPort(port)
                .build();
        return new MilvusClientImpl(connectParam);
    }
}

3.4.3 向量生成服务

@Service
public class VectorGenerationService {
    
    private static final String SENTENCE_BERT_API = "http://localhost:5000/encode";
    
    public float[] generateVector(String text) throws Exception {
        // 调用 Sentence-BERT 服务生成向量
        HttpClient client = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create(SENTENCE_BERT_API))
                .header("Content-Type", "application/json")
                .POST(HttpRequest.BodyPublishers.ofString("{\"text\":\"" + text + "\"}"))
                .build();
        
        HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
        Gson gson = new Gson();
        Map<String, Object> result = gson.fromJson(response.body(), Map.class);
        List<Double> vectorList = (List<Double>) result.get("vector");
        
        // 转换为 float 数组
        float[] vector = new float[vectorList.size()];
        for (int i = 0; i < vectorList.size(); i++) {
            vector[i] = vectorList.get(i).floatValue();
        }
        
        return vector;
    }
}

3.4.4 Milvus 操作服务

@Service
public class MilvusService {
    
    @Autowired
    private MilvusClient milvusClient;
    
    private static final String COLLECTION_NAME = "product_vectors";
    private static final int DIMENSION = 768; // Sentence-BERT 向量维度
    
    // 创建集合
    public void createCollection() {
        CollectionSchema schema = CollectionSchema.newBuilder()
                .withName(COLLECTION_NAME)
                .addField(FieldSchema.newBuilder()
                        .withName("product_id")
                        .withDataType(DataType.Int64)
                        .withPrimaryKey(true)
                        .withAutoID(false)
                        .build())
                .addField(FieldSchema.newBuilder()
                        .withName("vector")
                        .withDataType(DataType.FloatVector)
                        .withDimension(DIMENSION)
                        .build())
                .build();
        
        milvusClient.createCollection(CreateCollectionParam.newBuilder()
                .withCollectionName(COLLECTION_NAME)
                .withSchema(schema)
                .build());
        
        // 创建索引
        milvusClient.createIndex(CreateIndexParam.newBuilder()
                .withCollectionName(COLLECTION_NAME)
                .withFieldName("vector")
                .withIndexType(IndexType.IVF_FLAT)
                .withMetricType(MetricType.COSINE)
                .withExtraParam("{\"nlist\":1024}")
                .build());
    }
    
    // 插入向量
    public void insertVector(Long productId, float[] vector) {
        List<Long> productIds = Collections.singletonList(productId);
        List<List<Float>> vectors = new ArrayList<>();
        vectors.add(Arrays.stream(vector).boxed().collect(Collectors.toList()));
        
        InsertParam insertParam = InsertParam.newBuilder()
                .withCollectionName(COLLECTION_NAME)
                .addField("product_id", productIds)
                .addField("vector", vectors)
                .build();
        
        milvusClient.insert(insertParam);
    }
    
    // 搜索相似向量
    public List<Long> searchSimilarVectors(float[] queryVector, int topK) {
        List<List<Float>> queryVectors = new ArrayList<>();
        queryVectors.add(Arrays.stream(queryVector).boxed().collect(Collectors.toList()));
        
        SearchParam searchParam = SearchParam.newBuilder()
                .withCollectionName(COLLECTION_NAME)
                .withFieldNames(Collections.singletonList("vector"))
                .withVectors(queryVectors)
                .withTopK(topK)
                .withMetricType(MetricType.COSINE)
                .withParams("{\"nprobe\":10}")
                .build();
        
        SearchResults results = milvusClient.search(searchParam);
        SearchResultsWrapper wrapper = new SearchResultsWrapper(results);
        
        List<Long> similarProductIds = new ArrayList<>();
        for (int i = 0; i < wrapper.getRowCount(0); i++) {
            similarProductIds.add(wrapper.getLongValue(0, i, "product_id"));
        }
        
        return similarProductIds;
    }
}

3.4.5 商品推荐服务

@Service
public class ProductRecommendationService {
    
    @Autowired
    private ProductRepository productRepository;
    
    @Autowired
    private VectorGenerationService vectorGenerationService;
    
    @Autowired
    private MilvusService milvusService;
    
    // 初始化商品向量
    @Scheduled(cron = "0 0 0 * * ?") // 每天凌晨执行
    public void initProductVectors() throws Exception {
        List<Product> products = productRepository.findAll();
        for (Product product : products) {
            float[] vector = vectorGenerationService.generateVector(product.getDescription());
            milvusService.insertVector(product.getId(), vector);
        }
    }
    
    // 获取相似商品推荐
    public List<Product> getSimilarProducts(Long productId, int topK) throws Exception {
        // 获取目标商品
        Product targetProduct = productRepository.findById(productId)
                .orElseThrow(() -> new IllegalArgumentException("Product not found"));
        
        // 生成目标商品向量
        float[] targetVector = vectorGenerationService.generateVector(targetProduct.getDescription());
        
        // 搜索相似向量
        List<Long> similarProductIds = milvusService.searchSimilarVectors(targetVector, topK + 1);
        
        // 排除自身
        similarProductIds.remove(productId);
        
        // 获取相似商品信息
        return productRepository.findAllById(similarProductIds);
    }
}

3.4.6 REST API 接口

@RestController
@RequestMapping("/api/recommendations")
public class RecommendationController {
    
    @Autowired
    private ProductRecommendationService recommendationService;
    
    @GetMapping("/similar/{productId}")
    public ResponseEntity<List<Product>> getSimilarProducts(
            @PathVariable Long productId,
            @RequestParam(defaultValue = "10") int topK) {
        try {
            List<Product> similarProducts = recommendationService.getSimilarProducts(productId, topK);
            return ResponseEntity.ok(similarProducts);
        } catch (Exception e) {
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).build();
        }
    }
}