1. 技术选型与架构设计
1.1 核心技术栈
- 向量数据库:Milvus 2.x(高性能向量检索引擎)
- 搜索引擎:Elasticsearch 8.x(结构化 / 文本检索)
- 多模态模型:CLIP(文本 - 图像跨模态编码)
- 开发语言:Java 17
- 框架:Spring Boot 3.x
- 客户端:
- Milvus Java SDK 2.3.0+
- Elasticsearch Java Client 8.11.0+
1.2 系统架构
┌───────────────────────────────────────────────────────────────────┐
│ 应用层 (Spring Boot) │
├───────────────┬───────────────────────┬───────────────────────────┤
│ 数据导入模块 │ 混合检索模块 │ 结果融合模块 │
├───────────────┼───────────────────────┼───────────────────────────┤
│ CLIP编码 │ 文本/图像查询编码 │ 相似度加权融合 │
│ 向量生成 │ ES结构化过滤 │ 排序重排 │
│ ES元数据写入 │ Milvus向量检索 │ 结果返回 │
└───────────────┴───────────────────────┴───────────────────────────┘
│ │ │
┌─────────▼─────────┐ ┌─────────▼─────────┐ ┌─────────▼─────────┐
│ Milvus │ │ Elasticsearch │ │ 业务DB │
│ 向量存储与检索 │ │ 元数据与文本检索 │ │ 原始数据存储 │
└───────────────────┘ └───────────────────┘ └───────────────────┘
2. 环境准备与配置
2.1 依赖配置(pom.xml)
<dependencies>
<!-- Spring Boot -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- Milvus -->
<dependency>
<groupId>io.milvus</groupId>
<artifactId>milvus-sdk-java</artifactId>
<version>2.3.0</version>
</dependency>
<!-- Elasticsearch -->
<dependency>
<groupId>co.elastic.clients</groupId>
<artifactId>elasticsearch-java</artifactId>
<version>8.11.0</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</dependency>
<!-- CLIP模型 -->
<dependency>
<groupId>ai.djl.huggingface</groupId>
<artifactId>tokenizers</artifactId>
<version>0.22.1</version>
</dependency>
<dependency>
<groupId>ai.djl.pytorch</groupId>
<artifactId>pytorch-engine</artifactId>
<version>0.22.1</version>
</dependency>
<dependency>
<groupId>ai.djl.pytorch</groupId>
<artifactId>pytorch-native-cpu-precxx11</artifactId>
<version>2.0.1</version>
<scope>runtime</scope>
</dependency>
<!-- 工具类 -->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.12.0</version>
</dependency>
</dependencies>
2.2 配置文件(application.yml)
spring:
application:
name: multimodal-search
# Milvus配置
milvus:
host: localhost
port: 19530
database: default
collection-name: multimodal_vectors
# Elasticsearch配置
elasticsearch:
hosts: localhost:9200
username: elastic
password: changeme
index-name: multimodal_metadata
# CLIP模型配置
clip:
model-name: openai/clip-vit-base-patch32
vector-dim: 512
3. 核心模块实现
3.1 数据模型定义
// 多模态数据实体
@Data
public class MultimodalData {
private String id;
private String text;
private String imageUrl;
private byte[] imageBytes;
private float[] vector;
private Map<String, Object> metadata;
private LocalDateTime createTime;
}
// 检索请求
@Data
public class SearchRequest {
private String queryText;
private byte[] queryImage;
private Map<String, Object> filters;
private int topK = 10;
private float vectorWeight = 0.7f;
private float textWeight = 0.3f;
}
// 检索结果
@Data
public class SearchResult {
private String id;
private String text;
private String imageUrl;
private float vectorScore;
private float textScore;
private float finalScore;
private Map<String, Object> metadata;
}
3.2 CLIP 向量编码服务
@Service
public class ClipEncodingService {
private final Logger logger = LoggerFactory.getLogger(ClipEncodingService.class);
private final Translator<Image, float[]> imageTranslator;
private final Translator<String, float[]> textTranslator;
private final Predictor<Image, float[]> imagePredictor;
private final Predictor<String, float[]> textPredictor;
public ClipEncodingService(@Value("${clip.model-name}") String modelName) throws Exception {
// 初始化DL4J引擎
Engine.getInstance();
// 构建模型
Criteria<Image, float[]> imageCriteria = Criteria.builder()
.setTypes(Image.class, float[].class)
.optModelUrls(modelName)
.optTranslator(ImageFeatureExtractor.builder().build())
.optEngine("PyTorch")
.build();
Criteria<String, float[]> textCriteria = Criteria.builder()
.setTypes(String.class, float[].class)
.optModelUrls(modelName)
.optTranslator(TextFeatureExtractor.builder().build())
.optEngine("PyTorch")
.build();
this.imagePredictor = imageCriteria.loadModel().newPredictor();
this.textPredictor = textCriteria.loadModel().newPredictor();
}
// 图像编码
public float[] encodeImage(byte[] imageBytes) throws Exception {
Image image = ImageFactory.getInstance().fromByteArray(imageBytes);
return imagePredictor.predict(image);
}
// 文本编码
public float[] encodeText(String text) throws Exception {
return textPredictor.predict(text);
}
}
3.3 Milvus 向量数据库服务
@Service
public class MilvusService {
private final MilvusClient milvusClient;
private final String collectionName;
public MilvusService(
@Value("${milvus.host}") String host,
@Value("${milvus.port}") Integer port,
@Value("${milvus.collection-name}") String collectionName) {
// 初始化Milvus客户端
this.milvusClient = new MilvusClientBuilder()
.withHost(host)
.withPort(port)
.build();
this.collectionName = collectionName;
}
// 创建向量集合(初始化时调用)
public void createCollection(int dimension) {
CollectionSchema schema = CollectionSchema.builder()
.withName(collectionName)
.addField(FieldType.builder()
.withName("id")
.withDataType(DataType.VarChar)
.withMaxLength(64)
.withPrimaryKey(true)
.withAutoID(false)
.build())
.addField(FieldType.builder()
.withName("vector")
.withDataType(DataType.FloatVector)
.withDimension(dimension)
.build())
.build();
milvusClient.createCollection(CreateCollectionParam.newBuilder()
.withCollectionName(collectionName)
.withSchema(schema)
.withShardsNum(2)
.build());
// 创建向量索引
milvusClient.createIndex(CreateIndexParam.newBuilder()
.withCollectionName(collectionName)
.withFieldName("vector")
.withIndexType(IndexType.IVF_FLAT)
.withMetricType(MetricType.COSINE)
.withExtraParam("{\"nlist\": 1024}")
.build());
}
// 插入向量数据
public void insertVectors(List<MultimodalData> dataList) {
List<String> ids = dataList.stream().map(MultimodalData::getId).collect(Collectors.toList());
List<List<Float>> vectors = dataList.stream()
.map(data -> Arrays.stream(data.getVector()).boxed().collect(Collectors.toList()))
.collect(Collectors.toList());
InsertParam insertParam = InsertParam.newBuilder()
.withCollectionName(collectionName)
.withFields(Arrays.asList(
Field.newBuilder().withName("id").withValues(ids).build(),
Field.newBuilder().withName("vector").withValues(vectors).build()
))
.build();
milvusClient.insert(insertParam);
milvusClient.flush(FlushParam.newBuilder().withCollectionNames(Collections.singletonList(collectionName)).build());
}
// 向量检索
public List<SearchResult> searchVectors(float[] queryVector, int topK) {
SearchParam searchParam = SearchParam.newBuilder()
.withCollectionName(collectionName)
.withMetricType(MetricType.COSINE)
.withTopK(topK)
.withVectors(Collections.singletonList(Arrays.stream(queryVector).boxed().collect(Collectors.toList())))
.withVectorFieldName("vector")
.withOutFields(Collections.singletonList("id"))
.build();
SearchResults results = milvusClient.search(searchParam);
// 解析结果...
return parseSearchResults(results);
}
}
3.4 Elasticsearch 元数据服务
@Service
public class ElasticsearchService {
private final ElasticsearchClient esClient;
private final String indexName;
public ElasticsearchService(
@Value("${elasticsearch.hosts}") String hosts,
@Value("${elasticsearch.username}") String username,
@Value("${elasticsearch.password}") String password,
@Value("${elasticsearch.index-name}") String indexName) throws Exception {
// 初始化ES客户端
RestClient restClient = RestClient.builder(HttpHost.create(hosts))
.setHttpClientConfigCallback(httpClientBuilder ->
httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider))
.build();
ElasticsearchTransport transport = new RestClientTransport(restClient, new JacksonJsonpMapper());
this.esClient = new ElasticsearchClient(transport);
this.indexName = indexName;
}
// 创建索引
public void createIndex() throws Exception {
if (!esClient.indices().exists(e -> e.index(indexName)).value()) {
esClient.indices().create(c -> c
.index(indexName)
.mappings(m -> m
.properties("id", p -> p.keyword(k -> k))
.properties("text", p -> p.text(t -> t.analyzer("ik_max_word")))
.properties("imageUrl", p -> p.keyword(k -> k))
.properties("metadata", p -> p.object(o -> o.enabled(true)))
.properties("createTime", p -> p.date(d -> d.format("yyyy-MM-dd HH:mm:ss")))
)
);
}
}
// 插入元数据
public void insertMetadata(MultimodalData data) throws Exception {
esClient.index(i -> i
.index(indexName)
.id(data.getId())
.document(data)
);
}
// 文本检索与过滤
public List<SearchResult> searchText(String queryText, Map<String, Object> filters, int topK) throws Exception {
// 构建查询
BoolQuery.Builder boolQuery = BoolQuery.builder();
// 文本匹配
if (StringUtils.isNotBlank(queryText)) {
boolQuery.must(q -> q
.match(m -> m
.field("text")
.query(queryText)
)
);
}
// 过滤条件
if (filters != null && !filters.isEmpty()) {
filters.forEach((key, value) -> {
boolQuery.filter(f -> f
.term(t -> t
.field("metadata." + key)
.value(value.toString())
)
);
});
}
// 执行查询
SearchResponse<MultimodalData> response = esClient.search(s -> s
.index(indexName)
.query(q -> q.bool(boolQuery.build()))
.size(topK),
MultimodalData.class
);
// 解析结果...
return parseSearchResults(response);
}
}
3.5 混合检索服务
@Service
public class HybridSearchService {
private final ClipEncodingService clipEncodingService;
private final MilvusService milvusService;
private final ElasticsearchService elasticsearchService;
@Autowired
public HybridSearchService(ClipEncodingService clipEncodingService,
MilvusService milvusService,
ElasticsearchService elasticsearchService) {
this.clipEncodingService = clipEncodingService;
this.milvusService = milvusService;
this.elasticsearchService = elasticsearchService;
}
// 混合检索核心方法
public List<SearchResult> hybridSearch(SearchRequest request) throws Exception {
// 1. 生成查询向量
float[] queryVector;
if (request.getQueryImage() != null) {
queryVector = clipEncodingService.encodeImage(request.getQueryImage());
} else {
queryVector = clipEncodingService.encodeText(request.getQueryText());
}
// 2. 向量检索(Milvus)
List<SearchResult> vectorResults = milvusService.searchVectors(queryVector, request.getTopK() * 2);
// 3. 文本检索与过滤(ES)
List<SearchResult> textResults = elasticsearchService.searchText(
request.getQueryText(), request.getFilters(), request.getTopK() * 2);
// 4. 结果融合与重排
return fuseResults(vectorResults, textResults, request);
}
// 结果融合算法
private List<SearchResult> fuseResults(List<SearchResult> vectorResults,
List<SearchResult> textResults,
SearchRequest request) {
// 构建ID到结果的映射
Map<String, SearchResult> resultMap = new HashMap<>();
// 处理向量检索结果
vectorResults.forEach(result -> {
resultMap.put(result.getId(), result);
});
// 处理文本检索结果
textResults.forEach(textResult -> {
SearchResult result = resultMap.computeIfAbsent(textResult.getId(),
id -> new SearchResult());
result.setTextScore(textResult.getTextScore());
result.setText(textResult.getText());
result.setImageUrl(textResult.getImageUrl());
result.setMetadata(textResult.getMetadata());
});
// 计算最终得分
resultMap.values().forEach(result -> {
float vectorScore = result.getVectorScore() > 0 ? result.getVectorScore() : 0;
float textScore = result.getTextScore() > 0 ? result.getTextScore() : 0;
// 加权融合
result.setFinalScore(
vectorScore * request.getVectorWeight() +
textScore * request.getTextWeight()
);
});
// 排序并返回TopK结果
return resultMap.values().stream()
.sorted(Comparator.comparing(SearchResult::getFinalScore).reversed())
.limit(request.getTopK())
.collect(Collectors.toList());
}
}
3.6 数据导入服务
@Service
public class DataImportService {
private final ClipEncodingService clipEncodingService;
private final MilvusService milvusService;
private final ElasticsearchService elasticsearchService;
@Autowired
public DataImportService(ClipEncodingService clipEncodingService,
MilvusService milvusService,
ElasticsearchService elasticsearchService) {
this.clipEncodingService = clipEncodingService;
this.milvusService = milvusService;
this.elasticsearchService = elasticsearchService;
}
// 单条数据导入
public void importData(MultimodalData data) throws Exception {
// 生成唯一ID
if (StringUtils.isBlank(data.getId())) {
data.setId(UUID.randomUUID().toString());
}
// 生成向量
if (data.getVector() == null || data.getVector().length == 0) {
if (data.getImageBytes() != null) {
data.setVector(clipEncodingService.encodeImage(data.getImageBytes()));
} else if (StringUtils.isNotBlank(data.getText())) {
data.setVector(clipEncodingService.encodeText(data.getText()));
}
}
// 设置创建时间
if (data.getCreateTime() == null) {
data.setCreateTime(LocalDateTime.now());
}
// 写入ES元数据
elasticsearchService.insertMetadata(data);
// 写入Milvus向量
milvusService.insertVectors(Collections.singletonList(data));
}
// 批量数据导入
public void batchImport(List<MultimodalData> dataList) throws Exception {
// 批量处理...
for (List<MultimodalData> batch : Lists.partition(dataList, 100)) {
// 批量编码向量
for (MultimodalData data : batch) {
// 向量生成逻辑...
}
// 批量写入ES
// 批量写入Milvus
}
}
}
4. REST API 接口
@RestController
@RequestMapping("/api/search")
public class SearchController {
private final HybridSearchService hybridSearchService;
private final DataImportService dataImportService;
@Autowired
public SearchController(HybridSearchService hybridSearchService,
DataImportService dataImportService) {
this.hybridSearchService = hybridSearchService;
this.dataImportService = dataImportService;
}
// 数据导入接口
@PostMapping("/import")
public ResponseEntity<String> importData(@RequestBody MultimodalData data) {
try {
dataImportService.importData(data);
return ResponseEntity.ok("Data imported successfully");
} catch (Exception e) {
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body("Import failed: " + e.getMessage());
}
}
// 混合检索接口
@PostMapping("/hybrid")
public ResponseEntity<List<SearchResult>> hybridSearch(@RequestBody SearchRequest request) {
try {
List<SearchResult> results = hybridSearchService.hybridSearch(request);
return ResponseEntity.ok(results);
} catch (Exception e) {
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body(Collections.emptyList());
}
}
// 文本检索接口
@PostMapping("/text")
public ResponseEntity<List<SearchResult>> textSearch(@RequestBody SearchRequest request) {
// 实现文本检索...
}
// 图像检索接口
@PostMapping("/image")
public ResponseEntity<List<SearchResult>> imageSearch(@RequestParam("image") MultipartFile image,
@RequestParam Map<String, String> filters) {
// 实现图像检索...
}
}
5. 系统优化与最佳实践
5.1 性能优化
- 批量处理:数据导入和检索采用批量操作,减少网络开销
- 索引优化:
- Milvus:根据数据量调整 nlist 参数(推荐为数据量的平方根)
- ES:合理设计映射,对频繁过滤的字段使用 keyword 类型
- 缓存策略:对热点查询结果进行缓存
- 异步处理:向量编码和数据写入采用异步方式
5.2 向量生成优化
- 模型部署:考虑将 CLIP 模型部署为独立服务,使用 TensorRT 或 ONNX Runtime 加速
- 批量编码:对多张图片或多条文本进行批量编码,提高 GPU 利用率
- 预编码:在数据入库前完成向量编码,避免实时编码延迟
5.3 结果融合策略
- 动态权重:根据查询类型动态调整向量权重和文本权重
- 归一化处理:对不同来源的分数进行归一化后再融合
- 多级融合:先进行粗排(向量检索),再进行精排(ES 文本 + 过滤)
6. 测试与验证
6.1 功能测试
- 数据导入测试:验证文本和图像数据能正确编码并写入两个数据库
- 文本检索测试:使用关键词查询,验证返回结果的相关性
- 图像检索测试:上传图片,验证能返回语义相似的图文结果
- 混合检索测试:同时使用文本和过滤条件,验证结果融合效果
6.2 性能测试
- 响应时间:测试不同 topK 值下的检索响应时间
- 吞吐量:测试系统在高并发下的处理能力
- 召回率:验证检索结果的召回率和精确率
7. 部署与监控
7.1 部署架构
- 容器化部署:使用 Docker Compose 部署 Milvus、Elasticsearch 和应用服务
- 集群扩展:
- Milvus:部署分布式集群,配置数据分片和副本
- Elasticsearch:部署 3 节点以上集群,确保高可用
7.2 监控指标
- Milvus 监控:向量检索 QPS、延迟、内存使用率
- Elasticsearch 监控:查询延迟、索引大小、JVM 内存
- 应用监控:API 响应时间、错误率、CLIP 编码延迟
8. 总结与展望
本方案实现了基于向量数据库与 Elasticsearch 的多模态混合检索系统,能够处理文本和图像数据的统一检索。系统采用模块化设计,具有良好的扩展性和可维护性。
未来可以进一步优化的方向:
- 引入更多模态数据(如音频、视频)
- 使用更先进的多模态模型(如 GPT-4V、Gemini)
- 实现更智能的结果融合算法
- 支持增量学习,动态更新向量索引
通过本实战方案,你可以构建一个高性能、高可用的多模态混合检索系统,满足各种复杂场景下的检索需求。

1978

被折叠的 条评论
为什么被折叠?



