milvus同步collection python脚本

Python3.11

Python3.11

Conda
Python

Python 是一种高级、解释型、通用的编程语言,以其简洁易读的语法而闻名,适用于广泛的应用,包括Web开发、数据分析、人工智能和自动化脚本

milvus 指定COLLECTION_NAME ,从SOURCE_MILVUS同步collection schema和数据到SINK_MILVUS python脚本,试用batch查询,仅适用于小数据量。

from pymilvus import connections, Collection, CollectionSchema, FieldSchema, utility
import time

# ========== 配置 ==========
SOURCE_MILVUS_HOST = '1.1.1.1'
SOURCE_MILVUS_PORT = '19530'

SINK_MILVUS_HOST = '2.2.2.2'
SINK_MILVUS_PORT = '19530'

COLLECTION_NAME = 'MILVUS_COLLECTION_1'
BATCH_SIZE = 1000

# ========== 连接 ==========
connections.connect("source", host=SOURCE_MILVUS_HOST, port=SOURCE_MILVUS_PORT)
connections.connect("sink", host=SINK_MILVUS_HOST, port=SINK_MILVUS_PORT)

# ========== 获取源 schema ==========
source_collection = Collection(name=COLLECTION_NAME, using='source')
source_schema = source_collection.schema
print(f"Source schema: {source_schema}")

# ========== 在目标 Milvus 上创建相同的 collection ==========
if utility.has_collection(COLLECTION_NAME, using='sink'):
    print(f"Sink collection {COLLECTION_NAME} already exists, dropping it first.")
    utility.drop_collection(COLLECTION_NAME, using='sink')

sink_fields = []
for field in source_schema.fields:
    sink_fields.append(FieldSchema(
        name=field.name,
        dtype=field.dtype,
        is_primary=field.is_primary,
        auto_id=field.auto_id,
        description=field.description,
        dim=field.dim if hasattr(field, "dim") else None
    ))

sink_schema = CollectionSchema(fields=sink_fields, description=source_schema.description)
sink_collection = Collection(name=COLLECTION_NAME, schema=sink_schema, using='sink')
print("Sink collection created.")

# ========== 批量数据迁移 ==========
offset = 0
total_inserted = 0
output_fields = [f.name for f in source_schema.fields]

while True:
    source_collection.load()
    results = source_collection.query(
        expr="",
        offset=offset,
        limit=BATCH_SIZE,
        output_fields=output_fields
    )

    if not results:
        print("No more data to copy.")
        break

    # 转换为插入格式:字段为列式结构
    column_data = {key: [] for key in output_fields}
    for row in results:
        for key in output_fields:
            column_data[key].append(row[key])

    sink_collection.insert([column_data[key] for key in output_fields])
    total_inserted += len(results)
    print(f"Inserted batch: {len(results)}, total: {total_inserted}")

    offset += BATCH_SIZE

# ========== 最后同步(可选) ==========
sink_collection.flush()
print("All data migrated and flushed.")

您可能感兴趣的与本文相关的镜像

Python3.11

Python3.11

Conda
Python

Python 是一种高级、解释型、通用的编程语言,以其简洁易读的语法而闻名,适用于广泛的应用,包括Web开发、数据分析、人工智能和自动化脚本

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值