最近在使用百度云bos对象存储的时候,从桶里拷贝文件遇到了一个问题。
我发现大于5G的文件使用普通的同步copyObject() 方法会报错,但是它报错提示的信息是Invalid Argument. ???
这就让我百思不得其解,而且我瞄了一眼官方SDK文档,没看到有关于这个的详细说明,然后看到还有一个分块拷贝的方法,就想着是不是要用这个方法?
说干就干,于是我又去看了文档,这里不得不提一句啊,像分块上传都提供了三合一的方法put_super_obejct_from_file(), 为啥我copy文件就没有啊 呜呜呜。实现代码如下:
import multiprocessing
import os
from baidubce.services.bos.bos_client import BosClient
from baidubce.auth.bce_credentials import BceCredentials
from baidubce.bce_client_configuration import BceClientConfiguration
# 设置 BOS 相关信息
access_key_id = "AK"
secret_access_key = "SK"
endpoint = "xxxx"
bucket_name = "桶名称"
object_key = (
"文件完整地址 类似于:xxx/xxx/a.txt"
)
source_key = (
"文件完整地址 类似于:xxx/xxx/a.txt"
)
target_key = "目标路径"
# 初始化 BOS 客户端
credentials = BceCredentials(access_key_id, secret_access_key)
config = BceClientConfiguration(credentials=credentials, endpoint=endpoint)
bos_client = BosClient(config)
try:
# 获取源对象的大小
source_size = int(
bos_client.get_object_meta_data(bucket_name, source_key).metadata.content_length
)
# 初始化Multipart Upload
response = bos_client.initiate_multipart_upload(bucket_name, target_key)
upload_id = response.upload_id
print(f"Initiated multipart upload with ID: {upload_id}")
# 设置分块大小(5MB)和分块数
part_size = 5 * 1024 * 1024
part_count = (source_size + part_size - 1) // part_size
part_list = []
for i in range(part_count):
part_number = i + 1
offset = i * part_size
curr_part_size = min(part_size, source_size - offset)
print(f"Copying part {part_number}/{part_count}")
response = bos_client.upload_part_copy(
source_bucket_name=bucket_name,
source_key=source_key,
target_bucket_name=bucket_name,
target_key=target_key,
upload_id=upload_id,
part_number=part_number,
part_size=curr_part_size,
offset=offset,
)
part_list.append({"partNumber": part_number, "eTag": response.etag})
# 完成分块上传
result = bos_client.complete_multipart_upload(
bucket_name, target_key, upload_id, part_list
)
if result.status == 200:
print("分块复制成功完成。")
print(f"ETag: {result.etag}")
print(f"位置: {result.location}")
print(f"键: {result.key}")
print(f"桶: {result.bucket}")
else:
print(f"分块复制完成,但状态码异常: {result.status}")
except Exception as e:
print(f"分块复制过程中发生错误: {str(e)}")
# 如果出错,尝试中止分块上传
try:
bos_client.abort_multipart_upload(bucket_name, target_key, upload_id)
print("已中止分块上传")
except Exception as abort_error:
print(f"中止分块上传时发生错误: {str(abort_error)}")
这里提一句 完成之后可能会提示 分块复制完成,但状态码异常,但是不影响哈,已经上传完成。