python列表存储字符串_从Python中将字符串列表存储到HDF5数据集

本文介绍HDF5中变长(VL)格式数据的存储方式,特别是字符串如何以C风格空终止缓冲区的形式存储。由于NumPy本身不支持这种格式,使用h5py库时将变长字符串映射到对象数组。文章还讨论了Python字符串与VL数据之间的自动转换。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

In HDF5, data in VL format is stored as arbitrary-length vectors of a base type. In particular, strings are stored C-style in null-terminated buffers. NumPy has no native mechanism to support this. Unfortunately, this is the de facto standard for representing strings in the HDF5 C API, and in many HDF5 applications.

Thankfully, NumPy has a generic pointer type in the form of the “object” (“O”) dtype. In h5py, variable-length strings are mapped to object arrays. A small amount of metadata attached to an “O” dtype tells h5py that its contents should be converted to VL strings when stored in the file.

Existing VL strings can be read and written to with no additional effort; Python strings and fixed-length NumPy strings can be auto-converted to VL data and stored.

Example

import h5py import numpy as np import os from ResNET import ResNet path = "dataset2/" # 获取数据集所有图片 def getAllPics(path): image_paths = [] # 获取dataset/下所有数据集文件夹 folders = os.listdir(path) # 遍历每个数据集 for folder in folders: # print(folder) # 获取该数据集下所有子文件夹 folders_1 = os.listdir(os.path.join(path, folder)) # 遍历每个子文件夹 for folder_1 in folders_1: # 获取所有子文件夹下所有文件 ls = os.listdir(os.path.join(path, folder + "/", folder_1)) # 遍历所有文件 for image_path in ls: # 如果是.jpg格式才收录 if image_path.endswith('jpg'): # 路径连接 image_path = os.path.join(path, folder + "/", folder_1 + "/", image_path) # print("正在获取图片 "+image_path) # 存储 image_paths.append(image_path) if image_path.endswith('JPG'): # 路径连接 image_path = os.path.join(path, folder + "/", folder_1 + "/", image_path) # print("正在获取图片 "+image_path) # 存储 image_paths.append(image_path) if image_path.endswith('jpeg'): # 路径连接 image_path = os.path.join(path, folder + "/", folder_1 + "/", image_path) # print("正在获取图片 "+image_path) # 存储 image_paths.append(image_path) if image_path.endswith('png'): # 路径连接 image_path = os.path.join(path, folder + "/", folder_1 + "/", image_path) # print("正在获取图片 "+image_path) # 存储 image_paths.append(image_path) # 返回所有图片列表 return image_paths def get_features(path): # # 获取所有图片 img_list = getAllPics(path) print("图片总数量:" + len(img_list).__str__() + "张") print("--------------------------------------------------") print(" 开始提取特征...... ") print("--------------------------------------------------") features = [] names = [] model = ResNet() allfeatures = [] for i, img_path in enumerate(img_list): norm_feat = model.get_feat(img_path) print(len(norm_feat)) print(type(norm_feat)) # allfeatures.append(norm_feat.tolist()) img_name = img_path features.append(norm_feat) names.append(img_name) print("正在提取图像特征:第 %d 张 , 共 %d 张......." % ((i + 1), len(img_list)) + img_name) feats = np.array(features) feats = feats.tolist() output = "index1.h5" print("--------------------------------------------------") print(" 正在将提取到的特征数据存储到文件中......") print("--------------------------------------------------") h5f = h5py.File(output, 'w') h5f.create_dataset('dataset_1', data=features) h5f.create_dataset('dataset_2', data=np.string_(names)) h5f.close() get_features(path)逐句解释代码
最新发布
04-01
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值