python列表存储字符串_从Python中将字符串列表存储到HDF5数据集

最新推荐文章于 2021-07-23 16:25:51 发布

weixin_39548490

最新推荐文章于 2021-07-23 16:25:51 发布

阅读量168

点赞数

文章标签： python列表存储字符串

本文介绍HDF5中变长(VL)格式数据的存储方式，特别是字符串如何以C风格空终止缓冲区的形式存储。由于NumPy本身不支持这种格式，使用h5py库时将变长字符串映射到对象数组。文章还讨论了Python字符串与VL数据之间的自动转换。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

In HDF5, data in VL format is stored as arbitrary-length vectors of a base type. In particular, strings are stored C-style in null-terminated buffers. NumPy has no native mechanism to support this. Unfortunately, this is the de facto standard for representing strings in the HDF5 C API, and in many HDF5 applications.

Thankfully, NumPy has a generic pointer type in the form of the “object” (“O”) dtype. In h5py, variable-length strings are mapped to object arrays. A small amount of metadata attached to an “O” dtype tells h5py that its contents should be converted to VL strings when stored in the file.

Existing VL strings can be read and written to with no additional effort; Python strings and fixed-length NumPy strings can be auto-converted to VL data and stored.

Example