killer features
- hierarchical groups
- attributes
- descriptive metadata
- slicing
- actural data is on disk, slicing made it red to memroy
- have control of storage allocated
- support compression
HDF5
- large numerical arrays of homogenous type
- organized hierarchically
- tagging with arbitrary metadata
- high performance
- partial I/O
HDF5 data model
dataset: array like objects that sotre numerical data on disk
- attributes: name, type, shape
- support random access
group: hierarchical containers that storedatasets and othergroups
- using B-trees
attribute: user defined metadata, can be attached todatasetandgroup
HDF5 library
- written in C
- with C++, Java and Python bindings
read operation
- h5py figures out the shape (10, 50) of the resulting array object.
- An empty NumPy array is allocated of shape (10, 50).
- HDF5 selects the appropriate part of the dataset.
- HDF5 copies data from the dataset into the empty NumPy array.
- The newly filled in NumPy array is returned.
write operation
- h5py figures out the size of the selection, and determines whether it is compatible with the size of the array being assigned.
- HDF5 makes an appropriately sized selection on the dataset.
- HDF5 reads from the input array and writes to the file.
performance tips
- reduce read/write on the dataset
reshape
- can’t change the number of axes
3971

被折叠的 条评论
为什么被折叠?



