Please see The HDF Group's new Support Portal for the latest information.
A bug was fixed recently that resulted in a significant improvement in chunking performance. This fix will be available in HDF5 1.6.9 and HDF5 1.8.3, due out in May. For those who would like to obtain the fix now, snapshots are already available. See the end of this message for information on obtaining the snapshot.
Detailed Description:
There was a bug present in HDF5 between versions 1.6.3 and 1.8.2 that severely limited the usefulness of the dataset chunk cache. In these versions, only a single chunk could be held in cache at one time. Thus, any operations that span multiple chunks would read each chunk into memory and then immediately evict it. If a similar set of chunks were then read or written again, each chunk would again have to be read into cache and then evicted. This clearly had a negative impact on performance, which in some cases could even be worse than if the chunk cache were simply disabled.
This bug has now been fixed, and the maximum number of chunks held in cache is determined by the rdcc_nbytes parameter in H5Pset_cache (default is 1 MB). Thus, in our example, the first time the chunks are read they will all be loaded into cache (provided the cache is large enough), and the second time they are read they will be read directly from the cache without having to access the disk. This will improve performance by a factor of at least two for anyone that is reading or writing multiple chunks more than once.
Obtaining the Snapshot:
Snapshots can be accessed from the bottom of the HDF5 Home page, which is located at:
https://support.hdfgroup.org/HDF5/index.html
The HDF5 1.8 snapshots are located here:
https://gamma.hdfgroup.org/ftp/pub/outgoing/hdf5/snapshots/v18/
Snapshot hdf5-1.8.2-post7 contains the fix for this problem.
The HDF5 1.6 snapshots are located here:
https://gamma.hdfgroup.org/ftp/pub/outgoing/hdf5/snapshots/v16/
Snapshot hdf5-1.6.8-post8 contains the fix.
- - Last modified: 14 October 2016