![]() |
HDF5
1.8.23
C-API Reference
|
Use dataset access properties to modify the default behavior of the HDF5 library when accessing datasets. The properties include adjusting the size of the chunk cache, providing prefixes for external content and virtual dataset file paths, and controlling flush behavior, etc. These properties are not persisted with datasets, and can be adjusted at runtime before a dataset is created or opened.
Functions | |
herr_t | H5Pget_chunk_cache (hid_t dapl_id, size_t *rdcc_nslots, size_t *rdcc_nbytes, double *rdcc_w0) |
Retrieves the raw data chunk cache parameters. More... | |
ssize_t | H5Pget_efile_prefix (hid_t dapl_id, char *prefix, size_t size) |
Retrieves the prefix for external raw data storage files as set in the dataset access property list. More... | |
herr_t | H5Pset_chunk_cache (hid_t dapl_id, size_t rdcc_nslots, size_t rdcc_nbytes, double rdcc_w0) |
Sets the raw data chunk cache parameters. More... | |
herr_t | H5Pset_efile_prefix (hid_t dapl_id, const char *prefix) |
Sets the external dataset storage file prefix in the dataset access property list. More... | |
herr_t H5Pget_chunk_cache | ( | hid_t | dapl_id, |
size_t * | rdcc_nslots, | ||
size_t * | rdcc_nbytes, | ||
double * | rdcc_w0 | ||
) |
Retrieves the raw data chunk cache parameters.
[in] | dapl_id | Dataset access property list identifier |
[out] | rdcc_nslots | Number of chunk slots in the raw data chunk cache hash table |
[out] | rdcc_nbytes | Total size of the raw data chunk cache, in bytes |
[out] | rdcc_w0 | Preemption policy |
H5Pget_chunk_cache() retrieves the number of chunk slots in the raw data chunk cache hash table, the maximum possible number of bytes in the raw data chunk cache, and the preemption policy value.
These values are retrieved from a dataset access property list. If the values have not been set on the property list, then values returned will be the corresponding values from a default file access property list.
Any (or all) pointer arguments may be null pointers, in which case the corresponding data is not returned.
ssize_t H5Pget_efile_prefix | ( | hid_t | dapl_id, |
char * | prefix, | ||
size_t | size | ||
) |
Retrieves the prefix for external raw data storage files as set in the dataset access property list.
[in] | dapl_id | Dataset access property list identifier |
[in,out] | prefix | Dataset external storage prefix in UTF-8 or ASCII (Path and filename must be ASCII on Windows systems.) |
[in] | size | Size of prefix buffer in bytes |
prefix
and the prefix string will be stored in prefix
if successful. Otherwise returns a negative value and the contents of prefix
will be undefined.H5Pget_efile_prefix() retrieves the file system path prefix for locating external files associated with a dataset that uses external storage. This will be the value set with H5Pset_efile_prefix() or the HDF5 library’s default.
The value of size
is the size in bytes of the prefix, including the NULL terminator. If the size is unknown, a preliminary H5Pget_elink_prefix() call with the pointer prefix
set to NULL will return the size of the prefix without the NULL terminator.
The prefix
buffer must be allocated by the caller. In a call that retrieves the actual prefix, that buffer must be of the size specified in size
.
Sets the raw data chunk cache parameters.
[in] | dapl_id | Dataset access property list identifier |
[in] | rdcc_nslots | The number of chunk slots in the raw data chunk cache for this dataset. Increasing this value reduces the number of cache collisions, but slightly increases the memory used. Due to the hashing strategy, this value should ideally be a prime number. As a rule of thumb, this value should be at least 10 times the number of chunks that can fit in rdcc_nbytes bytes. For maximum performance, this value should be set approximately 100 times that number of chunks. The default value is 521. If the value passed is H5D_CHUNK_CACHE_NSLOTS_DEFAULT, then the property will not be set on dapl_id and the parameter will come from the file access property list used to open the file. |
[in] | rdcc_nbytes | The total size of the raw data chunk cache for this dataset. In most cases increasing this number will improve performance, as long as you have enough free memory. The default size is 1 MB. If the value passed is H5D_CHUNK_CACHE_NBYTES_DEFAULT, then the property will not be set on dapl_id and the parameter will come from the file access property list. |
[in] | rdcc_w0 | The chunk preemption policy for this dataset. This must be between 0 and 1 inclusive and indicates the weighting according to which chunks which have been fully read or written are penalized when determining which chunks to flush from cache. A value of 0 means fully read or written chunks are treated no differently than other chunks (the preemption is strictly LRU) while a value of 1 means fully read or written chunks are always preempted before other chunks. If your application only reads or writes data once, this can be safely set to 1. Otherwise, this should be set lower, depending on how often you re-read or re-write the same data. The default value is 0.75. If the value passed is H5D_CHUNK_CACHE_W0_DEFAULT, then the property will not be set on dapl_id and the parameter will come from the file access property list. |
H5Pset_chunk_cache() sets the number of elements, the total number of bytes, and the preemption policy value in the raw data chunk cache on a dataset access property list. After calling this function, the values set in the property list will override the values in the file's file access property list.
The raw data chunk cache inserts chunks into the cache by first computing a hash value using the address of a chunk, then using that hash value as the chunk's index into the table of cached chunks. The size of this hash table, i.e., and the number of possible hash values, is determined by the rdcc_nslots
parameter. If a different chunk in the cache has the same hash value, this causes a collision, which reduces efficiency. If inserting the chunk into cache would cause the cache to be too big, then the cache is pruned according to the rdcc_w0
parameter.
Motivation: H5Pset_chunk_cache() is used to adjust the chunk cache parameters on a per-dataset basis, as opposed to a global setting for the file using H5Pset_cache(). The optimum chunk cache parameters may vary widely with different data layout and access patterns, so for optimal performance they must be set individually for each dataset. It may also be beneficial to reduce the size of the chunk cache for datasets whose performance is not important in order to save memory space.
Example Usage: The following code sets the chunk cache to use a hash table with 12421 elements and a maximum size of 16 MB, while using the preemption policy specified for the entire file: H5Pset_chunk_cache(dapl_id, 12421, 16*1024*1024,
H5D_CHUNK_CACHE_W0_DEFAULT);
Usage Notes: The chunk cache size is a property for accessing a dataset and is not stored with a dataset or a file. To guarantee the same chunk cache settings each time the dataset is opened, call H5Dopen() with a dataset access property list where the chunk cache size is set by calling H5Pset_chunk_cache() for that property list. The property list can be used for multiple accesses in the same application.
For files where the same chunk cache size will be appropriate for all or most datasets, H5Pset_cache() can be called with a file access property list to set the chunk cache size for accessing all datasets in the file.
Both methods can be used in combination, in which case the chunk cache size set by H5Pset_cache() will apply except for specific datasets where H5Dopen() is called with dataset property list with the chunk cache size set by H5Pset_chunk_cache().
In the absence of any cache settings, H5Dopen() will by default create a 1 MB chunk cache for the opened dataset. If this size happens to be appropriate, no call will be needed to either function to set the chunk cache size.
It is also possible that a change in access pattern for later access to a dataset will change the appropriate chunk cache size.
Sets the external dataset storage file prefix in the dataset access property list.
[in] | dapl_id | Dataset access property list identifier |
[in] | prefix | Dataset external storage prefix in UTF-8 or ASCII (Path and filename must be ASCII on Windows systems.) |
H5Pset_efile_prefix() sets the prefix used to locate raw data files for a dataset that uses external storage. This prefix can provide either an absolute path or a relative path to the external files.
H5Pset_efile_prefix() is used in conjunction with H5Pset_external() to control the behavior of the HDF5 library when searching for the raw data files associated with a dataset that uses external storage:
The HDF5_EXTFILE_PREFIX environment variable can be used to override the above behavior (the environment variable supersedes the API call). Setting the variable to a path string and calling H5Dcreate() or H5Dopen() is the equivalent of calling H5Pset_efile_prefix() and calling the same create or open function. The environment variable is checked at the time of the create or open action and copied so it can be safely changed after the H5Dcreate() or H5Dopen() call.
Calling H5Pset_efile_prefix() with prefix
set to NULL or the empty string returns the search path to the default. The result would be the same as if H5Pset_efile_prefix() had never been called.