HDF5
1.14.4.3
API Reference
|
Use dataset creation properties to control aspects of dataset creation such as fill time, storage layout, compression methods, etc. Unlike dataset access and transfer properties, creation properties are stored with the dataset, and cannot be changed once a dataset has been created.
Function | Purpose |
---|---|
H5Pset_layout | Sets the type of storage used to store the raw data for a dataset. |
H5Pget_layout | Returns the layout of the raw data for a dataset. |
H5Pset_chunk | Sets the size of the chunks used to store a chunked layout dataset. |
H5Pget_chunk | Retrieves the size of chunks for the raw data of a chunked layout dataset. |
H5Pset_chunk_opts/H5Pget_chunk_opts | Sets/gets the edge chunk option setting from a dataset creation property list. |
H5Pset_deflate | Sets compression method and compression level. |
H5Pset_fill_value | Sets the fill value for a dataset. |
H5Pget_fill_value | Retrieves a dataset fill value. |
H5Pfill_value_defined | Determines whether the fill value is defined. |
H5Pset_fill_time | Sets the time when fill values are written to a dataset. |
H5Pget_fill_time | Retrieves the time when fill value are written to a dataset. |
H5Pset_alloc_time | Sets the timing for storage space allocation. |
H5Pget_alloc_time | Retrieves the timing for storage space allocation. |
H5Pset_filter | Adds a filter to the filter pipeline. |
H5Pall_filters_avail | Verifies that all required filters are available. |
H5Pget_nfilters | Returns the number of filters in the pipeline. |
H5Pget_filter | Returns information about a filter in a pipeline. The C function is a macro:
|
H5Pget_filter_by_id | Returns information about the specified filter. The C function is a macro:
|
H5Pmodify_filter | Modifies a filter in the filter pipeline. |
H5Premove_filter | Deletes one or more filters in the filter pipeline. |
H5Pset_fletcher32 | Sets up use of the Fletcher32 checksum filter. |
H5Pset_nbit | Sets up use of the n-bit filter. |
H5Pset_scaleoffset | Sets up use of the scale-offset filter. |
H5Pset_shuffle | Sets up use of the shuffle filter. |
H5Pset_szip | Sets up use of the Szip compression filter. |
H5Pset_external | Adds an external file to the list of external files. |
H5Pget_external_count | Returns the number of external files for a dataset. |
H5Pget_external | Returns information about an external file. |
H5Pset_char_encoding | Sets the character encoding used to encode a string. Use to set ASCII or UTF-8 character encoding for object names. |
H5Pget_char_encoding | Retrieves the character encoding used to create a string. |
H5Pset_virtual | Sets the mapping between virtual and source datasets. |
H5Pget_virtual_count | Gets the number of mappings for the virtual dataset. |
H5Pget_virtual_dsetname | Gets the name of a source dataset used in the mapping. |
H5Pget_virtual_filename | Gets the filename of a source dataset used in the mapping. |
H5Pget_virtual_srcspace | Gets a dataspace identifier for the selection within the source dataset used in the mapping. |
H5Pget_virtual_vspace | Gets a dataspace identifier for the selection within the virtual dataset used in the mapping. |
H5Pset_dset_no_attrs_hint/H5Pget_dset_no_attrs_hint | Sets/gets the flag to create minimized dataset object headers. |
Functions | |
htri_t | H5Pall_filters_avail (hid_t plist_id) |
Verifies that all required filters are available. | |
herr_t | H5Pset_deflate (hid_t plist_id, unsigned level) |
Sets deflate (GNU gzip) compression method and compression level. | |
herr_t | H5Pfill_value_defined (hid_t plist, H5D_fill_value_t *status) |
Determines whether fill value is defined. | |
herr_t | H5Pget_alloc_time (hid_t plist_id, H5D_alloc_time_t *alloc_time) |
Retrieves the timing for storage space allocation. | |
int | H5Pget_chunk (hid_t plist_id, int max_ndims, hsize_t dim[]) |
Retrieves the size of chunks for the raw data of a chunked layout dataset. | |
herr_t | H5Pget_chunk_opts (hid_t plist_id, unsigned *opts) |
Retrieves the edge chunk option setting from a dataset creation property list. | |
herr_t | H5Pget_dset_no_attrs_hint (hid_t dcpl_id, hbool_t *minimize) |
Retrieves the setting for whether or not to create minimized dataset object headers. | |
herr_t | H5Pget_external (hid_t plist_id, unsigned idx, size_t name_size, char *name, off_t *offset, hsize_t *size) |
Returns information about an external file. | |
int | H5Pget_external_count (hid_t plist_id) |
Returns the number of external files for a dataset. | |
herr_t | H5Pget_fill_time (hid_t plist_id, H5D_fill_time_t *fill_time) |
Retrieves the time when fill values are written to a dataset. | |
herr_t | H5Pget_fill_value (hid_t plist_id, hid_t type_id, void *value) |
Retrieves a dataset fill value. | |
H5D_layout_t | H5Pget_layout (hid_t plist_id) |
Returns the layout of the raw data for a dataset. | |
herr_t | H5Pget_virtual_count (hid_t dcpl_id, size_t *count) |
Gets the number of mappings for the virtual dataset. | |
ssize_t | H5Pget_virtual_dsetname (hid_t dcpl_id, size_t index, char *name, size_t size) |
Gets the name of a source dataset used in the mapping. | |
ssize_t | H5Pget_virtual_filename (hid_t dcpl_id, size_t index, char *name, size_t size) |
Gets the filename of a source dataset used in the mapping. | |
hid_t | H5Pget_virtual_srcspace (hid_t dcpl_id, size_t index) |
Gets a dataspace identifier for the selection within the source dataset used in the mapping. | |
hid_t | H5Pget_virtual_vspace (hid_t dcpl_id, size_t index) |
Gets a dataspace identifier for the selection within the virtual dataset used in the mapping. | |
herr_t | H5Pset_alloc_time (hid_t plist_id, H5D_alloc_time_t alloc_time) |
Sets the timing for storage space allocation. | |
herr_t | H5Pset_chunk (hid_t plist_id, int ndims, const hsize_t dim[]) |
Sets the size of the chunks used to store a chunked layout dataset. | |
herr_t | H5Pset_chunk_opts (hid_t plist_id, unsigned opts) |
Sets the edge chunk option in a dataset creation property list. | |
herr_t | H5Pset_dset_no_attrs_hint (hid_t dcpl_id, hbool_t minimize) |
Sets the flag to create minimized dataset object headers. | |
herr_t | H5Pset_external (hid_t plist_id, const char *name, off_t offset, hsize_t size) |
Adds an external file to the list of external files. | |
herr_t | H5Pset_fill_time (hid_t plist_id, H5D_fill_time_t fill_time) |
Sets the time when fill values are written to a dataset. | |
herr_t | H5Pset_fill_value (hid_t plist_id, hid_t type_id, const void *value) |
Sets the fill value for a dataset. | |
herr_t | H5Pset_shuffle (hid_t plist_id) |
Sets up use of the shuffle filter. | |
herr_t | H5Pset_layout (hid_t plist_id, H5D_layout_t layout) |
Sets the type of storage used to store the raw data for a dataset. | |
herr_t | H5Pset_nbit (hid_t plist_id) |
Sets up the use of the N-Bit filter. | |
herr_t | H5Pset_scaleoffset (hid_t plist_id, H5Z_SO_scale_type_t scale_type, int scale_factor) |
Sets up the use of the scale-offset filter. | |
herr_t | H5Pset_szip (hid_t plist_id, unsigned options_mask, unsigned pixels_per_block) |
Sets up use of the SZIP compression filter. | |
herr_t | H5Pset_virtual (hid_t dcpl_id, hid_t vspace_id, const char *src_file_name, const char *src_dset_name, hid_t src_space_id) |
Sets the mapping between virtual and source datasets. | |
H5Z_filter_t | H5Pget_filter1 (hid_t plist_id, unsigned filter, unsigned int *flags, size_t *cd_nelmts, unsigned cd_values[], size_t namelen, char name[]) |
Returns information about a filter in a pipeline (DEPRECATED) | |
herr_t | H5Pget_filter_by_id1 (hid_t plist_id, H5Z_filter_t id, unsigned int *flags, size_t *cd_nelmts, unsigned cd_values[], size_t namelen, char name[]) |
Returns information about the specified filter. | |
Verifies that all required filters are available.
[in] | plist_id | Property list identifier |
H5Pall_filters_avail() verifies that all of the filters set in the dataset or group creation property list plist_id
are currently available.
herr_t H5Pfill_value_defined | ( | hid_t | plist, |
H5D_fill_value_t * | status | ||
) |
Determines whether fill value is defined.
[in] | plist | Dataset creation property list identifier |
[out] | status | Status of fill value in property list |
H5Pfill_value_defined() determines whether a fill value is defined in the dataset creation property list plist
. Valid values returned in status are as follows:
H5D_FILL_VALUE_UNDEFINED | Fill value is undefined. |
H5D_FILL_VALUE_DEFAULT | Fill value is the library default. |
H5D_FILL_VALUE_USER_DEFINED | Fill value is defined by the application. |
herr_t H5Pget_alloc_time | ( | hid_t | plist_id, |
H5D_alloc_time_t * | alloc_time | ||
) |
Retrieves the timing for storage space allocation.
[in] | plist_id | Dataset creation property list identifier |
[out] | alloc_time | The timing setting for allocating dataset storage space |
H5Pget_alloc_time() retrieves the timing for allocating storage space for a dataset's raw data. This property is set in the dataset creation property list plist_id
. The timing setting is returned in alloc_time
as one of the following values:
H5D_ALLOC_TIME_DEFAULT | Uses the default allocation time, based on the dataset storage method. See the alloc_time description in H5Pset_alloc_time() for default allocation times for various storage methods. |
H5D_ALLOC_TIME_EARLY | All space is allocated when the dataset is created. |
H5D_ALLOC_TIME_INCR | Space is allocated incrementally as data is written to the dataset. |
H5D_ALLOC_TIME_LATE | All space is allocated when data is first written to the dataset. |
Retrieves the size of chunks for the raw data of a chunked layout dataset.
[in] | plist_id | Dataset creation property list identifier |
[in] | max_ndims | Size of the dims array |
[out] | dim | Array to store the chunk dimensions |
H5Pget_chunk() retrieves the size of chunks for the raw data of a chunked layout dataset. This function is only valid for dataset creation property lists. At most, max_ndims
elements of dim
will be initialized.
Retrieves the edge chunk option setting from a dataset creation property list.
[in] | plist_id | Dataset creation property list identifier |
[out] | opts | Edge chunk option flag. Valid values are described in H5Pset_chunk_opts(). The option status can be retrieved using the bitwise AND operator ( & ). For example, the expression (opts&H5D_CHUNK_DONT_FILTER_PARTIAL_CHUNKS) will evaluate to H5D_CHUNK_DONT_FILTER_PARTIAL_CHUNKS if that option has been enabled. Otherwise, it will evaluate to 0 (zero). |
H5Pget_chunk_opts() retrieves the edge chunk option setting stored in the dataset creation property list plist_id
.
Retrieves the setting for whether or not to create minimized dataset object headers.
[in] | dcpl_id | Dataset creation property list identifier |
[out] | minimize | Flag indicating whether the library will or will not create minimized dataset object headers |
H5Pget_dset_no_attrs_hint() retrieves the no dataset attributes hint setting for the dataset creation property list dcpl_id
. This setting is used to inform the library to create minimized dataset object headers when true. The setting value is returned in the boolean pointer minimize
.
herr_t H5Pget_external | ( | hid_t | plist_id, |
unsigned | idx, | ||
size_t | name_size, | ||
char * | name, | ||
off_t * | offset, | ||
hsize_t * | size | ||
) |
Returns information about an external file.
[in] | plist_id | Dataset creation property list identifier |
[in] | idx | External file index |
[in] | name_size | Maximum length of name array |
[out] | name | Name of the external file |
[out] | offset | Pointer to a location to return an offset value |
[out] | size | Pointer to a location to return the size of the external file data |
H5Pget_external() returns information about an external file. The external file is specified by its index, idx
, which is a number from zero to N-1, where N is the value returned by H5Pget_external_count(). At most name_size
characters are copied into the name
array. If the external file name is longer than name_size
with the null terminator, the return value is not null terminated (similar to strncpy()).
If name_size
is zero or name
is the null pointer, the external file name is not returned. If offset
or size
are null pointers then the corresponding information is not returned.
idx
parameter type changed to unsigned. int H5Pget_external_count | ( | hid_t | plist_id | ) |
Returns the number of external files for a dataset.
[in] | plist_id | Dataset creation property list identifier |
H5Pget_external_count() returns the number of external files for the specified dataset.
herr_t H5Pget_fill_time | ( | hid_t | plist_id, |
H5D_fill_time_t * | fill_time | ||
) |
Retrieves the time when fill values are written to a dataset.
[in] | plist_id | Dataset creation property list identifier |
[out] | fill_time | Setting for the timing of writing fill values to the dataset |
H5Pget_fill_time() examines the dataset creation property list plist_id
to determine when fill values are to be written to a dataset. Valid values returned in fill_time
are as follows:
H5D_FILL_TIME_IFSET | Fill values are written to the dataset when storage space is allocated only if there is a user-defined fill value, i.e., one set with H5Pset_fill_value(). (Default) |
H5D_FILL_TIME_ALLOC | Fill values are written to the dataset when storage space is allocated. |
H5D_FILL_TIME_NEVER | Fill values are never written to the dataset. |
Retrieves a dataset fill value.
[in] | plist_id | Dataset creation property list identifier |
[in] | type_id | Datatype identifier for the value passed via value |
[out] | value | Pointer to buffer to contain the returned fill value |
H5Pget_fill_value() returns the dataset fill value defined in the dataset creation property list plist_id
. The fill value is returned through the value
pointer and will be converted to the datatype specified by type_id
. This datatype may differ from the fill value datatype in the property list, but the HDF5 library must be able to convert between the two datatypes.
If the fill value is undefined, i.e., set to NULL in the property list, H5Pget_fill_value() will return an error. H5Pfill_value_defined() should be used to check for this condition before H5Pget_fill_value() is called.
Memory must be allocated by the calling application.
H5Z_filter_t H5Pget_filter1 | ( | hid_t | plist_id, |
unsigned | filter, | ||
unsigned int * | flags, | ||
size_t * | cd_nelmts, | ||
unsigned | cd_values[], | ||
size_t | namelen, | ||
char | name[] | ||
) |
Returns information about a filter in a pipeline (DEPRECATED)
[in] | plist_id | Property list identifier |
[in] | filter | Sequence number within the filter pipeline of the filter for which information is sought |
[out] | flags | Bit vector specifying certain general properties of the filter |
[in,out] | cd_nelmts | Number of elements in cd_values |
[out] | cd_values | Auxiliary data for the filter |
[in] | namelen | Anticipated number of characters in name |
[out] | name | Name of the filter |
H5Pget_filter1() returns information about a filter, specified by its filter number, in a filter pipeline, specified by the property list with which it is associated.
plist_id
must be a dataset or group creation property list.
filter
is a value between zero and N-1, as described in H5Pget_nfilters(). The function will return a negative value if the filter number is out of range.
The structure of the flags
argument is discussed in H5Pset_filter().
On input, cd_nelmts
indicates the number of entries in the cd_values
array, as allocated by the caller; on return, cd_nelmts
contains the number of values defined by the filter.
If name
is a pointer to an array of at least namelen
bytes, the filter name will be copied into that array. The name will be null terminated if namelen
is large enough. The filter name returned will be the name appearing in the file, the name registered for the filter, or an empty string.
filter
parameter type changed to unsigned.herr_t H5Pget_filter_by_id1 | ( | hid_t | plist_id, |
H5Z_filter_t | id, | ||
unsigned int * | flags, | ||
size_t * | cd_nelmts, | ||
unsigned | cd_values[], | ||
size_t | namelen, | ||
char | name[] | ||
) |
Returns information about the specified filter.
[in] | plist_id | Property list identifier |
[in] | id | Filter identifier |
[out] | flags | Bit vector specifying certain general properties of the filter |
[in,out] | cd_nelmts | Number of elements in cd_values |
[out] | cd_values | Auxiliary data for the filter |
[in] | namelen | Anticipated number of characters in name |
[out] | name | Name of the filter |
H5Pget_filter_by_id1() returns information about a filter, specified in id
, a filter identifier.
plist_id
must be a dataset or group creation property list and id
must be in the associated filter pipeline.
The id
and flags
parameters are used in the same manner as described in the discussion of H5Pset_filter().
Aside from the fact that they are used for output, the parameters cd_nelmts
and cd_values
[] are used in the same manner as described in the discussion of H5Pset_filter(). On input, the cd_nelmts
parameter indicates the number of entries in the cd_values
[] array allocated by the calling program; on exit it contains the number of values defined by the filter.
On input, the namelen
parameter indicates the number of characters allocated for the filter name by the calling program in the array name
[]. On exit name
[] contains the name of the filter with one character of the name in each element of the array.
If the filter specified in id
is not set for the property list, an error will be returned and this function will fail.
H5D_layout_t H5Pget_layout | ( | hid_t | plist_id | ) |
Returns the layout of the raw data for a dataset.
[in] | plist_id | Dataset creation property list identifier |
H5Pget_layout() returns the layout of the raw data for a dataset. This function is only valid for dataset creation property lists.
Note that a compact storage layout may affect writing data to the dataset with parallel applications. See the H5Dwrite() documentation for details.
Gets the number of mappings for the virtual dataset.
[in] | dcpl_id | Dataset creation property list identifier |
[out] | count | The number of mappings |
H5Pget_virtual_count() gets the number of mappings for a virtual dataset that has the creation property list specified by dcpl_id
.
Gets the name of a source dataset used in the mapping.
[in] | dcpl_id | Dataset creation property list identifier |
[in] | index | Mapping index. The value of index is 0 (zero) or greater and less than count (0 ≤ index < count ), where count is the number of mappings returned by H5Pget_virtual_count(). |
[out] | name | A buffer containing the name of the source dataset |
[in] | size | The size, in bytes, of the name buffer. Must be the size of the dataset name in bytes plus 1 for a NULL terminator |
H5Pget_virtual_dsetname() takes the dataset creation property list for the virtual dataset, dcpl_id
, the mapping index, index
, the size of the dataset name for a source dataset, size
, and retrieves the name of the source dataset used in the mapping.
Up to size
characters of the dataset name are returned in name
; additional characters, if any, are not returned to the user application.
If the length of the dataset name, which determines the required value of size
, is unknown, a preliminary call to H5Pget_virtual_dsetname() with the last two parameters set to NULL and zero respectively can be made. The return value of this call will be the size in bytes of the dataset name. That value, plus 1 for a NULL terminator, must then be assigned to size
for a second H5Pget_virtual_dsetname() call, which will retrieve the actual dataset name.
Gets the filename of a source dataset used in the mapping.
[in] | dcpl_id | Dataset creation property list identifier |
[in] | index | Mapping index. The value of index is 0 (zero) or greater and less than count (0 ≤ index < count ), where count is the number of mappings returned by H5Pget_virtual_count(). |
[out] | name | A buffer containing the name of the file containing the source dataset |
[in] | size | The size, in bytes, of the name buffer. Must be the size of the filename in bytes plus 1 for a NULL terminator |
H5Pget_virtual_filename() takes the dataset creation property list for the virtual dataset, dcpl_id
, the mapping index, index
, the size of the filename for a source dataset, size
, and retrieves the name of the file for a source dataset used in the mapping.
Up to size
characters of the filename are returned in name
; additional characters, if any, are not returned to the user application.
If the length of the filename, which determines the required value of size
, is unknown, a preliminary call to H5Pget_virtual_filename() with the last two parameters set to NULL and zero respectively can be made. The return value of this call will be the size in bytes of the filename. That value, plus 1 for a NULL terminator, must then be assigned to size
for a second H5Pget_virtual_filename() call, which will retrieve the actual filename.
Gets a dataspace identifier for the selection within the source dataset used in the mapping.
[in] | dcpl_id | Dataset creation property list identifier |
[in] | index | Mapping index. The value of index is 0 (zero) or greater and less than count (0 ≤ index < count ), where count is the number of mappings returned by H5Pget_virtual_count(). |
H5Pget_virtual_srcspace() takes the dataset creation property list for the virtual dataset, dcpl_id
, and the mapping index, index
, and returns a dataspace identifier for the selection within the source dataset used in the mapping.
Gets a dataspace identifier for the selection within the virtual dataset used in the mapping.
[in] | dcpl_id | Dataset creation property list identifier |
[in] | index | Mapping index. The value of index is 0 (zero) or greater and less than count (0 ≤ index < count ), where count is the number of mappings returned by H5Pget_virtual_count() |
H5Pget_virtual_vspace() takes the dataset creation property list for the virtual dataset, dcpl_id
, and the mapping index, index
, and returns a dataspace identifier for the selection within the virtual dataset used in the mapping.
herr_t H5Pset_alloc_time | ( | hid_t | plist_id, |
H5D_alloc_time_t | alloc_time | ||
) |
Sets the timing for storage space allocation.
[in] | plist_id | Dataset creation property list identifier |
[in] | alloc_time | When to allocate dataset storage space |
H5Pset_alloc_time() sets up the timing for the allocation of storage space for a dataset's raw data. This property is set in the dataset creation property list plist_id
. Timing is specified in alloc_time
with one of the following values:
H5D_ALLOC_TIME_DEFAULT | Allocate dataset storage space at the default time (Defaults differ by storage method.) |
H5D_ALLOC_TIME_EARLY | Allocate all space when the dataset is created (Default for compact datasets.) |
H5D_ALLOC_TIME_INCR | Allocate space incrementally, as data is written to the dataset
|
H5D_ALLOC_TIME_LATE | Allocate all space when data is first written to the dataset (Default for contiguous datasets.) |
Sets the size of the chunks used to store a chunked layout dataset.
[in] | plist_id | Dataset creation property list identifier |
[in] | ndims | The number of dimensions of each chunk |
[in] | dim | An array defining the size, in dataset elements, of each chunk |
H5Pset_chunk() sets the size of the chunks used to store a chunked layout dataset. This function is only valid for dataset creation property lists.
The ndims
parameter currently must be the same size as the rank of the dataset.
The values of the dim
array define the size of the chunks to store the dataset's raw data. The unit of measure for dim
values is dataset elements.
As a side-effect of this function, the layout of the dataset is changed to H5D_CHUNKED, if it is not already so set.
Sets the edge chunk option in a dataset creation property list.
[in] | plist_id | Dataset creation property list identifier |
[in] | opts | Edge chunk option flag. Valid values are:
|
H5Pset_chunk_opts() sets the edge chunk option in the dataset creation property list dcpl_id
.
The available option is detailed in the parameters section. Only chunks that are not completely filled by the dataset's dataspace are affected by this option. Such chunks are referred to as partial edge chunks.
Motivation: H5Pset_chunk_opts() is used to specify storage options for chunks on the edge of a dataset's dataspace. This capability allows the user to tune performance in cases where the dataset size may not be a multiple of the chunk size and the handling of partial edge chunks can impact performance.
Sets deflate (GNU gzip) compression method and compression level.
[in] | plist_id | Object creation property list identifier |
[in] | level | Compression level |
H5Pset_deflate() sets the deflate compression method and the compression level, level
, for a dataset or group creation property list, plist_id
.
The filter identifier set in the property list is H5Z_FILTER_DEFLATE.
The compression level, level
, is a value from zero to nine, inclusive. A compression level of 0 (zero) indicates no compression; compression improves but speed slows progressively from levels 1 through 9:
Compression Level | Gzip Action |
---|---|
0 | No compression |
1 | Best compression speed; least compression |
2 through 8 | Compression improves; speed degrades |
9 | Best compression ratio; slowest speed |
Note that setting the compression level to 0 (zero) does not turn off use of the gzip filter; it simply sets the filter to perform no compression as it processes the data.
HDF5 relies on GNU gzip for this compression.
Sets the flag to create minimized dataset object headers.
[in] | dcpl_id | Dataset creation property list identifier |
[in] | minimize | Flag for indicating whether or not a dataset's object header will be minimized |
H5Pset_dset_no_attrs_hint() sets the no dataset attributes hint setting for the dataset creation property list dcpl_id
. Datasets created with the dataset creation property list dcpl_id
will have their object headers minimized if the boolean flag minimize
is set to true. By setting minimize
to true, the library expects that no attributes will be added to the dataset. Attributes can be added, but they are appended with a continuation message, which can reduce performance.
This setting interacts with H5Fset_dset_no_attrs_hint(): if either is set to true, then the created dataset's object header will be minimized.
Adds an external file to the list of external files.
[in] | plist_id | Dataset creation property list identifier |
[in] | name | Name of an external file |
[in] | offset | Offset, in bytes, from the beginning of the file to the location in the file where the data starts |
[in] | size | Number of bytes reserved in the file for the data |
The first call to H5Pset_external() sets the external storage property in the property list, thus designating that the dataset will be stored in one or more non-HDF5 file(s) external to the HDF5 file. This call also adds the file name
as the first file in the list of external files. Subsequent calls to the function add the named file as the next file in the list.
If a dataset is split across multiple files, then the files should be defined in order. The total size of the dataset is the sum of the size
arguments for all the external files. If the total size is larger than the size of a dataset then the dataset can be extended (provided the data space also allows the extending).
The size
argument specifies the number of bytes reserved for data in the external file. If size
is set to H5F_UNLIMITED, the external file can be of unlimited size and no more files can be added to the external files list. If size
is set to 0 (zero), no external file will actually be created.
All of the external files for a given dataset must be specified with H5Pset_external() before H5Dcreate() is called to create the dataset. If one these files does not exist on the system when H5Dwrite() is called to write data to it, the library will create the file.
herr_t H5Pset_fill_time | ( | hid_t | plist_id, |
H5D_fill_time_t | fill_time | ||
) |
Sets the time when fill values are written to a dataset.
[in] | plist_id | Dataset creation property list identifier |
[in] | fill_time | When to write fill values to a dataset |
H5Pset_fill_time() sets up the timing for writing fill values to a dataset. This property is set in the dataset creation property list plist_id
. Timing is specified in fill_time
with one of the following values:
H5D_FILL_TIME_IFSET | Write fill values to the dataset when storage space is allocated only if there is a user-defined fill value, i.e.,one set with H5Pset_fill_value(). (Default) |
H5D_FILL_TIME_ALLOC | Write fill values to the dataset when storage space is allocated. |
H5D_FILL_TIME_NEVER | Never write fill values to the dataset. |
Sets the fill value for a dataset.
[in] | plist_id | Dataset creation property list identifier |
[in] | type_id | Datatype of value |
[in] | value | Pointer to buffer containing value to use as fill value |
H5Pset_fill_value() sets the fill value for a dataset in the dataset creation property list. value
is interpreted as being of datatype type_id
. This datatype may differ from that of the dataset, but the HDF5 library must be able to convert value
to the dataset datatype when the dataset is created.
The default fill value is 0 (zero), which is interpreted according to the actual dataset datatype.
Setting value
to NULL indicates that the fill value is to be undefined.
herr_t H5Pset_layout | ( | hid_t | plist_id, |
H5D_layout_t | layout | ||
) |
Sets the type of storage used to store the raw data for a dataset.
[in] | plist_id | Dataset creation property list identifier |
[in] | layout | Type of storage layout for raw data |
H5Pset_layout() sets the type of storage used to store the raw data for a dataset. This function is only valid for dataset creation property lists.
Valid values for layout
are:
Note that a compact storage layout may affect writing data to the dataset with parallel applications. See the note in H5Dwrite() documentation for details.
Sets up the use of the N-Bit filter.
[in] | plist_id | Dataset creation property list identifier |
H5Pset_nbit() sets the N-Bit filter, H5Z_FILTER_NBIT, in the dataset creation property list plist_id
.
The HDF5 user can create an N-Bit datatype with the following code:
hid_t nbit_datatype = H5Tcopy(H5T_STD_I32LE); H5Tset_precision(nbit_datatype, 16); H5Tset_offset(nbit_datatype, 4);
In memory, one value of the N-Bit datatype in the above example will be stored on a little-endian machine as follows:
byte 3 | byte 2 | byte 1 | byte 0 |
???????? | ????SPPP | PPPPPPPP | PPPP???? |
Note: S - sign bit, P - significant bit, ? - padding bit; For signed integer, the sign bit is included in the precision.
When data of the above datatype is stored on disk using the N-bit filter, all padding bits are chopped off and only significant bits are stored. The values on disk will be something like:
1st value | 2nd value | ... |
SPPPPPPPPPPPPPPP | SPPPPPPPPPPPPPPP | ... |
The N-Bit filter is used effectively for compressing data of an N-Bit datatype as well as a compound and an array datatype with N-Bit fields. However, the datatype classes of the N-Bit datatype or the N-Bit field of the compound datatype or the array datatype are limited to integer or floating-point.
The N-Bit filter supports complex situations where a compound datatype contains member(s) of a compound datatype or an array datatype that has a compound datatype as the base type. However, it does not support the situation where an array datatype has a variable-length or variable-length string as its base datatype. The filter does support the situation where a variable-length or variable-length string is a member of a compound datatype.
The N-Bit filter allows all other HDF5 datatypes (such as time, string, bitfield, opaque, reference, enum, and variable length) to pass through as a no-op.
Like other I/O filters supported by the HDF5 library, application using the N-Bit filter must store data with chunked storage.
By nature, the N-Bit filter should not be used together with other I/O filters.
herr_t H5Pset_scaleoffset | ( | hid_t | plist_id, |
H5Z_SO_scale_type_t | scale_type, | ||
int | scale_factor | ||
) |
Sets up the use of the scale-offset filter.
[in] | plist_id | Dataset creation property list identifier |
[in] | scale_type | Flag indicating compression method |
[in] | scale_factor | Parameter related to scale. Must be non-negative |
H5Pset_scaleoffset() sets the scale-offset filter, H5Z_FILTER_SCALEOFFSET, for a dataset.
Generally speaking, scale-offset compression performs a scale and/or offset operation on each data value and truncates the resulting value to a minimum number of bits (MinBits) before storing it. The current scale-offset filter supports integer and floating-point datatypes.
For an integer datatype, the parameter scale_type
should be set to H5Z_SO_INT (2). The parameter scale_factor
denotes MinBits. If the user sets it to H5Z_SO_INT_MINBITS_DEFAULT (0), the filter will calculate MinBits. If scale_factor
is set to a positive integer, the filter does not do any calculation and just uses the number as MinBits. However, if the user gives a MinBits that is less than what would be generated by the filter, the compression will be lossy. Also, the MinBits supplied by the user cannot exceed the number of bits to store one value of the dataset datatype.
For a floating-point datatype, the filter adopts the GRiB data packing mechanism, which offers two alternate methods: E-scaling and D-scaling. Both methods are lossy compression. If the parameter scale_type
is set to H5Z_SO_FLOAT_DSCALE (0), the filter will use the D-scaling method; if it is set to H5Z_SO_FLOAT_ESCALE (1), the filter will use the E-scaling method. Since only the D-scaling method is implemented, scale_type
should be set to H5Z_SO_FLOAT_DSCALE or 0.
When the D-scaling method is used, the original data is "D" scaled — multiplied by 10 to the power of scale_factor
, and the "significant" part of the value is moved to the left of the decimal point. Care should be taken in setting the decimal scale_factor
so that the integer part will have enough precision to contain the appropriate information of the data value. For example, if scale_factor
is set to 2, the number 104.561 will be 10456.1 after "D" scaling. The last digit 1 is not "significant" and is thrown off in the process of rounding. The user should make sure that after "D" scaling and rounding, the data values are within the range that can be represented by the integer (same size as the floating-point type).
Valid values for scale_type are as follows:
H5Z_SO_FLOAT_DSCALE (0) | Floating-point type, using variable MinBits method |
H5Z_SO_FLOAT_ESCALE (1) | Floating-point type, using fixed MinBits method |
H5Z_SO_INT (2) | Integer type |
The meaning of scale_factor
varies according to the value assigned to scale_type:
scale_type value | scale_factor description |
---|---|
H5Z_SO_FLOAT_DSCALE | Denotes the decimal scale factor for D-scaling and can be positive, negative or zero. This is the current implementation of the library. |
H5Z_SO_FLOAT_ESCALE | Denotes MinBits for E-scaling and must be a positive integer. This is not currently implemented by the library. |
H5Z_SO_INT | Denotes MinBits and it should be a positive integer or H5Z_SO_INT_MINBITS_DEFAULT (0). If it is less than 0, the library will reset it to 0 since it is not implemented. |
Like other I/O filters supported by the HDF5 library, an application using the scale-offset filter must store data with chunked storage.
Sets up use of the shuffle filter.
[in] | plist_id | Dataset creation property list identifier |
H5Pset_shuffle() sets the shuffle filter, H5Z_FILTER_SHUFFLE, in the dataset creation property list plist_id
. The shuffle filter de-interlaces a block of data by reordering the bytes. All the bytes from one consistent byte position of each data element are placed together in one block; all bytes from a second consistent byte position of each data element are placed together a second block; etc. For example, given three data elements of a 4-byte datatype stored as 012301230123, shuffling will re-order data as 000111222333. This can be a valuable step in an effective compression algorithm because the bytes in each byte position are often closely related to each other and putting them together can increase the compression ratio.
As implied above, the primary value of the shuffle filter lies in its coordinated use with a compression filter; it does not provide data compression when used alone. When the shuffle filter is applied to a dataset immediately prior to the use of a compression filter, the compression ratio achieved is often superior to that achieved by the use of a compression filter without the shuffle filter.
Sets up use of the SZIP compression filter.
[in] | plist_id | Dataset creation property list identifier |
[in] | options_mask | A bit-mask conveying the desired SZIP options; Valid values are H5_SZIP_EC_OPTION_MASK and H5_SZIP_NN_OPTION_MASK. |
[in] | pixels_per_block | The number of pixels or data elements in each data block (max H5_SZIP_MAX_PIXELS_PER_BLOCK) |
H5Pset_szip() sets an SZIP compression filter, H5Z_FILTER_SZIP, for a dataset. SZIP is a compression method designed for use with scientific data.
Before proceeding, all users should review the “Limitations” section below.
Users familiar with SZIP outside the HDF5 context may benefit from reviewing the Note “For Users Familiar with SZIP in Other Contexts” below.
In the text below, the term pixel refers to an HDF5 data element. This terminology derives from SZIP compression's use with image data, where pixel referred to an image pixel.
The SZIP bits_per_pixel
value (see Note, below) is automatically set, based on the HDF5 datatype. SZIP can be used with atomic datatypes that may have size of 8, 16, 32, or 64 bits. Specifically, a dataset with a datatype that is 8-, 16-, 32-, or 64-bit signed or unsigned integer; char; or 32- or 64-bit float can be compressed with SZIP. See Note, below, for further discussion of the SZIP bits_per_pixel
setting.
SZIP options are passed in an options mask, options_mask
, as follows.
Option | Description (Mutually exclusive; select one) |
---|---|
H5_SZIP_EC_OPTION_MASK | Selects entropy coding method |
H5_SZIP_NN_OPTION_MASK | Selects nearest neighbor preprocessing followed by entropy coding |
The following guidelines can be used in determining which option to select:
Other factors may affect results, but the above criteria provides a good starting point for optimizing data compression.
SZIP compresses data block by block, with a user-tunable block size. This block size is passed in the parameter pixels_per_block
and must be even and not greater than 32, with typical values being 8, 10, 16, or 32. This parameter affects compression ratio; the more pixel values vary, the smaller this number should be to achieve better performance.
In HDF5, compression can be applied only to chunked datasets. If pixels_per_block
is bigger than the total number of elements in a dataset chunk, H5Pset_szip() will succeed but the subsequent call to H5Dcreate() will fail; the conflict can be detected only when the property list is used.
To achieve optimal performance for SZIP compression, it is recommended that a chunk's fastest-changing dimension be equal to N times pixels_per_block
where N is the maximum number of blocks per scan line allowed by the SZIP library. In the current version of SZIP, N is set to 128.
SZIP compression is an optional HDF5 filter.
Limitations:
pixels_in_object
, the number of pixels in the object to be compressedbits_per_pixel
, the number of bits per pixelpixels_per_scanline
, the number of pixels per scan linepixels_in_object
to the number of elements in a chunk and bits_per_pixel
to the size of the element or pixel datatype.pixels_per_scanline:
pixels_per_scanline
to 128 times pixels_per_block
.pixels_per_block
, set pixels_per_scanline
to the minimum of size and 128 times pixels_per_block
.pixels_per_block
but greater than the number elements in the chunk, set pixels_per_scanline
to the minimum of the number elements in the chunk and 128 times pixels_per_block
.bits_per_pixel
is set to the precision of the HDF5 datatype.bits_per_pixel
will be set to the number of bits in the full size of the data element.bits_per_pixel
will be set to 32.bits_per_pixel
will be set to 64.herr_t H5Pset_virtual | ( | hid_t | dcpl_id, |
hid_t | vspace_id, | ||
const char * | src_file_name, | ||
const char * | src_dset_name, | ||
hid_t | src_space_id | ||
) |
Sets the mapping between virtual and source datasets.
[in] | dcpl_id | Dataset creation property list identifier |
[in] | vspace_id | The dataspace identifier with the selection within the virtual dataset applied, possibly an unlimited selection |
[in] | src_file_name | The name of the HDF5 file where the source dataset is located or a "." (period) for a source dataset in the same file. The file might not exist yet. The name can be specified using a C-style printf statement as described below. |
[in] | src_dset_name | The path to the HDF5 dataset in the file specified by src_file_name . The dataset might not exist yet. The dataset name can be specified using a C-style printf statement as described below. |
[in] | src_space_id | The source dataset's dataspace identifier with a selection applied, possibly an unlimited selection |
H5Pset_virtual() maps elements of the virtual dataset (VDS) described by the virtual dataspace identifier vspace_id
to the elements of the source dataset described by the source dataset dataspace identifier src_space_id
. The source dataset is identified by the name of the file where it is located, src_file_name
, and the name of the dataset, src_dset_name
.
printf
Formatting Statements:printf
formatting allows a pattern to be specified in the name of a source file or dataset. Strings for the file and dataset names are treated as literals except for the following substitutions: "%%" | Replaced with a single "%" (percent) character. |
"%<d>b" | Where "<d>" is the virtual dataset dimension axis (0-based) and "b" indicates that the block count of the selection in that dimension should be used. The full expression (for example, "%0b" ) is replaced with a single numeric value when the mapping is evaluated at VDS access time. Example code for many source and virtual dataset mappings is available in the "Examples of Source to Virtual Dataset Mapping" chapter in the RFC: HDF5 Virtual Dataset. |
src_file_name
as described below: src_file_name
is a "."
(period) then it refers to the file containing the virtual dataset. src_file_name
is a relative pathname, the following steps are performed:HDF5_VDS_PREFIX
and will try to prepend each prefix to src_file_name
to form a new src_file_name
. If the new src_file_name
does not exist or if HDF5_VDS_PREFIX
is not set, the library will get the prefix set via H5Pset_virtual_prefix() and prepend it to src_file_name
to form a new src_file_name
. If the new src_file_name
does not exist or no prefix is being set by H5Pset_virtual_prefix() then the path of the file containing the virtual dataset is obtained. This path can be the absolute path or the current working directory plus the relative path of that file when it is created/opened. The library will prepend this path to src_file_name
to form a new src_file_name
.src_file_name
does not exist, then the library will look for src_file_name
and will return failure/success accordingly. src_file_name
is an absolute pathname, the library will first try to find src_file_name
. If src_file_name
does not exist, src_file_name
is stripped of directory paths to form a new src_file_name
. The search for the new src_file_name
then follows the same steps as described above for a relative pathname. See examples below illustrating how src_file_name
is stripped to form a new src_file_name
. src_file_name
is considered to be an absolute pathname when the following condition is true: src_file_name
is a slash (/
).src_file_name
of /tmp/A.h5
. If that source file does not exist, the new src_file_name
after stripping will be A.h5
. src_file_name
is an absolute drive with absolute pathname.src_file_name
of /tmp/A.h5
. If that source file does not exist, the new src_file_name
after stripping will be A.h5
.src_file_name
is an absolute pathname without specifying drive name.src_file_name
of /tmp/A.h5
. If that source file does not exist, the new src_file_name
after stripping will be A.h5
.src_file_name
is an absolute drive with relative pathname.src_file_name
of /tmp/A.h5
. If that source file does not exist, the new src_file_name
after stripping will be tmp/A.h5
.src_file_name
is in UNC (Uniform Naming Convention) format with server name, share name, and pathname.src_file_name
of /tmp/A.h5
. If that source file does not exist, the new src_file_name
after stripping will be A.h5
.src_file_name
is in Long UNC (Uniform Naming Convention) format with server name, share name, and pathname.src_file_name
of /tmp/A.h5
. If that source file does not exist, the new src_file_name
after stripping will be A.h5
.src_file_name
is in Long UNC (Uniform Naming Convention) format with an absolute drive and an absolute pathname.src_file_name
of /tmp/A.h5
. If that source file does not exist, the new src_file_name
after stripping will be A.h5