HDF5
1.14.4.3
API Reference
|
Navigate back: Main / Getting Started with HDF5 / Command-line Tools
The h5dump and h5ls tools can both be used to view the contents of an HDF5 file. The tools are discussed below:
The h5dump tool dumps or displays the contents of an HDF5 file (textually). By default if you specify no options, h5dump will display the entire contents of a file. There are many h5dump options for examining specific details of a file. To see all of the available h5dump options, specify the -h
or –help
option:
The following h5dump options can be helpful in viewing the content and structure of a file:
Option | Description | Comment |
---|---|---|
-n, –contents | Displays a list of the objects in a file | See Example 1 |
-n 1, –contents=1 | Displays a list of the objects and attributes in a file | See Example 6 |
-H, –header | Displays header information only (no data) | See Example 2 |
-A 0, –onlyattr=0 | Suppresses the display of attributes | See Example 2 |
-N P, –any_path=P | Displays any object or attribute that matches path P | See Example 6 |
The following command displays a list of the objects in the file OMI-Aura.he5 (an HDF-EOS5 file):
As shown in the output below, the objects (groups, datasets) are listed to the left, followed by their names. You can see that this file contains two root groups, HDFEOS and HDFEOS INFORMATION:
The file structure of the OMI-Aura.he5 file can be seen with the following command. The -A 0 option suppresses the display of attributes:
Output of this command is shown below:
The h5ls tool by default just displays the objects in the root group. It will not display items in groups beneath the root group unless specified. Useful h5ls options for viewing file content and structure are:
Option | Description | Comment |
---|---|---|
-r | Lists all groups and objects recursively | See Example 3 |
-v | Generates verbose output (lists dataset properties, attributes and attribute values, but no dataset values) |
The following command shows the contents of the HDF-EOS5 file OMI-Aura.he5. The output is similar to h5dump, except that h5ls also shows dataspace information for each dataset:
The output is shown below:
Both h5dump and h5ls can be used to view specific datasets.
Useful h5dump options for examining specific datasets include:
Option | Description | Comment |
---|---|---|
-d D, –dataset=D | Displays dataset D | See Example 4 |
-H, –header | Displays header information only | See Example 4 |
-p, –properties | Displays dataset filters, storage layout, and fill value properties | See Example 5 |
-A 0, –onlyattr=0 | Suppresses the display of attributes | See Example 2 |
-N P, –any_path=P | Displays any object or attribute that matches path P | See Example 6 |
A specific dataset can be viewed with h5dump
using the -d D
option and specifying the entire path and name of the dataset for D
. The path is important in identifying the correct dataset, as there can be multiple datasets with the same name. The path can be determined by looking at the objects in the file with h5dump -n
.
The following example uses the groups.h5
file that is created by the Examples from Learning the Basics example h5_crtgrpar.c
. To display dset1
in the groups.h5
file below, specify dataset /MyGroup/dset1
. The -H
option is used to suppress printing of the data values:
Contents of groups.h5
Display dataset "dset1"
The -p
option is used to examine the dataset filters, storage layout, and fill value properties of a dataset.
This option can be useful for checking how well compression works, or even for analyzing performance and dataset size issues related to chunking. (The smaller the chunk size, the more chunks that HDF5 has to keep track of, which increases the size of the file and potentially affects performance.)
In the file shown below the dataset /DS1
is both chunked and compressed:
You can obtain the h5ex_d_gzip.c
program that created this file, as well as the file created, from the Examples by API page.
Specific datasets can be specified with h5ls
by simply adding the dataset path and dataset after the file name. As an example, this command displays dataset dset2
in the groups.h5
file used in Example 4 :
Just the dataspace information gets displayed:
The following options can be used to see detailed information about a dataset.
Option | Description |
---|---|
-v, –verbose | Generates verbose output (lists dataset properties, attributes and attribute values, but no dataset values) |
-d, –data | Displays dataset values |
The output of using -v
is shown below:
The output of using -d
is shown below:
Both h5dump and h5ls can be used to view specific groups in a file.
The h5dump
options that are useful for examining groups are:
Option | Description |
---|---|
-g G, –group=G | Displays group G and its members |
-H, –header | Displays header information only |
-A 0, –onlyattr=0 | Suppresses the display of attributes |
To view the contents of the HDFEOS
group in the OMI file mentioned previously, you can specify the path and name of the group as follows:
The -A 0
option suppresses attributes and -H
suppresses printing of data values:
You can view the contents of a group with h5ls
/ by specifying the group after the file name. To use h5ls
to view the contents of the /HDFEOS
group in the OMI-Aura.he5
file, type:
The output of this command is:
If you specify the -v
option, you can also see the attributes and properties of the datasets.
Attributes are displayed by default if using h5dump
. Some files contain many attributes, which can make it difficult to examine the objects in the file. Shown below are options that can help when using h5dump
to work with files that have attributes.
The -a
A option will display an attribute. However, the path to the attribute must be included when specifying this option. For example, to see the ScaleFactor
attribute in the OMI-Aura.he5
file, type:
This command displays:
How can you determine the path to the attribute? This can be done by looking at the file contents with the -n 1
option:
Below is a portion of the output for this command:
There can be multiple objects or attributes with the same name in a file. How can you make sure you are finding the correct object or attribute? You can first determine how many attributes there are with a specified name, and then examine the paths to them.
The -N
option can be used to display all objects or attributes with a given name. For example, there are four attributes with the name ScaleFactor
in the OMI-Aura.he5
file, as can be seen below with the -N
option:
It outputs:
If you include the -v
(verbose) option for h5ls
, you will see all of the attributes for the specified file, dataset or group. You cannot display individual attributes.
If you have a very large dataset, you may wish to subset or see just a portion of the dataset. This can be done with the following h5dump
options.
Option | Description |
---|---|
-d D, –dataset=D | Dataset D |
-s START, –start=START | Offset or start of subsetting selection |
-S STRIDE, –stride=STRIDE | Stride (sampling along a dimension). The default (unspecified, or 1) selects every element along a dimension, a value of 2 selects every other element, a value of 3 selects every third element, ... |
-c COUNT, –count=COUNT | Number of blocks to include in the selection |
-k BLOCK, –block=BLOCK | Size of the block in a hyperslab. The default (unspecified, or 1) is for the block size to be the size of a single element. |
The START (s)
, STRIDE (S)
, COUNT (c)
, and BLOCK (k)
options define the shape and size of the selection. They are arrays with the same number of dimensions as the rank of the dataset's dataspace, and they all work together to define the selection. A change to one of these arrays can affect the others.
When specifying these h5dump options, a comma is used as the delimiter for each dimension in the option value. For example, with a 2-dimensional dataset, the option value is specified as "H,W", where H is the height and W is the width. If the offset is 0 for both dimensions, then START
would be specified as follows:
There is also a shorthand way to specify these options with brackets at the end of the dataset name:
Multiple dimensions are separated by commas. For example, a subset for a 2-dimensional dataset would be specified as follows:
For a detailed understanding of how selections works, see the H5Sselect_hyperslab API in the HDF5 Reference Manual.
The dataset SolarZenithAngle in the OMI-Aura.he5 file can be used to illustrate these options. This dataset is a 2-dimensional dataset of size 720 (height) x 1440 (width). Too much data will be displayed by simply viewing the specified dataset with the -d
option:
Subsetting narrows down the output that is displayed. In the following example, the first 15x10 elements (-c "15,10") are specified, beginning with position (0,0) (-s "0,0"):
If using the shorthand method, specify:
Where, the -d
option must be specified before subsetting options (if not using the shorthand method).
The -A 0
option suppresses the printing of attributes.
The -w 0
option sets the number of columns of output to the maximum allowed value (65535). This ensures that there are enough columns specified for displaying the data.
Either command displays:
What if we wish to read three rows of three elements at a time (-c "3,3"), where each element is a 2 x 3 block (-k "2,3") and we wish to begin reading from the second row (-s "1,0")?
You can do that with the following command:
In this case, the stride must be specified as 2 by 3 (or larger) to accommodate the reading of 2 by 3 blocks. If it is smaller, the command will fail with the error,
The output of the above command is shown below:
The following datatypes are discussed, using the output of h5dump
with HDF5 files from the Examples by API page:
Users have been confused by the difference between an Array datatype (H5T_ARRAY) and a dataset that (has a dataspace that) is an array.
Typically, these users want a dataset that has a simple datatype (like integer or float) that is an array, like the following dataset /DS1
. It has a datatype of H5T_STD_I32LE (32-bit Little-Endian Integer) and is a 4 by 7 array:
Contrast that with the following dataset that has both an Array datatype and is an array:
In this file, dataset /DS1
has a datatype of
and it also has a dataspace of
In other words, it is an array of four elements, in which each element is a 3 by 5 array of H5T_STD_I64LE.
This dataset is much more complex. Also note that subsetting cannot be done on Array datatypes.
See this section for more information on the Array datatype.
An Object Reference is a reference to an entire object (dataset, group, or named datatype). A dataset with an Object Reference datatype consists of one or more Object References. An Object Reference dataset can be used as an index to an HDF5 file.
The /DS1
dataset in the following file (h5ex_t_objref.h5
) is an Object Reference dataset. It contains two references, one to group /G1
and the other to dataset /DS2
:
A Region Reference is a reference to a selection within a dataset. A selection can be either individual elements or a hyperslab. In h5dump
you will see the name of the dataset along with the elements or slab that is selected. A dataset with a Region Reference datatype consists of one or more Region References.
An example of a Region Reference dataset (h5ex_t_regref.h5
) can be found on the Examples by API page, under Datatypes. If you examine this dataset with h5dump
you will see that /DS1
is a Region Reference dataset as indicated by its datatype, highlighted in bold below:
It contains two Region References:
/DS2 : (0,1), (2,11), (1,0), (2,4)
See the H5Sselect_elements API in the HDF5 User Guide for information on selecting individual elements. /DS2 : (0,0)-(0,2), (0,11)-(0,13), (2,0)-(2,2), (2,11)-(2,13)
See the H5Sselect_hyperslab API in the HDF5 User Guide for how to do hyperslab selection.If you look at the code that creates the dataset (h5ex_t_regref.c
) you will see that the first reference is created with these calls:
where the buffer containing the coordinates to select is:
The second reference is created by calling,
where start, stride, count, and block have these values:
These start, stride, count, and block values will select the elements shown in bold in the dataset:
If you use h5dump
to select a subset of dataset /DS2
with these start, stride, count, and block values, you will see that the same elements are selected:
For more information on selections, see the tutorial topic on Reading From or Writing To a Subset of a Dataset. Also see the Dataset Subset tutorial topic on using h5dump
to view a subset.
There are two types of string data, fixed length strings and variable length strings.
Below is the h5dump
output for two files that have the same strings written to them. In one file, the strings are fixed in length, and in the other, the strings have different sizes (and are variable in size).
Dataset of Fixed Length Strings
Dataset of Variable Length Strings
You might wonder which to use. Some comments to consider are included below.
Navigate back: Main / Getting Started with HDF5 / Command-line Tools