August, 2001
Since HDF5 is not backward compatible with earlier versions of HDF, many users must transition from HDF4 to HDF5. This transition may require rewriting software and possibly rewriting datasets. The details depend on the goals and situation of the users.
NCSA has published a default mapping of HDF4 objects to HDF5 [1]. This mapping provides guidance and recommendations for how HDF4 files and objects should be represented in HDF5. Of course, users may wish to do something other than the default, in order to take best advantage of HDF5.
The h4toh5 utility is provided as part of the HDF5.1.4.2 release [2]. This tool converts one HDF4 file to an equivalent HDF5 file, using the HDF4 to HDF5 mapping [1]. It is important to realize that this is a default conversion, which may not preserve some of the 'semantics' of the HDF4 data, particularly if the file is complicated. If the default conversion is not adequate for some purpose, it may serve as an example or prototype from which to construct a customized conversion.
This experiment tested the h4toh5 utility using NASA HDF4 datasets as input. The goal is to test the utility with a set of real files, to assure that it works, and to assess it's performance.
The input data was all real NASA science data, provided to the public in HDF4. Thus, this is a realistic test in that it used real data.
Some of the NASA datasets were created with the HDF-EOS library (using HDF4). It is important to realize that the h4toh5 converter does not 'understand' HDF-EOS. In this case, the native HDF4 components of Grids, Swaths, etc., were converted into HDF5 objects. The result is definitely not a legal HDF-EOS5 file. The HDF-EOS5 library stores the HDF-EOS objects in ways that are optimized for HDF5, which are not the default translation of how they were stored in HDF4.
Therefore, the conversions performed on these datasets should be seen as a demonstration that conversion is feasible, although custom conversions may well be needed to create the desired HDF5 file.
A sample of NASA datasets were acquired from DAACs and DIAL. All were obtained from public sources of sample or real data. These datasets included data from approximately 33* data products, from 8 instruments (avhrr, ACRIM, CERES, MISR, MODIS, MOPITT, SSMI). The total number of granules (HDF4 files) was 1776. The data sets are listed in the Appendix.
This sample of data was chosen arbitrarily from what could be obtained from DIAL [7] and from public FTP services at DAACs. Therefore, it is not statistically representative of NASA data or of any specific body of data. However, it is all real or sample data from NASA.
The experiment was run on a dual 550 MHz Pentium III, Linux 2.2.18smp, using a local disk. The h4toh5 utility is from the HDF5.1.4.2 release (August 2001), uses HDF4.2r3. The sizes are from the Linux file system, and times were collected with the system 'time' function.
Each file was converted at least 5 times, with the average time reported.
The output files were visually inspected with the H5View utility [3] and compared to the original HDF4 file (using the JHV tool [4]).
Two files could not be converted by the converter. These files exposed bugs in the converter, which will be fixed in the next release. These are included in the data.
Two files were damaged in transfer and could not be used. These are excluded from the data.
With one exception, the HDF5 files were all within a few percent of the same size as the HDF4 file, with HDF5 usually slightly smaller. The exception was one group of browse files which contain compressed images. These are stored uncompressed in the HDF5 file, resulting in a 3 times larger file. In the future, the h4oth5 conversion will apply compression when used in the HDF4 file, and then the compressed HDF5 file should be approximately the same size as the HDF4 file in all cases. Of course, individual cases will vary.
Of the 1776 files tested, 98% were converted in 1 second or less. The longest conversion time observed was 332 seconds (5 minutes 32 sec) (for a 117 MB HDF4 file). Only 4 files averaged longer than 1 minute conversion (117MB, 781MB, 75 MB and 68 MB respectively). Table 1 shows these statistics. Figure 1 shows a histogram of the top 20.
Highest single time of 1776 datasets | 332 s (MOP02-19970814-L2V0.1.1.hdf) |
Average time > 1 min | 4 out of 1776 datasets (.2%) |
Average time > 1 sec | 19 (1.1%) datasets |
Average time < 1 sec | 98.6% datasets |
In general, larger files take more time to convert, but the relationship is not simply linear with the size of the file. There wasn't enough variability to analyze this. Generally, any file under 4 MB was converted in much less than 1 s. Figure 2 shows a scatter plot for the 1776 files.
Figure 2. Summary of conversion times by the size of the original
HDF4 file.
These tests show that the h4toh5 utility works reliably on a variety of real data, including data written by older versions of HDF4. The small number of failures have been traced to a couple of bugs in the converter which will be fixed in the next release.
The output files appear to contain all the data, and are readily recognized as faithful (if simpleminded) translations of the HDF4 original. The files have similar size to the HDF4 originals, as should be the case.
In the past, there has been considerable concern about the performance of the conversion utility and similar custom programs. This test shows that performance is not a problem, even using a fairly inexpensive PC. Of course, these figures would be much slower if a network disk is used, or on a slower system.
The speed of the conversion is, of course, related to the size of the files. However, the slowest conversion was not the largest file. It is possible that the speed of conversion is limited by the size of the largest object, or the number of objects, or some combination of factors. This cannot be determined from this limited sample of data. Also, the next release of the h4toh5 utility should have even better performance, especially for large objects.
This study shows that converting HDF4 files into HDF5 is feasible, even for files in the range of 800 MB. With conversion times of a few seconds to a few minutes, it is clear that whole archives could be converted in a few hours. Alternatively most data could be converted "on the fly" when requested from a server.
As discussed above, the h4toh5 utility would not be appropriate to convert HDF-EOS files. The heconvert tool [5] will provide a similar conversion for HDF-EOS files. The heconvert tool could not be tested for this study, because it is not available for Linux.
For some data products, default conversions may not be sufficient. A custom converter would be needed, perhaps using functions from the libh4toh5*** to convert individual objects. This study shows that once created, a custom converter should work reliably and efficiently.
Overall, this study suggests that converting files from HDF4 to HDF5 is technically viable.
*In some cases I'm not certain what officially counts as a 'data product'. There are 33 different 'kinds' of HDF4 file, with many instances (e.g., multiple days or months) in some cases.
***The NCSA libh4toh5 is a library of C functions to perform a default conversion of individual HDF4 objects. This library will be available in Fall 2001.
This report is based upon work supported in part by a Cooperative Agreement with NASA under NASA grant NAG 5-2040. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration.
Other support provided by NCSA and other sponsors and agencies [6].
1. Mike Folk, Robert E. McGrath, and Kent Yang, "Mapping HDF4 Objects to HDF5 Objects",
H4toH5Mapping.pdf.
3. Java HDF5
4. Java HDF
5. HDF-EOS and Related Software
The data was obtained from publicly available NASA datasets.
Data From DIAL | Observation Date | Description |
avhrr8kmmonthly | 1993 | AVHRR 8KM 10 Day Composites - Southeast Asia |
avhrr1km10day (transfer failed--not converted) | 1993 | AVHRR |
avhrr8km10day (transfer failed--not converted) | 1993 | AVHRR |
tahoe-north-middle | 1998 | ASTER L1BT, Lake tahoe |
ASTL1B_000830185 | 1998? | ASTER L1 test? |
CER_ES8_Terra-FM2_Test_SCF_016011.20000830.
subset_70_20_-140_-40.20001012_204110Z |
1998? | |
CER_ES4_Terra-FM1_Beta_015013.200004 | 2000 | Preliminary data (do not use) |
CER_FSW_TRMM-PFM-VIRS_Sample_000000.199801Z06 | 2000 | Preliminary data (do not use) |
98034001632_GOES08_IMAGER | 1998? | GOES Imager? |
MISR_AM1_AS_LANDSFC_P027_O000027_01.dw | 1996 | Prelaunch Land surface |
MISR_AM1_AS_AEROSOL_P027_O000027_01_dw | 1996 | Prelaunch aerosols over ocean |
MOD021KM.A2000080.1815.002.2000083151033 | 2000 | Hudson's Bay |
MOD02HKM.A2000242.0140.002.2000247230108 | 1999 | MODIS Preliminary |
NISE_SSMIF11_19911227 | 1999 | Ice and Snow |
ballon_sp | 1996-1997 | HDFEOS Point data for balloon launches |
misr_l1a_ccd_df.new.nominal | 2001 | MISR L1A |
MOP02-19970814-L2V0.1.1 | 1997 | Sample MOPITT L2 |
Data From DAACS | Source | Observation Date | Description |
MOAPWBM1.P1.ADD2000321.002.2001034035708 | Goddard | 2000 | MODIS L4 ocean data |
MOAPWBM2.P2.ADD2000321.002.2001034035718 | Goddard | 2000 | MODIS Ocean Level 4 data |
MOAPWBME.PAR.ADD2000321.002.2001034035728 | Goddard | 2000 | MODIS L4 ocean data |
MOD03.A2000106.1540.001.2000109075312 | Goddard | 2000 | MODIS radiometric geolocation |
MOD03.A2000110.0220.002.2000193195357 | Goddard | 2000 | MODIS radiometric geolocation |
MOD08_E3.A2000337.002.2001037044240 | Goddard | 2000 | Sample MODIS L3 |
C1986151201607.L2_BRS | PODAAC | 1986 | Coastal Zone Color (hourly--all data for May, 827 granules) |
C1986151201607.L2_GAC | PODAAC | 1986 | Coastal Zone Color (hourly--all data for May, 827 granules) |
f13_Tb_01220_01D | MSFC | 2001 | SSMI Brightness temperature
(All 29 passes, 1 day) |
f13_hn_01220_01D | MSFC | 2001 | SSMI geolocation
(All 29 passes, 1 day) |
f13_ln_01220_01D | MSFC | 2001 | SSMI geolocation
(All 29 passes, 1 day) |
Other MODIS data | |||
MOD04_L2.A2000242.0140.002.2000264223516
MOD04_L2.A2000243.1850.002.2000252164712 MOD05_L2.A2000243.1850.002.2000252164414 MOD06_L2.A2000243.1850.002.2000252173103 MOD35_L2.A2000243.1850.002.2000244222700 |