Fill Value and Dataset Storage Allocation Issues in HDF5

Quincey Koziol
koziol@ncsa.uiuc.edu
October 9, 2002

Document's Audience:
- Current H5 library designers and knowledgable external developers.
Introduction:
What is a fill-value?

A fill-value is the value retrieved for a dataset element in HDF5 when no application data has been written to that element. The fill-value may be stored explicitly in the dataset by HDF5 or it may be implied in some way.

What does a dataset storage allocation mean?

Dataset storage allocation is a term used to indicate that space in a file has been reserved for the raw data of a dataset.

How are fill-values and dataset storage allocation handled currently in HDF4 and HDF5?
1. HDF4
  
  These issues are specific to how the SD*() API functions operate in the latest version of HDF4, other portions of the HDF4 library may operate in different ways. Only the normal (i.e. "contiguous") and chunked storage methods are discussed, other storage methods (like external file storage, or linked-block storage) are treated as normal storage in HDF4.
  1. Dataset Storage Allocation
    Allocating space to store a dataset is deferred until the space is needed. Space is only needed when application data is written to a dataset. This allows for very large datasets to be defined, and if they are not written to, the file size can stay very small. This applies to both contiguous and chunked data.
  2. Fill-values
    
    Metadata
    Metadata documenting the fill-value is always written to a file. Either the default fill-value (of zero) or the user's fill-value is written as an attribute of the dataset.
    
    Reading
    If storage for the dataset or chunk is not allocated yet, the fill-value is used to fill the buffer to return to the application and the file data is not read.
    
    Writing
    Fill-values are only written to the dataset or chunk when the entire dataset or chunk is not going to be written in a single I/O request. For example: in a contiguously stored dataset, if a hyperslab in the middle of the dataset is written by the user (and this is the first piece of data to be written to the dataset), fill-values are written to the dataset and then the user's data is written in the hyperslab location. However, if the entire dataset is going to be written in one write call, then the fill-value writing step is skipped, since they would all be immediately over-written with the actual data.
    
    Note: Writing fill-values in HDF4 can be turned off completely by a user who either "knows" that they will be writing the entire dataset in successive calls, or who doesn't care about data outside the region(s) they are writing to in the dataset.
2. HDF5
  
  These issues apply to all datasets in HDF5. Only the contiguous and chunked storage methods are discussed, other storage methods (such as external file storage) are treated as contiguous storage in HDF5.
  1. Dataset Storage Allocation
    Space for contiguously stored data is always allocated during the creation of the dataset. Space for chunk stored data is allocated as needed, when data needs to be written to the portion of the dataset that the chunk occupies. (Except in the case of parallel I/O, where all the chunks for a dataset are allocated at creation time also).
  2. Fill-values
    
    Metadata
    Metadata documenting the fill-value for a dataset is only written out if the user explicitly set a fill-value for the dataset during creation. Although there is an implicit zero fill-value assumed for the dataset, this is not enforced or recorded.
    
    Reading
    Fill-values are only used for chunked storage datasets when an unallocated chunk is read from. Because contiguously stored data always allocates space in the file, the library assumes that there is always valid data to read for contiguous data.
    
    Writing
    Fill-values are only written to contiguously stored data when a dataset is created (and only if the user has set a fill-value). This occurs irregardless of how the fill-values will be overwritten by future writes to the dataset.
    
    Fill-values for chunked storage data are somewhat more controlled, they are written only when data is actually written to a particular chunk. This occurs irregardless of how the fill-values will be overwritten by future writes to the chunk.
Why these issues need to be faced now?

Although we've been aware of differences between the way storage space is allocated in a file and how fill-values are treated between HDF4 and HDF5 for a while, this hasn't been an especially burning problem that needed to be dealt with. Unfortunately, there is a bug with the way that memory for variable-length (VL) data is being leaked in the file when the data elements are overwritten, and it is tied to these storage and fill-value issues.

Currently, when VL data elements are over-written in a dataset, the space for the previous piece of VL data is not released to the file to be re-used, it is instead leaked and not reused. Because the previous value for the VL data would need to be read from the file dataset in order to be properly released, it ties in with the fill-values stored in the file. (For the current library design, since a heap ID is stored in the dataset for the location of the VL data, not the VL data itself, a heap ID set to all zeros is used to indicate that there is no VL data for a paticular location. So currently, the only valid fill-value for VL data is an all zero value, indicating that no VL data has been stored in the heap.)

If fill-values are not written to the file, then there is the potential for junk data to be read from the file as the VL data to be released and errors to occur. Currently, the library relies on the filesystem to zero-fill blocks allocated to the file when there is no fill-value set for the dataset. We've already seen this assumption break down under Win9x, where the OS does not zero-fill file blocks with zeros and users report "junk" in datasets which have been created, but not written to.

So, VL data requires valid fill-values to be present in the file in order to be certain that reading the VL data to be overwritten is valid and contains the correct information to either free the previous VL data (in the case of non-NULL valued VL data) or not to try to free the previous VL data (for NULL valued VL data). Having junk (or the potential for junk) in the data read from the file opens the possibility for corrupting data in the file if that junk data is used to try to free the previous VL data.

How do fill-values and VL datatypes interact?

Currently, the only valid fill-value for a VL datatype is specifying an all zero (0) valid, to indicate that there is no VL sequence for an element.

How do fill-values and composite datatypes interact?

When a fill-value is stored for a composite datatype (compound, array or variable-length), the value stored in the "new" fill-value header message (detailed below) is exactly the same how the values in the dataset elements are stored.

How do fill-values and compound datatypes interact?

It is possible to write to only one (out of potentially many) field in a compound dataset. This has no affect on the operation which fills in the elements of the dataset, as all the elements of a dataset have the fill-value written to them, then they are overwritten with the application values specified, which are only a part of each element in this case.
Design Goals:
- Provide a method for controlling when and how fill-values are written to a dataset.
- Provide a method for controlling when space is allocated for storing a dataset.
Primary Users:

Current HDF5 users

Existing HDF5 users who are storing VL data and re-writing that data will need to stop leaking file data. Existing HDF5 users who desire more control over how fill-values are written to their datasets and when space is allocated to store their raw date would benefit also.

New users

Additionally, there may be other users who have chosen not to use HDF5 due to the lack of the controls available, especially if they are currently using HDF4 and find these features important.
Requirements:
- The library's performance and stability should not be impacted as a result of these new features.
- Changes from these new features must operation correctly and efficiently in a parallel programming environment as well as a serial environment.
- Make as small of a set of changes to the HDF5 file format and programming API as possible by implementing this feature.

Proposed Changes to Library Behavior:

At the very minimum, to be able to fix the VL data memory leak, valid data should be available for all datasets with VL datatypes. This is handled by requiring a fill value to be written for all datasets with VL datatype. This means that calls to H5Dcreate with a datatype which contains a VL datatype (either directly, or as part of a compound or array datatype) and have set the fill-value to "undefined" will fail.

We can provide users with three properties to control the fill-value and allocation strategies of the library. They are "when to allocate space", "when to write the fill-value" and the actual fill-value to write.

Each property is described below:

When to allocate space:

Early - during dataset create call. Allocate storage for the dataset immediately when the dataset is created. Certain VFDs (like MPI-I/O and MPI-posix) require space to be allocated when a dataset is created, which will override the setting chosen by a user.
Late - during first write to dataset. Defer allocating space for storing the dataset until the dataset is written to. Choosing incremental allocation for compact dataset storage is an error.
Incremental - during first write to chunk. Defer allocating space for storing each chunk until the chunk is written to. Choosing incremental allocation for contiguous dataset storage is treated as late allocation. Choosing incremental allocation for compact dataset storage is an error.

Default - Allocate storage for the dataset as appropriate for the storage method and access method. The defaults are shown here:

	Serial I/O	Parallel I/O
Contiguous Storage	Late	Early
Chunked Storage	Incremental	Early
Compact Storage	Early	Early

When to write fill value:
1. Never - Fill value will never be written to dataset's storage.
2. Allocation - Fill value is written when space is allocated. This is the default for both chunked and contiguous data storage.
What fill value to write:
1. Undefined - no value stored.
2. Default - library defined. By default, the library defines a fill-value of all zero bytes (whatever that means for the datatype).
3. User-defined - user defined value.

By using these three properties, the library's behavior of fill value writing is listed in the table below during the dataset create-write-close cycle.

When to allocate space When to write fill value What fill value to write Library create-write-close behavior

Early Never ----- Library allocates space when dataset is created, but never writes fill value to dataset.

Late Never ----- Library allocates space when dataset is written to, but never writes fill value to dataset.

Incremental Never ----- Library allocates space when dataset or chunk (whichever is smallest unit of space) is written to, but never writes fill value to dataset or chunk.

----- Allocation undefined Error on creating dataset, dataset not created.

Early Allocation default or user-defined Allocate space for dataset when dataset is created. Write fill value (default or user-defined) to entire dataset when dataset is created.

Late Allocation default or user-defined Doesn't allocate space for dataset until user's data values are written to dataset. Write fill value to entire dataset before writing user's data value.

Incremental Allocation default or user-defined Doesn't allocate space for dataset until user's data values are written to dataset or chunk (whichever is smallest unit of space). Write fill value to entire dataset or chunk before writing user's data value.

----- stands for any value.

During the H5Dread function call, the library behavior depends on whether space has been allocated, whether fill value has been written to storage, how fill value is defined, and when to write fill value.

Is space allocated? What is the fill value? When to write fill value? Library read behavior

No undefined Allocation Error. Dataset can't exist, no data has been written, fill value isn't defined.

Never Error. Data doesn't exist, fill value isn't defined and therefore cannot be used to fill user's buffer.

default or user-defined ----- Fill user's buffer with fill value.

Yes undefined ----- Return data from storage(dataset), trash is possible if user has not written data to portion of dataset being read.

default or user-defined Never Return data from storage(dataset), trash is possible if user has not written data to portion of dataset being read.

default or user-defined Allocation Return data from storage(dataset).

----- stands for any value.

Implementation Plans:

The work outlined in the document is already finished and checked into the library. This document is describing the rationale for the changes and the exact changes implemented.
Changes Remaining:

None currently.
Advanced Features:

It may be possible in the future to specify valid VL information in the fill-value and have the library write that VL information to the file's global heap only once. Then all the references to that VL information in the dataset would share the same VL information, without excessive duplication of the VL information. Care must be taken if this is implemented, to correctly handle the reference counts necessary when re-writing dataset elements currently using the shared value.

It is possible to optimize the operation which writes the fill-value to a dataset, by only writing the fill-value to the elements which are not going to be overwritten by the applications first write to the dataset (when the fill time property is set to "allocation" and the space allocation property is set to "late"). This will improve performance in cases where the application is writing a significant portion of the dataset. Care must be taken if this is implemented, to correctly handle the cases when an application is only writing part of a compound datatype, however. Additionally, this has extra complexities in a parallel I/O environment, which would have to be carefully handled.

It is possible to optimize the operation which writes the fill-value to a dataset a bit more, by delaying writing the fill-values to the dataset until the dataset is closed. This could be done by using a selection to build up the regions of the dataset which have been written to and then only write the fill-values to the "inverse" of that region when the dataset is closed. Care must be taken if this is implemented, to correctly handle the cases when an application is only writing part of a compound datatype, however. Additionally, this has extra complexities in a parallel I/O environment, which would have to be carefully handled.

It is possible to optimize the fill-value I/O situation even further by never writing full fill-values to the dataset. Instead, the regions of the dataset which have been written to by the user are tracked with a selection and the selection is stored with the dataset in the file. Then, when an application attempted to read elements outside that region, the fill-values would be placed directly into the applications buffer, having never been actually stored in the file at all. Care must be taken if this is implemented, to correctly handle the cases when an application is only writing part of a compound datatype, however. Additionally, this has extra complexities in a parallel I/O environment, which would have to be carefully handled.
Alternate Approachs:

None proposed

When to allocate space	When to write fill value	What fill value to write	Library create-write-close behavior
Early	Never	-----	Library allocates space when dataset is created, but never writes fill value to dataset.
Late	Never	-----	Library allocates space when dataset is written to, but never writes fill value to dataset.
Incremental	Never	-----	Library allocates space when dataset or chunk (whichever is smallest unit of space) is written to, but never writes fill value to dataset or chunk.
-----	Allocation	undefined	Error on creating dataset, dataset not created.
Early	Allocation	default or user-defined	Allocate space for dataset when dataset is created. Write fill value (default or user-defined) to entire dataset when dataset is created.
Late	Allocation	default or user-defined	Doesn't allocate space for dataset until user's data values are written to dataset. Write fill value to entire dataset before writing user's data value.
Incremental	Allocation	default or user-defined	Doesn't allocate space for dataset until user's data values are written to dataset or chunk (whichever is smallest unit of space). Write fill value to entire dataset or chunk before writing user's data value.

Is space allocated?	What is the fill value?	When to write fill value?	Library read behavior
No	undefined	Allocation	Error. Dataset can't exist, no data has been written, fill value isn't defined.
Never	Error. Data doesn't exist, fill value isn't defined and therefore cannot be used to fill user's buffer.
default or user-defined	-----	Fill user's buffer with fill value.
Yes	undefined	-----	Return data from storage(dataset), trash is possible if user has not written data to portion of dataset being read.
default or user-defined	Never	Return data from storage(dataset), trash is possible if user has not written data to portion of dataset being read.
default or user-defined	Allocation	Return data from storage(dataset).

File Format Changes:

The changes in this document require two object header message changes. For the data storage layout message, the "address" value has been changed to express unallocated space. A "new" fill-value message has been added with new fields to store the information contained in the new properties.

The revised data storage layout message follows, with the only changes being in the description of the "Version" and "Address" fields:

Name:

Type:

Length:

Status:

Purpose:

The array can be stored in one contiguous area of the file. The layout requires that the size of the array be constant and does not permit chunking, compression, checksums, encryption, etc. The message stores the total size of the array and the offset of an element from the beginning of the storage area is computed as in C.

The array domain can be regularly decomposed into chunks and each chunk is allocated separately. This layout supports arbitrary element traversals, compression, encryption, and checksums, and the chunks can be distributed across external raw data files (these features are described in other messages). The message stores the size of a chunk instead of the size of the entire array; the size of the entire array can be calculated by traversing the B-tree that stores the chunk addresses.

Format:

byte	byte	byte	byte
Version	Dimensionality	Layout Class	Reserved
Reserved
Address
Dimension 0
Dimension 1
...

Description:

Field Name	Description
Version	A version number for the layout message. This document describes version two (2).
Dimensionality	An array has a fixed dimensionality. This field specifies the number of dimension size fields later in the message.
Layout Class	The layout class specifies how the other fields of the layout message are to be interpreted. A value of one (1) indicates contiguous storage while a value of two (2) indicates chunked storage. Other values will be defined in the future.
Address	For contiguous storage, this is the offset of the first byte of raw data information for the dataset. This offset may contain the value "HADDR_UNDEF" (-1) to indicate the storage space has not been allocated. For chunked storage this is the offset of the B-tree that is used to look up the offsets of the chunks.
Dimension 0...n	For contiguous storage the dimensions define the entire size of the array while for chunked storage they define the size of a single chunk.

The new fill-value message follows. This is a new object header message, designed to supercede the current fill-value message. The old fill-value message lacked a "Version" field and thus could not be changed to accomodate the new information to be stored. The old fill-value message will still be written out when appropriate, to facilitate forward compatibility with new files being read by old versions of the library.

Name:

Type:

Length:

Status:

Purpose:

Format:

byte	byte	byte	byte
Version	Space allocation time	Fill-value write time	Fill-value defined?
Size
Fill-Value

Description:

Field Name	Description
Version	A version number for the layout message. This document describes version one (1).
Space allocation time	When to allocate storage space. Specifies whether to allocate space when the dataset is created (a value of one (1)), or when application data is written to the dataset (a value of two (2)).
Fill-value write time	When to write fill-value to dataset. A value of zero (0) indicates never to write fill-values; a value of one (1) indicates to write fill value when storage space is allocated for the dataset.
Fill-value defined?	A value of zero (0) means the fill-value is undefined for this dataset; a value of one (1) indicates the fill-value is defined (either default or user-defined). If undefined, the "Size" field will have the value of zero and the "Fill-value" field will not exist.
Size	This the size of the "Fill-value" field in bytes.
Fill-Value	The actual fill-value. The fill-value is interpreted using the same datatype as for the dataset.

Changes to current API Calls:

Minor changes have been made to existing library API functions:
The following API calls have changed significantly:
New API Calls:

The following API calls have been implemented:

QAK:10/9/02

Fill Value and Dataset Storage Allocation Issues in HDF5

Quincey Koziol koziol@ncsa.uiuc.edu October 9, 2002

Document's Audience:

Introduction:

HDF4

HDF5

Design Goals:

Primary Users:

Requirements:

Proposed Changes to Library Behavior:

Implementation Plans:

Changes Remaining:

Advanced Features:

Alternate Approachs:

File Format Changes:

Changes to current API Calls:

New API Calls:

Quincey Koziol
koziol@ncsa.uiuc.edu
October 9, 2002