The HDF Group logo The HDF Group Documentation
 
 
Other HDF Links

HDF5 1.10 Documentation


New Features in HDF5 1.10
Additional New APIs

Collective Metadata I/O

Fine-tuning the Metadata Cache

File Space Management

Page Buffering

Partial Edge Chunks

Reference

SWMR

Virtual Datasets (VDS)
 

Virtual Dataset (VDS) Documentation
The purpose of this page is to briefly describe the new HDF5 Virtual Dataset (VDS) feature and provide a gateway to available documentation. The page includes the following sections:
 
Virtual Dataset Overview

With a growing amount of data in HDF5, the need has emerged to access data stored across HDF5 files using standard HDF5 objects, such as groups and datasets, without rewriting or rearranging the data.

While the ability to build hierarchical structures across existing HDF5 files has been available in HDF5 for quite some time through the mounting and external link features, the ability has not existed to present data stored in several HDF5 datasets and files as a single HDF5 dataset and to access the data via HDF5 APIs without rewriting and rearranging the data.

To address this, The HDF Group has implemented a new feature called the HDF5 Virtual Dataset (VDS).

The feature is a logical next step in the development of HDF5 that enables HDF5 users to access and work with data stored in a collection of HDF5 files using well-known tools and existing HDF5 applications and higher-level libraries such as h5py, MATLAB, and IDL without changing the way the data is collected and stored.

The following examples illustrate situations that will benefit from use of virtual datasets:

  • Synchrotron centers such as DLS and DESY will be generating and storing terabytes of experimental data per day in HDF5 files. Because of the nature of the experiments and hardware constraints, the data representing, for example, an X-ray image will be stored across different HDF5 datasets in multiple HDF5 files. With VDS, the whole image may be be accessed by an application without any specific knowledge of where data for each part of the image is stored.
  • Climatologists who study and analyze climate variations (temporal changes at a given location) will be able to use the VDS feature to describe and access “data rods” – data of interest stored in a series of HDF5 files organized by time stamps – without rewriting the data into new HDF5 file. The “data rods” will be accessible as a regular HDF5 dataset via their applications without any special knowledge “coded” into the applications.

 
Virtual Dataset User and Resource Documents
HDF5 VDS User’s Guide
(This document is not yet available.)

 

Until an HDF5 Virtual Dataset User’s Guide becomes available, users may find the following resources helpful:

HDF5 VDS Project (a Confluence wiki page)
Includes illustrations of various virtual dataset use cases with links to code examples and to this page.

RFC: HDF5 Virtual Dataset (PDF)
Includes several sections illustrating the use of virtual datasets (VDS) and discussing the VDS programming model, some feature constraints, and several use cases.

Note: The current version of this document reflects the design, strategies, and general approach employed in the VDS feature, but the API implementation had to be modified from the specification. An expected update will correct this divergence.
 

 

HDF5 Library APIs
 
New VDS Functions
H5Pset_virtual Sets the mapping between virtual and source datasets.
 
H5Pget_virtual_count Retrieves the number of mappings for the virtual dataset.
 
H5Pget_virtual_vspace Retrieves a dataspace identifier for the selection within the virtual dataset used in the mapping.
 
H5Pget_virtual_srcspace Retrieves a dataspace identifier for the selection within the source dataset used in the mapping.
 
H5Pget_virtual_dsetname Retrieves the name of a source dataset used in the mapping.
 
H5Pget_virtual_filename Retrieves the filename of a source dataset used in the mapping.
 
H5Pset_virtual_printf_gap Sets maximum number of missing source files and/or datasets with printf-style names when getting the extent of an unlimited virtual dataset.
 
H5Pget_virtual_printf_gap Returns maximum number of missing source files and/or datasets with printf-style names when getting the extent for an unlimited virtual dataset.
 
H5Pset_virtual_view Sets the view of the virtual dataset to include or exclude missing mapped elements.
 
H5Pget_virtual_view Retrieves the view of a virtual dataset.
 
Supporting Functions
H5Sis_regular_hyperslab Determines whether a hyperslab selection is regular.
 
H5Sget_regular_hyperslab    Retrieves a regular hyperslab selection.
 
Modified Functions
H5Pset_layout Specifies the layout to be used for a dataset.
Virtual dataset, H5D_VIRTUAL, has been added to the list of layouts available through this function.
 
H5Pget_layout Retrieves the layout in use for a dataset.
Virtual dataset, H5D_VIRTUAL, has been added to the list of layouts.
 
Expected Updates and Additional Documentation
The following additional documentation will be posted as it becomes available:
  • Update: “RFC: HDF5 Virtual Dataset” (see below).
    The current document reflects the design, strategies, and general approach employed in the VDS feature, but the API implementation had to be modified from the specification. The update will correct this divergence.
     
  • VDS User Guide material
     
  • Presentation materials describing the VDS feature
 
Tools
No new tools are necessary to examine or manipulate virtual datasets. Where necessary, existing HDF5 tools have been updated to be aware of the new properties, but tool operations on virtual datasets will be essentially transparent to the user.
 

Virtual Dataset Design
The Virtual Dataset design document below describes feature requirements, how the feature works, and why design choices were made.
 
RFC: HDF5 Virtual Dataset (PDF)   This document describes requirements that guided development of the Virtual Dataset (VDS) feature, feature constraints, several use cases, the VDS programming model, and some details of the implementation.

This document contains useful illustrations that provide an intuitive understanding of virtual datasets.

Note: The current version reflects the design, strategies, and general approach employed in the VDS feature, but the API implementation had to be modified from the specification. An expected update will correct this divergence.

 



The HDF Group Help Desk: The HDF Group Help Desk
Last modified: February 2017

  Copyright by The HDF Group.
All rights reserved.