Description of the three basic performance tools

iopipe

Times reads and writes to an HDF5 2-d dataset and compares that with reads and writes using POSIX I/O. Reports seven measurements in terms of CPU user time, CPU system time, elapsed time, and bandwidth:

fill raw: time it takes to memset() a buffer.
fill hdf5: time it takes to read from a dataset never written
out raw: time it takes to write using POSIX I/O
out hdf5: time it takes to write using H5Dwrite()
in raw: time it takes to read data just written using POSIX I/O
in hdf5: time it takes to H5Dread() data written with H5Dwrite()
in hdf5 partial: time it takes to H5Dread() the "center" area.

This is a pretty stupid performance test. It accesses the same area of file and memory over and over and the file size is way too small. But it is good at showing how much overhead there is in the library itself.

chunk

Determines how efficient the raw data cache is for various access patterns of a chunked dataset, both reading and writing. The access pattern is either (a) we access the entire dataset by moving a window across and down a 2-d dataset in row-major order a full window height and width at a time, or (b) we access part of a dataset by moving the window diagonally from the (0,0) corner to the opposite corner by half the window height and width at a time. The window is measured in terms of the chunk size.

The result is:
A table written to stdout that contains the window size as a fraction of the chunk size and the efficiencey of the cache (i.e., number of bytes accessed by H5Dread() or H5Dwrite() divided by the number of bytes of the dataset actually read or written by lower layers.

A gnuplot script and data files which can be displayed by running gnuplot and typing the command `load "x-gnuplot"'.

overhead

Measures the overhead used by the B-tree for indexing chunked datasets. As data is written to a chunked dataset the B-tree grows and its nodes get split. When a node splits one of three ratios are used to determine how many items from the original node go into the new left and right nodes, and these ratios affect the total size of the B-tree in a way that depends on the order that data is written to the dataset.

Invoke as `overhead usage' for more information.


Robb Matzke
Last modified: Jun 4, 2003