[yt-users] yt moving to h5py

May 6, 2009

      Currently, the hdf5 operations for object serialization in yt are handled by
pytables.  However, pytables is not the only python hdf5 module out there.
For my own non-yt code, I have been using h5py (
http://code.google.com/p/h5py/).  For simple hdf5 i/o, I find h5py to be far
more intuitive and flexible than pytables.  If you want to do hdf5 in
python, I strongly recommend h5py over pytables.  It's for these reasons
that we're moving yt from using pytables as a data-serialization backend to
using h5py.

Apart from casual use, h5py offers additional benefits over pytables that
are relevant to yt.  h5py is faster and relies on fewer python object than
pytables.  h5py works better with yt in parallel and its design is generally
better suited to yt.  Switching dependencies is something we want to do as
little as possible, but it seems worth it in this case and will most likely
continue to pay off as yt grows. This evening, we will be committing the
switch from pytables to h5py dependency to the yt trunk.  We have altered
the install script accordingly so rerunning that will do everything that is
needed.

h5py can also be easily installed on it own.  Set the following environment
variables:
HDF5_DIR=path to hdf5
HDF5_API=16
Then do: sudo easy_install h5py
(You may not need to use sudo, depending on where easy_install is
installed.)

None of the yt function calls have changed.  Only the internal calls to
tables functionality have been replaced with h5py calls.  Additionally, it
should be noted that this does not affect any of the dataset i/o as this is
handled by Matt's specially built hdf5 reader.  So far, we have tested this
on various machines and it seems to be working.  If anyone encounters any
problems that may be related to this or has any problems installing h5py,
please contact us as soon as possible.

Britton Smith