
Francesc Alted falted@pytables.org writes:
At the beginning, you will need to export your data to a PyTables file,
... which appears to be actually a HDF5 file. Thanks for the tip. It is clear that a binary file format would be more advantageous simply because text files are not seekable in the way needed for parallel reading. I was thinking of using NetCDF because OpenDX does not support HDF5. Konrad Hinsen has written a Python interface for reading NetCDF files. Distributed writing is more compilcated and unfortunately this interface seems particularly unsuitable for it because the difference between definition and data mode is hidden. The interface also uses Numeric instead of Numarray.
An advantage of HDF5 would be that the libraries support parallel I/O via MPI-IO but can this be utilised in PyTables? There is the problem that there are no standard MPI bindings for Python.
I have also considered writing Python bindings for Parallel-NetCDF but I suppose that would not be totally trivial even if the library turns out to be well Swiggable.