[Numpy-discussion] fast numpy i/o

Derek Homeier derek at astro.physik.uni-goettingen.de
Mon Jun 27 12:17:45 EDT 2011


On 21.06.2011, at 8:35PM, Christopher Barker wrote:

> Robert Kern wrote:
>> https://raw.github.com/numpy/numpy/master/doc/neps/npy-format.txt
> 
> Just a note. From that doc:
> 
> """
>     HDF5 is a complicated format that more or less implements
>     a hierarchical filesystem-in-a-file.  This fact makes satisfying
>     some of the Requirements difficult.  To the author's knowledge, as
>     of this writing, there is no application or library that reads or
>     writes even a subset of HDF5 files that does not use the canonical
>     libhdf5 implementation.
> """
> 
> I'm pretty sure that the NetcdfJava libs, developed by Unidata, use 
> their own home-grown code. netcdf4 is built on HDF5, so that qualifies 
> as "a library that reads or writes a subset of HDF5 files". Perhaps 
> there are lessons to be learned there. (too bad it's Java)
> 
> """
>     Furthermore, by
>     providing the first non-libhdf5 implementation of HDF5, we would
>     be able to encourage more adoption of simple HDF5 in applications
>     where it was previously infeasible because of the size of the
>     library.
> """
> 
> I suppose this point is still true -- a C lib that supported a subset of 
> hdf would be nice.
> 
> That being said, I like the simplicity of the .npy format, and I don't 
> know that anyone wants to take any of this on anyway.

Some late comments on the note (I was a bit surprised that HDF5 installation seems to be a serious hurdle to many - maybe I've just been profiting from the fink build system for OS X here - but I also was not aware that the current netCDF is built on downwards-compatibility to the HDF5 standard, something useful learnt again...:-)

Some more confusion arose when finding that the NCAR netCDF includes C and Fortran versions:
http://www.unidata.ucar.edu/software/netcdf/
but they also depend actually on HDF5 for netCDF 4 access. While the Java version appears not to, it also only provides *read* access to those formats, so it probably would not be of that much help anyway. 

The netCDF4-Python package mentioned before  
http://code.google.com/p/netcdf4-python/
unfortunately builds on HDF5 again, same for the PyNIO module 
http://www.pyngl.ucar.edu/Nio.shtml
which is probably explained by the above dependencies. 

Finally, the former Scientific.IO NetCDF interface is now part of scipy.io, but I assume it only supports netCDF 3 (the documentation is not specific about that). This might be the easiest option for a portable data format (if Matlab supports it). 

Cheers,
							Derek




More information about the NumPy-Discussion mailing list