
David Warde-Farley wrote:
On 23-May-09, at 4:25 PM, Albert Thuswaldner wrote:
Actually my vision with pyhdf5io is to have hdf5 to replace numpy's own binary file format (.npy, npz). Pyhdf5io (or an incarnation of it) should be the standard (binary) way to store data in scipy/numpy. A bold statement, I know, but I think that it would be an improvement, especially for those users how are replacing Matlab with sicpy/numpy.
In that it introduces a dependency on pytables (and the hdf5 C library) I doubt it would be something the numpy core developers would be eager to adopt.
The npy and npz formats (as best I can gather) exist so that there is _some_ way of persisting data to disk that ships with numpy. It's not meant necessarily as the best way, or as an interchange format, just as something that works "out of the box", the code for which is completely contained within numpy.
It might be worth mentioning the limitations of numpy's built-in save(), savez() and load() in the docstrings and recommending more portable alternatives, though.
David
I tend to agree with David that PyTables is too big a dependency for inclusion in core Numpy. It does a lot more than simply loading and saving arrays. While I haven't tried Andrew Collette's h5py (http://code.google.com/p/h5py), it looks like a very 'thin' wrapper around the HDF5 C libraries. Maybe numpy's save(), savez(), load(), memmap() could be enhanced so that saving/loading files with HDF5-like file extensions used the HDF5 format, with code based on h5py and pyhdf5io. This could, I imagine, be a relatively small/simple addition to numpy, with the only external dependency being the HDF5 libraries themselves. Stephen