[Numpy-discussion] Home for pyhdf5io?
Stephen Simmons
mail at stevesimmons.com
Sun May 24 08:23:22 EDT 2009
David Warde-Farley wrote:
> On 23-May-09, at 4:25 PM, Albert Thuswaldner wrote:
>> Actually my vision with pyhdf5io is to have hdf5 to replace numpy's
>> own binary file format (.npy, npz). Pyhdf5io (or an incarnation of it)
>> should be the standard (binary) way to store data in scipy/numpy. A
>> bold statement, I know, but I think that it would be an improvement,
>> especially for those users how are replacing Matlab with sicpy/numpy.
>>
> In that it introduces a dependency on pytables (and the hdf5 C
> library) I doubt it would be something the numpy core developers would
> be eager to adopt.
>
> The npy and npz formats (as best I can gather) exist so that there is
> _some_ way of persisting data to disk that ships with numpy. It's not
> meant necessarily as the best way, or as an interchange format, just
> as something that works "out of the box", the code for which is
> completely contained within numpy.
>
> It might be worth mentioning the limitations of numpy's built-in
> save(), savez() and load() in the docstrings and recommending more
> portable alternatives, though.
>
> David
>
I tend to agree with David that PyTables is too big a dependency for
inclusion in core Numpy. It does a lot more than simply loading and
saving arrays.
While I haven't tried Andrew Collette's h5py
(http://code.google.com/p/h5py), it looks like a very 'thin' wrapper
around the HDF5 C libraries. Maybe numpy's save(), savez(), load(),
memmap() could be enhanced so that saving/loading files with HDF5-like
file extensions used the HDF5 format, with code based on h5py and
pyhdf5io. This could, I imagine, be a relatively small/simple addition
to numpy, with the only external dependency being the HDF5 libraries
themselves.
Stephen
More information about the NumPy-Discussion
mailing list