
On 24-May-09, at 5:22 PM, Robert Kern wrote:
While I haven't tried Andrew Collette's h5py (http://code.google.com/p/h5py), it looks like a very 'thin' wrapper around the HDF5 C libraries. Maybe numpy's save(), savez(), load(), memmap() could be enhanced so that saving/loading files with HDF5- like file extensions used the HDF5 format, with code based on h5py and pyhdf5io. This could, I imagine, be a relatively small/simple addition to numpy, with the only external dependency being the HDF5 libraries themselves.
*libhdf5* is too big, not PyTables.
Yup. According to sloccount, numpy is roughly ~210,000 lines of code. The hdf5 library is ~385,000 lines. Including even a small part of libhdf5 would grow the code base significantly, and requiring it as a dependency isn't a good idea since libhdf5 can be tricky to build right. As Robert's design document for the NPY format says, one option would be to implement a minimal subset of the HDF5 protocol *from scratch* (that would be required for saving NumPy arrays as top-level leaf nodes, for example). This would also sidestep any tricky licensing issues (I don't know what the HDF5 license is in particular, I know it's fairly permissive but still might not be suitable for including any of it in NumPy). David