[Numpy-discussion] Designing a new storage format for numpy recarrays

Dag Sverre Seljebotn dagss at student.matnat.uio.no
Fri Oct 30 10:08:20 EDT 2009


Stephen Simmons wrote:
> P.S. Maybe this will be too much work, and I'd be better off sticking
> with Pytables.....

I can't judge that, but I want to share some thoughts (rant?):

 - Are you ready to not only write the code, but maintain it over years to
come, and work through nasty bugs, and think things through when people
ask for parallellism or obscure filesystem locking functionality or
whatnot?

 - Are you ready to finish even the last, boring "10%". Since there are
existing options in the same area you can't expect a growing userbase to
help you with the last "10%" (unlike projects in unexplored areas).

 - When you are done, are you sure that what you finally have will really
be leaner and easier to work with than the existing options (like
PyTables?).

If not, odds are the result will in the end only be used by yourself.
Simply writing the prototype is the easy part of the job!

Perhaps needless to say, my hunch would be to try to work with PyTables to
add what you miss there. There's a harder learning curve than writing
something from scratch, but not harder than what others will have with
something you write from scratch.

The advantage of hdf5 is that there's lot of existing tools for
inspecting, processing and sharing the data independent of NumPy (well, up
to propriotary compression; but that's hardly worse than the entire format
being propriotary).

Dag Sverre




More information about the NumPy-Discussion mailing list