[Numpy-discussion] I/O documentation and code

Skipper Seabold jsseabold at gmail.com
Sat Jun 20 18:24:41 EDT 2009

On Sat, Jun 20, 2009 at 5:33 PM, Ralf
Gommers<ralf.gommers at googlemail.com> wrote:
> Hi,
> I'm working on the I/O documentation, and have a bunch of questions.
> 1. The npy/npz formats are documented in lib.format and in the NEP
> (http://svn.scipy.org/svn/numpy/trunk/doc/neps/npy-format.txt). Is
> lib.format the right place to add relevant parts of the NEP, or would doc.io
> be better? Or create a separate page (maybe doc.npy_format)? And is the .npz
> format fixed or still in flux?
> 2. Is the .npy format version number (now at 1.0) independent of the numpy
> version numbering, when is it incremented, and will it be backwards
> compatible?
> 3. For a longer coherent overview of I/O, does that go in doc.io or
> routines.io.rst?
> 4. This page http://www.scipy.org/Data_sets_and_examples talks about
> including data sets with scipy, has this happened? Would it be possible to
> include a single small dataset in numpy for use in examples?
> 5. DataSource contains a lot of TODOs and behavior that is documented as a
> bug in the docstring. Is anyone working on this? If not, I can give it a go.

This was proposed as a GSoC project and I went through it, but that's
about all I know.  I can't find my notes now, but here are some
thoughts off the top of my head.  The code is here for the record

> TODOs that need work, or at least a yes/no decision:
> 5a. .zip and .tar support (is .tar needed?)

Would these be trivial to implement?  And since the import overhead is
deferred until it's needed I don't see the harm in including the

> 5b. URLs only work if they include 'http://' (currently documented as a bug,
> which it not necessarily is. fix or document?)

I would say document, since we might have any number of protocols, so
it might not make sense to just default to http://

> 5c. _cache() does not handle compressed files, and should use
> shutils.copyfile

I never understood what this meant, but maybe I'm missing something.
If path is a compressed file then it is written to a local directory
as a compressed file.  What else does it need to handle?  Should it be
fetch archive, extract (single file or archive), cache locally?

> 5d. make abspath() more robust
> 5e. in open(), support for creating files and adding a 'subdir' parameter
> (needed?)

I would think there should be support for both of these.  I have some
rough scripts that I used for remote data fetching and I like it to
create a ./tmp directory and cache the file there and then clean up
after myself when I'm done.

> Does anyone have (self-contained) code using DataSource, or a suggestion for
> data on the web that can be used in examples?

I'm not sure if this is what you're after, but I've been using some of
these "classic published results" and there are some compressed


> Cheers,
> Ralf


More information about the NumPy-Discussion mailing list