[Numpy-discussion] numpy.lib.npyio.load

David Cournapeau cournape at gmail.com
Mon Oct 10 04:15:04 EDT 2011


Hi Jean-Louis,

On Sun, Oct 9, 2011 at 2:37 PM, Jean-Louis Durrieu
<jean-louis at durrieu.ch> wrote:
> Hi everyone,
>
> I was just wondering something: lately, I had to use the load function, to load arrays stored in npz files.
>
> During one session, I need to read quite a few times several files (or even the same files), for some model training. I however just found out that the batch processing I ran failed because of a "too many open files" problem.
>
> After checking, with lsof, it seems that the use of np.load(filename), where filename is a string (= path to the file), worked an unexpected way. When I do the following, in a ipython 0.11 session, with the --pylab option :
>
> In [1]: np.__version__
> Out[1]: '1.6.1'
>
> In [2]: np.load
> Out[2]: <function numpy.lib.npyio.load>
>
> In [3]: struc = np.load('path/to/file.npz')
>
> In [4]: ar1 = struc['ar1']
>
> I would expect to have opened a file, read the array in it, and closed it. However, 'lsof' proved me wrong, and I found out that I need to explicitly do 'struc.close()' in order to close the file.

This is a documentation bug. If you look into the sources of load, you
will see that in the case of zipfile, a NpzFile instance is returned
by load. This is a file-like object, and needs to be closed. The
rationale is that it enables lazy-loading (not all arrays are loaded
in memory, only the one you request).

So for now, closing the returned NpzFile instance is the correct
solution. I added a note about this in the load doc, and a context
manager to NpzFile so you can also do (python >= 2.5 only):

with load('yo.npz') as data:
    ....

cheers,

David



More information about the NumPy-Discussion mailing list