Hi everyone, I was just wondering something: lately, I had to use the load function, to load arrays stored in npz files. During one session, I need to read quite a few times several files (or even the same files), for some model training. I however just found out that the batch processing I ran failed because of a "too many open files" problem. After checking, with lsof, it seems that the use of np.load(filename), where filename is a string (= path to the file), worked an unexpected way. When I do the following, in a ipython 0.11 session, with the --pylab option : In [1]: np.__version__ Out[1]: '1.6.1' In [2]: np.load Out[2]: <function numpy.lib.npyio.load> In [3]: struc = np.load('path/to/file.npz') In [4]: ar1 = struc['ar1'] I would expect to have opened a file, read the array in it, and closed it. However, 'lsof' proved me wrong, and I found out that I need to explicitly do 'struc.close()' in order to close the file. While this is not a big issue, since one has to do that anyway when opening/closing files, I was a bit surprised that I had to do it when using the load function. Maybe this behaviour should be made explicit in the documentation? (at least, in the help(load), nothing is said about that, as far as I could read). I just wanted to share that, just in case someone was crossing the same type of issues. I hope this is going to be useful for others! Best regards, Jean-Louis Durrieu
Hi Jean-Louis, On Sun, Oct 9, 2011 at 2:37 PM, Jean-Louis Durrieu <jean-louis@durrieu.ch> wrote:
Hi everyone,
I was just wondering something: lately, I had to use the load function, to load arrays stored in npz files.
During one session, I need to read quite a few times several files (or even the same files), for some model training. I however just found out that the batch processing I ran failed because of a "too many open files" problem.
After checking, with lsof, it seems that the use of np.load(filename), where filename is a string (= path to the file), worked an unexpected way. When I do the following, in a ipython 0.11 session, with the --pylab option :
In [1]: np.__version__ Out[1]: '1.6.1'
In [2]: np.load Out[2]: <function numpy.lib.npyio.load>
In [3]: struc = np.load('path/to/file.npz')
In [4]: ar1 = struc['ar1']
I would expect to have opened a file, read the array in it, and closed it. However, 'lsof' proved me wrong, and I found out that I need to explicitly do 'struc.close()' in order to close the file.
This is a documentation bug. If you look into the sources of load, you will see that in the case of zipfile, a NpzFile instance is returned by load. This is a file-like object, and needs to be closed. The rationale is that it enables lazy-loading (not all arrays are loaded in memory, only the one you request). So for now, closing the returned NpzFile instance is the correct solution. I added a note about this in the load doc, and a context manager to NpzFile so you can also do (python >= 2.5 only): with load('yo.npz') as data: .... cheers, David
participants (2)
-
David Cournapeau
-
Jean-Louis Durrieu