New subject: Memory leak/fragmentation when using np.memmap

May 18, 2011

      Hello,
I need to process several large (~40 GB) files. np.memmap seems ideal for
this, but I have run into a problem that looks like a memory leak or memory
fragmentation. The following code illustrates the problem

import numpy as np

x = np.memmap('mybigfile.bin',mode='r',dtype='uint8')
print x.shape   # prints (42940071360,) in my case
ndat = x.shape[0]
for k in range(1000):
  y = x[k*ndat/1000:(k+1)*ndat/1000].astype('float32')  #The astype ensures
that the data is read in from disk
  del y

One would expect such a program would have a roughly constant memory
footprint, but in fact 'top' shows that the RES memory continually
increases. I can see that the memory usage is actually occurring because the
OS eventually starts to swap to disk. The memory usage does not seem to
correspond with the total size of the file.

Has anyone see this behavior? Is there a solution? I found this article:
http://pushingtheweb.com/2010/06/python-and-tcmalloc/ which sounds similar,
but it seems that the ~40 MB chunks I am loading would be using mmap anyway
so could be returned to the OS.

I am using nearly the latest version of numpy from the git repository
(np.__version__ returns 2.0.0.dev-Unknown), Python 2.7.1, and RHEL 5 on
x86_64.

I appreciate any suggestions.
Thanks,
Glenn

Memory leak/fragmentation when using np.memmap

G Jones

Pauli Virtanen

G Jones

Pauli Virtanen

Ralf Gommers

tags

participants (3)