[Numpy-discussion] np.memmap and memory usage
Pauli Virtanen
pav at iki.fi
Wed Jul 1 05:14:39 EDT 2009
Wed, 01 Jul 2009 10:17:51 +0200, Emmanuelle Gouillart kirjoitti:
> I'm using numpy.memmap to open big 3-D arrays of Xray tomography
> data. After I have created a new array using memmap, I modify the
> contrast of every Z-slice (along the first dimension) inside a for loop,
> for a better visualization of the data. Although I call memmap.flush
> after each modification of a Z-slice, the memory used by Ipython keeps
> increasing at every new iteration. At the end of the loop, the memory
> used by Ipython is of the order of magnitude of the size of the data
> file (1.8Go !). I would have expected that the maximum amount of memory
> used would corresponde to only one Z-slice of the 3D array. See the code
> snapshots below for more details.
>
> Is this an expected behaviour? How can I reduce the amount of
> memory used by Ipython and still process my data?
How do you measure the memory used? Note that on Linux, "top" includes
the size of OS caches for the memmap in the RSS size of a process.
You can try to monitor "free" instead:
$ free
total used free shared buffers cached
Mem: 12300488 11485664 814824 0 642928 7960736
-/+ buffers/cache: 2882000 9418488
Swap: 7847712 2428 7845284
If the memory is used by OS caches, the "used" number on the "-/+ buffers/
cache" line should stay constant while the program runs.
In this case, what is most likely actually taking up memory is the OS
buffering the data in memory, before writing it to disk. Linux has at
least some system-wide parameters available that tune the aggressiveness
of data cachine. I suppose there may also be some file-specific settings,
but I have no idea what they are.
--
Pauli Virtanen
More information about the NumPy-Discussion
mailing list