[Numpy-discussion] Change in memmap behaviour

Thouis (Ray) Jones thouis at gmail.com
Tue Jul 3 05:35:24 EDT 2012


On Mon, Jul 2, 2012 at 11:52 PM, Sveinung Gundersen <sveinugu at gmail.com> wrote:
>
> On 2. juli 2012, at 22.40, Nathaniel Smith wrote:
>
>> On Mon, Jul 2, 2012 at 6:54 PM, Sveinung Gundersen <sveinugu at gmail.com> wrote:
>>> [snip]
>>>
>>>
>>>
>>> Your actual memory usage may not have increased as much as you think,
>>> since memmap objects don't necessarily take much memory -- it sounds
>>> like you're leaking virtual memory, but your resident set size
>>> shouldn't go up as much.
>>>
>>>
>>> As I understand it, memmap objects retain the contents of the memmap in
>>> memory after it has been read the first time (in a lazy manner). Thus, when
>>> reading a slice of a 24GB file, only that part recides in memory. Our system
>>> reads a slice of a memmap, calculates something (say, the sum), and then
>>> deletes the memmap. It then loops through this for consequitive slices,
>>> retaining a low memory usage. Consider the following code:
>>>
>>> import numpy as np
>>> res = []
>>> vecLen = 3095677412
>>> for i in xrange(vecLen/10**8+1):
>>> x = i * 10**8
>>> y = min((i+1) * 10**8, vecLen)
>>> res.append(np.memmap('val.float64', dtype='float64')[x:y].sum())
>>>
>>> The memory usage of this code on a 24GB file (one value for each nucleotide
>>> in the human DNA!) is 23g resident memory after the loop is finished (not
>>> 24g for some reason..).
>>>
>>> Running the same code on 1.5.1rc1 gives a resident memory of 23m after the
>>> loop.
>>
>> Your memory measurement tools are misleading you. The same memory is
>> resident in both cases, just in one case your tools say it is
>> operating system disk cache (and not attributed to your app), and in
>> the other case that same memory, treated in the same way by the OS, is
>> shown as part of your app's resident memory. Virtual memory is
>> confusing...
>
> But the crucial difference is perhaps that the disk cache can be cleared by the OS if needed, but not the application memory in the same way, which must be swapped to disk? Or am I still confused?
>
> (snip)
>
>>>
>>> Great! Any idea on whether such a patch may be included in 1.7?
>>
>> Not really, if I or you or someone else gets inspired to take the time
>> to write a patch soon then it will be, otherwise not...
>>
>> -N
>
> I have now tried to add a patch, in the way you proposed, but I may have gotten it wrong..
>
> http://projects.scipy.org/numpy/ticket/2179

I put this in a github repo, and added tests (author credit to Sveinung)
https://github.com/thouis/numpy/tree/mmap_children

I'm not sure which branch to issue a PR request against, though.



More information about the NumPy-Discussion mailing list