[Numpy-discussion] How to limit the numpy.memmap's RAM usage?

braingateway braingateway at gmail.com
Sat Oct 23 15:12:13 EDT 2010


Charles R Harris
>
>
> On Sat, Oct 23, 2010 at 10:27 AM, braingateway <braingateway at gmail.com 
> <mailto:braingateway at gmail.com>> wrote:
>
>     Charles R Harris :
>     >
>     >
>     > On Sat, Oct 23, 2010 at 10:15 AM, Charles R Harris
>     > <charlesr.harris at gmail.com <mailto:charlesr.harris at gmail.com>
>     <mailto:charlesr.harris at gmail.com
>     <mailto:charlesr.harris at gmail.com>>> wrote:
>     >
>     >
>     >
>     >     On Sat, Oct 23, 2010 at 9:44 AM, braingateway
>     >     <braingateway at gmail.com <mailto:braingateway at gmail.com>
>     <mailto:braingateway at gmail.com <mailto:braingateway at gmail.com>>>
>     wrote:
>     >
>     >         David Cournapeau :
>     >
>     >             2010/10/23 braingateway <braingateway at gmail.com
>     <mailto:braingateway at gmail.com>
>     >             <mailto:braingateway at gmail.com
>     <mailto:braingateway at gmail.com>>>:
>     >
>     >
>     >                 Hi everyone,
>     >                 I noticed the numpy.memmap using RAM to buffer data
>     >                 from memmap files.
>     >                 If I get a 100GB array in a memmap file and
>     process it
>     >                 block by block,
>     >                 the RAM usage is going to increasing with the
>     process
>     >                 running until
>     >                 there is no available space in RAM (4GB), even
>     though
>     >                 the block size is
>     >                 only 1MB.
>     >                 for example:
>     >                 ####
>     >                 a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
>     >                 blocklen=1e5
>     >                 b=npy.zeros((len(a)/blocklen,))
>     >                 for i in range(0,len(a)/blocklen):
>     >                 b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
>     >                 ####
>     >                 Is there any way to restrict the memory usage in
>     >                 numpy.memmap?
>     >
>     >
>     >
>     >             The whole point of using memmap is to let the OS do the
>     >             buffering for
>     >             you (which is likely to do a better job than you in many
>     >             cases). Which
>     >             OS are you using ? And how do you measure how much
>     memory
>     >             is taken by
>     >             numpy for your array ?
>     >
>     >             David
>     >             _______________________________________________
>     >
>     >
>     >         Hi David,
>     >
>     >         I agree with you about the point of using memmap. That
>     is why
>     >         the behavior is so strange to me.
>     >         I actually measure the size of resident set (pink trace in
>     >         figure2) of the python process on Windows. Here I
>     attached the
>     >          result. You can see the  RAM  usage is definitely not file
>     >         system cache.
>     >
>     >
>     >     Umm, a good operating system will use *all* of ram for buffering
>     >     because ram is fast and it assumes you are likely to reuse data
>     >     you have already used once. If it needs some memory for
>     something
>     >     else it just writes a page to disk, if dirty, and reads in
>     the new
>     >     data from disk and changes the address of the page. Where
>     you get
>     >     into trouble is if pages can't be evicted for some reason. Most
>     >     modern OS's also have special options available for reading in
>     >     streaming data from disk that can lead to significantly faster
>     >     access for that sort of thing, but I don't think you can do that
>     >     with memmapped files.
>     >
>     >     I'm not sure how windows labels it's memory. IIRC, Memmaping a
>     >     file leads to what is called file backed memory, it is
>     essentially
>     >     virtual memory. Now, I won't bet my life that there isn't a
>     >     problem, but I think a misunderstanding of the memory
>     information
>     >     is more likely.
>     >
>     >
>     > It is also possible that something else in your program is hanging
>     > onto memory but without knowing a lot more it is hard to tell.
>     Are you
>     > seeing symptoms besides the memory graphs? It looks like you aren't
>     > running on windows, actually, so what OS are you running on?
>     >
>     > Chuck
>     >
>     ------------------------------------------------------------------------
>     >
>     >
>     Hi Chuck,
>
>     Thanks a lot for quick response. I do run following supper simple
>     script
>     on windows:
>
>     ####
>     a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
>     blocklen=1e5
>     b=npy.zeros((len(a)/blocklen,))
>     for i in range(0,len(a)/blocklen):
>     b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
>     ####
>     Everything became supper slow after python ate all the RAM.
>     By the way, I also tried Qt QFile::map() there is no problem at all...
>
>
> Hmm. Nothing looks suspicious. For reference, can you be specific 
> about the OS/version, python version, and numpy version?
>
> What happens if you simply do
> for i in range(0,len(a)/blocklen):
>      a[i*blocklen:(i+1)*blocklen].copy()
>
> Chuck
>
Hi Chuck,
Here is the versions:
print sys.version
2.6.5 (r265:79096, Mar 19 2010, 18:02:59) [MSC v.1500 64 bit (AMD64)]
print numpy.__version__
1.4.1
print sys.getwindowsversion()
(5, 2, 3790, 2, 'Service Pack 2')

Besides, a[i*blocklen:(i+1)*blocklen].copy() gave out the same result.

LittleBigBrain

-------------- next part --------------
A non-text attachment was scrubbed...
Name: numpyMemmapAvaRAM3.png
Type: image/png
Size: 19835 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20101023/9e087728/attachment.png>


More information about the NumPy-Discussion mailing list