[Numpy-discussion] Catching out-of-memory error before it happens

Nathaniel Smith njs at pobox.com
Fri Jan 24 18:09:01 EST 2014

On Fri, Jan 24, 2014 at 10:29 PM, Chris Barker <chris.barker at noaa.gov> wrote:
> On Fri, Jan 24, 2014 at 8:25 AM, Nathaniel Smith <njs at pobox.com> wrote:
>> If your arrays are big enough that you're worried that making a stray copy
>> will ENOMEM, then you *shouldn't* have to worry about fragmentation - malloc
>> will give each array its own virtual mapping, which can be backed by
>> discontinuous physical memory. (I guess it's possible windows has a somehow
>> shoddy VM system and this isn't true, but that seems unlikely these days?)
> All I know is that when I push the limits with memory on a 32 bit Windows
> system, it often crashed out when I've never seen more than about 1GB of
> memory use by the application -- I would have thought that would be plenty
> of overhead.
> I also know that I've reached limits onWindows32 well before OS_X 32, but
> that may be because IIUC, Windows32 only allows 2GB per process, whereas
> OS-X32 allows 4GB per process.
>> Memory fragmentation is more a problem if you're allocating lots of small
>> objects of varying sizes.
> It could be that's what I've been doing....
>> On 32 bit, virtual address fragmentation could also be a problem, but if
>> you're working with giant data sets then you need 64 bits anyway :-).
> well, "giant" is defined relative to the system capabilities... but yes, if
> you're  pushing the limits of a 32 bit system , the easiest thing to do is
> go to 64bits and some more memory!

Oh, yeah, common confusion. Allowing 2 GiB of address space per
process doesn't mean you can actually practically use 2 GiB of
*memory* per process, esp. if you're allocating/deallocating a mix of
large and small objects, because address space fragmentation will kill
you way before that. The memory is there, there isn't anywhere to slot
it into the process's address space. So you don't need to add more
memory, just switch to a 64-bit OS.

On 64-bit you have oodles of address space, so the memory manager can
easily slot in large objects far away from small objects, and it's
only fragmentation within each small-object arena that hurts. A good
malloc will keep this overhead down pretty low though -- certainly
less than the factor of two you're thinking about.


More information about the NumPy-Discussion mailing list