[Python-Dev] The untuned tunable parameter ARENA_SIZE

Fri Jun 2 16:05:21 EDT 2017

On 06/02/2017 02:38 AM, Antoine Pitrou wrote:
> I hope those are not the actual numbers you're intending to use ;-)
> I still think that allocating more than 1 or 2MB at once would be
> foolish.  Remember this is data that's going to be carved up into
> (tens of) thousands of small objects.  Large objects eschew the small
> object allocator (not to mention that third-party libraries like Numpy
> may be using different allocation routines when they allocate very
> large data).

Honest, I'm well aware of what obmalloc does and how it works.  I bet 
I've spent more time crawling around in it in the last year than anybody 
else on the planet.  Mainly because it works so well for CPython, nobody 
else needed to bother!

I'm also aware, for example, that if your process grows to consume 
gigabytes of memory, you're going to have tens of thousands of allocated 
arenas.  The idea that on systems with gigabytes of memory--90%+? of 
current systems running CPython--we should allocate memory forever in 
256kb chunks is faintly ridiculous.  I agree that we should start small, 
and ramp up slowly, so Python continues to run well on small computers 
and not allocate tons of memory for small programs.  But I also think we 
should ramp up *ever*, for programs that use tens or hundreds of megabytes.

Also note that if we don't touch the allocated memory, smart modern OSes 
won't actually commit any resources to it.  All that happens when your 
process allocates 1GB is that the OS changes some integers around.  It 
doesn't actually commit any memory to your process until you attempt to 
write to that memory, at which point it gets mapped in in 
local-page-size chunks (4k? 8k? something in that neighborhood and 
power-of-2 sized).  So if we allocate 32mb, and only touch the first 
1mb, the other 31mb doesn't consume any real resources.  I was planning 
on making the multi-arena code only touch memory when it actually needs 
to, similarly to the way obmalloc lazily consumes memory inside an 
allocated pool (see the nextoffset field in pool_header), to take 
advantage of this ubiquitous behavior.

If I write this multi-arena code, which I might, I was thinking I'd try 
this approach:

  * leave arenas themselves at 256k
  * start with a 1MB multi-arena size
  * every time I allocate a new multi-arena, multiply the size of the
    next multi-arena by 1.5 (rounding up to 256k each time)
  * every time I free a multi-arena, divide the size of the next
    multi-arena by 2 (rounding up to 256k each time)
  * if allocation of a multi-arena fails, use a binary search algorithm
    to allocate the largest multi-arena possible (rounding up to 256k at
    each step)
  * cap the size of multi arenas at, let's say, 32mb

So multi-arenas would be 1mb, 1.5mb, 2.25mb, 3.5mb (round up!), etc.

Fun fact: Python allocates 16 arenas at the start of the program, just 
to initialize obmalloc.  That consumes 4mb of memory.  With the above 
multi-arena approach, that'd allocate the first three multi-arenas, 
pre-allocating 19 arenas, leaving 3 unused.  It's *mildly* tempting to 
make the first multi-arena be 4mb, just so this is exactly right-sized, 
but... naah.

//arry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20170602/71562c56/attachment.html>