[Python-Dev] The untuned tunable parameter ARENA_SIZE

INADA Naoki songofacandy at gmail.com
Thu Jun 1 05:37:17 EDT 2017

x86's hugepage is 2MB.
And some Linux enables "Transparent Huge Page" feature.

Maybe, 2MB arena size is better for TLB efficiency.
Especially, for servers having massive memory.

On Thu, Jun 1, 2017 at 4:38 PM, Larry Hastings <larry at hastings.org> wrote:
> When CPython's small block allocator was first merged in late February 2001,
> it allocated memory in gigantic chunks it called "arenas".  These arenas
> were a massive 256 KILOBYTES apiece.
> This tunable parameter has not been touched in the intervening 16 years.
> Yet CPython's memory consumption continues to grow.  By the time a current
> "trunk" build of CPython reaches the REPL prompt it's already allocated 16
> arenas.
> I propose we make the arena size larger.  By how much?  I asked Victor to
> run some benchmarks with arenas of 1mb, 2mb, and 4mb.  The results with 1mb
> and 2mb were mixed, but his benchmarks with a 4mb arena size showed
> measurable (>5%) speedups on ten benchmarks and no slowdowns.
> What would be the result of making the arena size 4mb?
> CPython could no longer run on a computer where at startup it could not
> allocate at least one 4mb continguous block of memory.
> CPython programs would die slightly sooner in out-of-memory conditions.
> CPython programs would use more memory.  How much?  Hard to say.  It depends
> on their allocation strategy.  I suspect most of the time it would be < 3mb
> additional memory.  But for pathological allocation strategies the
> difference could be significant.  (e.g: lots of allocs, followed by lots of
> frees, but the occasional object lives forever, which means that the arena
> it's in can never be freed.  If 1 out of ever 16 256k arenas is kept alive
> this way, and the objects are spaced out precisely such that now it's 1 for
> every 4mb arena, max memory use would be the same but later stable memory
> use would hypothetically be 16x current)
> Many programs would be slightly faster now and then, simply because we call
> malloc() 1/16 as often.
> What say you?  Vote for your favorite color of bikeshed now!
> /arry
