[Python-Dev] The untuned tunable parameter ARENA_SIZE

Gregory P. Smith greg at krypto.org
Mon Jun 5 16:53:57 EDT 2017

On Fri, Jun 2, 2017 at 12:33 PM Larry Hastings <larry at hastings.org> wrote:

> On 06/02/2017 02:46 AM, Victor Stinner wrote:
> I would be curious of another test: use pymalloc for objects larger
> than 512 bytes. For example, allocate up to 4 KB?
> In the past, we already changed the maximum size from 256 to 512 to
> support most common Python objects on 64-bit platforms. Since Python
> objects contain many pointers: switching from 32 bit to 64 bit can
> double the size of the object in the worst case.
> You've already seen Tim Peters' post about why we must leave pool size set
> to 4k.  Obviously This in turn means using obmalloc for larger objects will
> mean more and more wasted memory.
> For example, let's say we use obmalloc for allocations of 2048 bytes.
> Pool size is 4096 bytes, and there's a 48-byte "pool_header" structure on
> the front (on 64-bit platforms, if I counted right).  So there are only
> 4048 bytes usable per pool.  After the first 2048 allocation, we're left
> with 2000 bytes at the end.  You can't use that memory for another
> allocation class, that's impossible given obmalloc's design.  So that 2000
> bytes is just wasted.
> Currently obmalloc's maximum allocation size is 512 bytes; after 7
> allocations, this leaves 464 wasted bytes at the end.  Which isn't *great*
> exactly but it's only 11% of the overall allocated memory.
> Anyway, I'm not super excited by the prospect of using obmalloc for larger
> objects.  There's an inverse relation between the size of allocation and
> the frequency of allocation.  In Python there are lots of tiny allocations,
> but fewer and fewer as the size increases.  (A similarly-shaped graph to
> what retailers call the "long tail".)  By no small coincidence, obmalloc is
> great at small objects, which is where we needed the help most.  Let's
> leave it at that.
> A more fruitful endeavor might be to try one of these fancy new
> third-party allocators in CPython, e.g. tcmalloc, jemalloc.  Try each with
> both obmalloc turned on and turned off, and see what happens to performance
> and memory usage.  (I'd try it myself, but I'm already so far behind on
> watching funny cat videos.)

FYI - in CPython using a different malloc instead of CPython's own obmalloc
is effectively a simple addition of ~three lines of code to something
thanks to tracemalloc:

    PyMemAllocator pma;
    PyMem_GetAllocator(PYMEM_DOMAIN_RAW, &pma);
    PyMem_SetAllocator(PYMEM_DOMAIN_OBJ, &pma);

That sets the object allocator (normally obmalloc) to be the "system"
malloc which I assume you are having your alternate-malloc (tcmalloc,
jemalloc?, etc..) override.

As to where exactly I'd have to walk through the code... I see that it was
just refactored beyond what I used to recognize as part of cleaning up
interpreter startup.  (yay!) :)

For CPython at large, I don't want us to be in the business of shipping a
malloc implementation (any more than we already do). But it does seem worth
making what we've got tune-able without having to recompile to do it.

I liked the environment variable arena size setting idea. But there are
likely more things that could be offered up for tuning by people deploying
applications who have tested things to determine what is best for their
specific needs.

A note about OS/HW page sizes: While hugepages and transparent hugepages
*can* be a large performance increase due to less TLB cache misses, they
don't fit every application. They can be painful for anyone who depends on
a fork()'ed workers application model as a copy on write for a 2mb or 1gb
page due to a single refcount update is... costly.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20170605/ff010ed8/attachment.html>

More information about the Python-Dev mailing list