[Python-Dev] The untuned tunable parameter ARENA_SIZE

Sat Jun 3 22:46:09 EDT 2017

For fun, let's multiply everything by 256:

- A "pool" becomes 1 MB.
- An "arena" becomes 64 MB.

As briefly suggested before, then for any given size class a pool
could pass out hundreds of times more objects before needing to fall
back on the slower code creating new pools or new arenas.

As an added bonus, programs would finish much sooner due to the flurry
of segfaults from Py_ADDRESS_IN_RANGE ;-)

But greatly increasing the pool size also makes a different
implementation of that much more attractive:  an obvious one.  That
is, obmalloc could model its address space with a bit vector, one bit
per pool-aligned address.  For a given address, shift it right by 20
bits (divide by 1MB) and use what remains as the bit vector index.  If
the bit is set, obmalloc manages that MB, else (or if the bit address
is out of the vector's domain) it doesn't.  The system page size would
become irrelevant to its operation, and it would play nice with
magical memory debuggers (it would never access memory obmalloc hadn't
first allocated and initialized itself).

A virtual address space span of a terabyte could hold 1M pools, so
would "only" need a 1M/8 = 128KB bit vector.  That's minor compared to
a terabyte (one bit per megabyte).

Of course using a bit per 4KB (the current pool size) is less
attractive - by a factor of 256.  Which is why that wasn't even tried.

Note that trying to play the same trick with arenas instead would be
at best more complicated.  The system calls can't be relied on to
return arena-aligned _or_ pool-aligned addresses.  obmalloc itself
forces pool-alignment of pool base addresses, by (if necessary)
ignoring some number of the leading bytes in an arena.  That makes
useful arithmetic on pool addresses uniform and simple.