[Python-Dev] Re: radix tree arena map for obmalloc

18 Jun 2019

      And one more random clue.

The memcrunch.py attached to the earlier-mentioned bug report does
benefit a lot from changing to a "use the arena with the smallest
address" heuristic, leaving 86.6% of allocated bytes in use by objects
at the end (this is with the arena-thrashing fix, and the current
256K/4K arena/pool sizes).  Which isn't surprising, else the person
who opened that report wouldn't have attached that specific script ;-)

However, when I increase its number of starting objects by a factor of
200, changing the heuristic barely helped at all.  It managed to
reclaim a few dozen more arenas by the end (out of 13,556 total arenas
allocated), but left only 56.1% of arena space in use by objects.

For the later program I posted, which by accident-rather-than-design
is even worse for arena cycling, changing the heuristic didn't help at
all.  Either way, no arenas were released (although one empty arena is
retained in the usable_arenas list), and less than 10% of arena space
was in use at the end of phase 10.

A few things to note, then:

- For truly effective RAM releasing, we would almost certainly need to
make major changes, to release RAM at an OS page level.   256K arenas
were already too fat a granularity.

- Whether a program benefits from what we do now appears to rely by
far most on its patterns of allocation/deallocation, rather than on
the arena-releasing heuristic OR on the arena size.

- While the by-lowest-address heuristic is obviously more effective in
one known artificial program so far ;-) , it's more expensive than
what we do now.  That's a full-blown sorting problem.  Ordering by
number of free pools is not:  that's essentially a bucket-distribution
problem, since the number of _possible_ values is small.  The data
structure changes I already checked in truly take constant time, in
all cases, to restore sorted order when an arena's number of free
pools changes.  When sorting by address, the best that can be done in
general is to take O(log(A)) time to insert a new arena (where A is
the number of arenas).  And a linked list is no good for that either
(unless, e.g., it's made fancier, like a skip list).

The quick experiment I tried used one-at-time search to put an arena
in the by-address-ordered list (the submitted patch also did), and we
already know from experience that can become a timing disaster when
hundreds of thousands of arenas are in use.  OTOH, that only needs to
be done once each time an arena is added to usable_arenas, not every
time an arena's number of free pools changes.

So, overall:

- I'm not inclined to pursue changing the arena-freeing heuristic.

- Changes to keep track of OS page-level use would likely require
starting over from scratch.  But would also make using far larger
pools practical (on 64-bit boxes).

- I still haven't found a program where changing from 256K to 1M
arenas makes a lick of appreciable difference in arena recycling
effectiveness:  for a given program chewing on a given problem size,
"does pretty well" or "does horribly" has been the outcome either way.

[Python-Dev] Re: radix tree arena map for obmalloc

Tim Peters