[Python-Dev] Re: obmalloc (was Have a big machine and spare time? Here's a possible Python bug.)

7 Jun 2019

      On 2019-06-06, Tim Peters wrote:
...
Like now:  if the size were passed in, obmalloc could test the size
instead of doing the `address_in_range()` dance(*).  But if it's ever
possible that the size won't be passed in, all the machinery
supporting `address_in_range()` still needs to be there, and every
obmalloc spelling of malloc/realloc needs to ensure that machinery
will work if the returned address is passed back to an obmalloc
free/realloc spelling without the size.
We can almost make it work for GC objects, the use of obmalloc is
quite well encapsulated.  I think I intentionally designed the
PyObject_GG_New/PyObject_GC_Del/etc APIs that way.

Quick and dirty experiment is here:

    https://github.com/nascheme/cpython/tree/gc_malloc_free_size

The major hitch seems my new gc_obj_size() function.  We can't be
sure the 'nbytes' passed to _PyObject_GC_Malloc() is the same as
what is computed by gc_obj_size().  It usually works but there are
exceptions (freelists for frame objects and tuple objects, for one)

A nasty problem is the weirdness with PyType_GenericAlloc() and the
sentinel item.  _PyObject_GC_NewVar() doesn't include space for the
sentinel but PyType_GenericAlloc() does.  When you get to
gc_obj_size(), you don't if you should use "nitems" or "nitems+1".

I'm not sure how the fix the sentinel issue.  Maybe a new type slot
or a type flag?  In any case, making a change like my git branch
above would almost certainly break extensions that don't play
nicely.  It won't be hard to make it a build option, like the
original gcmodule was.  Then, assuming there is a performance boost,
people can enable it if their extensions are friendly.
...
The "only"problem with address_in_range is that it limits us to a
maximum pool size of 4K.  Just for fun, I boosted that to 8K to see
how likely segfaults really are, and a Python built that way couldn't
even get to its first prompt before dying with an access violation
(Windows-speak for segfault).
If we can make the above idea work, you could set the pool size to
8K without issue.  A possible problem is that the obmalloc and
gcmalloc arenas are separate.  I suppose that affects 
performance testing.
...
We could eliminate the pool size restriction in many ways.  For
example, we could store the addresses obtained from the system
malloc/realloc - but not yet freed - in a set, perhaps implemented as
a radix tree to cut the memory burden.  But digging through 3 or 4
levels of a radix tree to determine membership is probably
significantly slower than address_in_range.
You are likely correct. I'm hoping to benchmark the radix tree idea.
I'm not too far from having it working such that it can replace
address_in_range().  Maybe allocating gc_refs as a block would
offset the radix tree cost vs address_in_range().  If the above idea
works, we know the object size at free() and realloc(), we don't
need address_in_range() for those code paths.

Regards,

  Neil

[Python-Dev] Re: obmalloc (was Have a big machine and spare time? Here's a possible Python bug.)

Neil Schemenauer