On 2019-06-06, Tim Peters wrote:
Like now: if the size were passed in, obmalloc could test the size instead of doing the `address_in_range()` dance(*). But if it's ever possible that the size won't be passed in, all the machinery supporting `address_in_range()` still needs to be there, and every obmalloc spelling of malloc/realloc needs to ensure that machinery will work if the returned address is passed back to an obmalloc free/realloc spelling without the size.
We can almost make it work for GC objects, the use of obmalloc is quite well encapsulated. I think I intentionally designed the PyObject_GG_New/PyObject_GC_Del/etc APIs that way. Quick and dirty experiment is here: https://github.com/nascheme/cpython/tree/gc_malloc_free_size The major hitch seems my new gc_obj_size() function. We can't be sure the 'nbytes' passed to _PyObject_GC_Malloc() is the same as what is computed by gc_obj_size(). It usually works but there are exceptions (freelists for frame objects and tuple objects, for one) A nasty problem is the weirdness with PyType_GenericAlloc() and the sentinel item. _PyObject_GC_NewVar() doesn't include space for the sentinel but PyType_GenericAlloc() does. When you get to gc_obj_size(), you don't if you should use "nitems" or "nitems+1". I'm not sure how the fix the sentinel issue. Maybe a new type slot or a type flag? In any case, making a change like my git branch above would almost certainly break extensions that don't play nicely. It won't be hard to make it a build option, like the original gcmodule was. Then, assuming there is a performance boost, people can enable it if their extensions are friendly.
The "only"problem with address_in_range is that it limits us to a maximum pool size of 4K. Just for fun, I boosted that to 8K to see how likely segfaults really are, and a Python built that way couldn't even get to its first prompt before dying with an access violation (Windows-speak for segfault).
If we can make the above idea work, you could set the pool size to 8K without issue. A possible problem is that the obmalloc and gcmalloc arenas are separate. I suppose that affects performance testing.
We could eliminate the pool size restriction in many ways. For example, we could store the addresses obtained from the system malloc/realloc - but not yet freed - in a set, perhaps implemented as a radix tree to cut the memory burden. But digging through 3 or 4 levels of a radix tree to determine membership is probably significantly slower than address_in_range.
You are likely correct. I'm hoping to benchmark the radix tree idea. I'm not too far from having it working such that it can replace address_in_range(). Maybe allocating gc_refs as a block would offset the radix tree cost vs address_in_range(). If the above idea works, we know the object size at free() and realloc(), we don't need address_in_range() for those code paths. Regards, Neil