[New-bugs-announce] [issue33607] Explicitly track object ownership (and allocator).

Tue May 22 15:19:38 EDT 2018

New submission from Eric Snow <ericsnowcurrently at gmail.com>:

When an object is created it happens relative to the current
thread (ergo interpreter) and the current allocator (part of
global state).  We do not track either of these details for
the object.  It may make sense to start doing so (reasons next).

Regarding tracking the interpreter, that originating interpreter
can be thought of as the owner.  Any lifecycle operations should
happen relative to that interpreter.  Furthermore, the object
should be used in C-API calls only in that interpreter (i.e.
when the current thread's Py_ThreadState belongs to that
interpreter).  This hasn't been an issue since currently all
interpreters in the process share the GIL, as well as the fact
that subinterpreters haven't been heavily used historically.
However, the possibility of no longer sharing the GIL suggests
that tracking the owning interpreter (and perhaps even other
"sharing" interpreters) would be important.  Furthermore,
in the last few years subinterpreters have seen increasing usage
(see Openstack Ceph), and knowing the originating interpreter
for an object can be useful there.  Regardless, even in the
single interpreter case knowing the owning interpreter is
important during runtime finalization (which is currently
slightly broken), which impacts CPython embedders.

Regarding the allocator, there used to be just a single global
one that the runtime used from start to finish.  Now the C-API
offers a way to switch the allocator, so there's no guarantee
that the right allocator is used in PyMem_Free().  This has
already had a negative impact on efforts to clean up CPython's
runtime initialization.  It also results in problems during
finalization.  Additionally, we are looking into moving the
allocator from the global runtime state to the per-interpreter
(or even per-thread or per-context) state value.  In that world
it would be essential to know which allocator was used when
creating the object.  There are other possible applications
based on knowing an object's allocator, but I'll stop there.

To sort all this out we would need to track per-object:

* originating allocator (pointer or id)
* owning interpreter (pointer or id)
* (possibly) "sharing" interpreters (linked list?)

Either we'd add 2 pointer-size fields to PyObject or we would
keep a separate hash table (or two) pointing from each object
to the info (similar to how we've considered doing for
refcounts).  To alleviate impact on the common case (not
embedded, single interpreter, same allocator), we could default
to not tracking interpreter/allocator and take a lookup failure
to mean "main interpreter, default allocator".

----------
messages: 317330
nosy: eric.snow, ncoghlan, vstinner
priority: normal
severity: normal
status: open
title: Explicitly track object ownership (and allocator).
versions: Python 3.8

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue33607>
_______________________________________