On 2020-08-05 14:09, Nick Coghlan wrote:
Thanks for this write-up Petr - I think it would make a good informational PEP.
On Thu, 16 Jul 2020 at 20:39, Petr Viktorin <encukou@gmail.com> wrote:
Hello,
Several people asked me to put the long-term effort re. PEPs 489 and 573 in words, so I've written a document that I'd like to submit as an informational PEP and/or a HOWTO.
A rendered version is available at: https://hackmd.io/@encukou/module-state It's in Markdown: I've gotten so used to MD that translating to ReST once sounds easier than drafting in ReST. If you have a HackMD account and would like to co-author or fix typos directly, I can give you access.
Isolating Extension Modules
[snip]
### Non-goals: Speedups and the GIL
There is some effort to speed up CPython by making the GIL per-interpreter. While isolating interpreters helps that effort, defaulting to per-module state will be beneficial even if no speed-up is achieved.
Minor note: 'speedup up CPython on multi-core CPUs ...' would be a bit more explicit here.
OK. (It's been a while since I saw a single-core CPU capable of running CPython, though...)
[snip]
### Heap types
Traditionally, types defined in C code were static, that is,
static PyTypeObject
structures defined directly in code and initialized usingPyType_Ready()
.Such types are necessarily shared across the process. Sharing them between module objects requires paying attention to any state they own or access. To limit the possible issues, static types are immutable at the Python level: for example, you can't set
str.myattribute = 123
.[Note] Sharing truly immutable objects between interpreters is fine, as long as they don't provide access to mutable objects. But, every Python object has a mutable implementation detail: the reference count. Changes to the refcount are guarded by the GIL. Thus, code that shares any Python objects across interpreters implicitly depends on CPython's current, process-wide GIL.
An alternative to static types is *heap-allocated types*, or heap types for short. These correspond more closely to classes created by Python’s
class
statement.Heap types can be created by filling a
PyType_Spec
structure, a description or “blueprint” of a class, and callingPyType_FromModuleAndSpec()
to construct a new class object.This reminded me of an idea I had a while back that may help with migrating existing C static types to heap types: a
PyType_DEFINE_SPEC_FROM_STATIC_TYPE
macro that translated a static PyTypeObject declaration to a suitable PyType_Spec declaration at compile time. (It needs to be a macro to avoid potential problems with PyTypeObject changing size on newer interpreter versions)I think making the idea work in practice without dynamically allocating memory inside the macro would require modifying PyType_FromSpec (et al) to allow NULL in slot definitions though - that way the macro could define a static array that appropriately populated every slot available in either the defined LIMITED_API version, or else the Python version being used to compile the extension module, and the interpreter would just skip over the ones that had NULLs in them at runtime. It would also require the header file defining the macro to keep copies of all the iterations of
PyTypeObject
since the limited API was first defined, so it could appropriately cast any static type object pointers it was given.
To me, that sounds like ideas for an external library, not CPython itself. It would also conflict with the idea that NULL *undefines* the slot, like setting e.g. __hash__ to None.
Alternatively, it might be possible to write a script using pycparser that processes a C header or implementation file, and spits out appropriate PyType_Spec declarations for any PyTypeObject definitions it finds.
I think that's a better idea.
[note] Other functions, like
PyType_FromSpec()
, can also create heap types, butPyType_FromModuleAndSpec()
associates the module with the class, granting access to the module state to methods.The class should generally be stored in *both* the module state (for safe access from C) and the module's
__dict__
(for access from Python code).In these cases, it may make sense to offer an accessible-from-Python function that checks the module's internal state for consistency with the Python level state. Otherwise projects that try to inject transparent proxies around classes (e.g. some application instrumentation monitoring toolkits) will get weird behaviour, as C code will still see the original class, while Python code will see the wrapped one.
I'm not familiar with these proxies. How do they work? Can we test CPython somehow to make sure they don't break? If not, I claim that if you monkeypatch something and it breaks, you get to keep the pieces.
[snip]
### Per-Class scope
It is also not possible to attach state to *types*. While
PyHeapTypeObject
is a variable-size object (PyVarObject
), but its variable-size storage is currently consumed by slots. There will also be issues if several classes in an inheritance hierarchy need state.Perhaps mention that the simplest currently available option here is to do the same thing Python code does, and define a private attribute on the defining class, potentially prepending the class name to reduce the risk of name collisions?
I don't want to mention that because it'f unsafe: it'll break when that attribute is modified, which you can do trivially from Python.