[Import-SIG] Singleton modules (Re: PEP 489: Redesigning extension module loading)
Petr Viktorin
encukou at gmail.com
Fri Apr 24 12:19:25 CEST 2015
On 04/17/2015 12:33 PM, Petr Viktorin wrote:
> On 04/17/2015 08:51 AM, Stefan Behnel wrote:
>> Petr Viktorin schrieb am 16.04.2015 um 13:05:
>>> A wart I added is "singleton modules", necessary for
>>> "PyState_FindModule"-like functionality. I wouldn't mind not including
>>> this, but it would mean the new API can't replace all use cases of the
>>> old PyInit_.
>>>
>>> Singleton Modules
>>> -----------------
>>>
>>> Modules defined by PyModuleDef may be registered with PyState_AddModule,
>>> and later retrieved with PyState_FindModule.
>>>
>>> Under the new API, there is no one-to-one mapping between PyModuleSpec
>>> and the module created from it.
>>> In particular, multiple modules may be loaded from the same description.
>>
>> Is that because a single shared library (which is what the module spec
>> refers to, right?) can contain multiple modules? Or are you referring to
>> something else here?
>
> By using Loader.create_module/Loader.exec_module directly, you can load
> an extension module without adding it to sys.modules. You can do this as
> many times as you like, and you always get a new, independent module
> object.
>
>
>>> This means that there is no "global" instance of a module object.
>>> Any C-level callbacks that need access to the module state need to be
>>> passed
>>> a reference to the module object, either directly or indirectly.
>>>
>>>
>>> However, there are some modules that really need to be only loaded once:
>>> typically ones that wrap a C library with global state.
>>> These modules should set the PyModule_EXPORT_SINGLETON flag
>>> in PyModuleDesc.flags. When this flag is set, loading an additional
>>> copy of the module after it has been loaded once will return the
>>> previously
>>> loaded object.
>>> This will be done on a low level, using _PyImport_FixupExtensionObject.
>>> Additionally, the module will be automatically registered using
>>> PyState_AddSingletonModule (see below) after execution slots are
>>> processed.
>>>
>>> Singleton modules can be retrieved, registered or unregistered with
>>> the interpreter state using three new functions, which parallel their
>>> PyModuleDef counterparts, PyState_FindModule, PyState_AddModule,
>>> and PyState_RemoveModule::
>>>
[...]
>>
>> Yes, this is totally a wart. However, I'm not sure I understand the
>> actual
>> use case from what is written above. Could you clarify why this whole
>> special case is needed?
>
> Normally, you need to pass module object to any C-level callbacks that
> need the module state in any way. Since in the new scheme of things
> there may be multiple modules, you can't just attach a module object to
> interpreter state and then look it up.
>
> However, consider wrapping a C library with global state. The library
> might not allow you to pass arbitrary data to your callbacks, so there's
> no proper way to get to the module object.
> So you want to load the module only once, returning the same object when
> it's created from the same slots, and a way to get to your module object
> from anywhere. That's what Python does currently, with
> PyState_FindModule for finding the module.
>
> Well, that's the use case as I understand it.
>
> It would read a bit better if PyModule_Def is reused – in that variant
> it's a way to keep PyState_FindModule working. The more I look at this
> though, the more I see using PyState_FindModule as something that should
> just be discontinued when converting a module to the new API.
>
> Perhaps it'll be better to remove the flag; there's always a possibility
> to add it in the future.
The technical reason for EXPORT_SINGLETON is to allow PyState_FindModule.
I've looked into PyState_FindModule usage in the stdlib, and I think
that rather than adding support for it in PEP 489, use cases for it
should be removed, and I don't think good solutions are too related to
the loading mechanism.
In the meantime, modules that need PyState_FindModule can stick with
PyInit, and if it turns out it's really needed, the flag/slot can be
added at any time.
That's the tl;dr version; I'll give details on my reasoning later.
The upshot is: not all modules can be ported to PEP 489 (which would
become just the first iteration of the new module loading mechanism).
This insludes "_csv_ and "readline" which I picked to port as part of
the initial implementation, so I'll pick other ones.
(I did pick them precisely because they do complex things.)
The details:
I saw PyState_FindModule used in four scenarios.
First is for modules that wrap a library with global state, or rather
for C callbacks that don't get some state argument. For example, the
readline library's rl_startup_hook is called with no arguments, so the
wrapping module needs to look into global state to select Python code to
run.
This is where some kind of "singleton modules" would be useful. However,
only one readline module can work correctly in a given *process*. A
singleton mechanism would not only need prevent loading such a module
multiple times in an interpreter, but also, somehow, across
sub-interpreters.
Solving this requires designing how readline (and others) should behave
in the face of multiple interpreters. I don't think that is a job for
PEP 489, and – since using the PEP 489 mechanism is supposed to mean the
module does support subinterpreters – I now think providing singleton
module support in PEP 489 is, at best, premature.
If something like it does need to be added in the future, it will need
better semantics than my current proposal.
The second use of PyState_FindModule is in module-level functions, which
(in Python 3) get the module object as an argument. This is just a
holdover from Python 2 and can be fixed rather mechanically.
The third use is as a crutch: the module reference is not passed to
everything that needs it, so the stuff that needs it reaches out to
global state.
A problematic case is a method that needs to raise a module-specific
exception: _pickle.Unpickler.dumps waits to raise _pickle.Error.
Unfortunately, while a module's functions have a reference to the module
(m_self), the classes don't. (And it's rather difficult to store
arbitrary state on a class object; there's no m_size in
PyType_FromSpec). So methods pretty much need to peek into global state.
Here again, I think just allowing PyState_FindModule is not the proper
solution. It unnecessarily restricts the module to a singleton. Also, if
we ever get unloadable modules, it would become possible for a class to
outlive its module, at which point PyState_FindModule would start
failing. (And PyState_FindModule failures are usually fatal; the error
handling story around it isn't great).
I think the right solution would be to give classes a reference to their
module, as methods have now. And I think this isn't in scope for PEP
489. (But it is possibly in scope for the future class-initialization slot.)
The fourth use is sharing internal state with other modules. The _io
module is a bit special since it's always available; non-stdlib modules
should really use capsules for that.
More information about the Import-SIG
mailing list