[Python-Dev] Pre-PEP: Redesigning extension modules

Sat Aug 31 18:49:56 CEST 2013

Oops, had a draft from a few days ago that I was interrupted before
sending. Finished editing the parts I believe are still relevant.

On 25 Aug 2013 21:56, "Stefan Behnel" <stefan_ml at behnel.de> wrote:
>
> Nick Coghlan, 24.08.2013 23:43:
> > On 25 Aug 2013 01:44, "Stefan Behnel" wrote:
> >> Nick Coghlan, 24.08.2013 16:22:
> >>> The new _PyImport_CreateAndExecExtensionModule function does the heavy
> >>> lifting:
> >>>
> >>>
https://bitbucket.org/ncoghlan/cpython_sandbox/src/081f8f7e3ee27dc309463b48e6c67cf4880fca12/Python/importdl.c?at=new_extension_imports#cl-65
> >>>
> >>> One key point to note is that it *doesn't* call
> >>> _PyImport_FixupExtensionObject, which is the API that handles all the
> >>> PEP 3121 per-module state stuff. Instead, the idea will be for modules
> >>> that don't need additional C level state to just implement
> >>> PyImportExec_NAME, while those that *do* need C level state implement
> >>> PyImportCreate_NAME and return a custom object (which may or may not
> >>> be a module subtype).
> >>
> >> Is it really a common case for an extension module not to need any C
level
> >> state at all? I mean, this might work for very simple accelerator
modules
> >> with only a few stand-alone functions. But anything non-trivial will
> >> almost
> >> certainly have some kind of global state, cache, external library,
etc.,
> >> and that state is best stored at the C level for safety reasons.

In my experience, most extension authors aren't writing high performance C
accelerators, they're exposing an existing C API to Python. It's the cffi
use case rather than the Cython use case.

My primary experience of C extensions is with such wrapper modules, and for
those, the exec portion of the new API is exactly what you want. The
components of the wrapper module don't share global state, they just
translate between Python and a pre-existing externally stateless C API.

For that use case, a precreated module to populate with types and functions
is exactly what you want to keep things simple and stateless at the C level.

> > I'd prefer to encourage people to put that state on an exported *type*
> > rather than directly in the module global state. So while I agree we
need
> > to *support* C level module globals, I'd prefer to provide a simpler
> > alternative that avoids them.
>
> But that has an impact on the API then. Why do you want the users of an
> extension module to go through a separate object (even if it's just a
> singleton, for example) instead of going through functions at the module
> level? We don't currently encourage or propose this design for Python
> modules either. Quite the contrary, it's extremely common for Python
> modules to provide most of their functionality at the function level. And
> IMHO that's a good thing.

Mutable module global state is always a recipe for obscure bugs, and not
something I will ever let through code review without a really good
rationale. Hidden process global state is never good, just sometimes a
necessary evil.

However, keep in mind my patch is currently just the part I can implement
without PEP 451 module spec objects. Once those are available, then I can
implement the initial hook that supports returning a completely custom
object.

> Note that even global functions usually hold state, be it in the form of
> globally imported modules, global caches, constants, ...

If they can be shared safely across multiple instances of the module (e.g.
immutable constants), then these can be shared at the C level. Otherwise, a
custom Python type will be needed to make them instance specific.

> > We also need the create/exec split to properly support reloading. Reload
> > *must* reinitialize the object already in sys.modules instead of
inserting
> > a different object or it completely misses the point of reloading
modules
> > over deleting and reimporting them (i.e. implicitly affecting the
> > references from other modules that imported the original object).
>
> Interesting. I never thought of it that way.
>
> I'm not sure this can be done in general. What if the module has threads
> running that access the global state? In that case, reinitialising the
> module object itself would almost certainly lead to a crash.

My current proposal on import-sig is to make the first hook
"prepare_module", and pass in the existing object in the reload case. For
the extension loader, this would be reflected in the signature of the C
level hook as well, so the module could decide for itself if it supported
reloading.

> And what if you do "from extmodule import some_function" in a Python
> module? Then reloading couldn't replace that reference, just as for normal
> Python modules. Meaning that you'd still have to keep both modules
properly
> alive in order to prevent crashes due to lost global state of the imported
> function.
>
> The difference to Python modules here is that in Python code, you'll get
> some kind of exception if state is lost during a reload. In C code, you'll
> most likely get a crash.

Agreed. This is actually my primary motivation for trying to improve the
"can this be reloaded or not?" aspects of the loader API in PEP 451.

>
> How would you even make sure global state is properly cleaned up? Would
you
> call tp_clear() on the module object before re-running the init code? Or
> how else would you enable the init code to do the right thing during both
> the first run (where global state is uninitialised) and subsequent runs
> (where global state may hold valid state and owned Python references)?

Up to the module. For Python modules, we just blindly overwrite things and
let the GC sort it out.

(keep in mind existing extension modules using the existing API will still
never be reloaded)

>
> Even tp_clear() may not be enough, because it's only meant to clean up
> Python references, not C-level state. Basically, for reloading to be
> correct without changing the object reference, it would have to go all the
> way through tp_dealloc(), catch the object at the very end, right before
it
> gets freed, and then re-initialise it.
>
> This sounds like we need some kind of indirection (as you mentioned
above),
> but without the API impact that a separate type implies. Simply making
> modules an arbitrary extension type, as I proposed, cannot solve this.
>
> (Actually, my intuition tells me that if it can't really be made to work
> 100% for Python modules, e.g. due to the from-import case, why bother with
> it for extension types?)

To fix testing the C implementation of etree using the same model we use
for other extension modules (that's loading a second copy rather than
reloading in place, but the problems are related).

>
>
> >>> Such modules can still support reloading (e.g.
> >>> to pick up reloaded or removed module dependencies) by providing
> >>> PyImportExec_NAME as well.
> >>>
> >>> (in a PEP 451 world, this would likely be split up as two separate
> >>> functions, one for create, one for exec)
> >>
> >> Can't we just always require extension modules to implement their own
> >> type?
> >> Sure, it's a lot of boiler plate code, but that could be handled by a
> >> simple C code generator or maybe even a copy&paste example in the
docs. I
> >> would like to avoid making it too easy for users in the future to get
> >> anything wrong with reloading or sub-interpreters. Most people won't
test
> >> these things for their own code and the harder it is to make them not
> >> work,
> >> the more likely it is that a given set of dependencies will properly
work
> >> in a sub-interpreter.
> >>
> >> If users are required to implement their own type, I think it would be
> >> more
> >> obvious where to put global module state, how to define functions (i.e.
> >> module methods), how to handle garbage collection at the global module
> >> level, etc.
> >
> > Take a look at the current example - everything gets stored in the
module
> > dict for the simple case with no C level global state.
>
> Well, you're storing types there. And those types are your module API. I
> understand that it's just an example, but I don't think it matches a
common
> case. As far as I can see, the types are not even interacting with each
> other, let alone doing any C-level access of each other. We should try to
> focus on the normal case that needs C-level state and C-level field access
> of extension types. Once that's solved, we can still think about how to
> make the really simple cases simpler, if it turns out that they are not
> simple enough.

Our experience is very different - my perspective is that the normal case
either eschews C level global state in the extension module, because it
causes so many problems, or else just completely ignores subinterpreter
support and proper module cleanup.

> Keeping everything in the module dict is a design that (IMHO) is too error
> prone. C state should be kept safely at the C level, outside of the reach
> of Python code. I don't want users of my extension module to be able to
> provoke a crash by saying "extmodule._xyz = None".

So don't have global state in the *extension module*, then, keep it in the
regular C/C++ modules. (And don't use the exec-only approach if you do have
significant global state in the extension).

> I didn't know about PyType_FromSpec(), BTW. It looks like a nice addition
> for manually written code (although useless for Cython).

This is the only way to create custom types when using the stable ABI. Can
I take your observation to mean that Cython doesn't currently offer the
option of limiting itself to the stable ABI?

Cheers,
Nick.

>
> Stefan
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20130901/ebb01043/attachment-0001.html>