[Import-SIG] Proto-PEP: Redesigning extension module loading

Brett Cannon brett at python.org
Mon Feb 23 16:29:17 CET 2015


On Sat Feb 21 2015 at 7:27:19 AM Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 21 February 2015 at 00:56, Petr Viktorin <encukou at gmail.com> wrote:
> > Hello list,
> >
> > I have taken Nick's challenge of extension module loading.
>
> Thanks for tackling this!
>
> > I've read some of the relevant discussions, and bounced my ideas off Nick
> > to see if I missed anything important.
> >
> > The main idea I realized, which was not obvious from the discussion,
> > was that in addition to playing well with PEP 451 (ModuleSpec) and
> supporting
> > subinterpreters and multiple Py_Initialize/Py_Finalize cycles,
> > Nick's Create/Exec proposal allows executing the module in a "foreign",
> > externally created module object. The main use case for that would be
> runpy and
> > __main__, but lazy-loading mechanisms were mentioned that would benefit
> as well.
>
> For everyone else's reference: this actually came up in Petr's earlier
> off-list discussions with me, when I realised I'd had the "running
> extension modules as __main__" use case in mind myself, but never
> actually written that notion down anywhere.
>
> It's the one capability of PyModuleExec_* that simply doesn't exist today.
>
> > As I was writing this down, I realized that once pre-created modules are
> > allowed, it makes no sense to insist that they actually are module
> > instances -- PyModule_Type provides little functionality above a plain
> object
> > subclass. I'm not sure there are any use cases for this, but I don't see
> a
> > reason to limit things artificially. Any bugs caused by allowing
> > non-ModuleType modules are unlikely to be subtle, unless the custom
> object
> > passes the "asked for it" line.
> >
> > Comments appreciated.
>
> This generally looks good to me. Some more specific feedback inline below.
>
> > PEP: XXX
> > Title: Redesigning extension module loading
>
> For the BDFL-Delegate question: Brett would you be happy tackling this one?
>

I don't know if "be happy tackling" is the right way to phrase it. =)

Honestly I don't think I'm the best person for this PEP. My experience with
the C API and extension modules is rather limited and so I don't think I
will be able to properly think of the impact on more complex, sane
extension module use cases.


>
> > Motivation
> > ==========
> >
> > Python modules and extension modules are not being set up in the same
> way.
> > For Python modules, the module is created and set up first, then the
> module
> > code is being executed (PEP 302).
> > A ModuleSpec object (PEP 451) is used to hole information about the
> module,
> > and pased to the relevant hooks.
>
> s/hole/hold/
> s/pased/passed/
>
> <snip>
>
> > Furthermore, the majority of currently existing extension modules has
> > problems with sub-interpreter support and/or reloading, and, while it is
> > it possible with the current infrastructure to support these
> > features, is neither easy nor efficient.
> > Addressing these issues was the goal of PEP 3121, but many extensions
> > took the least-effort approach to porting to Python 3, leaving many of
> these
> > issues unresolved.
>
> It's probably worth noting that some of those "least-effort" porting
> approaches are in the standard library: this PEP is about solving our
> own problems in addition to other people's.
>
> > Thius PEP keeps the backwards-compatible behavior, which should reduce
> pressure
> > and give extension authors adequate time to consider these issues when
> porting.
>
> s/thius/this/
>
> > The proposal
> > ============
> >
> > The current extension module initialisation will be deprecated in favour
> of
> > a new initialisation scheme. Since the current scheme will continue to be
> > available, existing code will continue to work unchanged, including
> binary
> > compatibility.
> >
> > Extension modules that support the new initialisation scheme must export
> one
> > or both of the public symbols "PyModuleCreate_modulename" and
> > "PyModuleExec_modulename", where "modulename" is the
> > name of the shared library. This mimics the previous naming convention
> for
> > the "PyInit_modulename" function.
> >
> > This symbols, if defined, must resolve to C functions with the following
> > signatures, respectively::
> >
> >     PyObject* (*PyModuleCreateFunction)(PyObject* module_spec)
> >     int (*PyModuleExecFunction)(PyObject* module)
>
> For the Python level, the model we ended up with for 3.5 is:
>
> 1. create_module must exist, but may return None
> 2. exec_module must exist, but may have no effect on the module state
>
> For the new C level API, it's probably worth drawing the more explicit
> parallel to __new__ and __init__ on classes, where you can implement
> both of them if you want, but in most cases, implementing only one or
> the other will be sufficient.
>
> The reason I suggest that is because I was going to ask if we should
> make providing both APIs, or at least PyModuleExec_*, compulsory
> (based on the Python Loader API requirements), but thinking of the
> __new__/__init__ analogy made me realise that your current design
> makes sense, since dealing with it is confined specifically to the
> extension module loader implementation.
>

See I don't like this fork from the PEP 451 API. Unless we want to change
importlib to not require exec_module() and instead let create_module()
partially fulfill the role load_module() had by doing everything then I say
the C API should try to follow how the rest of the import machinery
operates, especially if the separation is mostly a refactoring of what some
combined PyModuleCreate_* would probably do anyway.


>
> > The PyModuleCreate function
> > ---------------------------
>
> <snip>
>
> > When called, this function must create and return a module object.
> >
> > If "PyModuleExec_module" is undefined, this function must also initialize
> > the module; see PyModuleExec_module for details on initialization.
>
> This should be clarified to point out that, as per PEP 451, the import
> machinery will still take care of setting the import related
> attributes after the loader returns the module from create_module.
>
> > There is no requirement for the returned object to be an instance of
> > types.ModuleType. Any type can be used.
>
> The requirement for the returned object to support getting and setting
> attributes (as per
> https://www.python.org/dev/peps/pep-0451/#attributes) should be
> defined here.
>
> > This follows the current
> > support for allowing arbitrary objects in sys.modules and makes it easier
> > for extension modules to define a type that exactly matches their needs
> for
> > holding module state.
>
> +1
>
> > The PyModuleExec function
> > -------------------------
> >
> > This PyModuleExec function is used to implement "loader.exec_module"
> > defined in PEP 451.
> > It is called after ModuleSpec-related attributes such as ``__loader__``,
> > ``__spec__`` and ``__name__`` are set on the module.
> > (The full list is in PEP 451 [#pep-0451-attributes]_)
> >
> > The "PyModuleExec_modulename" function will be called to initialize a
> module.
> > This happens in two situations: when the module is first initialized for
> > a given (sub-)interpreter, and when the module is reloaded.
> >
> > The "module" argument receives the module object.
> > If PyModuleCreate is defined, this will be the the object returned by it.
> > If PyModuleCreate is not defined, PyModuleExec is epected to operate
> > on any Python object for which attributes can be added by
> PyObject_GetAttr*
> > and retreived by PyObject_SetAttr*.
> > Specifically, as the module may not be a PyModule_Type subclass,
> > PyModule_* functions should not be used on it, unless they explicitly
> support
> > operating on all objects.
>
> I think this is too permissive on the interpreter side of things, thus
> making things more complicated than we'd like them to be for extension
> module authors.
>
> If PyModuleCreate_* is defined, PyModuleExec_* will receive the object
> returned there, while if it isn't defined, the interpreter *will*
> provide a PyModule_Type instance, as per PEP 451.
>
> However, permitting module authors to make the PyModule_Type (or a
> subclass) assumption in their implementation does introduce a subtle
> requirement on the implementation of both the load_module method, and
> on custom PyModuleExec_* functions that are paired with a
> PyModuleCreate_* function.
>
> Firstly, we need to enforce the following constraint in load_module:
> if the underlying C module does *not* define a custom PyModuleCreate_*
> function, and we're passed a module execution environment which is
> *not* an instance of PyModule_Type, then we should throw TypeError.
>
> By contrast, in the presence of a custom PyModuleCreate_* function,
> the requirement for checking the type of the execution environment
> (and throwing TypeError if the module can't handle it) should be
> delegated to the PyModuleExec_* function, and that will need to be
> documented appropriately.
>
> That keeps things simple in the default case (extension module authors
> just using PyModuleExec_* can continue to assume the use of
> PyModule_Type or a subclass), while allowing more flexibility in the
> "power user" case of creating your own module object.
>
> > Usage
> > =====
> >
> > This PEP allows three new ways of creating modules, each with its
> > advantages and disadvantages.
> >
> >
> > Exec-only
> > ---------
> >
> > The preferred way to create C extensions is to define
> "PyModuleExec_modulename"
> > only. This brings the following advantages:
> >
> > * The extension can be loaded into a pre-created module, making it
> possible
> >   to run them as ``__main__``, participate in certain lazy-loading
> schemes
> >   [#lazy_import_concerns]_, or enable other creative uses.
> > * The module can be reloaded in the same way as Python modules.
> >
> > As Exec-only extension modules do not have C-level storage,
> > all module-local data must be stored in the module object's attributes,
> > possibly using the PyCapsule mechanism.
>
> With my suggested change above, this approach will also let module
> authors assume PyModule_Type (or a subclass), and have the interpreter
> enforce that assumption on their behalf.
>
> > Create-only
> > -----------
> >
> > Extensions defining only the "PyModuleCreate_modulename" hook behave
> similarly
> > to current extensions.
> >
> > This is the easiest way to create modules that require custom module
> objects,
> > or substantial per-module state at the C level (using positive
> > ``PyModuleDef.m_size``).
> >
> > When the PyModuleCreate function is called, the module has not yet been
> added
> > to sys.modules.
> > Attempts to load the module again (possibly transitively) will result in
> an
> > infinite loop.
> > If user code needs to me called in module initialization,
> > module authors are advised to do so from the PyModuleExec function.
> >
> > Reloading a Create-only module does nothing, except re-setting
> > ModuleSpec-related attributes described in PEP 0451
> [#pep-0451-attributes].
>
> Another advantage of this approach is that you don't need to worry
> about potentially being passed a module object of an arbitrary type.
>
> > Exec and Create
> > ---------------
> >
> > Extensions that need to create a custom module object,
> > and either need to run user code during initialization or support
> reloading,
> > should define both "PyModuleCreate_modulename" and
> "PyModuleExec_modulename".
>
> This approach will have the downside of needing to check the type of
> the passed in module against the module implementation's assumptions.
>
> > Subinterpreters and Interpreter Reloading
> > -----------------------------------------
> >
> > Extensions using the new initialization scheme are expected to support
> > subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly.
> > The mechanism is designed to make this easy, but care is still required
> > on the part of the extension author.
> > No user-defined functions, methods, or instances may leak to different
> > interpreters.
> > To achieve this, all module-level state should be kept in either the
> module
> > dict, or in the module object.
> > A simple rule of thumb is: Do not define any static data, except
> built-in types
> > with no mutable or user-settable class attributes.
>
> Worth noting here that this is why we consider it desirable to provide
> a utility somewhere in the standard library to make it easy to do
> these kinds of checks.
>
> At the very least we need it in the test.support module to do our own
> tests, but it would be preferable to have it as a supported API
> somewhere in the standard library.
>
> This isn't the only area where this kind of question of making it
> easier for people to test whether or not they're implementing or
> emulating a protocol correctly has come up - it's applicable to
> testing things like total ordering support in custom objects, operand
> precedence handling, ABC compliance, code generation, exception
> traceback manipulation, etc.
>
> Perhaps we should propose a new unittest submodule for compatibility
> and compliance tests that are too esoteric for the module top level,
> but we also don't want to ask people to write for themselves?
>
> > Module Reloading
> > ----------------
> >
> > Extensions that support reloading must define PyModuleExec, which is
> called
> > in reload() to re-initialize the module in place.
> > The same caveats apply to reloading an extension module as to reloading
> > a Python module.
>
> Assuming you go with my suggestion regarding the PyModule_Type
> assumption above, that would be worth reiterating here.
>
> > Multiple modules in one library
> > -------------------------------
> >
> > To support multiple Python modules in one shared library, the library
> > must export all appropriate PyModuleExec_<name> or PyModuleCreate_<name>
> hooks
> > for each exported module.
> > The modules are loaded using a ModuleSpec with origin set to the name of
> the
> > library file, and name set to the module name.
> > Note that this mechanism can only be used to *load* such modules,
> > not to *find* them.
>
> If I recall correctly, Brett already updated the extension module
> finder to handle locating such modules. It's either that or there's an
> existing issue on the tracker for it.
>

Existing issue; extensions use FileFinder and do no caching or search of
what initialization functions are exported by the module.

-Brett


>
> > Open issues
> > ===========
> >
> > Now that PEP 442 is implemented, it would be nice if module finalization
> > did not set all attributes to None,
>
> Antoine added that in 3.4: http://bugs.python.org/issue18214
>
> However, it wasn't entirely effective, as several extension modules
> still need to be hit with a sledgehammer to get them to drop
> references properly. Asking "Why is that so?" is actually one of the
> things that got me started digging into this area a couple of years
> back.
>
> > In this scheme, it is not possible to create a module with C-level state,
> > which would be able to exec itself in any externally provided module
> object,
> > short of putting PyCapsules in the module dict.
>
> I suspect "PyCapsule in the module dict" may be the right answer here,
> in which case some suitable documentation and perhaps some convenience
> APIs could be a good way to go.
>
> Relying on PyCapsule also has the advantage of potentially supporting
> better collaboration between extension modules, without needing to
> link them with each other directly.
>
> > The proposal repurposes PyModule_SetDocString, PyModule_AddObject,
> > PyModule_AddIntMacro et.al. to work on any object.
> > Would it be better to have these in the PyObject namespace?
>
> With my proposal above to keep the PyModule_Type assumption in most
> cases, I think it may be better to leave them alone entirely. If folks
> decide to allow non module types, they can decide to handle the
> consequences.
>
> > We should expose some kind of API in importlib.util (or a better place?)
> that
> > can be used to check that a module works with reloading and
> subinterpreters.
>
> See comments above on that.
>
> > The runpy module will need to be modified to take advantage of PEP 451
> > and this PEP. This might out of scope for this PEP.
>
> I think it's out of scope, but runpy *does* need an internal redesign
> to take full advantage of PEP 451. Currently it works by attempting to
> extract the code object directly in most situations, whereas PEP 451
> should let it rely almost entirely on exec_code instead (with direct
> execution used only when it's actually given a path directly to a
> Python source or bytecode file.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Import-SIG mailing list
> Import-SIG at python.org
> https://mail.python.org/mailman/listinfo/import-sig
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20150223/a597db5f/attachment-0001.html>


More information about the Import-SIG mailing list