[Import-SIG] Proto-PEP: Redesigning extension module loading

Brett Cannon brett at python.org
Mon Feb 23 16:16:12 CET 2015

I mostly have grammar/typo comments and one suggestion to minimize the
number of ways of initializing a module by not letting PyModuleCreate_* do
that step on its own.

Otherwise the approach seems to be on the right track for what we need for
extension loading. Thanks for taking this on!

On Fri Feb 20 2015 at 9:57:16 AM Petr Viktorin <encukou at gmail.com> wrote:

> Hello list,
> I have taken Nick's challenge of extension module loading.
> I've read some of the relevant discussions, and bounced my ideas off Nick
> to see if I missed anything important.
> The main idea I realized, which was not obvious from the discussion,
> was that in addition to playing well with PEP 451 (ModuleSpec) and
> supporting
> subinterpreters and multiple Py_Initialize/Py_Finalize cycles,
> Nick's Create/Exec proposal allows executing the module in a "foreign",
> externally created module object. The main use case for that would be
> runpy and
> __main__, but lazy-loading mechanisms were mentioned that would benefit as
> well.
> As I was writing this down, I realized that once pre-created modules are
> allowed, it makes no sense to insist that they actually are module
> instances -- PyModule_Type provides little functionality above a plain
> object
> subclass. I'm not sure there are any use cases for this, but I don't see a
> reason to limit things artificially. Any bugs caused by allowing
> non-ModuleType modules are unlikely to be subtle, unless the custom object
> passes the "asked for it" line.
> Comments appreciated.
> ---
> Title: Redesigning extension module loading
> Version: $Revision$
> Last-Modified: $Date$
> Author: Petr Viktorin <encukou at gmail.com>, Stefan Behnel <stefan_ml
> at behnel.de>, Nick Coghlan <ncoghlan at gmail.com>
> BDFL-Delegate: "???"
> Discussions-To: "???"
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 11-Aug-2013
> Python-Version: 3.5
> Post-History: 23-Aug-2013, 20-Feb-2015
> Resolution:
> Abstract
> ========
> This PEP proposes a redesign of the way in which extension modules interact
> with the import machinery. This was last revised for Python 3.0 in PEP
> 3121, but did not solve all problems at the time. The goal is to solve them
> by bringing extension modules closer to the way Python modules behave;
> specifically to hook into the ModuleSpec-based loading mechanism
> introduced in PEP 451.
> Two ways to initialize a module, depending on the desired functionality,
> are proposed.
> The preferred form allows extension modules to be executed in pre-defined
> namespaces, paving the way for extension modules being runnable with
> Python's
> ``-m`` switch.
> Other modules can use arbitrary custom types for their module
> implementation,
> and are no longer restricted to types.ModuleType.
> Both ways make it easy to support properties at the module
> level and to safely store arbitrary global state in the module that is
> covered by normal garbage collection and supports reloading and
> sub-interpreters.
> Extension authors are encouraged to take these issues into account
> when using the new API.
> Motivation
> ==========
> Python modules and extension modules are not being set up in the same way.
> For Python modules, the module is created and set up first, then the module
> code is being executed (PEP 302).
> A ModuleSpec object (PEP 451) is used to hole information about the module,

"hole" -> "hold"

> and pased to the relevant hooks.

"pased" -> "passed"

> For extensions, i.e. shared libraries, the module
> init function is executed straight away and does both the creation and
> initialisation. The initialisation function is not passed ModuleSpec
> information about the loaded module, such as the __file__ or
> fully-qualified
> name
> This hinders relative imports and resource loading.
> This is specifically a problem for Cython generated modules, for which it's
> not uncommon that the module init code has the same level of complexity as
> that of any 'regular' Python module. Also, the lack of __file__ and
> __name__
> information hinders the compilation of __init__.py modules, i.e. packages,
> especially when relative imports are being used at module init time.
> The other disadvantage of the discrepancy is that existing Python
> programmers
> learning C cannot effectively map concepts between the two domains.
> As long as extension modules are fundamentally different from pure Python
> ones
> in the way they're initialised, they are harder for people to pick up
> without
> relying on something like cffi, SWIG or Cython to handle the actual
> extension
> module creation.
> Currently, extension modules are also not added to sys.modules until they
> are
> fully initialized, which means that a (potentially transitive)
> re-import of the module will really try to reimport it and thus run into an
> infinite loop when it executes the module init function again.
> Without the fully qualified module name, it is not trivial to correctly add
> the module to sys.modules either.
> Furthermore, the majority of currently existing extension modules has
> problems with sub-interpreter support and/or reloading, and, while it is
> it possible with the current infrastructure to support these
> features, is neither easy nor efficient.

"is neither" -> "it is neither"

> Addressing these issues was the goal of PEP 3121, but many extensions
> took the least-effort approach to porting to Python 3, leaving many of
> these
> issues unresolved.
> Thius PEP keeps the backwards-compatible behavior, which should reduce
> pressure

"Thius" -> "Thus"

> and give extension authors adequate time to consider these issues when
> porting.
> The current process
> ===================
> Currently, extension modules export an initialisation function named
> "PyInit_modulename", named after the file name of the shared library. This
> function is executed by the import machinery and must return either NULL in
> the case of an exception, or a fully initialised module object. The
> function receives no arguments, so it has no way of knowing about its
> import context.
> During its execution, the module init function creates a module object
> based on a PyModuleDef struct. It then continues to initialise it by adding
> attributes to the module dict, creating types, etc.
> In the back, the shared library loader keeps a note of the fully qualified
> module name of the last module that it loaded, and when a module gets
> created that has a matching name, this global variable is used to determine
> the fully qualified name of the module object. This is not entirely safe
> as it
> relies on the module init function creating its own module object first,
> but this assumption usually holds in practice.
> The proposal
> ============
> The current extension module initialisation will be deprecated in favour of
> a new initialisation scheme. Since the current scheme will continue to be
> available, existing code will continue to work unchanged, including binary
> compatibility.
> Extension modules that support the new initialisation scheme must export
> one
> or both of the public symbols "PyModuleCreate_modulename" and
> "PyModuleExec_modulename", where "modulename" is the
> name of the shared library. This mimics the previous naming convention for
> the "PyInit_modulename" function.
> This symbols, if defined, must resolve to C functions with the following

"This" -> "These"

> signatures, respectively::
>     PyObject* (*PyModuleCreateFunction)(PyObject* module_spec)
>     int (*PyModuleExecFunction)(PyObject* module)
> The PyModuleCreate function
> ---------------------------
> This PyModuleCreate function is used to implement "loader.create_module"
> defined in PEP 451.
> By exporting the "PyModuleCreate_modulename" symbol, an extension module
> indicates that it uses a custom module object.
> This prevents loading the extension in a pre-created module,
> but gives greater flexibility in allowing a custom C-level layout
> of the module object.
> The "module_spec" argument receives a "ModuleSpec" instance, as defined in
> PEP 451.
> When called, this function must create and return a module object.
> If "PyModuleExec_module" is undefined, this function must also initialize
> the module; see PyModuleExec_module for details on initialization.

Why conflate module creation with initialization? If one is going to have
initialization code then it can't be difficult to factor out into a
PyModuleExec_* function, so I don't see a good reason to support only
defining PyModuleCreate_*.

> There is no requirement for the returned object to be an instance of
> types.ModuleType. Any type can be used. This follows the current
> support for allowing arbitrary objects in sys.modules and makes it easier
> for extension modules to define a type that exactly matches their needs for
> holding module state.
> The PyModuleExec function
> -------------------------
> This PyModuleExec function is used to implement "loader.exec_module"
> defined in PEP 451.
> It is called after ModuleSpec-related attributes such as ``__loader__``,
> ``__spec__`` and ``__name__`` are set on the module.
> (The full list is in PEP 451 [#pep-0451-attributes]_)
> The "PyModuleExec_modulename" function will be called to initialize a
> module.
> This happens in two situations: when the module is first initialized for
> a given (sub-)interpreter, and when the module is reloaded.
> The "module" argument receives the module object.
> If PyModuleCreate is defined, this will be the the object returned by it.
> If PyModuleCreate is not defined, PyModuleExec is epected to operate

"epected" -> "expected"

> on any Python object for which attributes can be added by PyObject_GetAttr*
> and retreived by PyObject_SetAttr*.

"retreived" -> "retrieved"

> Specifically, as the module may not be a PyModule_Type subclass,
> PyModule_* functions should not be used on it, unless they explicitly
> support
> operating on all objects.
> Helper functions
> ----------------
> For two initialization tasks previously done by PyModule_Create,
> two functions are introduced::
>     int PyModule_SetDocString(PyObject *m, const char *doc)
>     int PyModule_AddFunctions(PyObject *m, PyMethodDef *functions)
> These set the module docstring, and add the module functions, respectively.
> Both will work on any Python object that supports setting attributes.
> They return zero on success, and on failure, they set the exception
> and return -1.
> Other changes
> -------------
> The following functions and macros will be modified to work on any object
> that supports attribute access:
>     * PyModule_GetNameObject
>     * PyModule_GetName
>     * PyModule_GetFilenameObject
>     * PyModule_GetFilename
>     * PyModule_AddIntConstant
>     * PyModule_AddStringConstant
>     * PyModule_AddIntMacro
>     * PyModule_AddStringMacro
>     * PyModule_AddObject
> Usage
> =====
> This PEP allows three new ways of creating modules, each with its
> advantages and disadvantages.

> Exec-only
> ---------
> The preferred way to create C extensions is to define
> "PyModuleExec_modulename"
> only. This brings the following advantages:
> * The extension can be loaded into a pre-created module, making it possible
>   to run them as ``__main__``, participate in certain lazy-loading schemes
>   [#lazy_import_concerns]_, or enable other creative uses.
> * The module can be reloaded in the same way as Python modules.
> As Exec-only extension modules do not have C-level storage,
> all module-local data must be stored in the module object's attributes,
> possibly using the PyCapsule mechanism.
> XXX: Provide an example?
> Create-only
> -----------
> Extensions defining only the "PyModuleCreate_modulename" hook behave
> similarly
> to current extensions.

If we are going to bother with allowing module creation then I would rather
either have people stay with the old way or completely move over to the new
way and not switch over only partially. Supporting this
create-and-initialize also breaks with the Python analog that the rest of
this PEP promotes.

> This is the easiest way to create modules that require custom module
> objects,
> or substantial per-module state at the C level (using positive
> ``PyModuleDef.m_size``).
> When the PyModuleCreate function is called, the module has not yet been
> added
> to sys.modules.
> Attempts to load the module again (possibly transitively) will result in an
> infinite loop.
> If user code needs to me called in module initialization,

"me" -> "be"

> module authors are advised to do so from the PyModuleExec function.
> Reloading a Create-only module does nothing, except re-setting
> ModuleSpec-related attributes described in PEP 0451 [#pep-0451-attributes].
> XXX: Provide an example? (It would be similar to the one in PEP 3121)
> Exec and Create
> ---------------
> Extensions that need to create a custom module object,
> and either need to run user code during initialization or support
> reloading,
> should define both "PyModuleCreate_modulename" and
> "PyModuleExec_modulename".
> XXX: Provide an example?
If you drop the ability for PyModuleCreate_* to also initialize then you
will really only have 1 way to import a module it happens to have an
optional module creation step. If you do drop it then the opening line for
this section is misleading.

> Legacy Init
> -----------
> If neither PyModuleExec nor PyModuleCreate is defined, the module is
> initialized using the PyModuleInit hook, as described in PEP 3121.
> If PyModuleExec or PyModuleCreate is defined, PyModuleInit will be ignored.
> Modules requiring compatibility with previous versions of CPython may
> implement
> PyModuleInit in addition to the new hooks.
> Subinterpreters and Interpreter Reloading
> -----------------------------------------
> Extensions using the new initialization scheme are expected to support
> subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly.
> The mechanism is designed to make this easy, but care is still required
> on the part of the extension author.
> No user-defined functions, methods, or instances may leak to different
> interpreters.
> To achieve this, all module-level state should be kept in either the module
> dict, or in the module object.
> A simple rule of thumb is: Do not define any static data, except built-in
> types
> with no mutable or user-settable class attributes.
> Module Reloading
> ----------------
> Extensions that support reloading must define PyModuleExec, which is called
> in reload() to re-initialize the module in place.
> The same caveats apply to reloading an extension module as to reloading
> a Python module.
> Note that due to limitations in shared library loading (both dlopen on
> and LoadModuleEx on Windows), it is not generally possible to load a
> modified
> library after it has changed on disk.
> Therefore, reloading extension modules is of limited use.
> Multiple modules in one library
> -------------------------------
> To support multiple Python modules in one shared library, the library
> must export all appropriate PyModuleExec_<name> or PyModuleCreate_<name>
> hooks
> for each exported module.
> The modules are loaded using a ModuleSpec with origin set to the name of
> the
> library file, and name set to the module name.
> Note that this mechanism can only be used to *load* such modules,
> not to *find* them.
> XXX: Provide an example of how to load such modules
> Implementation
> ==============
> XXX - not started
> Open issues
> ===========
> Now that PEP 442 is implemented, it would be nice if module finalization
> did not set all attributes to None,
> In this scheme, it is not possible to create a module with C-level state,
> which would be able to exec itself in any externally provided module
> object,
> short of putting PyCapsules in the module dict.
> The proposal repurposes PyModule_SetDocString, PyModule_AddObject,
> PyModule_AddIntMacro et.al. to work on any object.
> Would it be better to have these in the PyObject namespace?

No. They are setting explicit attributes that are meant only for modules so
its more generalization than is necessary to rename them.

> We should expose some kind of API in importlib.util (or a better place?)
> that
> can be used to check that a module works with reloading and
> subinterpreters.

What would such an API actually check to verify that a module could be

> The runpy module will need to be modified to take advantage of PEP 451
> and this PEP. This might out of scope for this PEP.
> Previous Approaches
> ===================
> Stefan Behnel's initial proto-PEP [#stefans_protopep]_
> had a "PyInit_modulename" hook that would create a module class,
> whose ``__init__`` would be then called to create the module.
> This proposal did not correspond to the (then nonexistent) PEP 451,
> where module creation and initialization is broken into distinct steps.
> It also did not support loading an extension into pre-existing module
> objects.
> Nick Coghlan proposed the Create annd Exec hooks, and wrote a prototype
> implementation [#nicks-prototype]_.
> At this time PEP 451 was still not implemented, so the prototype
> does not use ModuleSpec.
> References
> ==========
> .. [#lazy_import_concerns]
>    https://mail.python.org/pipermail/python-dev/2013-August/128129.html
> .. [#pep-0451-attributes]
>    https://www.python.org/dev/peps/pep-0451/#attributes
> .. [#stefans_protopep]
>    https://mail.python.org/pipermail/python-dev/2013-August/128087.html
> .. [#nicks-prototype]
>    https://mail.python.org/pipermail/python-dev/2013-August/128101.html
> Copyright
> =========
> This document has been placed in the public domain.
> _______________________________________________
> Import-SIG mailing list
> Import-SIG at python.org
> https://mail.python.org/mailman/listinfo/import-sig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20150223/6a193672/attachment-0001.html>

More information about the Import-SIG mailing list