[Import-SIG] Running C extension modules using -m switch

Fri May 19 06:24:58 EDT 2017

On 18 May 2017 at 22:50,  <gmarcel.plch at gmail.com> wrote:
> Greetings,
>
> This has been already sent to python-ideas, but since I got no
> response, so I'm re-sending it to this SIG. I would welcome any
> comments.

Sorry about that - I suspect you caught a lot of other folks in the
middle of getting ready for PyCon travel, and I put it aside to have a
closer look when I had more time.

> I'm a student that has been working lately on feature of the runpy
> module that I have been quite interested in: execution of extension
> modules using the -m switch.

This is very cool, and one of the things we were hoping to enable with
the multi-phase initialisation changes :)

> Currently this requires access to the module's code, so it only works
> for modules written in Python.
> I have a proof-of-concept implementation that adds a new
> ExtensionFileLoader method called "exec_as_main".
> The runpy module then checks if the loader has this method, and if so,
> calls it instead of getting the the code and running that.
>
> This new method calls into the _imp module, which executes the module
> as a script.
> I can see two ways of doing this. Both expect that the module uses PEP
> 489 multi-phase initialization.

The main reason I didn't immediately reply is that I had a vague
recollection of thinking this could be done *without* a new method on
loaders, but I needed to refresh my memory of our plans in that
regard.

I've now done that, and I'm pretty sure the unwritten plan was to
change runpy to do something like the following:

    spec = importlib.find_spec(modname)
    created = spec.loader.create_module()
    if created is not None:
        raise RuntimeError("Cannot use customised module instance as __main__")
    spec.loader.exec_module(main_mod)

That's oversimplified quite a bit, but it gives the general idea.

> The first way is having a new PyModuleDef_Slot called Py_mod_main,
> which names a function to execute when run as main.
>
> The second way is running a module's Py_mod_exec inside the __main__
> module's namespace, as it's done for normal modules.
> The module would then do a `if __name__ == "__main__"` check.
> This is possible for modules that don't define Py_mod_create: they
> expect a default module object to be created for them, so we can pass
> the __main__ module to their Py_mod_exec function.
> This way would mean that, for example, modules written in Cython would
> behave like their Python counterparts.

And that's *precisely* the idea behind allowing this to work with
existing loaders, as long as they return None from create_module().

> Another possibility would be to use both, allowing both easy Cython-
> style modules and a dedicated slot for modules that need custom
> Py_mod_create.

I'm OK with continuing to have cases like the latter rely on a helper
module that imports the one that needs a custom module instance. Given
PEP 489, those can even be defined in the same shared library:
https://www.python.org/dev/peps/pep-0489/#multiple-modules-in-one-library

Independently of extension module initialisation, I also have an idea
for taking another go at providing "autorun" capabilities for main
modules, where defining a dunder functions with a particular name will
execute it after __main__ finishes running (I'll do a separate post
about that).

> My proof of concept uses another combination: it requires Py_mod_main
> and runs it in the __main__ namespace. But that can change based on
> discussion here.
>
> Link to the implementation: https://github.com/Traceur759/cpython/tree/
> main_c_modules
> Diff from master: https://github.com/python/cpython/compare/master...Tr
> aceur759:main_c_modules
>
> You can quickly test it with:
> $ ./python -m _testmultiphase
> This is an extension module named __main__

Once again, very cool!

If we don't have one already, could you file a 3.7 RFE for this on
bugs.python.org?

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia