[Import-SIG] PEP 489: Redesigning extension module loading

Sat Mar 21 12:37:57 CET 2015

/me dredges up some dim dark history regarding the evolution of the
"-m" implementation :)

On 21 March 2015 at 20:30, Petr Viktorin <encukou at gmail.com> wrote:
> The idea of extending ModuleDef brings me back to the runpy problem. I don't
> think it's actually necessary for "-m" to mean "exec the module in an object
> named "__main__". Let's provide a slot for a main function, and have runpy
> call that.
> This would mean in Cython modules the "if __name__ == "__main__" hack won't
> work, ever (as opposed to that being a bug this PEP can help fix). Is that
> an acceptable loss?

runpy lets you set "__name__" to whatever you like, so triggering "if
__name__ ..." blocks isn't the problem - it's playing nice with other
code that assumes __main__ is a true singleton module that lasts the
entire lifecycle of the process (or at least from PyInit to
PyFinalize)

> (Maybe my next PEP should be letting Python modules define a
> __main__function, and slowly deprecating the things runpy needs to do.)

One thing about __main__ that makes it genuinely special is that it's
the namespace that the interpreter drops you into as a result of
passing -i at the command line or setting PYTHONINSPECT=1 in the
environment (either beforehand or while the application is running).

Earlier runpy based implementations of -m broke that by running the
code in a separate namespace rather than in the actual builtin
__main__ module, while later implementations fixed it by using the
real __main__ to run the code.

So if we wanted to allow -m to support execution of extension modules
with module level state, then one key thing to do would be to add a
mechanism to replace __main__ *for real*, such that PYTHONINSPECT
dropped you into the replacement namespace, rather than the original
builtin one.

Unfortunately, you then run into the problem that various package
__init__ methods may have seen the original __main__ before runpy got
a chance to swap it out - there's certainly code out there in the wild
that assumes __main__ is reliably a true singleton module, one that
never changes identity while the interpreter is capable of running
Python bytecode. That's part of why it's the only module where its
__spec__ may change depending on the phase of bootstrapping you're at
- it starts out advertising itself as a builtin module, but that may
change later on in the startup sequence depending on exactly what you
invoked as __main__. (My vague recollection is that the largest number
of states it can run through during any given startup sequence is 3,
but the total number of different possible states is on the order of 6
or 7. It's been a while though, so I may be misremembering both
numbers)

This "__main__ is __main__" assumption is one I've never been game to
even consider breaking - it's been a feature of Python since day 1,
and it seems to me that the *kinds* of breakage people would see if
they were relying on it and didn't know it would be close to
incomprehensible.

There's a reason I went and wrote PEP 432 after making the changes
necessary to get the interpreter startup sequence to play nice with
importlib in 3.3. Parts of it are some of the oldest code in CPython,
it's all painfully hard to test properly, and it gets hard to tell the
difference between "feature people are relying on" and "quirk of the
current implementation we can safely change" :P

Getting a fresh set of eyes on that code would be wonderful though -
one of the reasons PEP 432 stagnated (aside from my getting busy with
other things) was not having anyone else familiar enough with the
entire startup sequence to really argue with me about the detailed
design. (And at this point I'm rusty enough on it myself that getting
back into it would be a voyage of rediscovery)

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia