[Import-SIG] PEP 547: Could we implement a usable "get_code()" for extension modules?

Wed Jan 17 12:19:35 EST 2018

On 01/17/2018 05:08 PM, Nick Coghlan wrote:
> On 17 January 2018 at 23:25, Petr Viktorin <encukou at gmail.com> wrote:
>> On 01/17/2018 05:45 AM, Nick Coghlan wrote:
>>> Whether we use namespace duplication or PEP 562, both of them have the
>>> problem that attribute *rebinding* won't work, since they'll only
>>> affect the wrapper namespace. I think we can live with that, but we'll
>>> likely want to expose a dunder-name to let folks access the underlying
>>> "real" module (e.g. make the variable name "__module__" rather than
>>> "module", and then use the PEP 562 approach so there's no risk of
>>> accidentally overwriting it).
>>
>> Hm, the more I think about it, the more I don't like the namespace copy.
>> The `python -im math` is cute, but doesn't really solve any immediate
>> problem. `from math import *` practically does the same thing, and is way
>> more obvious.
>> The main rationale behind making -m work for extension modules was to make
>> them behave like pure-Python ones -- e.g. if you Cythonize something,
>> everything will keep working as before. That's not the case here, and adding
>> `__module__` would be just piling on workarounds.
> 
> Aye, the namespace copy idea is cute, but I don't think it's a path we
> want to go down due to the state consistency management problems that
> it creates.
> 
> I definitely prefer the idea of handling the importlib/runpy side of
> PEP 547 via `get_code()` though - this thread was prompted by asking
> myself whether or not I'd approve the PEP in its current form, and
> deciding that runpy et al needing to be aware of the new capability in
> order to benefit from it genuinely bothered me.
> 
> So the question then is what the module execution code would need to
> look like for the following cases:
> 
> - multi-phase init with Py_mod_exec only
> - multi-phase init with Py_mod_create as well
> - single-phase init
> 
> Where things get tricky with this approach is that by the time the
> synthesised code object is running, it doesn't have access to the
> module itself any more, only the module namespace. We could get around
> that in the Py_mod_exec-only case by looking __name__ up in
> sys.modules, but that doesn't help with either of the other two cases
> where the module creation happens outside the import system's control,
> and would be a surprising discrepancy between extension modules and
> pure Python ones.
> 
> As far as I can see, that leaves us with only one potential design
> direction we haven't explored yet: what if we provided a way for an
> existing namespace to be passed in when creating a module object? If
> we did that, then it would be possible to create a hidden module in
> the synthesised code such that "globals() is
> _private_module.__dict__". That might not get us all the way to
> supporting single-phase init, but it would make it feasible to define
> a new Py_mod_create_with_namespace slot, such that "-m" would be
> supported for multi-phase modules that either didn't define
> Py_mod_create, or else defined Py_mod_create_with_namespace.
> 
> I'm fairly sure that wouldn't actually work right though, as I expect
> the descriptor protocol would lead to the "wrong" module getting
> passed in to the extension module functions :(

Let me suggest another potential direction we (or at least I) haven't 
explored yet: what about working to make the __main__ module either 
replaceable, or unused until we know what it should be?

I remember you saying that's not feasible, so I haven't tried anything, 
but I don't remember an explanation. How sure are you that that rabbit 
hole is deeper than the one we're in now?