[Import-SIG] PEP 547: Could we implement a usable "get_code()" for extension modules?

Wed Jan 17 11:08:30 EST 2018

On 17 January 2018 at 23:25, Petr Viktorin <encukou at gmail.com> wrote:
> On 01/17/2018 05:45 AM, Nick Coghlan wrote:
>> Whether we use namespace duplication or PEP 562, both of them have the
>> problem that attribute *rebinding* won't work, since they'll only
>> affect the wrapper namespace. I think we can live with that, but we'll
>> likely want to expose a dunder-name to let folks access the underlying
>> "real" module (e.g. make the variable name "__module__" rather than
>> "module", and then use the PEP 562 approach so there's no risk of
>> accidentally overwriting it).
>
> Hm, the more I think about it, the more I don't like the namespace copy.
> The `python -im math` is cute, but doesn't really solve any immediate
> problem. `from math import *` practically does the same thing, and is way
> more obvious.
> The main rationale behind making -m work for extension modules was to make
> them behave like pure-Python ones -- e.g. if you Cythonize something,
> everything will keep working as before. That's not the case here, and adding
> `__module__` would be just piling on workarounds.

Aye, the namespace copy idea is cute, but I don't think it's a path we
want to go down due to the state consistency management problems that
it creates.

I definitely prefer the idea of handling the importlib/runpy side of
PEP 547 via `get_code()` though - this thread was prompted by asking
myself whether or not I'd approve the PEP in its current form, and
deciding that runpy et al needing to be aware of the new capability in
order to benefit from it genuinely bothered me.

So the question then is what the module execution code would need to
look like for the following cases:

- multi-phase init with Py_mod_exec only
- multi-phase init with Py_mod_create as well
- single-phase init

Where things get tricky with this approach is that by the time the
synthesised code object is running, it doesn't have access to the
module itself any more, only the module namespace. We could get around
that in the Py_mod_exec-only case by looking __name__ up in
sys.modules, but that doesn't help with either of the other two cases
where the module creation happens outside the import system's control,
and would be a surprising discrepancy between extension modules and
pure Python ones.

As far as I can see, that leaves us with only one potential design
direction we haven't explored yet: what if we provided a way for an
existing namespace to be passed in when creating a module object? If
we did that, then it would be possible to create a hidden module in
the synthesised code such that "globals() is
_private_module.__dict__". That might not get us all the way to
supporting single-phase init, but it would make it feasible to define
a new Py_mod_create_with_namespace slot, such that "-m" would be
supported for multi-phase modules that either didn't define
Py_mod_create, or else defined Py_mod_create_with_namespace.

I'm fairly sure that wouldn't actually work right though, as I expect
the descriptor protocol would lead to the "wrong" module getting
passed in to the extension module functions :(

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia