metamodules (was: Re: Idea to support lazy loaded names.)

On Tue, Oct 7, 2014 at 5:12 AM, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
Yeah, and having had some time to think about that discussion and do some prototyping, I'm going to argue below that allowing assignment to module instances' __class__ really is the best path forward. (For those who find the below TLDR, check this out instead: https://github.com/njsmith/metamodule)
IIRC, the not-obviously-unreasonable options suggested were:
Great list! I've rearranged a bit to make my argument clearer.
1) Analogy with __new__: For packages only, if there's a __new__.py, that gets executed first. If it "returns" (not sure how that was defined) an instance of a subclass of ModuleType, that instance is used to run __init__.py instead of a normal module instance.
This is very similar to the current approach of having __init__.py reassign sys.modules[__name__]. The advantages are: - It gives a way to enforce the rule that you have to do this assignment as the very first thing inside your module, before allowing arbitrary code to run (e.g. by importing other modules which might recursively import your module in turn, and access sys.modules[__name__] before you've modified it). - Because __new__.py would run *before* the It avoids the headache of having to juggle two module objects, one of whose __dict__'s is already being used as the execution environment for the code that is trying to do the switcheroo. But: - It's a pretty complicated way to accomplish the stated goals. - The restriction to packages is unfortunate. - The backcompat story is terrible -- faking __new__.py support in old versions of python would be really difficult, and the main reason I care about this stuff in the first place is because I want to be able to e.g. deprecate module attributes that are in the public API of old, widely-used packages. It will be many years before such packages can require 3.5. So I think we should probably try to make the existing sys.modules[__name__] = ... strategy work before considering this.
2) Analogy with metaclass= (or with 2.x __metaclass__): If a module (or a package's __init__.py) does some new syntax or magic comment before any non-comment code, it can specify a custom type in place of ModuleType (not sure how that type gets imported and made available).
I don't see any way to solve this import problem you refer to at the end -- in most cases the code implementing the metamodule type will be defined *inside* the module/package which wants to use the metamodule, so we have a chicken-and-egg problem.
4) Make it easier to write import hooks for this special purpose.
This has the same problem as the previous -- who imports the importer?
5) Make it easier for a module to construct a ModuleType-subclass instance with a copy of the same dict and replace itself in sys.modules.
So, trying to *copy* the dict is just not going to work. Consider the package foo, with a foo/__init__.py that looks like: orig_dict = sys.modules[__name__].__dict__ sys.modules[__name__] = MyModule(__name__, __doc__) a = 1 from .submod import b c = 3 sys.modules[__name__].__dict__.update(orig_dict) and where foo/submod.py looks like: import foo b = foo.a + 1 def c(): return foo.a + 2 This won't work, because at the time we import .submod, sys.modules["foo"].__dict__ does not contain an entry for "a" -- only the original module's dict has that. There are a bunch of different ways we could try writing our __init__.py. We might try putting the sys.module assignment at the end: a = 1 from .submod import b, c d = 4 orig_dict = sys.modules[__name__].__dict__ sys.modules[__name__] = MyModule(__name__, __doc__) sys.modules[__name__].__dict__.update(orig_dict) Now when .submod re-imports the top-level module, it ends up with a reference to the original module object, which has an "a" entry, so the definition of "b" works. But, now .submod.foo will continue to refer to the original module object, even after we substitute in the metamodule object. If we do 'foo.a = 5' later on, then foo.c() will continue to use the original binding of 'a'; this mutation will be invisible to it. I guess the only way to make it work in this case is to do multiple copies, one before every nested-import: orig_dict = sys.modules[__name__].__dict__ sys.modules[__name__] = MyModule(__name__, __doc__) a = 1 sys.modules[__name__].__dict__.update(orig_dict) from .submod import b, c d = 4 sys.modules[__name__].__dict__.update(orig_dict) ...but this is incredibly ugly and error-prone. What we really want to do instead is to make our new metamodule object refer directly to the original module's __dict__: orig_dict = sys.modules[__name__].__dict__ sys.modules[__name__] = MyModule(__name__, __doc__) sys.modules[__name__].__dict__ = orig_dict a = 1 from .submod import b, c d = 4 That way they will always be in sync. This looks like it should work great! But it has a few problems: - Trying to assign to a module's __dict__ attribute raises "TypeError: readonly attribute". - So I actually implemented a fix for that, and ran into a new problem: module's take jealous ownership of their __dict__. In particular, they assume that they are when they are deallocated they should wipe their dict clean (https://www.python.org/doc/essays/cleanup/). Obviously this is bad for us because we are still using that dict! - Also, in modern Python module objects contain more state besides __dict__ -- in particular, PEP 3121-related state. There's no public API to get at this. - Possibly future version of Python will add more state fields again, who knows. The easiest way to solve all these problems is to *swap* all of the internal fields between the old module object and the new metamodule object. This can be done hackishly using ctypes; this requires knowing about CPython's struct layouts, but that's okay for prototyping and for backwards compatibility hacks (which only have to support specific known versions). To do it non-hackishly, I was at first thinking that we should provide an official API for swapping module object states. But then I realized that at that point, we're basically back to...
3) Analogy with re-classing object instances: Just allow modules to set __class__ during execution (or after, if you want). Variations on this include allowing that for all non-heap types, or even getting rid of the concept of non-heap types.
...this proposal after all. And allowing __class__ assignment on modules strikes me as more asthetic than having a sys.swap_module_contents function. I implemented a prototype of this functionality here: https://github.com/njsmith/metamodule The implementation is here: https://github.com/njsmith/metamodule/blob/master/metamodule.py That file has 3 parts: - A fancy metamodule class that handles implicit-imports and warn-on-attribute-access. - A utility function for setting up a metamodule; it tries __class__ assignment, and if that doesn't work falls back on ctypes hackery. - The aforementioned ctypes hackery. It's pretty ugly, but if we add __class__ assignment then it will become unnecessary on future Python versions, woohoo! -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

On Thu, Oct 9, 2014 at 5:24 AM, Nathaniel Smith <njs@pobox.com> wrote: [...]
As in, a non-obvious way to do a non-obvious thing, from the user's point of view? On the implementation side it doesn't strike me as particularly complicated, am I wrong?
- The restriction to packages is unfortunate.
True, but it seems to me that you'd usually want it for the __init__.py – after all if you need to do `from mypackage import submodule` anyway, and submodule isn't a package itself, you can usually can just make `submodule` a class directly. Or have the top-level package define an import hook. Seems to me that this would really only hurt single-file top-level modules, but those are easily converted to a package.
That seems like a terrible reason to me – if it should work nicely on older Pythons, it means a 3rd party module would be enough. Why bother adding it to the language? Valid reasons would be to make it easier for alternative interpreters, or to catch edge cases (e.g. make it work nicely with custom importers, zip files, etc). But at that point, we can just do it the "right way"*, have the backcompat story to a third-party shim. * i.e. "not paying attention to older Pythons" – I'm not saying __new__.py is necessarily the right way, I'm criticizing this reason against it

On 9 Oct 2014 13:14, "Petr Viktorin" <encukou@gmail.com> wrote:
that gets executed first. If it "returns" (not sure how that was defined) an instance of a subclass of ModuleType, that instance is used to run __init__.py instead of a normal module instance.
Mostly I mean that it requires a much heavier-weight design discussion. It would add qualitatively new concepts to the language, require its own PEP, we'd have to have the debate about whether the cost of extra stat calls on every import was worth it, we'd have to figure out all the fiddly details about how exactly it should work (are variables defined in __new__.py visible in __init__.py?), etc. Usually one requirement in such debates is to demonstrate that there is no acceptable lighter-weight alternative. I think that's a hard argument to make here. The __class__ assignment approach requires minimal changes to python itself, and works pretty much just as well as the __new__.py approach; maybe better in some ways (as per below). So it's hard to justify the effort it would require to get consensus.
Yeah, this is not a showstopper, but given that we have an alternative that *doesn't* have such fiddly restrictions, it's worth mentioning.
I posted an implementation that works fine on older pythons. If you look at the code though then I think you will agree that the way it works is OK for a backcompat shim that only has to target a specific range of python versions, but... it is also something that we should feel embarrassed to recommend as a long-term solution.
My point is just that it matters whether a backcompat shim is doable. Probably there is some way to implement a backcompat shim for __new__.py, but I don't immediately know how to do it. (E.g., how do you read a file that's a sibling of the current source file in a compatible way across python versions, taking into account zipimport etc.? Is it possible to get the right namespace semantics when executing __new__.py *after* the package module object has been created? Of course it depends on what those semantics are, which is currently underspecified....) And again, this is to be compared to the __class__ assignment approach, where we know very well that a backcompat shim is doable because it is done :-). -n

On 10 Oct 2014 00:05, "Nathaniel Smith" <njs@pobox.com> wrote:
Probably there is some way to implement a backcompat shim for __new__.py, but I don't immediately know how to do it. Eric Snow already backported the Python 3.4 importlib to Python 2.7 as importlib2, and I know of at least one large company that is planning to deploy that in order to benefit from the directory caching feature. It's a reasonably safe assumption that any future Python 3 import system changes will be made available for Python 2.7 the same way. That said...
I actually agree making modules a proxy type that allows setting __class__ on instances is likely the simpler solution here. You won't get full magic method support, but it *will* enable use of the descriptor protocol for module level attributes. Cheers, Nick.

On Oct 9, 2014, at 15:21, Nick Coghlan <ncoghlan@gmail.com> wrote:
By "proxy type", you mean that instead of turning off the non-heap flag for ModuleType, we'd just add a (pure Python, or C but not non-heap) ModuleType that delegates to the real one? If so, that certainly sounds just as easy to backport to 2.7 with importlib2 as any of the other solutions; no need for ctypes hackery.

On Thu, Oct 9, 2014 at 11:21 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I don't really see how importlib2 can help here. I mean, we can't reasonably tell people "okay, this is very important -- before you run 'import numpy' (or whatever) you first have to run 'import importlib2; importlib2.hook.install()' or else it won't work". The bootstrapping problem here is nasty, because the whole idea is that __new__.py would be executed before *any* other code distributed with the package.
Like Andrew, I'm not sure what you mean here. Note that there's various code (e.g. reload()) which insists that module objects must be instances of the built-in ModuleType class. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

On 10 Oct 2014 10:16, "Nathaniel Smith" <njs@pobox.com> wrote:
interpreters,
Yes, that's exactly what the Python 2 usage model would be if relying on an import system feature.
If folks want Python 3 import system features in Python 2.7, importlib2 is how to get them. Most single source packages won't want to do that because of the bootstrapping challenge, but that's not the same thing as being entirely incompatible with Python 2.
I'm agreeing with you. One of the main characteristics of proxy types is that type(x) and x.__class__ give different answers, which is how I interpreted your metamodule proposal. Cheers, Nick.

On Thu, Oct 9, 2014 at 5:24 AM, Nathaniel Smith <njs@pobox.com> wrote: [...]
As in, a non-obvious way to do a non-obvious thing, from the user's point of view? On the implementation side it doesn't strike me as particularly complicated, am I wrong?
- The restriction to packages is unfortunate.
True, but it seems to me that you'd usually want it for the __init__.py – after all if you need to do `from mypackage import submodule` anyway, and submodule isn't a package itself, you can usually can just make `submodule` a class directly. Or have the top-level package define an import hook. Seems to me that this would really only hurt single-file top-level modules, but those are easily converted to a package.
That seems like a terrible reason to me – if it should work nicely on older Pythons, it means a 3rd party module would be enough. Why bother adding it to the language? Valid reasons would be to make it easier for alternative interpreters, or to catch edge cases (e.g. make it work nicely with custom importers, zip files, etc). But at that point, we can just do it the "right way"*, have the backcompat story to a third-party shim. * i.e. "not paying attention to older Pythons" – I'm not saying __new__.py is necessarily the right way, I'm criticizing this reason against it

On 9 Oct 2014 13:14, "Petr Viktorin" <encukou@gmail.com> wrote:
that gets executed first. If it "returns" (not sure how that was defined) an instance of a subclass of ModuleType, that instance is used to run __init__.py instead of a normal module instance.
Mostly I mean that it requires a much heavier-weight design discussion. It would add qualitatively new concepts to the language, require its own PEP, we'd have to have the debate about whether the cost of extra stat calls on every import was worth it, we'd have to figure out all the fiddly details about how exactly it should work (are variables defined in __new__.py visible in __init__.py?), etc. Usually one requirement in such debates is to demonstrate that there is no acceptable lighter-weight alternative. I think that's a hard argument to make here. The __class__ assignment approach requires minimal changes to python itself, and works pretty much just as well as the __new__.py approach; maybe better in some ways (as per below). So it's hard to justify the effort it would require to get consensus.
Yeah, this is not a showstopper, but given that we have an alternative that *doesn't* have such fiddly restrictions, it's worth mentioning.
I posted an implementation that works fine on older pythons. If you look at the code though then I think you will agree that the way it works is OK for a backcompat shim that only has to target a specific range of python versions, but... it is also something that we should feel embarrassed to recommend as a long-term solution.
My point is just that it matters whether a backcompat shim is doable. Probably there is some way to implement a backcompat shim for __new__.py, but I don't immediately know how to do it. (E.g., how do you read a file that's a sibling of the current source file in a compatible way across python versions, taking into account zipimport etc.? Is it possible to get the right namespace semantics when executing __new__.py *after* the package module object has been created? Of course it depends on what those semantics are, which is currently underspecified....) And again, this is to be compared to the __class__ assignment approach, where we know very well that a backcompat shim is doable because it is done :-). -n

On 10 Oct 2014 00:05, "Nathaniel Smith" <njs@pobox.com> wrote:
Probably there is some way to implement a backcompat shim for __new__.py, but I don't immediately know how to do it. Eric Snow already backported the Python 3.4 importlib to Python 2.7 as importlib2, and I know of at least one large company that is planning to deploy that in order to benefit from the directory caching feature. It's a reasonably safe assumption that any future Python 3 import system changes will be made available for Python 2.7 the same way. That said...
I actually agree making modules a proxy type that allows setting __class__ on instances is likely the simpler solution here. You won't get full magic method support, but it *will* enable use of the descriptor protocol for module level attributes. Cheers, Nick.

On Oct 9, 2014, at 15:21, Nick Coghlan <ncoghlan@gmail.com> wrote:
By "proxy type", you mean that instead of turning off the non-heap flag for ModuleType, we'd just add a (pure Python, or C but not non-heap) ModuleType that delegates to the real one? If so, that certainly sounds just as easy to backport to 2.7 with importlib2 as any of the other solutions; no need for ctypes hackery.

On Thu, Oct 9, 2014 at 11:21 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I don't really see how importlib2 can help here. I mean, we can't reasonably tell people "okay, this is very important -- before you run 'import numpy' (or whatever) you first have to run 'import importlib2; importlib2.hook.install()' or else it won't work". The bootstrapping problem here is nasty, because the whole idea is that __new__.py would be executed before *any* other code distributed with the package.
Like Andrew, I'm not sure what you mean here. Note that there's various code (e.g. reload()) which insists that module objects must be instances of the built-in ModuleType class. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

On 10 Oct 2014 10:16, "Nathaniel Smith" <njs@pobox.com> wrote:
interpreters,
Yes, that's exactly what the Python 2 usage model would be if relying on an import system feature.
If folks want Python 3 import system features in Python 2.7, importlib2 is how to get them. Most single source packages won't want to do that because of the bootstrapping challenge, but that's not the same thing as being entirely incompatible with Python 2.
I'm agreeing with you. One of the main characteristics of proxy types is that type(x) and x.__class__ give different answers, which is how I interpreted your metamodule proposal. Cheers, Nick.
participants (4)
-
Andrew Barnert
-
Nathaniel Smith
-
Nick Coghlan
-
Petr Viktorin