[Python-Dev] advice needed: best approach to enabling "metamodules"?

Sun Nov 30 23:14:48 CET 2014

Hi,

This discussion has been going on for a while, but no one has questioned 
the basic premise. Does this needs any change to the language or 
interpreter?

I believe it does not. I'm modified your original metamodule.py to not 
use ctypes and support reloading:
https://gist.github.com/markshannon/1868e7e6115d70ce6e76

Cheers,
Mark.

On 29/11/14 01:59, Nathaniel Smith wrote:
> Hi all,
>
> There was some discussion on python-ideas last month about how to make
> it easier/more reliable for a module to override attribute access.
> This is useful for things like autoloading submodules (accessing
> 'foo.bar' triggers the import of 'bar'), or for deprecating module
> attributes that aren't functions. (Accessing 'foo.bar' emits a
> DeprecationWarning, "the bar attribute will be removed soon".) Python
> has had some basic support for this for a long time -- if a module
> overwrites its entry in sys.modules[__name__], then the object that's
> placed there will be returned by 'import'. This allows one to define
> custom subclasses of module and use them instead of the default,
> similar to how metaclasses allow one to use custom subclasses of
> 'type'.
>
> In practice though it's very difficult to make this work safely and
> correctly for a top-level package. The main problem is that when you
> create a new object to stick into sys.modules, this necessarily means
> creating a new namespace dict. And now you have a mess, because now
> you have two dicts: new_module.__dict__ which is the namespace you
> export, and old_module.__dict__, which is the globals() for the code
> that's trying to define the module namespace. Keeping these in sync is
> extremely error-prone -- consider what happens, e.g., when your
> package __init__.py wants to import submodules which then recursively
> import the top-level package -- so it's difficult to justify for the
> kind of large packages that might be worried about deprecating entries
> in their top-level namespace. So what we'd really like is a way to
> somehow end up with an object that (a) has the same __dict__ as the
> original module, but (b) is of our own custom module subclass. If we
> can do this then metamodules will become safe and easy to write
> correctly.
>
> (There's a little demo of working metamodules here:
>     https://github.com/njsmith/metamodule/
> but it uses ctypes hacks that depend on non-stable parts of the
> CPython ABI, so it's not a long-term solution.)
>
> I've now spent some time trying to hack this capability into CPython
> and I've made a list of the possible options I can think of to fix
> this. I'm writing to python-dev because none of them are obviously The
> Right Way so I'd like to get some opinions/ruling/whatever on which
> approach to follow up on.
>
> Option 1: Make it possible to change the type of a module object
> in-place, so that we can write something like
>
>     sys.modules[__name__].__class__ = MyModuleSubclass
>
> Option 1 downside: The invariants required to make __class__
> assignment safe are complicated, and only implemented for
> heap-allocated type objects. PyModule_Type is not heap-allocated, so
> making this work would require lots of delicate surgery to
> typeobject.c. I'd rather not go down that rabbit-hole.
>
> ----
>
> Option 2: Make PyModule_Type into a heap type allocated at interpreter
> startup, so that the above just works.
>
> Option 2 downside: PyModule_Type is exposed as a statically-allocated
> global symbol, so doing this would involve breaking the stable ABI.
>
> ----
>
> Option 3: Make it legal to assign to the __dict__ attribute of a
> module object, so that we can write something like
>
>     new_module = MyModuleSubclass(...)
>     new_module.__dict__ = sys.modules[__name__].__dict__
>     sys.modules[__name__].__dict__ = {}     # ***
>     sys.modules[__name__] = new_module
>
> The line marked *** is necessary because the way modules are designed,
> they expect to control the lifecycle of their __dict__. When the
> module object is initialized, it fills in a bunch of stuff in the
> __dict__. When the module object (not the dict object!) is
> deallocated, it deletes everything from the __dict__. This latter
> feature in particular means that having two module objects sharing the
> same __dict__ is bad news.
>
> Option 3 downside: The paragraph above. Also, there's stuff inside the
> module struct besides just the __dict__, and more stuff has appeared
> there over time.
>
> ----
>
> Option 4: Add a new function sys.swap_module_internals, which takes
> two module objects and swaps their __dict__ and other attributes. By
> making the operation a swap instead of an assignment, we avoid the
> lifecycle pitfalls from Option 3. By making it a builtin, we can make
> sure it always handles all the module fields that matter, not just
> __dict__. Usage:
>
>     new_module = MyModuleSubclass(...)
>     sys.swap_module_internals(new_module, sys.modules[__name__])
>     sys.modules[__name__] = new_module
>
> Option 4 downside: Obviously a hack.
>
> ----
>
> Option 3 or 4 both seem workable, it just depends on which way we
> prefer to hold our nose. Option 4 is slightly more correct in that it
> works for *all* modules, but OTOH at the moment the only time Option 3
> *really* fails is for compiled modules with PEP 3121 metadata, and
> compiled modules can already use a module subclass via other means
> (since they instantiate their own module objects).
>
> Thoughts? Suggestions on other options I've missed? Should I go ahead
> and write a patch for one of these?
>
> -n
>