On 12/29/2010 03:52 PM, Nick Coghlan wrote:
Disclaimer: this is a currently half-baked idea that needs some discussion here if it is going to turn into something a bit more coherent :)
On and off, I've been pondering the problem of the way implementation details (like the real file structures of the multiprocessing and unittest packages, or whether or not an interpreter use the pure Python or the C accelerated version of various modules) leak out into the world via the __module__ attribute on various components. This mostly comes up when discussing pickle compatibility between 2.x and 3.x, but in can show up in various guises whenever you start relying on dynamic introspection.
As, I see it, there are 3 basic ways of dealing with the problem:
Allow objects to lie about their source module This is likely a terrible idea, since a function's global namespace reference would disagree with its module reference. I suspect much weirdness would result.
A pickle-specific module alias registry, since that is where the problem comes up most often A possible approach, but not necessarily a good one (since it isn't really a pickle-specific problem).
An inspect-based module alias registry That is, an additional query API (get_canonical_module_name?) in the inspect module that translates from the implementation detail module name to the "preferred" module name. The implementation could be as simple as a "__canonical__" attribute in the module namespace.
I actually quite like option 3, with various things (such as pydoc) updated to show both names when they're different. That way people will know where to find official documentation for objects from pseudo-packages and acceleration modules (i.e. under the canonical name), without hiding where the actual implementation came from.
Pickle generation could then be updated to only send canonical module names during normal operation, reducing the exposure of implementation details like pseudo-packages and acceleration modules.
Whether or not runpy should set __canonical__ on the main module would be an open question (probably not, unless runpy was also updated to add the main module to sys.modules under its real name as well __main__).
This makes more sense now that we've discussed it a bit.
Here's a rough sketch of a context manager that temporarily overrides the __module__ attribute.
This works well for simple introspection. For example, you can use it to call inspect functions without changing them.
But pickling is recursive, so this probably wouldn't work very well for that.
from contextlib import contextmanager
class cls: def method(self): pass c = cls() InstanceMethod = type(c.method)
def _getter(self, value): if value == "__module__" and hasattr(self, "__alt_module__"): return object.__getattribute__(self, "__alt_module__") return object.__getattribute__(self, value)
@contextmanager def alt_module_getter(obj): obj.__class__.__getattribute__ = InstanceMethod(_getter, obj) try: yield obj finally: del obj.__class__.__getattribute__
def get_module_name(obj): return obj.__module__
# gets __alt__module__ if it exists, else gets __module__
with alt_module_getter(obj) as obj: module_name = get_module_name(obj)