On 01/09/2011 12:39 AM, Nick Coghlan wrote:
Also consider having virtual modules, where objects in it may have come from different *other* locations. A virtual module would need a way to keep track of that. (I'm not sure this is a good idea.)
It's too late, code already does that. This is precisely the use case I am trying to fix (objects like functools.partial that deliberately lie in their __module__ attribute), so that this can be done *right* (i.e. without having to choose which use cases to support and which ones to break).
Yes, __builtins__ is a virtual module.
Creating a module in memory...
import imp new = imp.new_module("new") new
<module 'new' (built-in)>
The term "(built-in)" doesn't quite fit in this case. But I can get used to it.
Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'new'
And it's not in sys.modules yet. That's ok, other things can be loaded into it before it's added it to sys.modules.
It's this loading part that can be improved.
That basic problem is that __module__ currently tries to serve two masters:
- use cases like inspect.getsource, where we want to know where the
object came from in the current interpreter 2. use cases like pickle, where we want the "official" portable location, with any implementation details (like the _functools module) hidden.
Most C extensions are written as modules, to be imported and imported from. A tool to load objects rather than import them, may be better in these situations.
partial = imp.load_extern_object("_functools.partial")
A loaded object would have it's __module__ attribute set to the module it's loaded into instead of where it came from.
By doing it this way, it doesn't complicate the import semantics.
It may also be useful to make it a special type, so that other tools can decide how to handle them.
Currently, the default behaviour of the interpreter is to support use case 1 and break use case 2 if any objects are defined in a different module from where they claim to live (e.g. see the pickle compatibility breakage with the 3.2 unittest implementation layout changes). The only tool currently available to module authors is to override __module__ (as functools.partial and the datetime acceleration module do), which is correct for use case 2, but breaks use case 1 (leading to misleading error messages in the C acceleration module case, and breaking otherwise valid introspection in the unittest case).
My proposed changes will: a) make overriding __module__ significantly easier to do b) allow the introspection use cases access to the information they need so they can do the right thing when confronted with an overridden __module__ attribute
It would be better to find solutions that don't override __module__ after it has been imported or loaded.
Does this fit some of problems you are thinking of where the granularity may matter?
It would take two functions to do this. One to create the virtual module, and another to pre-load it's initial objects. For those objects, the loader would set obj.__module__ to the virtual module name, and also set obj.__original_module__ to the original module name. These would only be seen on objects in virtual modules. A lookup on obj.__module__ will tell you it's in a virtual module. Then a lookup with obj.__original_module__ would give you the actual location info it came from.
That adds a lot of complexity though - far simpler to define a new __impl_module__ attribute on every object, retroactively fixing introspection of existing code that adjusts __module__ to make pickling work properly across different versions and implementations.
By doing it that way, most people will never need to know how these things work or even see them. ie... It's advance/expert Python foo. ;-)
Most people will never need to care or worry about the difference between __module__ and __impl_module__ either - it will be hidden inside libraries like inspect, pydoc and pickle.
I think __impl_module__ should only be on objects where it would be different than __module__.
Any way, I hope this gives you some ideas, I know you can figure out the details much better than I can.
Yeah, the idea of reusing the __module__ attribute name at the top level is an excellent one.
The hard part of all of this, is separating out the the good doable ideas from the good, but unfortunately can't do ideas because it will break something ideas.