On Tue, Jan 4, 2011 at 5:55 PM, Nick Coghlan
On Wed, Jan 5, 2011 at 8:52 AM, Guido van Rossum
wrote: Hmm... I starred this and am finally dug out enough to comment.
Would it be sufficient if the __module__ attribute of classes and functions got set to the "canonical" name rather than the "physical" name?
You can currently get a crude version of this by simply assigning to __name__ at the top of the module.
That sounds like it would be too confusing, however, so perhaps we could make it so that, when the __module__ attribute is initialized, it first looks for __canonical__ and then for __name__?
This may still be too crude though -- I looked at the one example I could think of where this might be useful, the unittest package, and realized that it would set __module__ to 'unittest' even for classes that are not actually re-exported via the unittest namespace.
So maybe it would be better in that case to just patch the __module__ attribute of all the public classes in unittest/__import__.py?
I did think about that - for classes, it would probably be sufficient, but for functions the fact that we'd be breaking the identity that "f.__globals__ is sys.modules[f.__module__]" scares me.
Really? Why? Who would ever depend on that? (You also probably meant sys.modules[...].__dict__ -- f.__globals__ is a dict, not a module object.) Note that for classes you'd have the same issue, since each method references the module globals in its f.__globals__.
Then again, the fact that "f.__module__ != f.__globals__['__name__']" would provide exactly the indicator of "two names" that I am talking about (at least where functions are concerned) - things like pydoc and the inspect module could definitely be updated to check both module names.
I think the more important question to answer first would be what you'd want pydoc and inspect to do.
On the gripping hand, there would still be problems with things like methods and nested classes and functions (unless tools were provided to recurse down through a class to update the subcomponents as well as the class itself).
Well, method references (even unbound) are not picklable anyway.
So perhaps the granularity on my initial suggestion wasn't fine enough - if the "__canonical__" idea was extended to all objects with a __module__ attribute, then objects could either be relocated at creation time (by setting __canonical__ in the module globals), or after the fact by assigning to the __canonical__ attribute on the object.
BTW, I think we need to come up with a better word than __canonical__. In general I don't like using adjectives as attribute names.
OTOH for things named __main__, setting __canonical__ (automatically, by -m or whatever other mechanism starts execution, like "python <filename>" might actually work.
Yes, although a related modification is needed in those cases (to actual insert the module being executed into sys.modules under its module name as well as under __main__).
That's the easy part. The hard part is to make the "real name" (i.e. not __main__) the name used by the classes and functions it defines, without breaking the "if __name__ == '__main__': main()" idiom...
On the third hand, maybe you've finally hit upon a reason why the "if __name__ == '__main__': main()" idiom is bad...
I can't take credit for that particular observation - I've certainly heard others complain about that in the context of pickling objects over the years. It is one of the main things that got me thinking along these lines in the first place.
Why didn't you say so in the first place? :-) I think it's easier to come up with a solution for just this case; the issue with e.g. unittest doesn't seem quite as hard (after all, "unittest.case" will always exist). We could just call it __real_name__ and use that in preference over __name__ for all __module__ attributes whenever it's set. (Or we could always set both...) -- --Guido van Rossum (python.org/~guido)