On Wed, Jan 5, 2011 at 2:47 PM, Guido van Rossum guido@python.org wrote:
On Tue, Jan 4, 2011 at 5:55 PM, Nick Coghlan ncoghlan@gmail.com wrote:
I can't take credit for that particular observation - I've certainly heard others complain about that in the context of pickling objects over the years. It is one of the main things that got me thinking along these lines in the first place.
Why didn't you say so in the first place? :-)
Well, I did put that "half-baked" disclaimer in for a reason... I'm still trying to figure out exactly what I think the real problem here is, so my expression of it is probably as clear as mud :)
I think it's easier to come up with a solution for just this case; the issue with e.g. unittest doesn't seem quite as hard (after all, "unittest.case" will always exist).
Perhaps it would focus the discussion if we picked one or two modules (in addition to __main__) as example cases.
functools comes in two pieces - partial and reduce are implemented in C in the _functools module, everything else is implemented in Python in functools itself. datetime, on the other hand, is a case of a pure acceleration module - if _datetime is available, it is expected to completely implement the datetime API.
_functools.partial and the classes in datetime all adopt the strategy of lying about their original location in __module__. This is probably the best available choice, as it makes pickling do the right thing.
The main downside with this approach is the way it confuses things like inspect.getsource (for datetime, it reports the pure Python versions as the source code for the C accelerated versions, for functools.partial it gives a technically accurate, but potentially misleading error message. If inspect could easily tell that the accelerated versions were in use, then it could handle the situation a bit more gracefully).
To eliminate that issue, what if, whenever we're setting a __module__ attribute (e.g. during class creation), we also set a "__real_module__" attribute? Then code could happily adjust __module__ to point to the official location (as it already does), but tools like inspect wouldn't be fooled regarding the state of the current interpreter. Most of the time, __module__ and __real_module__ will point to the same place, but the cases where they're different will be handled far more gracefully.
(I suspect that is significantly easier said than done though - I expect it would be a very manual process getting an extension module to do this correctly)
We could just call it __real_name__ and use that in preference over __name__ for all __module__ attributes whenever it's set. (Or we could always set both...)
The stuff I wrote above applies to pretty much everything except the __main__ module. For the __main__ module, I'm inclined to revisit Brett's idea from PEP 3122: put the real name of the __main__ module in a sys.main attribute. However, unlike that PEP, we would continue to set __name__ to "__main__" in the main module. The new attribute would be a transition step allowing manual reversal of the name mangling:
# Near top of module
if __name__ = "__main__": running_as_main = True import sys __name__ = sys.main
# Rest of module
# Near end of module
if running_as_main:
# Actually do "main" type stuff.
Alternatively, we could just do nothing about the problem with __main__ and continue to encourage people to separate their "main" modules from the modules that define classes.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia