[Python-ideas] Module aliases and/or "real names"

Wed Jan 5 13:15:28 CET 2011

On Wed, Jan 5, 2011 at 2:47 PM, Guido van Rossum <guido at python.org> wrote:
> On Tue, Jan 4, 2011 at 5:55 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> I can't take credit for that particular observation - I've certainly
>> heard others complain about that in the context of pickling objects
>> over the years. It is one of the main things that got me thinking
>> along these lines in the first place.
>
> Why didn't you say so in the first place? :-)

Well, I did put that "half-baked" disclaimer in for a reason... I'm
still trying to figure out exactly what I think the real problem here
is, so my expression of it is probably as clear as mud :)

> I think it's easier to come up with a solution for just this case; the
> issue with e.g. unittest doesn't seem quite as hard (after all,
> "unittest.case" will always exist).

Perhaps it would focus the discussion if we picked one or two modules
(in addition to __main__) as example cases.

functools comes in two pieces - partial and reduce are implemented in
C in the _functools module, everything else is implemented in Python
in functools itself.
datetime, on the other hand, is a case of a pure acceleration module -
if _datetime is available, it is expected to completely implement the
datetime API.

_functools.partial and the classes in datetime all adopt the strategy
of lying about their original location in __module__. This is probably
the best available choice, as it makes pickling do the right thing.

The main downside with this approach is the way it confuses things
like inspect.getsource (for datetime, it reports the pure Python
versions as the source code for the C accelerated versions, for
functools.partial it gives a technically accurate, but potentially
misleading error message. If inspect could easily *tell* that the
accelerated versions were in use, then it could handle the situation a
bit more gracefully).

To eliminate that issue, what if, whenever we're setting a __module__
attribute (e.g. during class creation), we also set a
"__real_module__" attribute? Then code could happily adjust __module__
to point to the official location (as it already does), but tools like
inspect wouldn't be fooled regarding the state of the *current*
interpreter. Most of the time, __module__ and __real_module__ will
point to the same place, but the cases where they're different will be
handled far more gracefully.

(I suspect that is significantly easier said than done though - I
expect it would be a very manual process getting an extension module
to do this correctly)

> We could just call it __real_name__ and use that in preference over
> __name__ for all __module__ attributes whenever it's set. (Or we could
> always set both...)

The stuff I wrote above applies to pretty much everything *except* the
__main__ module. For the __main__ module, I'm inclined to revisit
Brett's idea from PEP 3122: put the real name of the __main__ module
in a sys.main attribute. However, unlike that PEP, we would continue
to set __name__ to "__main__" in the main module. The new attribute
would be a transition step allowing manual reversal of the name
mangling:

  # Near top of module
  if __name__ = "__main__":
    running_as_main = True
    import sys
    __name__ = sys.main

  # Rest of module

  # Near end of module
  if running_as_main:
    # Actually do "main" type stuff.

Alternatively, we could just do nothing about the problem with
__main__ and continue to encourage people to separate their "main"
modules from the modules that define classes.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia