Module aliases and/or "real names"

Disclaimer: this is a currently half-baked idea that needs some discussion here if it is going to turn into something a bit more coherent :)
On and off, I've been pondering the problem of the way implementation details (like the real file structures of the multiprocessing and unittest packages, or whether or not an interpreter use the pure Python or the C accelerated version of various modules) leak out into the world via the __module__ attribute on various components. This mostly comes up when discussing pickle compatibility between 2.x and 3.x, but in can show up in various guises whenever you start relying on dynamic introspection.
As, I see it, there are 3 basic ways of dealing with the problem:
1. Allow objects to lie about their source module This is likely a terrible idea, since a function's global namespace reference would disagree with its module reference. I suspect much weirdness would result.
2. A pickle-specific module alias registry, since that is where the problem comes up most often A possible approach, but not necessarily a good one (since it isn't really a pickle-specific problem).
3. An inspect-based module alias registry That is, an additional query API (get_canonical_module_name?) in the inspect module that translates from the implementation detail module name to the "preferred" module name. The implementation could be as simple as a "__canonical__" attribute in the module namespace.
I actually quite like option 3, with various things (such as pydoc) updated to show *both* names when they're different. That way people will know where to find official documentation for objects from pseudo-packages and acceleration modules (i.e. under the canonical name), without hiding where the actual implementation came from.
Pickle *generation* could then be updated to only send canonical module names during normal operation, reducing the exposure of implementation details like pseudo-packages and acceleration modules.
Whether or not runpy should set __canonical__ on the main module would be an open question (probably not, *unless* runpy was also updated to add the main module to sys.modules under its real name as well __main__).
Cheers, Nick.

On 12/29/2010 03:52 PM, Nick Coghlan wrote:
Disclaimer: this is a currently half-baked idea that needs some discussion here if it is going to turn into something a bit more coherent :)
Sometimes half baked is good as it gets at the concept, rather than being bogged down with the details. ;-)
On and off, I've been pondering the problem of the way implementation details (like the real file structures of the multiprocessing and unittest packages, or whether or not an interpreter use the pure Python or the C accelerated version of various modules) leak out into the world via the __module__ attribute on various components. This mostly comes up when discussing pickle compatibility between 2.x and 3.x, but in can show up in various guises whenever you start relying on dynamic introspection.
This sounds like two different separate issues to me.
One is the leaking-out of lower level details.
The other is abstracting a framework with the minimal amount of details needed.
Ron

On Thu, Dec 30, 2010 at 11:48 AM, Ron Adam rrr@ronadam.com wrote:
This sounds like two different separate issues to me.
One is the leaking-out of lower level details.
The other is abstracting a framework with the minimal amount of details needed.
Yeah, sort of. Really, the core issue is that some objects live in two places: - where they came from right now, in the current interpreter - where they should be retrieved from "officially" (e.g. since another interpreter may not provide an accelerated version, or because the appropriate submodule may be selected at runtime based on the current platform)
There's currently no systematic way of flagging objects or modules where the latter location differs from the former, so the components that leak the low level details (such as pickling and pydoc) have no way to avoid it. Once a system is in place to identify such objects (or perhaps just the affected modules), then the places that leak that information can be updated to deal with the situation appropriately (e.g. pickling would likely just use the official names, while pydoc would display both, indicating which one was the 'official' location, and which one reflected the current interpreter behaviour).
So it's really one core problem (non-portable module details), which then leads to an assortment of smaller problems when other parts of the standard library are forced to rely on those non-portable details because that's the only information available.
Cheers, Nick.

Hmm... I starred this and am finally dug out enough to comment.
Would it be sufficient if the __module__ attribute of classes and functions got set to the "canonical" name rather than the "physical" name?
You can currently get a crude version of this by simply assigning to __name__ at the top of the module.
That sounds like it would be too confusing, however, so perhaps we could make it so that, when the __module__ attribute is initialized, it first looks for __canonical__ and then for __name__?
This may still be too crude though -- I looked at the one example I could think of where this might be useful, the unittest package, and realized that it would set __module__ to 'unittest' even for classes that are not actually re-exported via the unittest namespace.
So maybe it would be better in that case to just patch the __module__ attribute of all the public classes in unittest/__import__.py?
OTOH for things named __main__, setting __canonical__ (automatically, by -m or whatever other mechanism starts execution, like "python <filename>" might actually work.
On the third hand, maybe you've finally hit upon a reason why the "if __name__ == '__main__': main()" idiom is bad...
--Guido
On Thu, Dec 30, 2010 at 6:52 PM, Nick Coghlan ncoghlan@gmail.com wrote:
On Thu, Dec 30, 2010 at 11:48 AM, Ron Adam rrr@ronadam.com wrote:
This sounds like two different separate issues to me.
One is the leaking-out of lower level details.
The other is abstracting a framework with the minimal amount of details needed.
Yeah, sort of. Really, the core issue is that some objects live in two places:
- where they came from right now, in the current interpreter
- where they should be retrieved from "officially" (e.g. since another
interpreter may not provide an accelerated version, or because the appropriate submodule may be selected at runtime based on the current platform)
There's currently no systematic way of flagging objects or modules where the latter location differs from the former, so the components that leak the low level details (such as pickling and pydoc) have no way to avoid it. Once a system is in place to identify such objects (or perhaps just the affected modules), then the places that leak that information can be updated to deal with the situation appropriately (e.g. pickling would likely just use the official names, while pydoc would display both, indicating which one was the 'official' location, and which one reflected the current interpreter behaviour).
So it's really one core problem (non-portable module details), which then leads to an assortment of smaller problems when other parts of the standard library are forced to rely on those non-portable details because that's the only information available.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

On 01/04/2011 04:52 PM, Guido van Rossum wrote:
Hmm... I starred this and am finally dug out enough to comment.
Would it be sufficient if the __module__ attribute of classes and functions got set to the "canonical" name rather than the "physical" name?
You can currently get a crude version of this by simply assigning to __name__ at the top of the module.
That sounds like it would be too confusing, however, so perhaps we could make it so that, when the __module__ attribute is initialized, it first looks for __canonical__ and then for __name__?
This may still be too crude though -- I looked at the one example I could think of where this might be useful, the unittest package, and realized that it would set __module__ to 'unittest' even for classes that are not actually re-exported via the unittest namespace.
So maybe it would be better in that case to just patch the __module__ attribute of all the public classes in unittest/__import__.py?
OTOH for things named __main__, setting __canonical__ (automatically, by -m or whatever other mechanism starts execution, like "python <filename>" might actually work.
On the third hand, maybe you've finally hit upon a reason why the "if __name__ == '__main__': main()" idiom is bad...
(This is probably something that was suggested more than a few times before.)
Would it help if global name space acquired a __main__ name? Then the standard if line becomes only a slightly different "if __name__ == __main__: main()". I think that would make more sense to beginners also and it is a bit less magical.
For now, both ways could work, __main__ would be "__main__" or None, but down the road, (long enough to be sure everyone knows to drop the quotes), both __main__ and __name__ could be switched to the actual module name so that __name__ and __module__ attributes would always be correct.
Cheers, Ron

On Wed, Jan 5, 2011 at 11:48 AM, Ron Adam rrr@ronadam.com wrote:
(This is probably something that was suggested more than a few times before.)
Would it help if global name space acquired a __main__ name? Then the standard if line becomes only a slightly different "if __name__ == __main__: main()". I think that would make more sense to beginners also and it is a bit less magical.
For now, both ways could work, __main__ would be "__main__" or None, but down the road, (long enough to be sure everyone knows to drop the quotes), both __main__ and __name__ could be switched to the actual module name so that __name__ and __module__ attributes would always be correct.
If we decided to actually change the way the main module was executed, the most likely result would be to resurrect PEP 299. Changing that particular idiom is probably a Py4k scale of change though :P
Cheers, Nick.

On 01/04/2011 08:00 PM, Nick Coghlan wrote:
On Wed, Jan 5, 2011 at 11:48 AM, Ron Adamrrr@ronadam.com wrote:
(This is probably something that was suggested more than a few times before.)
Would it help if global name space acquired a __main__ name? Then the standard if line becomes only a slightly different "if __name__ == __main__: main()". I think that would make more sense to beginners also and it is a bit less magical.
For now, both ways could work, __main__ would be "__main__" or None, but down the road, (long enough to be sure everyone knows to drop the quotes), both __main__ and __name__ could be switched to the actual module name so that __name__ and __module__ attributes would always be correct.
If we decided to actually change the way the main module was executed, the most likely result would be to resurrect PEP 299. Changing that particular idiom is probably a Py4k scale of change though :P
Well, changing it in the way PEP 299 suggests is probably even a Py5k change. Which is why I didn't suggest that. ;-)
Also PEP 299 main motivation is different than what is being discussed here.
Cheers, Ron

On Wed, Jan 5, 2011 at 8:52 AM, Guido van Rossum guido@python.org wrote:
Hmm... I starred this and am finally dug out enough to comment.
Would it be sufficient if the __module__ attribute of classes and functions got set to the "canonical" name rather than the "physical" name?
You can currently get a crude version of this by simply assigning to __name__ at the top of the module.
That sounds like it would be too confusing, however, so perhaps we could make it so that, when the __module__ attribute is initialized, it first looks for __canonical__ and then for __name__?
This may still be too crude though -- I looked at the one example I could think of where this might be useful, the unittest package, and realized that it would set __module__ to 'unittest' even for classes that are not actually re-exported via the unittest namespace.
So maybe it would be better in that case to just patch the __module__ attribute of all the public classes in unittest/__import__.py?
I did think about that - for classes, it would probably be sufficient, but for functions the fact that we'd be breaking the identity that "f.__globals__ is sys.modules[f.__module__]" scares me. Then again, the fact that "f.__module__ != f.__globals__['__name__']" would provide exactly the indicator of "two names" that I am talking about (at least where functions are concerned) - things like pydoc and the inspect module could definitely be updated to check both module names. On the gripping hand, there would still be problems with things like methods and nested classes and functions (unless tools were provided to recurse down through a class to update the subcomponents as well as the class itself).
So perhaps the granularity on my initial suggestion wasn't fine enough - if the "__canonical__" idea was extended to all objects with a __module__ attribute, then objects could either be relocated at creation time (by setting __canonical__ in the module globals), or after the fact by assigning to the __canonical__ attribute on the object.
OTOH for things named __main__, setting __canonical__ (automatically, by -m or whatever other mechanism starts execution, like "python <filename>" might actually work.
Yes, although a related modification is needed in those cases (to actual insert the module being executed into sys.modules under its module name as well as under __main__).
On the third hand, maybe you've finally hit upon a reason why the "if __name__ == '__main__': main()" idiom is bad...
I can't take credit for that particular observation - I've certainly heard others complain about that in the context of pickling objects over the years. It is one of the main things that got me thinking along these lines in the first place.
Cheers, Nick.

On Tue, Jan 4, 2011 at 5:55 PM, Nick Coghlan ncoghlan@gmail.com wrote:
On Wed, Jan 5, 2011 at 8:52 AM, Guido van Rossum guido@python.org wrote:
Hmm... I starred this and am finally dug out enough to comment.
Would it be sufficient if the __module__ attribute of classes and functions got set to the "canonical" name rather than the "physical" name?
You can currently get a crude version of this by simply assigning to __name__ at the top of the module.
That sounds like it would be too confusing, however, so perhaps we could make it so that, when the __module__ attribute is initialized, it first looks for __canonical__ and then for __name__?
This may still be too crude though -- I looked at the one example I could think of where this might be useful, the unittest package, and realized that it would set __module__ to 'unittest' even for classes that are not actually re-exported via the unittest namespace.
So maybe it would be better in that case to just patch the __module__ attribute of all the public classes in unittest/__import__.py?
I did think about that - for classes, it would probably be sufficient, but for functions the fact that we'd be breaking the identity that "f.__globals__ is sys.modules[f.__module__]" scares me.
Really? Why? Who would ever depend on that? (You also probably meant sys.modules[...].__dict__ -- f.__globals__ is a dict, not a module object.)
Note that for classes you'd have the same issue, since each method references the module globals in its f.__globals__.
Then again, the fact that "f.__module__ != f.__globals__['__name__']" would provide exactly the indicator of "two names" that I am talking about (at least where functions are concerned) - things like pydoc and the inspect module could definitely be updated to check both module names.
I think the more important question to answer first would be what you'd want pydoc and inspect to do.
On the gripping hand, there would still be problems with things like methods and nested classes and functions (unless tools were provided to recurse down through a class to update the subcomponents as well as the class itself).
Well, method references (even unbound) are not picklable anyway.
So perhaps the granularity on my initial suggestion wasn't fine enough
- if the "__canonical__" idea was extended to all objects with a
__module__ attribute, then objects could either be relocated at creation time (by setting __canonical__ in the module globals), or after the fact by assigning to the __canonical__ attribute on the object.
BTW, I think we need to come up with a better word than __canonical__. In general I don't like using adjectives as attribute names.
OTOH for things named __main__, setting __canonical__ (automatically, by -m or whatever other mechanism starts execution, like "python <filename>" might actually work.
Yes, although a related modification is needed in those cases (to actual insert the module being executed into sys.modules under its module name as well as under __main__).
That's the easy part.
The hard part is to make the "real name" (i.e. not __main__) the name used by the classes and functions it defines, without breaking the "if __name__ == '__main__': main()" idiom...
On the third hand, maybe you've finally hit upon a reason why the "if __name__ == '__main__': main()" idiom is bad...
I can't take credit for that particular observation - I've certainly heard others complain about that in the context of pickling objects over the years. It is one of the main things that got me thinking along these lines in the first place.
Why didn't you say so in the first place? :-)
I think it's easier to come up with a solution for just this case; the issue with e.g. unittest doesn't seem quite as hard (after all, "unittest.case" will always exist).
We could just call it __real_name__ and use that in preference over __name__ for all __module__ attributes whenever it's set. (Or we could always set both...)

On Wed, Jan 5, 2011 at 2:47 PM, Guido van Rossum guido@python.org wrote:
On Tue, Jan 4, 2011 at 5:55 PM, Nick Coghlan ncoghlan@gmail.com wrote:
I can't take credit for that particular observation - I've certainly heard others complain about that in the context of pickling objects over the years. It is one of the main things that got me thinking along these lines in the first place.
Why didn't you say so in the first place? :-)
Well, I did put that "half-baked" disclaimer in for a reason... I'm still trying to figure out exactly what I think the real problem here is, so my expression of it is probably as clear as mud :)
I think it's easier to come up with a solution for just this case; the issue with e.g. unittest doesn't seem quite as hard (after all, "unittest.case" will always exist).
Perhaps it would focus the discussion if we picked one or two modules (in addition to __main__) as example cases.
functools comes in two pieces - partial and reduce are implemented in C in the _functools module, everything else is implemented in Python in functools itself. datetime, on the other hand, is a case of a pure acceleration module - if _datetime is available, it is expected to completely implement the datetime API.
_functools.partial and the classes in datetime all adopt the strategy of lying about their original location in __module__. This is probably the best available choice, as it makes pickling do the right thing.
The main downside with this approach is the way it confuses things like inspect.getsource (for datetime, it reports the pure Python versions as the source code for the C accelerated versions, for functools.partial it gives a technically accurate, but potentially misleading error message. If inspect could easily *tell* that the accelerated versions were in use, then it could handle the situation a bit more gracefully).
To eliminate that issue, what if, whenever we're setting a __module__ attribute (e.g. during class creation), we also set a "__real_module__" attribute? Then code could happily adjust __module__ to point to the official location (as it already does), but tools like inspect wouldn't be fooled regarding the state of the *current* interpreter. Most of the time, __module__ and __real_module__ will point to the same place, but the cases where they're different will be handled far more gracefully.
(I suspect that is significantly easier said than done though - I expect it would be a very manual process getting an extension module to do this correctly)
We could just call it __real_name__ and use that in preference over __name__ for all __module__ attributes whenever it's set. (Or we could always set both...)
The stuff I wrote above applies to pretty much everything *except* the __main__ module. For the __main__ module, I'm inclined to revisit Brett's idea from PEP 3122: put the real name of the __main__ module in a sys.main attribute. However, unlike that PEP, we would continue to set __name__ to "__main__" in the main module. The new attribute would be a transition step allowing manual reversal of the name mangling:
# Near top of module if __name__ = "__main__": running_as_main = True import sys __name__ = sys.main
# Rest of module
# Near end of module if running_as_main: # Actually do "main" type stuff.
Alternatively, we could just do nothing about the problem with __main__ and continue to encourage people to separate their "main" modules from the modules that define classes.
Cheers, Nick.

On 01/05/2011 06:15 AM, Nick Coghlan wrote:
Perhaps it would focus the discussion if we picked one or two modules (in addition to __main__) as example cases.
functools comes in two pieces - partial and reduce are implemented in C in the _functools module, everything else is implemented in Python in functools itself. datetime, on the other hand, is a case of a pure acceleration module - if _datetime is available, it is expected to completely implement the datetime API.
_functools.partial and the classes in datetime all adopt the strategy of lying about their original location in __module__. This is probably the best available choice, as it makes pickling do the right thing.
The main downside with this approach is the way it confuses things like inspect.getsource (for datetime, it reports the pure Python versions as the source code for the C accelerated versions, for functools.partial it gives a technically accurate, but potentially misleading error message. If inspect could easily *tell* that the accelerated versions were in use, then it could handle the situation a bit more gracefully).
It seems Python tries pretty hard to hide external calls, (the cause of the confusion you mention above). It makes me wonder why python doesn't have an extern type (or types). Then instead of them being a source of confusion, they would be recognisable for what they are. They could have extra attributes to enable pickle and other tools to work in a nice way.
Ron

On Tue, Jan 4, 2011 at 5:52 PM, Guido van Rossum guido@python.org wrote:
Would it be sufficient if the __module__ attribute of classes and functions got set to the "canonical" name rather than the "physical" name?
Not unless it were documented as an acceptable practice supported by the introspection libraries, with examples pointing to stdlib usage in places like elementTree.
Even then it may not work out, but that is the rest of the thread; I just wanted to emphasize that this is a case where "yup, it works" isn't good enough, because of confusion over specification vs implementation vs accidentally worked this time.
-jJ

On 1/4/2011 5:52 PM, Guido van Rossum wrote:
Nick's concern does not affect me,
On the third hand, maybe you've finally hit upon a reason why the "if __name__ == '__main__': main()" idiom is bad...
but I use this all the time. A suggested alternative and possible eventual replacement: give *every* module an attribute __main__ set to either True or False. Then the idiom would be much simpler and easier to learn and write: 'if __main__: ...'.
If there were no other use of the fake '__main__' name, the simple and unconditional replacement would be much less disruptive than, say, the int division change. But the first 10 pages of codesearch on '__main__' shows things like
django/test/_doctest.py - 107 identical elif module.__name__ == '__main__':
1850: m = sys.modules.get('__main__')
another sys.modules.get(), a sys.modules(), and
Formulator/tests/framework.py - many identical
57: if p0 and __name__ == '__main__': 58: os.chdir(p0)
The variant conditionals are easy to patch (by hand). The sys.modules lookup suggests that the main module should continue to be keyed under '__main__', even if also keyed under its 'real' name.
[Keying modules under a canonical name would eliminate duplicate import bugs, but that is another issue.]
-- Terry Jan Reedy

On 4 January 2011 22:52, Guido van Rossum guido@python.org wrote:
Hmm... I starred this and am finally dug out enough to comment.
Would it be sufficient if the __module__ attribute of classes and functions got set to the "canonical" name rather than the "physical" name?
You can currently get a crude version of this by simply assigning to __name__ at the top of the module.
That sounds like it would be too confusing, however, so perhaps we could make it so that, when the __module__ attribute is initialized, it first looks for __canonical__ and then for __name__?
This may still be too crude though -- I looked at the one example I could think of where this might be useful, the unittest package, and realized that it would set __module__ to 'unittest' even for classes that are not actually re-exported via the unittest namespace.
So maybe it would be better in that case to just patch the __module__ attribute of all the public classes in unittest/__import__.py?
So should I do this in unittest for Python 2.7 / 3.2?
The problem this *would* solve is that pickled unittest objects from 2.7 / 3.2 can't be unpickled on earlier versions of Python.
I don't know how *real* a problem it is or whether it is worth losing / faking the __module__ information on these classes to solve it. Sure it's a problem that is likely to bite *someone* at some point, but not very many people. If someone is using __module__ information to find source code (or anything else) for a class then changing __module__ will break that, so I'm not convinced it's a worthwhile tradeoff.
All the best,
Michael
OTOH for things named __main__, setting __canonical__ (automatically, by -m or whatever other mechanism starts execution, like "python <filename>" might actually work.
On the third hand, maybe you've finally hit upon a reason why the "if __name__ == '__main__': main()" idiom is bad...
--Guido
On Thu, Dec 30, 2010 at 6:52 PM, Nick Coghlan ncoghlan@gmail.com wrote:
On Thu, Dec 30, 2010 at 11:48 AM, Ron Adam rrr@ronadam.com wrote:
This sounds like two different separate issues to me.
One is the leaking-out of lower level details.
The other is abstracting a framework with the minimal amount of details needed.
Yeah, sort of. Really, the core issue is that some objects live in two
places:
- where they came from right now, in the current interpreter
- where they should be retrieved from "officially" (e.g. since another
interpreter may not provide an accelerated version, or because the appropriate submodule may be selected at runtime based on the current platform)
There's currently no systematic way of flagging objects or modules where the latter location differs from the former, so the components that leak the low level details (such as pickling and pydoc) have no way to avoid it. Once a system is in place to identify such objects (or perhaps just the affected modules), then the places that leak that information can be updated to deal with the situation appropriately (e.g. pickling would likely just use the official names, while pydoc would display both, indicating which one was the 'official' location, and which one reflected the current interpreter behaviour).
So it's really one core problem (non-portable module details), which then leads to an assortment of smaller problems when other parts of the standard library are forced to rely on those non-portable details because that's the only information available.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- --Guido van Rossum (python.org/~guido http://python.org/%7Eguido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

On Wed, Jan 5, 2011 at 11:42 PM, Michael Foord fuzzyman@voidspace.org.uk wrote:
So should I do this in unittest for Python 2.7 / 3.2?
The problem this *would* solve is that pickled unittest objects from 2.7 / 3.2 can't be unpickled on earlier versions of Python.
I don't know how *real* a problem it is or whether it is worth losing / faking the __module__ information on these classes to solve it. Sure it's a problem that is likely to bite *someone* at some point, but not very many people. If someone is using __module__ information to find source code (or anything else) for a class then changing __module__ will break that, so I'm not convinced it's a worthwhile tradeoff.
The two examples I looked at (functools and datetime) favoured hiding the implementation details at the cost of causing introspection problems. Despite my comments in the opening post of the thread, I think that is the better trade-off to make.
Cheers, Nick.

On 5 January 2011 15:57, Nick Coghlan ncoghlan@gmail.com wrote:
On Wed, Jan 5, 2011 at 11:42 PM, Michael Foord fuzzyman@voidspace.org.uk wrote:
So should I do this in unittest for Python 2.7 / 3.2?
The problem this *would* solve is that pickled unittest objects from 2.7
/
3.2 can't be unpickled on earlier versions of Python.
I don't know how *real* a problem it is or whether it is worth losing / faking the __module__ information on these classes to solve it. Sure it's
a
problem that is likely to bite *someone* at some point, but not very many people. If someone is using __module__ information to find source code
(or
anything else) for a class then changing __module__ will break that, so
I'm
not convinced it's a worthwhile tradeoff.
The two examples I looked at (functools and datetime) favoured hiding the implementation details at the cost of causing introspection problems. Despite my comments in the opening post of the thread, I think that is the better trade-off to make.
Both of those are because of underlying C implementations where introspection problems would be the default anyway, which isn't quite the same for situation for unittest.
Michael
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

I'm going to have to leave this thread to you all, my main goal was to tease out a better problem description. I think that's been taken care of now. The solution will then follow.

On Thu, Jan 6, 2011 at 3:45 AM, Michael Foord fuzzyman@voidspace.org.uk wrote:
On 5 January 2011 15:57, Nick Coghlan ncoghlan@gmail.com wrote:
The two examples I looked at (functools and datetime) favoured hiding the implementation details at the cost of causing introspection problems. Despite my comments in the opening post of the thread, I think that is the better trade-off to make.
Both of those are because of underlying C implementations where introspection problems would be the default anyway, which isn't quite the same for situation for unittest.
True, but it means the precedent of using __module__ to refer to the official location rather than than the actual location has already been set. That suggests to me our best way forward is to bless that as a recommended practice, then find a way to deal with the negative impact it currently has on introspection (such as a "__real_module__" attribute, as I suggested in another post).
Cheers, Nick.

On 06/01/2011 01:52, Nick Coghlan wrote:
On Thu, Jan 6, 2011 at 3:45 AM, Michael Foordfuzzyman@voidspace.org.uk wrote:
On 5 January 2011 15:57, Nick Coghlanncoghlan@gmail.com wrote:
The two examples I looked at (functools and datetime) favoured hiding the implementation details at the cost of causing introspection problems. Despite my comments in the opening post of the thread, I think that is the better trade-off to make.
Both of those are because of underlying C implementations where introspection problems would be the default anyway, which isn't quite the same for situation for unittest.
True, but it means the precedent of using __module__ to refer to the official location rather than than the actual location has already been set. That suggests to me our best way forward is to bless that as a recommended practice, then find a way to deal with the negative impact it currently has on introspection (such as a "__real_module__" attribute, as I suggested in another post).
Well, I would say set __module__ to the official location *when* we have "__real_module__" (or whatever) in place.
Changing __module__ breaks inspect.getsource:
.>>> import inspect .>>> from unittest import TestCase .>>> TestCase.__module__ 'unittest.case' .>>> TestCase.__module__ = 'unittest' .>>> inspect.getsource(TestCase) Traceback (most recent call last): ... IOError: could not find class definition
As the only problem this solves is a theoretical one (so far for unittest anyway) I'm not keen to do this until the introspection issue is resolved. One this is resolved I'm fine with it.
All the best,
Michael
Cheers, Nick.

On 01/05/2011 07:52 PM, Nick Coghlan wrote:
On Thu, Jan 6, 2011 at 3:45 AM, Michael Foordfuzzyman@voidspace.org.uk wrote:
On 5 January 2011 15:57, Nick Coghlanncoghlan@gmail.com wrote:
The two examples I looked at (functools and datetime) favoured hiding the implementation details at the cost of causing introspection problems. Despite my comments in the opening post of the thread, I think that is the better trade-off to make.
Both of those are because of underlying C implementations where introspection problems would be the default anyway, which isn't quite the same for situation for unittest.
True, but it means the precedent of using __module__ to refer to the official location rather than than the actual location has already been set. That suggests to me our best way forward is to bless that as a recommended practice, then find a way to deal with the negative impact it currently has on introspection (such as a "__real_module__" attribute, as I suggested in another post).
You could add a private dictionary to sys, that is updated along with sys.modules, which maps module names to real names. And have a function in inspect to retrieve the real name for an object.
That sounds like it would do pretty much what you need and doesn't add a top level builtin or global, or change "if __name__ == '__main__': main()".
Cheers, Ron

On Fri, Jan 7, 2011 at 12:38 PM, Ron Adam rrr@ronadam.com wrote:
You could add a private dictionary to sys, that is updated along with sys.modules, which maps module names to real names. And have a function in inspect to retrieve the real name for an object.
That sounds like it would do pretty much what you need and doesn't add a top level builtin or global, or change "if __name__ == '__main__': main()".
My original suggestion was along those lines, but I've come to the conclusion that it isn't sufficiently granular - when existing code tinkers with "__module__" it tends to do it at the object level rather than by modifying __name__ in the module globals.
To turn this into a concrete proposal, here is what I am thinking of specifying in a PEP for 3.3:
1. Implicit configuration of __module__ attributes is updated to check for a definition of "__import_name__" at the module level. If found, then this is used as the value for the __module__ attribute. Otherwise, __module__ is set to __name__ as usual.
2. Any code that currently sets a __module__ attribute (i.e. function and class definitions) will also set an __impl_module__ attribute. This attribute will always be set to the value of __name__.
3. Update and/or augment the relevant C APIs to make it easy to do this for affected extension modules
4. Update inspect.getsource() (and possibly some other introspection functions) to look at __impl_module__ rather than __module__
5. Update all acceleration (such as _datetime) and "implementation packages" (such as unittest) to set __module__ and __impl_module__ appropriately on exported objects
6. Update the __main__ execution logic (including both the builtin logic and runpy) to insert the __main__ module into sys.modules as both "__main__" and the module's real name (i.e. the name that would result in a second copy of the module ending up in sys.modules if you imported it)
7. Update the __main__ execution logic to set __import_name__ to the actual name of the module.
So we end up with two new magic attributes:
__import_name__: optional module level attribute that indicates a preferred alternative to __name__ for accessing the module. contents. Alters the value of __module__ for classes and functions defined in the module. Implicitly set for the __main__ module. __impl_module__: implicitly set on objects with a __module__ attribute to allow __module__ to be altered to refer to an object's preferred import location without losing the actual implementation location of the object
Cheers, Nick.

On 01/06/2011 09:28 PM, Nick Coghlan wrote:
On Fri, Jan 7, 2011 at 12:38 PM, Ron Adamrrr@ronadam.com wrote:
You could add a private dictionary to sys, that is updated along with sys.modules, which maps module names to real names. And have a function in inspect to retrieve the real name for an object.
That sounds like it would do pretty much what you need and doesn't add a top level builtin or global, or change "if __name__ == '__main__': main()".
My original suggestion was along those lines, but I've come to the conclusion that it isn't sufficiently granular - when existing code tinkers with "__module__" it tends to do it at the object level rather than by modifying __name__ in the module globals.
What do you mean by *tinkers with "__module__"* ?
Do you have an example where/when that is needed?
To turn this into a concrete proposal, here is what I am thinking of specifying in a PEP for 3.3:
- Implicit configuration of __module__ attributes is updated to check
for a definition of "__import_name__" at the module level. If found, then this is used as the value for the __module__ attribute. Otherwise, __module__ is set to __name__ as usual.
If __import_name__ is going to match __module__ everywhere else, why not just call it __module__ every where?
Would __package__ be changed in any way?
- Any code that currently sets a __module__ attribute (i.e. function
and class definitions) will also set an __impl_module__ attribute. This attribute will always be set to the value of __name__.
So we will have: __package__, __module__, __import_name__, __impl_name__, and if you also include __file__ and __path__, that makes six different attributes for describing where something came from.
I don't know about you, but this bothers me a bit. :-/
How about reconsidering going the other direction:
1. Add __module__ to module level name space. +1
2. Add a module registry that uses the __module__ attribute to get a module_location_info object, which would have all the useful location info in it. (including the real name of "__main__")
If __name__ and __module__ are not changed, Programs that use those won't break.
Also consider having virtual modules, where objects in it may have come from different *other* locations. A virtual module would need a way to keep track of that. (I'm not sure this is a good idea.)
Does this fit some of problems you are thinking of where the granularity may matter?
It would take two functions to do this. One to create the virtual module, and another to pre-load it's initial objects. For those objects, the loader would set obj.__module__ to the virtual module name, and also set obj.__original_module__ to the original module name. These would only be seen on objects in virtual modules. A lookup on obj.__module__ will tell you it's in a virtual module. Then a lookup with obj.__original_module__ would give you the actual location info it came from.
By doing it that way, most people will never need to know how these things work or even see them. ie... It's advance/expert Python foo. ;-)
Any way, I hope this gives you some ideas, I know you can figure out the details much better than I can.
Cheers, Ron

On 01/08/2011 03:06 AM, Ron Adam wrote:
So we will have: __package__, __module__, __import_name__, __impl_name__, and if you also include __file__ and __path__, that makes six different attributes for describing where something came from.
And also add __cached__ to that list.
I don't know about you, but this bothers me a bit. :-/

On Sat, Jan 8, 2011 at 7:06 PM, Ron Adam rrr@ronadam.com wrote:
On 01/06/2011 09:28 PM, Nick Coghlan wrote:
My original suggestion was along those lines, but I've come to the conclusion that it isn't sufficiently granular - when existing code tinkers with "__module__" it tends to do it at the object level rather than by modifying __name__ in the module globals.
What do you mean by *tinkers with "__module__"* ?
Do you have an example where/when that is needed?
from inspect import getsource from functools import partial partial.__module__
'functools'
getsource(partial)
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.6/inspect.py", line 689, in getsource lines, lnum = getsourcelines(object) File "/usr/lib/python2.6/inspect.py", line 678, in getsourcelines lines, lnum = findsource(object) File "/usr/lib/python2.6/inspect.py", line 552, in findsource raise IOError('could not find class definition') IOError: could not find class definition
partial is actually implemented in C in the _functools module, hence the failure of the getsource call. However, it officially lives in functools for pickling purposes (other implementations aren't obliged to provide _functools at all), so __module__ is adjusted appropriately.
The other examples I have been using are the _datetime C acceleration module and the unittest pseudo-package.
- Implicit configuration of __module__ attributes is updated to check
for a definition of "__import_name__" at the module level. If found, then this is used as the value for the __module__ attribute. Otherwise, __module__ is set to __name__ as usual.
If __import_name__ is going to match __module__ everywhere else, why not just call it __module__ every where?
Because the module level attributes for identifying the module don't serve the same purpose as the attributes identifying where functions and classes are defined. That said, calling it "__module__" would probably work, and make the naming logic a bit more intuitive. The precedent for that attribute name to refer to a string rather than a module object was set a long time ago, after all.
Would __package__ be changed in any way?
To look for __module__ before checking __name__? No, since doing that would make it unnecessarily difficult to use relative imports inside pseudo-packages.
- Any code that currently sets a __module__ attribute (i.e. function
and class definitions) will also set an __impl_module__ attribute. This attribute will always be set to the value of __name__.
So we will have: __package__, __module__, __import_name__, __impl_name__, and if you also include __file__ and __path__, that makes six different attributes for describing where something came from.
I don't know about you, but this bothers me a bit. :-/
It bothers me a lot, since I probably could have avoided at least some of it by expanding the scope of PEP 366. However, it does help to split them out into the different contexts and look at how each of them are used, since it makes it clear that there are a lot of attributes because there is a fair bit of information that is used in different ways.
Module level attributes relating to location in the external environment: __file__: typically refers to a source file, but is not required to (see PEP 302) __path__: package attribute used to identify the directory (or directories) searched for submodules __loader__: PEP 302 loader reference (may not exist for ordinary filesystem imports) __cached__: if it exists, refers to a compiled bytecode file (see PEP 3149)
It is important to understand that ever since PEP 302, *there is no loader independent mapping* between any of these external environment related attributes and the module namespace. Some Python standard library code (i.e. multiprocessing) currently assumes such a mapping exists and it is broken on windows right now as a direct result of that incorrect assumption (other code explicitly disclaims support for PEP 302 loaded modules and only works with actual files and directories).
Module level attributes relating to location within the module namespace: __name__: actual name of current module in the current interpreter instance. Best choice for introspection of the current interpreter. __module__ (*new*): "official" portable name for module contents (components should never include leading underscores). Best choice for information that should be portable to other interpreters (e.g. for pickling and other serialisation formats) __package__: optional attribute used specifically to control handling of relative imports. May be explicitly set (e.g. by runpy), otherwise implicitly set to "__name__.rpartion('.')[0]" by the first relative import.
Most of the time, __name__ is consistent across all 3 use cases, in which case __package__ and __import_name__ are redundant. However, when __name__ is wrong for some reason (e.g. including an implementation detail, or adjusted to "__main__" for execution as a script), then __package__ allows relative imports to be fixed, while __import_name__ will allow pickling and other operations that should hide implementation details to be fixed.
Object level attributes relating to location of class and function definitions: __module__ (*updated*): refers to __module__ from originating module (if defined) and to __name__, otherwise __impl_module__ (*new*): refers to __name__ from originating module
Looking at that write-up, I do quite like the idea of reusing __module__ for the new module level attribute.
Also consider having virtual modules, where objects in it may have come from different *other* locations. A virtual module would need a way to keep track of that. (I'm not sure this is a good idea.)
It's too late, code already does that. This is precisely the use case I am trying to fix (objects like functools.partial that deliberately lie in their __module__ attribute), so that this can be done *right* (i.e. without having to choose which use cases to support and which ones to break).
That basic problem is that __module__ currently tries to serve two masters: 1. use cases like inspect.getsource, where we want to know where the object came from in the current interpreter 2. use cases like pickle, where we want the "official" portable location, with any implementation details (like the _functools module) hidden.
Currently, the default behaviour of the interpreter is to support use case 1 and break use case 2 if any objects are defined in a different module from where they claim to live (e.g. see the pickle compatibility breakage with the 3.2 unittest implementation layout changes). The only tool currently available to module authors is to override __module__ (as functools.partial and the datetime acceleration module do), which is correct for use case 2, but breaks use case 1 (leading to misleading error messages in the C acceleration module case, and breaking otherwise valid introspection in the unittest case).
My proposed changes will: a) make overriding __module__ significantly easier to do b) allow the introspection use cases access to the information they need so they can do the right thing when confronted with an overridden __module__ attribute
Does this fit some of problems you are thinking of where the granularity may matter?
It would take two functions to do this. One to create the virtual module, and another to pre-load it's initial objects. For those objects, the loader would set obj.__module__ to the virtual module name, and also set obj.__original_module__ to the original module name. These would only be seen on objects in virtual modules. A lookup on obj.__module__ will tell you it's in a virtual module. Then a lookup with obj.__original_module__ would give you the actual location info it came from.
That adds a lot of complexity though - far simpler to define a new __impl_module__ attribute on every object, retroactively fixing introspection of existing code that adjusts __module__ to make pickling work properly across different versions and implementations.
By doing it that way, most people will never need to know how these things work or even see them. ie... It's advance/expert Python foo. ;-)
Most people will never need to care or worry about the difference between __module__ and __impl_module__ either - it will be hidden inside libraries like inspect, pydoc and pickle.
Any way, I hope this gives you some ideas, I know you can figure out the details much better than I can.
Yeah, the idea of reusing the __module__ attribute name at the top level is an excellent one.
Cheers, Nick.

On 01/09/2011 12:39 AM, Nick Coghlan wrote:
Also consider having virtual modules, where objects in it may have come from different *other* locations. A virtual module would need a way to keep track of that. (I'm not sure this is a good idea.)
It's too late, code already does that. This is precisely the use case I am trying to fix (objects like functools.partial that deliberately lie in their __module__ attribute), so that this can be done *right* (i.e. without having to choose which use cases to support and which ones to break).
Yes, __builtins__ is a virtual module.
Creating a module in memory...
import imp new = imp.new_module("new") new
<module 'new' (built-in)>
The term "(built-in)" doesn't quite fit in this case. But I can get used to it.
sys.modules[new.__name__]
Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'new'
And it's not in sys.modules yet. That's ok, other things can be loaded into it before it's added it to sys.modules.
It's this loading part that can be improved.
That basic problem is that __module__ currently tries to serve two masters:
- use cases like inspect.getsource, where we want to know where the
object came from in the current interpreter 2. use cases like pickle, where we want the "official" portable location, with any implementation details (like the _functools module) hidden.
Most C extensions are written as modules, to be imported and imported from. A tool to load objects rather than import them, may be better in these situations.
partial = imp.load_extern_object("_functools.partial")
A loaded object would have it's __module__ attribute set to the module it's loaded into instead of where it came from.
By doing it this way, it doesn't complicate the import semantics.
It may also be useful to make it a special type, so that other tools can decide how to handle them.
Currently, the default behaviour of the interpreter is to support use case 1 and break use case 2 if any objects are defined in a different module from where they claim to live (e.g. see the pickle compatibility breakage with the 3.2 unittest implementation layout changes). The only tool currently available to module authors is to override __module__ (as functools.partial and the datetime acceleration module do), which is correct for use case 2, but breaks use case 1 (leading to misleading error messages in the C acceleration module case, and breaking otherwise valid introspection in the unittest case).
My proposed changes will: a) make overriding __module__ significantly easier to do b) allow the introspection use cases access to the information they need so they can do the right thing when confronted with an overridden __module__ attribute
It would be better to find solutions that don't override __module__ after it has been imported or loaded.
Does this fit some of problems you are thinking of where the granularity may matter?
It would take two functions to do this. One to create the virtual module, and another to pre-load it's initial objects. For those objects, the loader would set obj.__module__ to the virtual module name, and also set obj.__original_module__ to the original module name. These would only be seen on objects in virtual modules. A lookup on obj.__module__ will tell you it's in a virtual module. Then a lookup with obj.__original_module__ would give you the actual location info it came from.
That adds a lot of complexity though - far simpler to define a new __impl_module__ attribute on every object, retroactively fixing introspection of existing code that adjusts __module__ to make pickling work properly across different versions and implementations.
By doing it that way, most people will never need to know how these things work or even see them. ie... It's advance/expert Python foo. ;-)
Most people will never need to care or worry about the difference between __module__ and __impl_module__ either - it will be hidden inside libraries like inspect, pydoc and pickle.
I think __impl_module__ should only be on objects where it would be different than __module__.
Any way, I hope this gives you some ideas, I know you can figure out the details much better than I can.
Yeah, the idea of reusing the __module__ attribute name at the top level is an excellent one.
The hard part of all of this, is separating out the the good doable ideas from the good, but unfortunately can't do ideas because it will break something ideas.
Cheers, Ron

On Mon, Jan 10, 2011 at 3:56 AM, Ron Adam rrr@ronadam.com wrote:
On 01/09/2011 12:39 AM, Nick Coghlan wrote:
Also consider having virtual modules, where objects in it may have come from different *other* locations. A virtual module would need a way to keep track of that. (I'm not sure this is a good idea.)
It's too late, code already does that. This is precisely the use case I am trying to fix (objects like functools.partial that deliberately lie in their __module__ attribute), so that this can be done *right* (i.e. without having to choose which use cases to support and which ones to break).
Yes, __builtins__ is a virtual module.
No, it's a real module, just like all the others.
sys.modules[new.__name__]
Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'new'
And it's not in sys.modules yet. That's ok, other things can be loaded into it before it's added it to sys.modules.
It's this loading part that can be improved.
I don't understand the point of this tangent. The practice of how objects are merged into modules is already established: you use "import *" or some other form of import statement. I want to *make that work properly*, not invent a new way to do it.
That basic problem is that __module__ currently tries to serve two masters:
- use cases like inspect.getsource, where we want to know where the
object came from in the current interpreter 2. use cases like pickle, where we want the "official" portable location, with any implementation details (like the _functools module) hidden.
Most C extensions are written as modules, to be imported and imported from. A tool to load objects rather than import them, may be better in these situations.
partial = imp.load_extern_object("_functools.partial")
A loaded object would have it's __module__ attribute set to the module it's loaded into instead of where it came from.
By doing it this way, it doesn't complicate the import semantics.
What complication to the import semantics? I'm not touching the import semantics, just the semantics for defining functions and classes.
It may also be useful to make it a special type, so that other tools can decide how to handle them.
No. The idea is to make existing code work properly, not force people to jump through new hoops.
Currently, the default behaviour of the interpreter is to support use case 1 and break use case 2 if any objects are defined in a different module from where they claim to live (e.g. see the pickle compatibility breakage with the 3.2 unittest implementation layout changes). The only tool currently available to module authors is to override __module__ (as functools.partial and the datetime acceleration module do), which is correct for use case 2, but breaks use case 1 (leading to misleading error messages in the C acceleration module case, and breaking otherwise valid introspection in the unittest case).
My proposed changes will: a) make overriding __module__ significantly easier to do b) allow the introspection use cases access to the information they need so they can do the right thing when confronted with an overridden __module__ attribute
It would be better to find solutions that don't override __module__ after it has been imported or loaded.
Again, no. My aim is to make existing practices not break things, rather than trying to get people to change their practices.
Most people will never need to care or worry about the difference between __module__ and __impl_module__ either - it will be hidden inside libraries like inspect, pydoc and pickle.
I think __impl_module__ should only be on objects where it would be different than __module__.
How does introducing an inconsistency like that make anything simpler? Optional attributes are painful to deal with, so we only use them for things where we don't fully control their creation (e.g. when we add new attributes to modules, PEP 302 means we can't assume they will exist when the module code is running, as third party loaders may not include them when initialising the module namespace). That is unlikely to be the case here.
Cheers, Nick.

Am 09.01.2011 19:18, schrieb Nick Coghlan:
On Mon, Jan 10, 2011 at 3:56 AM, Ron Adam rrr@ronadam.com wrote:
On 01/09/2011 12:39 AM, Nick Coghlan wrote:
Also consider having virtual modules, where objects in it may have come from different *other* locations. A virtual module would need a way to keep track of that. (I'm not sure this is a good idea.)
It's too late, code already does that. This is precisely the use case I am trying to fix (objects like functools.partial that deliberately lie in their __module__ attribute), so that this can be done *right* (i.e. without having to choose which use cases to support and which ones to break).
Yes, __builtins__ is a virtual module.
No, it's a real module, just like all the others.
__builtin__ (2.x) / builtins (3.x) is; __builtins__ you (Ron) should just forget about.
Georg

On 01/09/2011 12:18 PM, Nick Coghlan wrote:
On Mon, Jan 10, 2011 at 3:56 AM, Ron Adamrrr@ronadam.com wrote:
On 01/09/2011 12:39 AM, Nick Coghlan wrote:
Also consider having virtual modules, where objects in it may have come from different *other* locations. A virtual module would need a way to keep track of that. (I'm not sure this is a good idea.)
It's too late, code already does that. This is precisely the use case I am trying to fix (objects like functools.partial that deliberately lie in their __module__ attribute), so that this can be done *right* (i.e. without having to choose which use cases to support and which ones to break).
Yes, __builtins__ is a virtual module.
No, it's a real module, just like all the others.
As George pointed out it's "builtins". But you knew what I was referring to. ;-)
I wasn't saying it's not a real module, but there are differences. Mainly builtins (and other c modules) don't have a file reference after it's imported like modules written in python.
import dis dis
<module 'dis' from '/usr/local/lib/python3.2/dis.py'>
dis.__file__
'/usr/local/lib/python3.2/dis.py'
import builtins builtins
<module 'builtins' (built-in)>
builtins.__file__
Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'module' object has no attribute '__file__'
So they appear as if they don't have a source. There is probably a better term for this than virtual. I was thinking it fits well for modules constructed in memory rather than ones built up by executing python code directly.
Hmmm...
Should modules written in other languages have a __file__ attribute?
Would that help introspection or in other ways?
It's this loading part that can be improved.
I don't understand the point of this tangent. The practice of how objects are merged into modules is already established: you use "import *" or some other form of import statement. I want to *make that work properly*, not invent a new way to do it.
Sorry, I was looking for ways to avoid changing __module__.
All of the above ways, will still have the __module__ attribute on objects set to the module they came from. Which again is fine, because that is what you want most of the time. Just not in the case of partial.
Setting __module__ manually is easy enough in that case.
Cheers, Nick.
I think I'm more likely to side track you at this point. I am starting to get familiar with the c code, but I still have a ways to go before I understand all the different parts. Getting there though. :-)
On the python side of things, the attributes we've been discussing almost never have anything to do with what most programs are written to do. Unless it's a program written specifically for managing pythons various parts. It's kind of like the problem of separating content, context, and presentation in web pages. Sometimes it's hard to do.
Cheers, Ron

On Mon, Jan 10, 2011 at 11:11 AM, Ron Adam rrr@ronadam.com wrote:
On the python side of things, the attributes we've been discussing almost never have anything to do with what most programs are written to do. Unless it's a program written specifically for managing pythons various parts. It's kind of like the problem of separating content, context, and presentation in web pages. Sometimes it's hard to do.
Yep - 99.99% of python code will never care if this is ever fixed. However, the fact that we've started using acceleration modules and pseudo-packages in the standard library means that "things should just work" is being broken subtly in the stuff we're shipping ourselves (either by creating pickling problems, as in unittest, or misleading introspection results, as in functools and datetime).
And if we're going to fix it at all, we may as well fix it right :)
Cheers, Nick.

On 10 January 2011 11:26, Nick Coghlan ncoghlan@gmail.com wrote:
On Mon, Jan 10, 2011 at 11:11 AM, Ron Adam rrr@ronadam.com wrote:
On the python side of things, the attributes we've been discussing almost never have anything to do with what most programs are written to do.
Unless
it's a program written specifically for managing pythons various parts.
It's
kind of like the problem of separating content, context, and presentation
in
web pages. Sometimes it's hard to do.
Yep - 99.99% of python code will never care if this is ever fixed. However, the fact that we've started using acceleration modules and pseudo-packages in the standard library means that "things should just work" is being broken subtly in the stuff we're shipping ourselves (either by creating pickling problems, as in unittest, or misleading introspection results, as in functools and datetime).
And if we're going to fix it at all, we may as well fix it right :)
I certainly don't object to fixing this, and neither do I object to adding a new class / module / function attribute to achieve it.
However... is there anything else that this fixes? (Are there more examples "in the wild" where this would help?)
The unittest problem with pickling is real but likely to only affect a very, very small number of users. The introspection problem (getsource) for functools and datetime isn't a *real* problem because the source code isn't available. If in fact getsource now points to the pure Python version even in the cases where the C versions are being used then "fixing" this seems like a step backwards...
Python 3.2:
import inspect from datetime import date inspect.getsource(date)
'class date:\n """Concrete date type.\n\n ...'
Python 3.1:
import inspect from datetime import date inspect.getsource(date)
Traceback (most recent call last): ... IOError: source code not available
With your changes in place would Python 3.3 revert to the 3.1 behaviour here? How is this an advantage?
What I'm really asking is, is the cure (and the accompanying implementation effort and additional complexity to the Python object model) worse than the disease...
All the best,
Michael Foord
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

On Mon, Jan 10, 2011 at 9:37 PM, Michael Foord fuzzyman@voidspace.org.uk wrote:
I certainly don't object to fixing this, and neither do I object to adding a new class / module / function attribute to achieve it.
However... is there anything else that this fixes? (Are there more examples "in the wild" where this would help?)
The unittest problem with pickling is real but likely to only affect a very, very small number of users. The introspection problem (getsource) for functools and datetime isn't a *real* problem because the source code isn't available. If in fact getsource now points to the pure Python version even in the cases where the C versions are being used then "fixing" this seems like a step backwards...
unittest is actually a better example, because there *is* a solution to your pickling problem: alter __module__ to say "unittest" rather than "unittest.<whatever>", just as _functools.partial and the _datetime classes do. However, you've stated you don't want to do that because it would break introspection. That's a reasonable position to take, so the idea is to make it so you don't have to make that choice. Instead, you'll be able to happily adjust __module__ to make pickling work properly, while introspection will be able to fall back on __impl_module__ to get the correct information.
Python 3.2:
import inspect from datetime import date inspect.getsource(date)
'class date:\n """Concrete date type.\n\n ...'
Python 3.1:
import inspect from datetime import date inspect.getsource(date)
Traceback (most recent call last): ... IOError: source code not available
With your changes in place would Python 3.3 revert to the 3.1 behaviour here? How is this an advantage?
It's an improvement because the current answer is misleading: that source code is *not* what is currently running. You can change that source to your heart's content and it will do exactly *squat* when it comes to changing the interpreter's behaviour.
That said, one of the benefits of this proposal is that we aren't restricted to the either/or behaviour. Since the interpreter will provide both pieces of information, we have plenty of opportunity to make inspect smarter about the situation. (e.g. only looking in __impl_module__ by default, but offering a flag to also check __module__ if no source is available from the implementation module).
What I'm really asking is, is the cure (and the accompanying implementation effort and additional complexity to the Python object model) worse than the disease...
Potentially, but I see enough merit in the idea to follow up with a PEP for it.
Cheers, Nick.

On 01/10/2011 05:26 AM, Nick Coghlan wrote:
On Mon, Jan 10, 2011 at 11:11 AM, Ron Adamrrr@ronadam.com wrote:
On the python side of things, the attributes we've been discussing almost never have anything to do with what most programs are written to do. Unless it's a program written specifically for managing pythons various parts. It's kind of like the problem of separating content, context, and presentation in web pages. Sometimes it's hard to do.
Yep - 99.99% of python code will never care if this is ever fixed. However, the fact that we've started using acceleration modules and pseudo-packages in the standard library means that "things should just work" is being broken subtly in the stuff we're shipping ourselves (either by creating pickling problems, as in unittest, or misleading introspection results, as in functools and datetime).
And if we're going to fix it at all, we may as well fix it right :)
Fixing it right mean taking a longer view point. What would we like all this stuff to look like two or more versions down the road? (Probably python 3.5 or 3.6)
Doing the minimum to fix just the immediate problems is a short term veiw. That will work, but if we can align it with up with a longer view solution, it would be better.
If we can't decide what the long term solution might be, then we may be better off using private attributes and methods for now for these isolated situations.
How about making __module__ a property on accelerated objects, that looks for a global flag, then returned either, _module__ or _alt_module__ depending on the flag? (or some other way of store those values)
Pickle could set the flag so it can get what it needs from __module__, then unset it when it's done.
Cheers, Ron

On 01/10/2011 11:55 AM, Ron Adam wrote:
How about making __module__ a property on accelerated objects, that looks for a global flag, then returned either, _module__ or _alt_module__ depending on the flag? (or some other way of store those values)
Pickle could set the flag so it can get what it needs from __module__, then unset it when it's done.
Or this maybe should be the other way around. When a module begins with an underscore it should be considered a private implementation detail. So __module__, in the case of partial, is already set to the correct value.
But when the actual name is needed instead of the official name, A global flag can be set. Then when module is a property, it will get the actual name, instead of the official name.
Cheers, Ron

On 12/29/2010 03:52 PM, Nick Coghlan wrote:
Disclaimer: this is a currently half-baked idea that needs some discussion here if it is going to turn into something a bit more coherent :)
On and off, I've been pondering the problem of the way implementation details (like the real file structures of the multiprocessing and unittest packages, or whether or not an interpreter use the pure Python or the C accelerated version of various modules) leak out into the world via the __module__ attribute on various components. This mostly comes up when discussing pickle compatibility between 2.x and 3.x, but in can show up in various guises whenever you start relying on dynamic introspection.
As, I see it, there are 3 basic ways of dealing with the problem:
- Allow objects to lie about their source module This is likely a terrible idea, since a function's global namespace
reference would disagree with its module reference. I suspect much weirdness would result.
- A pickle-specific module alias registry, since that is where the
problem comes up most often A possible approach, but not necessarily a good one (since it isn't really a pickle-specific problem).
- An inspect-based module alias registry That is, an additional query API (get_canonical_module_name?) in the
inspect module that translates from the implementation detail module name to the "preferred" module name. The implementation could be as simple as a "__canonical__" attribute in the module namespace.
I actually quite like option 3, with various things (such as pydoc) updated to show *both* names when they're different. That way people will know where to find official documentation for objects from pseudo-packages and acceleration modules (i.e. under the canonical name), without hiding where the actual implementation came from.
Pickle *generation* could then be updated to only send canonical module names during normal operation, reducing the exposure of implementation details like pseudo-packages and acceleration modules.
Whether or not runpy should set __canonical__ on the main module would be an open question (probably not, *unless* runpy was also updated to add the main module to sys.modules under its real name as well __main__).
This makes more sense now that we've discussed it a bit.
Here's a rough sketch of a context manager that temporarily overrides the __module__ attribute.
This works well for simple introspection. For example, you can use it to call inspect functions without changing them.
But pickling is recursive, so this probably wouldn't work very well for that.
Cheers, Ron
#-------------------------------
from contextlib import contextmanager
class cls: def method(self): pass c = cls() InstanceMethod = type(c.method)
def _getter(self, value): if value == "__module__" and hasattr(self, "__alt_module__"): return object.__getattribute__(self, "__alt_module__") return object.__getattribute__(self, value)
@contextmanager def alt_module_getter(obj): obj.__class__.__getattribute__ = InstanceMethod(_getter, obj) try: yield obj finally: del obj.__class__.__getattribute__
def get_module_name(obj): return obj.__module__
# gets __alt__module__ if it exists, else gets __module__
with alt_module_getter(obj) as obj: module_name = get_module_name(obj)
participants (8)
-
Eric Smith
-
Georg Brandl
-
Guido van Rossum
-
Jim Jewett
-
Michael Foord
-
Nick Coghlan
-
Ron Adam
-
Terry Reedy