On Sat, Jan 8, 2011 at 7:06 PM, Ron Adam firstname.lastname@example.org wrote:
On 01/06/2011 09:28 PM, Nick Coghlan wrote:
My original suggestion was along those lines, but I've come to the conclusion that it isn't sufficiently granular - when existing code tinkers with "__module__" it tends to do it at the object level rather than by modifying __name__ in the module globals.
What do you mean by tinkers with "__module__" ?
Do you have an example where/when that is needed?
from inspect import getsource from functools import partial partial.__module__ 'functools' getsource(partial) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.6/inspect.py", line 689, in getsource lines, lnum = getsourcelines(object) File "/usr/lib/python2.6/inspect.py", line 678, in getsourcelines lines, lnum = findsource(object) File "/usr/lib/python2.6/inspect.py", line 552, in findsource raise IOError('could not find class definition') IOError: could not find class definition
partial is actually implemented in C in the _functools module, hence the failure of the getsource call. However, it officially lives in functools for pickling purposes (other implementations aren't obliged to provide _functools at all), so __module__ is adjusted appropriately.
The other examples I have been using are the _datetime C acceleration module and the unittest pseudo-package.
If __import_name__ is going to match __module__ everywhere else, why not just call it __module__ every where?
Because the module level attributes for identifying the module don't serve the same purpose as the attributes identifying where functions and classes are defined. That said, calling it "__module__" would probably work, and make the naming logic a bit more intuitive. The precedent for that attribute name to refer to a string rather than a module object was set a long time ago, after all.
Would __package__ be changed in any way?
To look for __module__ before checking __name__? No, since doing that would make it unnecessarily difficult to use relative imports inside pseudo-packages.
So we will have: __package__, __module__, __import_name__, __impl_name__, and if you also include __file__ and __path__, that makes six different attributes for describing where something came from.
I don't know about you, but this bothers me a bit. :-/
It bothers me a lot, since I probably could have avoided at least some of it by expanding the scope of PEP 366. However, it does help to split them out into the different contexts and look at how each of them are used, since it makes it clear that there are a lot of attributes because there is a fair bit of information that is used in different ways.
Module level attributes relating to location in the external environment: __file__: typically refers to a source file, but is not required to (see PEP 302) __path__: package attribute used to identify the directory (or directories) searched for submodules __loader__: PEP 302 loader reference (may not exist for ordinary filesystem imports) __cached__: if it exists, refers to a compiled bytecode file (see PEP 3149)
It is important to understand that ever since PEP 302, there is no loader independent mapping between any of these external environment related attributes and the module namespace. Some Python standard library code (i.e. multiprocessing) currently assumes such a mapping exists and it is broken on windows right now as a direct result of that incorrect assumption (other code explicitly disclaims support for PEP 302 loaded modules and only works with actual files and directories).
Module level attributes relating to location within the module namespace: __name__: actual name of current module in the current interpreter instance. Best choice for introspection of the current interpreter. __module__ (new): "official" portable name for module contents (components should never include leading underscores). Best choice for information that should be portable to other interpreters (e.g. for pickling and other serialisation formats) __package__: optional attribute used specifically to control handling of relative imports. May be explicitly set (e.g. by runpy), otherwise implicitly set to "__name__.rpartion('.')" by the first relative import.
Most of the time, __name__ is consistent across all 3 use cases, in which case __package__ and __import_name__ are redundant. However, when __name__ is wrong for some reason (e.g. including an implementation detail, or adjusted to "__main__" for execution as a script), then __package__ allows relative imports to be fixed, while __import_name__ will allow pickling and other operations that should hide implementation details to be fixed.
Object level attributes relating to location of class and function definitions: __module__ (updated): refers to __module__ from originating module (if defined) and to __name__, otherwise __impl_module__ (new): refers to __name__ from originating module
Looking at that write-up, I do quite like the idea of reusing __module__ for the new module level attribute.
Also consider having virtual modules, where objects in it may have come from different other locations. A virtual module would need a way to keep track of that. (I'm not sure this is a good idea.)
It's too late, code already does that. This is precisely the use case I am trying to fix (objects like functools.partial that deliberately lie in their __module__ attribute), so that this can be done right (i.e. without having to choose which use cases to support and which ones to break).
That basic problem is that __module__ currently tries to serve two masters:
Currently, the default behaviour of the interpreter is to support use case 1 and break use case 2 if any objects are defined in a different module from where they claim to live (e.g. see the pickle compatibility breakage with the 3.2 unittest implementation layout changes). The only tool currently available to module authors is to override __module__ (as functools.partial and the datetime acceleration module do), which is correct for use case 2, but breaks use case 1 (leading to misleading error messages in the C acceleration module case, and breaking otherwise valid introspection in the unittest case).
My proposed changes will: a) make overriding __module__ significantly easier to do b) allow the introspection use cases access to the information they need so they can do the right thing when confronted with an overridden __module__ attribute
Does this fit some of problems you are thinking of where the granularity may matter?
It would take two functions to do this. One to create the virtual module, and another to pre-load it's initial objects. For those objects, the loader would set obj.__module__ to the virtual module name, and also set obj.__original_module__ to the original module name. These would only be seen on objects in virtual modules. A lookup on obj.__module__ will tell you it's in a virtual module. Then a lookup with obj.__original_module__ would give you the actual location info it came from.
That adds a lot of complexity though - far simpler to define a new __impl_module__ attribute on every object, retroactively fixing introspection of existing code that adjusts __module__ to make pickling work properly across different versions and implementations.
By doing it that way, most people will never need to know how these things work or even see them. ie... It's advance/expert Python foo. ;-)
Most people will never need to care or worry about the difference between __module__ and __impl_module__ either - it will be hidden inside libraries like inspect, pydoc and pickle.
Any way, I hope this gives you some ideas, I know you can figure out the details much better than I can.
Yeah, the idea of reusing the __module__ attribute name at the top level is an excellent one.
-- Nick Coghlan | email@example.com | Brisbane, Australia