On 2/7/07, Ron Adam <rrr@ronadam.com> wrote:
Brett Cannon wrote:
On 2/4/07, Ron Adam <rrr@ronadam.com> wrote:
After exploring this a bit further on comp.lang.python, I was able to organize these ideas better. The more I thought about it, the more '+'s I found, and about the only '-'s I can think of is the work required to actually make a patch to do it.
It's also good to keep in mind that since most people still rely on the old relative import behavior, most people have not run into some of the issues I mention here. But they will at some point.
I did mean to keep this short, but clarity won out. (At least it's clear to me, but that's an entirely subjective opinion on my part.)
Maybe someone will adopt this and make a real PEP out of it. :-)
Cheers, Ron
PROPOSAL ========
Make pythons concept of a package, (currently an informal type), be stronger than that of the underlying file system search path and directory structure.
So you mean make packages more of an official thing than just having a __path__ attribute on a module, right?
Currently in python 2.5, __path__ attributes are only in the imported package name spaces. Running a module doesn't set a __path__ attribute, just the __file__ attribute.
True.
It would be nice if __path__ were set on all modules in packages no matter how they are started.
There is a slight issue with that as the __path__ attribute represents the top of a package and thus that it has an __init__ module. It has some significance in terms of how stuff works at the moment.
The real name could be worked out by comparing __path__ and __file__ if someone needs that. But I think it would be better to just go ahead and add a __realname__ attribute for when __name__ is "__main__".
__name__ == "__main__" can stay the same and still serve it's purpose to tell weather a script was started directly or imported.
I think the whole __main__ thing is the wrong thing to be trying to keep alive for this. I know it would break things, but it is probably better to come up with a better way for a module to know when it is being executed or do denote what code should only be run when it is executed.
Where the following hold true in python 3.X, or when absolute_import behavior is imported from __future__ in python 2.X:
(1) Python first determines if a module or package is part of a package and then runs that module or package in the context of the package they belong to. (see items below)
Don't quite follow this statement. What do you mean by "runs" here? You mean when using runpy or something and having the name set to '__main__'?
Yes
(2) import this_package.module import this_package.sub_package
If this_package is the same name as the current package, then do not look on sys.path. Use the location of this_package.
Already does this (at least in my pure Python implementation). Searches are done on __path__ when you are within a package.
Cool! I don't think it's like that for the non-pure version, but it may do it that way if "from __future__ import absolute_import" is used.
It does do it both ways, there is just a fallback on the classic import semantics in terms of trying it both as a relative and absolute import. But I got the semantics from the current implementation so it is not some great inspiration of mine. =)
Are you setting __path__ for each module imported in a package too?
No. As I said above, having __path__ set has some special meaning in how imports work at the moment. It stays on packages and not modules within packages.
(3) import other_package.module import other_package.sub_package
If other_package is a different name from the current package (this_package), then do not look in this_package and exclude searches in sys.path locations that are inside this_package including the current directory.
This change would require importers to do more. Since the absolute import semantics automatically make this kind of import start at the top-level (i.e., sys.path), each import for an entry on sys.path would need to be told what package it is currently in, check if it handles that package, and then skip it if it does have it.
I don't think it will be as hard as this. See below.
That seems like a lot of work that I know I don't want to have to implement for every importer I ever write.
Only getting the correct package location for the first module executed in the package will be a bit of work. (But not that much.) After that, it can be passed around.
Here's something I used recently to get the full dotted name without importing. It could also return the base package path as well. You probably don't need the cache. These could be combined and shortened further for just finding a root package location.
def path_type(path): """ Determine what kind of thing path is.
Returns -> 'module'|'package'|'dir'| None """ if os.path.isfile(path) \ and (path[-3:] == '.py' or \ path[-4:] in ('.pyw', '.pyc', '.pyd', '.pyo')): return 'module' if os.path.isdir(path): for end in ['', 'w', 'c', 'o']: if os.path.isfile(os.path.join(path, '__init__.py' + end)): return 'package' return 'dir'
def dotted_name(path, cache={}): """ Get a full dotted module or package name from a path name.
Returns -> fully qualified (dotted) name | None """ if path in cache: return cache[path] if path_type(path) in ('package', 'module'): parent, name = os.path.split(path) name, _ = os.path.splitext(name) while 1: if path_type(parent) != 'package': break parent, nextname = os.path.split(parent) name = '.'.join([nextname, name]) cache[path] = name return name
lets.. see (untested)
def package_path(path): """ Get the package location of a module. """ package = None if path_type(path) in ('package', 'module'): parent, name = os.path.split(path) while 1: if path_type(parent) != 'package': break package = os.path.join(parent, name) parent, name = os.path.split(parent) return package
Or you could have copied the code I wrote for the filesystem importer's find_module method that already does this classification. =) Part of the problem of working backwards from path to dotted name is that it might not import that way. __path__ can be tweaked, importers and loaders can be written to interpret the directory structure or file names differently, etc. Plus what about different file types like .ptl files from Quixote?
(4) import module import package
Module and package are not in a package, so don't look in any packages, even this one or sys.path locations inside of packages.
This is already done. Absolute imports would cause this to do a shallow check on sys.path for the module or package name.
Great! 2 down. Almost half way there. :-)
But will it check the current directory if you run a module directly because currently it doesn't know if it's part of a package. Is that correct?
Absolute import semantics go straight to sys.path, period.
(5) For behaviors other than these, like when you do actually want to run a module belonging to a package in a different context, a mechanism such as a command line switch, or a settable import attribute should be used.
MOTIVATION ==========
(A) Added reliability.
There will be much less chance of errors (silent or otherwise) due to path/import conflicts which are sometimes difficult to diagnose.
Probably, but I don't know if the implementation complexity warrants worrying about this. But then again how many people have actually needed to implement the import machinery. =) I could be labeled as jaded.
Well, I know it's not an easy thing to do. But it's not finding the paths and or weather files are modules etc... that is hard. From what I understand the hard part is making it work so it can be extended and customized.
Is that correct?
Yes. I really think ditching this whole __main__ name thing is going to be the only solid solution. Defining a __main__() method for modules that gets executed makes the most sense to me. Just import the module and then execute the function if it exists. That allow runpy to have the name be set properly and does away with import problems without mucking with import semantics. Still have the name problem if you specify a file directly on the command line, though.
There may also be some added security benefits as well because it would much harder for someone to create a same named module or package and insert it by putting it on the path. Or by altering sys.path to do the same. [*]
[* - If this can happen there are probably more serious security issues, but not everyone has the most secure setup, so this point is still probably a good point. General reliable execution of modules is the first concern, this may be a side benefit of that.]
(B) Reduce the need for special checks and editing sys.path.
Currently some authors have edit sys.path or do special if os.path.exists() checks to ensure proper operations in some situations such as running tests. These suggestions would reduce the need for such special testing and modifications.
This might minimize some sys.path hacks in some instances, but it also complicates imports overall in terms of implementation and semantics.
I'm not sure why it would make it so much more complicated. The contexts for which the imports are done will need to be done for cases of package imports, relative package imports, and modules in any case. It's just a matter of determining which one to use from the start. I guess I need to look into how pythons imports work in a little more detail.
Where is point C?
Woops... I could make one up if you really want one. ;-)
No, that's okay. =)
(It was moved elsewhere and I forgot to reletter.)
(D) Easier editing and testing.
While you are editing modules in a package, you could then run the module directly (as you can with old style relative imports) and still get the correct package-relative behavior instead of something else. (like an exception or wrong output). Many editors support running the file being edited, including idle. It's also can be difficult to write scripts for the editors to determine the correct context to run a module in.
How is this directly solved, though? You mentioned "running" a module as if it is in a package, but there is no direct explanation of how you would want to change the import machinery to pull this off. Basically you need a way to have either modules with the name __main__ be able to get the canonical name for import purposes. Or you need to leave __name__ alone and set some other global or something to flag that it is the __main__ module.
Leave __name__ alone, yes. Add a __path__ attribute for all modules that is set to the base package location. Add a __realname__ attribute only to modules who's __name__ is set to '__main__'.
I don't like this idea of having one attribute have the same meaning as another attribute. I don't think a good backwards-compatible solution is going to crop up.
The import machinery could then use those to determine how to handle imports in that module.
Is that clearer?
It is, but I don't like it. =)
If __path__ exists, then it's module in a package. If __realname__ exists, then it was run as a script, but here's the actual name anyway.
If __name__ is '__main__' then do what scripts do when __name__ == '__main__'.
Regardless, I am not seeing how you are proposing to go about solving this problem.
Discussing it is a good start to doing that, isn't it? ;-)
Yep. -Brett