After exploring this a bit further on comp.lang.python, I was able to organize these ideas better. The more I thought about it, the more '+'s I found, and about the only '-'s I can think of is the work required to actually make a patch to do it. It's also good to keep in mind that since most people still rely on the old relative import behavior, most people have not run into some of the issues I mention here. But they will at some point. I did mean to keep this short, but clarity won out. (At least it's clear to me, but that's an entirely subjective opinion on my part.) Maybe someone will adopt this and make a real PEP out of it. :-) Cheers, Ron PROPOSAL ======== Make pythons concept of a package, (currently an informal type), be stronger than that of the underlying file system search path and directory structure. Where the following hold true in python 3.X, or when absolute_import behavior is imported from __future__ in python 2.X: (1) Python first determines if a module or package is part of a package and then runs that module or package in the context of the package they belong to. (see items below) (2) import this_package.module import this_package.sub_package If this_package is the same name as the current package, then do not look on sys.path. Use the location of this_package. (3) import other_package.module import other_package.sub_package If other_package is a different name from the current package (this_package), then do not look in this_package and exclude searches in sys.path locations that are inside this_package including the current directory. (4) import module import package Module and package are not in a package, so don't look in any packages, even this one or sys.path locations inside of packages. (5) For behaviors other than these, like when you do actually want to run a module belonging to a package in a different context, a mechanism such as a command line switch, or a settable import attribute should be used. MOTIVATION ========== (A) Added reliability. There will be much less chance of errors (silent or otherwise) due to path/import conflicts which are sometimes difficult to diagnose. There may also be some added security benefits as well because it would much harder for someone to create a same named module or package and insert it by putting it on the path. Or by altering sys.path to do the same. [*] [* - If this can happen there are probably more serious security issues, but not everyone has the most secure setup, so this point is still probably a good point. General reliable execution of modules is the first concern, this may be a side benefit of that.] (B) Reduce the need for special checks and editing sys.path. Currently some authors have edit sys.path or do special if os.path.exists() checks to ensure proper operations in some situations such as running tests. These suggestions would reduce the need for such special testing and modifications. (D) Easier editing and testing. While you are editing modules in a package, you could then run the module directly (as you can with old style relative imports) and still get the correct package-relative behavior instead of something else. (like an exception or wrong output). Many editors support running the file being edited, including idle. It's also can be difficult to write scripts for the editors to determine the correct context to run a module in. (E) Consistency with from ... import ... relative imports. A relative import also needs to find it's home package(s). These suggestions are consistent with relative import needs and would also enable relative imports to work if a module is run directly from and editor (like idle) while editing it. [*] [* - Consistency isn't a major issue, but it's nice to have.] (F) It would make things much easier for me. ;-) (Insert "Me Too's" here.) DISCUSSION ========== (I) Python has a certain minimalist quality where it tries to do a lot with a minimum amount of resources. (Which I generally love.) But in the case of packages, that might not be the best thing. It is not difficult for python to detect if a module is located in a package. With the move to explicit absolute/relative imports, it would make since if Python also were a little smarter in this area. Packages are becoming used more often and so it may also be useful to formalize the concept of a package in a stronger way. (II) Many of the problems associated with imports are a side effect of using the OS's directory structure to represent a python "package" structure. This creates some external dependence on the operating system that can effect how python programs run. Some of these issues include: - Importing the wrong package or module. - Getting an error due to a package or module not being found. - Getting an error due to a package not being loaded or initialized first. - Having to run modules or packages within a very specific OS file context. - Needing a package location to be in the systems search path. By making the concept of a package have priority over the OS's search path and directory structure, the dependence on the OS's environment is lessoned and it would insure a module runs in the correct context or give meaningful exceptions in more cases than presently. (III) If a package was represented as a combined single file. Then the working directory would always be the package directory. The suggestions presented here would have that effect and also reduce or eliminate most if not all of these problem situations. (IV) The suggested changes would change the precise meaning of an absolute import. Given the following example of an un-dotted import: >>> import foo The current meaning is: "A module or package that is located in sys.path or the current directory". But maybe a narrower interpretation of "absolute import" would be better: "A module or package found in a specific package." I believe that this latter definition is what most people will think of while editing their programs. When dotted imports are used, the left most part of the name is always a top level package or module in this case. (V) Requirements to be on the search path. It is quite reasonable to have python modules and packages not in the search path. Conversely it is not reasonable to require all python modules or packages to be in locations listed in sys.path. While this isn't a true requirement, it is often put forth as a solution to some of the problems that occur with respect to imports. (VI) Clearer errors messages. In cases where a wrong module or package is imported you often get attribute exceptions further in the code. These changes would move that up to the import statement because the wrong module would not be imported. (VII) Setting a __package__ attribute. Would it be a good idea to have a simple way for modules to determine parent packages, and their absolute locations? Python could set these when it starts or imports a module. That may make it easier to write alternate importers that are package aware. PROBLEMS AND ISSUES: - Someone needs to make it happen. I really can't think of any more than that. But I'm sure there are some as most things like this are usually a trade off of something.
On 2/4/07, Ron Adam <rrr@ronadam.com> wrote:
After exploring this a bit further on comp.lang.python, I was able to organize these ideas better. The more I thought about it, the more '+'s I found, and about the only '-'s I can think of is the work required to actually make a patch to do it.
It's also good to keep in mind that since most people still rely on the old relative import behavior, most people have not run into some of the issues I mention here. But they will at some point.
I did mean to keep this short, but clarity won out. (At least it's clear to me, but that's an entirely subjective opinion on my part.)
Maybe someone will adopt this and make a real PEP out of it. :-)
Cheers, Ron
PROPOSAL ========
Make pythons concept of a package, (currently an informal type), be stronger than that of the underlying file system search path and directory structure.
So you mean make packages more of an official thing than just having a __path__ attribute on a module, right?
Where the following hold true in python 3.X, or when absolute_import behavior is imported from __future__ in python 2.X:
(1) Python first determines if a module or package is part of a package and then runs that module or package in the context of the package they belong to. (see items below)
Don't quite follow this statement. What do you mean by "runs" here? You mean when using runpy or something and having the name set to '__main__'?
(2) import this_package.module import this_package.sub_package
If this_package is the same name as the current package, then do not look on sys.path. Use the location of this_package.
Already does this (at least in my pure Python implementation). Searches are done on __path__ when you are within a package.
(3) import other_package.module import other_package.sub_package
If other_package is a different name from the current package (this_package), then do not look in this_package and exclude searches in sys.path locations that are inside this_package including the current directory.
This change would require importers to do more. Since the absolute import semantics automatically make this kind of import start at the top-level (i.e., sys.path), each import for an entry on sys.path would need to be told what package it is currently in, check if it handles that package, and then skip it if it does have it. That seems like a lot of work that I know I don't want to have to implement for every importer I ever write.
(4) import module import package
Module and package are not in a package, so don't look in any packages, even this one or sys.path locations inside of packages.
This is already done. Absolute imports would cause this to do a shallow check on sys.path for the module or package name.
(5) For behaviors other than these, like when you do actually want to run a module belonging to a package in a different context, a mechanism such as a command line switch, or a settable import attribute should be used.
MOTIVATION ==========
(A) Added reliability.
There will be much less chance of errors (silent or otherwise) due to path/import conflicts which are sometimes difficult to diagnose.
Probably, but I don't know if the implementation complexity warrants worrying about this. But then again how many people have actually needed to implement the import machinery. =) I could be labeled as jaded.
There may also be some added security benefits as well because it would much harder for someone to create a same named module or package and insert it by putting it on the path. Or by altering sys.path to do the same. [*]
[* - If this can happen there are probably more serious security issues, but not everyone has the most secure setup, so this point is still probably a good point. General reliable execution of modules is the first concern, this may be a side benefit of that.]
(B) Reduce the need for special checks and editing sys.path.
Currently some authors have edit sys.path or do special if os.path.exists() checks to ensure proper operations in some situations such as running tests. These suggestions would reduce the need for such special testing and modifications.
This might minimize some sys.path hacks in some instances, but it also complicates imports overall in terms of implementation and semantics. Where is point C?
(D) Easier editing and testing.
While you are editing modules in a package, you could then run the module directly (as you can with old style relative imports) and still get the correct package-relative behavior instead of something else. (like an exception or wrong output). Many editors support running the file being edited, including idle. It's also can be difficult to write scripts for the editors to determine the correct context to run a module in.
How is this directly solved, though? You mentioned "running" a module as if it is in a package, but there is no direct explanation of how you would want to change the import machinery to pull this off. Basically you need a way to have either modules with the name __main__ be able to get the canonical name for import purposes. Or you need to leave __name__ alone and set some other global or something to flag that it is the __main__ module. Regardless, I am not seeing how you are proposing to go about solving this problem. I understand the desire to fix this __main__ issue with absolute imports and I totally support it, but I just need a more concrete solution in front of me (assuming I am not totally blind and it is actually in this doc). -Brett
Brett Cannon wrote:
On 2/4/07, Ron Adam <rrr@ronadam.com> wrote:
After exploring this a bit further on comp.lang.python, I was able to organize these ideas better. The more I thought about it, the more '+'s I found, and about the only '-'s I can think of is the work required to actually make a patch to do it.
It's also good to keep in mind that since most people still rely on the old relative import behavior, most people have not run into some of the issues I mention here. But they will at some point.
I did mean to keep this short, but clarity won out. (At least it's clear to me, but that's an entirely subjective opinion on my part.)
Maybe someone will adopt this and make a real PEP out of it. :-)
Cheers, Ron
PROPOSAL ========
Make pythons concept of a package, (currently an informal type), be stronger than that of the underlying file system search path and directory structure.
So you mean make packages more of an official thing than just having a __path__ attribute on a module, right?
Currently in python 2.5, __path__ attributes are only in the imported package name spaces. Running a module doesn't set a __path__ attribute, just the __file__ attribute. It would be nice if __path__ were set on all modules in packages no matter how they are started. The real name could be worked out by comparing __path__ and __file__ if someone needs that. But I think it would be better to just go ahead and add a __realname__ attribute for when __name__ is "__main__". __name__ == "__main__" can stay the same and still serve it's purpose to tell weather a script was started directly or imported.
Where the following hold true in python 3.X, or when absolute_import behavior is imported from __future__ in python 2.X:
(1) Python first determines if a module or package is part of a package and then runs that module or package in the context of the package they belong to. (see items below)
Don't quite follow this statement. What do you mean by "runs" here? You mean when using runpy or something and having the name set to '__main__'?
Yes
(2) import this_package.module import this_package.sub_package
If this_package is the same name as the current package, then do not look on sys.path. Use the location of this_package.
Already does this (at least in my pure Python implementation). Searches are done on __path__ when you are within a package.
Cool! I don't think it's like that for the non-pure version, but it may do it that way if "from __future__ import absolute_import" is used. Are you setting __path__ for each module imported in a package too?
(3) import other_package.module import other_package.sub_package
If other_package is a different name from the current package (this_package), then do not look in this_package and exclude searches in sys.path locations that are inside this_package including the current directory.
This change would require importers to do more. Since the absolute import semantics automatically make this kind of import start at the top-level (i.e., sys.path), each import for an entry on sys.path would need to be told what package it is currently in, check if it handles that package, and then skip it if it does have it.
I don't think it will be as hard as this. See below.
That seems like a lot of work that I know I don't want to have to implement for every importer I ever write.
Only getting the correct package location for the first module executed in the package will be a bit of work. (But not that much.) After that, it can be passed around. Here's something I used recently to get the full dotted name without importing. It could also return the base package path as well. You probably don't need the cache. These could be combined and shortened further for just finding a root package location. def path_type(path): """ Determine what kind of thing path is. Returns -> 'module'|'package'|'dir'| None """ if os.path.isfile(path) \ and (path[-3:] == '.py' or \ path[-4:] in ('.pyw', '.pyc', '.pyd', '.pyo')): return 'module' if os.path.isdir(path): for end in ['', 'w', 'c', 'o']: if os.path.isfile(os.path.join(path, '__init__.py' + end)): return 'package' return 'dir' def dotted_name(path, cache={}): """ Get a full dotted module or package name from a path name. Returns -> fully qualified (dotted) name | None """ if path in cache: return cache[path] if path_type(path) in ('package', 'module'): parent, name = os.path.split(path) name, _ = os.path.splitext(name) while 1: if path_type(parent) != 'package': break parent, nextname = os.path.split(parent) name = '.'.join([nextname, name]) cache[path] = name return name lets.. see (untested) def package_path(path): """ Get the package location of a module. """ package = None if path_type(path) in ('package', 'module'): parent, name = os.path.split(path) while 1: if path_type(parent) != 'package': break package = os.path.join(parent, name) parent, name = os.path.split(parent) return package
(4) import module import package
Module and package are not in a package, so don't look in any packages, even this one or sys.path locations inside of packages.
This is already done. Absolute imports would cause this to do a shallow check on sys.path for the module or package name.
Great! 2 down. Almost half way there. :-) But will it check the current directory if you run a module directly because currently it doesn't know if it's part of a package. Is that correct?
(5) For behaviors other than these, like when you do actually want to run a module belonging to a package in a different context, a mechanism such as a command line switch, or a settable import attribute should be used.
MOTIVATION ==========
(A) Added reliability.
There will be much less chance of errors (silent or otherwise) due to path/import conflicts which are sometimes difficult to diagnose.
Probably, but I don't know if the implementation complexity warrants worrying about this. But then again how many people have actually needed to implement the import machinery. =) I could be labeled as jaded.
Well, I know it's not an easy thing to do. But it's not finding the paths and or weather files are modules etc... that is hard. From what I understand the hard part is making it work so it can be extended and customized. Is that correct?
There may also be some added security benefits as well because it would much harder for someone to create a same named module or package and insert it by putting it on the path. Or by altering sys.path to do the same. [*]
[* - If this can happen there are probably more serious security issues, but not everyone has the most secure setup, so this point is still probably a good point. General reliable execution of modules is the first concern, this may be a side benefit of that.]
(B) Reduce the need for special checks and editing sys.path.
Currently some authors have edit sys.path or do special if os.path.exists() checks to ensure proper operations in some situations such as running tests. These suggestions would reduce the need for such special testing and modifications.
This might minimize some sys.path hacks in some instances, but it also complicates imports overall in terms of implementation and semantics.
I'm not sure why it would make it so much more complicated. The contexts for which the imports are done will need to be done for cases of package imports, relative package imports, and modules in any case. It's just a matter of determining which one to use from the start. I guess I need to look into how pythons imports work in a little more detail.
Where is point C?
Woops... I could make one up if you really want one. ;-) (It was moved elsewhere and I forgot to reletter.)
(D) Easier editing and testing.
While you are editing modules in a package, you could then run the module directly (as you can with old style relative imports) and still get the correct package-relative behavior instead of something else. (like an exception or wrong output). Many editors support running the file being edited, including idle. It's also can be difficult to write scripts for the editors to determine the correct context to run a module in.
How is this directly solved, though? You mentioned "running" a module as if it is in a package, but there is no direct explanation of how you would want to change the import machinery to pull this off. Basically you need a way to have either modules with the name __main__ be able to get the canonical name for import purposes. Or you need to leave __name__ alone and set some other global or something to flag that it is the __main__ module.
Leave __name__ alone, yes. Add a __path__ attribute for all modules that is set to the base package location. Add a __realname__ attribute only to modules who's __name__ is set to '__main__'. The import machinery could then use those to determine how to handle imports in that module. Is that clearer? If __path__ exists, then it's module in a package. If __realname__ exists, then it was run as a script, but here's the actual name anyway. If __name__ is '__main__' then do what scripts do when __name__ == '__main__'.
Regardless, I am not seeing how you are proposing to go about solving this problem.
Discussing it is a good start to doing that, isn't it? ;-)
I understand the desire to fix this __main__ issue with absolute imports and I totally support it, but I just need a more concrete solution in front of me (assuming I am not totally blind and it is actually in this doc).
-Brett
I only outlined the behavioral rules that need to be probably first agreed on. After that its a matter of writing it. As I said above these behaviors are not the hard part, making it extendable in a nice clean way is. Cheers, Ron
On 2/7/07, Ron Adam <rrr@ronadam.com> wrote:
Brett Cannon wrote:
On 2/4/07, Ron Adam <rrr@ronadam.com> wrote:
After exploring this a bit further on comp.lang.python, I was able to organize these ideas better. The more I thought about it, the more '+'s I found, and about the only '-'s I can think of is the work required to actually make a patch to do it.
It's also good to keep in mind that since most people still rely on the old relative import behavior, most people have not run into some of the issues I mention here. But they will at some point.
I did mean to keep this short, but clarity won out. (At least it's clear to me, but that's an entirely subjective opinion on my part.)
Maybe someone will adopt this and make a real PEP out of it. :-)
Cheers, Ron
PROPOSAL ========
Make pythons concept of a package, (currently an informal type), be stronger than that of the underlying file system search path and directory structure.
So you mean make packages more of an official thing than just having a __path__ attribute on a module, right?
Currently in python 2.5, __path__ attributes are only in the imported package name spaces. Running a module doesn't set a __path__ attribute, just the __file__ attribute.
True.
It would be nice if __path__ were set on all modules in packages no matter how they are started.
There is a slight issue with that as the __path__ attribute represents the top of a package and thus that it has an __init__ module. It has some significance in terms of how stuff works at the moment.
The real name could be worked out by comparing __path__ and __file__ if someone needs that. But I think it would be better to just go ahead and add a __realname__ attribute for when __name__ is "__main__".
__name__ == "__main__" can stay the same and still serve it's purpose to tell weather a script was started directly or imported.
I think the whole __main__ thing is the wrong thing to be trying to keep alive for this. I know it would break things, but it is probably better to come up with a better way for a module to know when it is being executed or do denote what code should only be run when it is executed.
Where the following hold true in python 3.X, or when absolute_import behavior is imported from __future__ in python 2.X:
(1) Python first determines if a module or package is part of a package and then runs that module or package in the context of the package they belong to. (see items below)
Don't quite follow this statement. What do you mean by "runs" here? You mean when using runpy or something and having the name set to '__main__'?
Yes
(2) import this_package.module import this_package.sub_package
If this_package is the same name as the current package, then do not look on sys.path. Use the location of this_package.
Already does this (at least in my pure Python implementation). Searches are done on __path__ when you are within a package.
Cool! I don't think it's like that for the non-pure version, but it may do it that way if "from __future__ import absolute_import" is used.
It does do it both ways, there is just a fallback on the classic import semantics in terms of trying it both as a relative and absolute import. But I got the semantics from the current implementation so it is not some great inspiration of mine. =)
Are you setting __path__ for each module imported in a package too?
No. As I said above, having __path__ set has some special meaning in how imports work at the moment. It stays on packages and not modules within packages.
(3) import other_package.module import other_package.sub_package
If other_package is a different name from the current package (this_package), then do not look in this_package and exclude searches in sys.path locations that are inside this_package including the current directory.
This change would require importers to do more. Since the absolute import semantics automatically make this kind of import start at the top-level (i.e., sys.path), each import for an entry on sys.path would need to be told what package it is currently in, check if it handles that package, and then skip it if it does have it.
I don't think it will be as hard as this. See below.
That seems like a lot of work that I know I don't want to have to implement for every importer I ever write.
Only getting the correct package location for the first module executed in the package will be a bit of work. (But not that much.) After that, it can be passed around.
Here's something I used recently to get the full dotted name without importing. It could also return the base package path as well. You probably don't need the cache. These could be combined and shortened further for just finding a root package location.
def path_type(path): """ Determine what kind of thing path is.
Returns -> 'module'|'package'|'dir'| None """ if os.path.isfile(path) \ and (path[-3:] == '.py' or \ path[-4:] in ('.pyw', '.pyc', '.pyd', '.pyo')): return 'module' if os.path.isdir(path): for end in ['', 'w', 'c', 'o']: if os.path.isfile(os.path.join(path, '__init__.py' + end)): return 'package' return 'dir'
def dotted_name(path, cache={}): """ Get a full dotted module or package name from a path name.
Returns -> fully qualified (dotted) name | None """ if path in cache: return cache[path] if path_type(path) in ('package', 'module'): parent, name = os.path.split(path) name, _ = os.path.splitext(name) while 1: if path_type(parent) != 'package': break parent, nextname = os.path.split(parent) name = '.'.join([nextname, name]) cache[path] = name return name
lets.. see (untested)
def package_path(path): """ Get the package location of a module. """ package = None if path_type(path) in ('package', 'module'): parent, name = os.path.split(path) while 1: if path_type(parent) != 'package': break package = os.path.join(parent, name) parent, name = os.path.split(parent) return package
Or you could have copied the code I wrote for the filesystem importer's find_module method that already does this classification. =) Part of the problem of working backwards from path to dotted name is that it might not import that way. __path__ can be tweaked, importers and loaders can be written to interpret the directory structure or file names differently, etc. Plus what about different file types like .ptl files from Quixote?
(4) import module import package
Module and package are not in a package, so don't look in any packages, even this one or sys.path locations inside of packages.
This is already done. Absolute imports would cause this to do a shallow check on sys.path for the module or package name.
Great! 2 down. Almost half way there. :-)
But will it check the current directory if you run a module directly because currently it doesn't know if it's part of a package. Is that correct?
Absolute import semantics go straight to sys.path, period.
(5) For behaviors other than these, like when you do actually want to run a module belonging to a package in a different context, a mechanism such as a command line switch, or a settable import attribute should be used.
MOTIVATION ==========
(A) Added reliability.
There will be much less chance of errors (silent or otherwise) due to path/import conflicts which are sometimes difficult to diagnose.
Probably, but I don't know if the implementation complexity warrants worrying about this. But then again how many people have actually needed to implement the import machinery. =) I could be labeled as jaded.
Well, I know it's not an easy thing to do. But it's not finding the paths and or weather files are modules etc... that is hard. From what I understand the hard part is making it work so it can be extended and customized.
Is that correct?
Yes. I really think ditching this whole __main__ name thing is going to be the only solid solution. Defining a __main__() method for modules that gets executed makes the most sense to me. Just import the module and then execute the function if it exists. That allow runpy to have the name be set properly and does away with import problems without mucking with import semantics. Still have the name problem if you specify a file directly on the command line, though.
There may also be some added security benefits as well because it would much harder for someone to create a same named module or package and insert it by putting it on the path. Or by altering sys.path to do the same. [*]
[* - If this can happen there are probably more serious security issues, but not everyone has the most secure setup, so this point is still probably a good point. General reliable execution of modules is the first concern, this may be a side benefit of that.]
(B) Reduce the need for special checks and editing sys.path.
Currently some authors have edit sys.path or do special if os.path.exists() checks to ensure proper operations in some situations such as running tests. These suggestions would reduce the need for such special testing and modifications.
This might minimize some sys.path hacks in some instances, but it also complicates imports overall in terms of implementation and semantics.
I'm not sure why it would make it so much more complicated. The contexts for which the imports are done will need to be done for cases of package imports, relative package imports, and modules in any case. It's just a matter of determining which one to use from the start. I guess I need to look into how pythons imports work in a little more detail.
Where is point C?
Woops... I could make one up if you really want one. ;-)
No, that's okay. =)
(It was moved elsewhere and I forgot to reletter.)
(D) Easier editing and testing.
While you are editing modules in a package, you could then run the module directly (as you can with old style relative imports) and still get the correct package-relative behavior instead of something else. (like an exception or wrong output). Many editors support running the file being edited, including idle. It's also can be difficult to write scripts for the editors to determine the correct context to run a module in.
How is this directly solved, though? You mentioned "running" a module as if it is in a package, but there is no direct explanation of how you would want to change the import machinery to pull this off. Basically you need a way to have either modules with the name __main__ be able to get the canonical name for import purposes. Or you need to leave __name__ alone and set some other global or something to flag that it is the __main__ module.
Leave __name__ alone, yes. Add a __path__ attribute for all modules that is set to the base package location. Add a __realname__ attribute only to modules who's __name__ is set to '__main__'.
I don't like this idea of having one attribute have the same meaning as another attribute. I don't think a good backwards-compatible solution is going to crop up.
The import machinery could then use those to determine how to handle imports in that module.
Is that clearer?
It is, but I don't like it. =)
If __path__ exists, then it's module in a package. If __realname__ exists, then it was run as a script, but here's the actual name anyway.
If __name__ is '__main__' then do what scripts do when __name__ == '__main__'.
Regardless, I am not seeing how you are proposing to go about solving this problem.
Discussing it is a good start to doing that, isn't it? ;-)
Yep. -Brett
Brett Cannon wrote:
On 2/7/07, Ron Adam <rrr@ronadam.com> wrote:
Brett Cannon wrote:
On 2/4/07, Ron Adam <rrr@ronadam.com> wrote:
It would be nice if __path__ were set on all modules in packages no matter how they are started.
There is a slight issue with that as the __path__ attribute represents the top of a package and thus that it has an __init__ module. It has some significance in terms of how stuff works at the moment.
Yes, and after some reading I found __path__ isn't exactly what I was thinking. It could be it's only a matter of getting that first initial import right. An example of this is this recipe by Nick. http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/307772
The real name could be worked out by comparing __path__ and __file__ if someone needs that. But I think it would be better to just go ahead and add a __realname__ attribute for when __name__ is "__main__".
__name__ == "__main__" can stay the same and still serve it's purpose to tell weather a script was started directly or imported.
I think the whole __main__ thing is the wrong thing to be trying to keep alive for this. I know it would break things, but it is probably better to come up with a better way for a module to know when it is being executed or do denote what code should only be run when it is executed.
I was trying to suggest things that would do the least harm as far as changing things in the eyes of the users. If not keeping the "__main__" name in python 3k is a real option then yes, then there may be more options. Is it a real option? Or is Guido set on keeping it? If you remove the "__main__" name, then you will still need to have some attribute for python to determine the same thing. What you would end up doing is just moving the [if __name__=="__main__": __main__()] line off the end of program so that all program have it automatically. We just won't see it. And instead of checking __name__, the interpreter would check some other attribute. So what and where would that other attribute be? Would it be exposed so we add if __ismain__: <body> to our programs for initialization purposes? Or you could just replace it with an __ismain__ attribute then we can name our main functions anyhthing we want... like test(). if __ismain__: test() That is shorter and maybe less confusing than the __name__ check.
(2) import this_package.module import this_package.sub_package
If this_package is the same name as the current package, then do not look on sys.path. Use the location of this_package.
Already does this (at least in my pure Python implementation). Searches are done on __path__ when you are within a package.
Cool! I don't think it's like that for the non-pure version, but it may do it that way if "from __future__ import absolute_import" is used.
It does do it both ways, there is just a fallback on the classic import semantics in terms of trying it both as a relative and absolute import. But I got the semantics from the current implementation so it is not some great inspiration of mine. =)
I think there shouldn't be a fall back.. that will just confuse things. Raise an exception here because most likely falling back is not what you want. If someone wants to import an external to a package module with the same name as the package, (or modules in some other package with the same name), then there needs to be an explicit way to do that. But I really don't think this will come up that often. <clipped general examples>
Or you could have copied the code I wrote for the filesystem importer's find_module method that already does this classification. =)
Part of the problem of working backwards from path to dotted name is that it might not import that way.
Maybe it should work that way? If someone wants other than that behavior, then maybe there can be other ways to get it? Hers's an example of a situation where you might think it would be a problem, but it isn't: pkg1: __init__.py m1.py spkg1: __init__.py m3.py dirA: m4.py pkg2: __init__.py m5.py You might think it wouldn't work for pkg2.m5, but that's actually ok. pkg2 is a package just being stored in dirA which just happens to be located inside another package. Running m5.py directly will run it as a submodule of pkg2, which is what you want. It's not in a sub-package of pkg1. And m4.py is just a regular module. Or are you thinking of other relationships?
__path__ can be tweaked, importers and loaders can be written to interpret the directory structure or file names differently, etc.
Yes, and they will need a basic set of well defined default behaviors to build on. After that, it's up to them to be sure their interpretation does what they want.
Plus what about different file types like .ptl files from Quixote?
This is really a matter of using a corresponding file reader to get at it's contents or it's real (relative to python) type... Ie, is it really a module, a package, or a module in a package, or some other thing ... living inside of a zip, or some other device (or file) like container?
(4) import module import package
Module and package are not in a package, so don't look in any packages, even this one or sys.path locations inside of packages.
This is already done. Absolute imports would cause this to do a shallow check on sys.path for the module or package name.
Great! 2 down. Almost half way there. :-)
But will it check the current directory if you run a module directly because currently it doesn't know if it's part of a package. Is that correct?
Absolute import semantics go straight to sys.path, period.
Which includes the current directory. So in effect it will fall back to a relative type of behavior if a module with the same name is being imported exist in the current, inside this package direcotry, *if* you execute the module directly. I think this should also give an error, it is the inverse of the situation above. (#2) In most cases (if not all) it's not what you want. You wanted a module that is not part of this modules package, and got one that is.
MOTIVATION ==========
(A) Added reliability.
There will be much less chance of errors (silent or otherwise) due to path/import conflicts which are sometimes difficult to diagnose.
Probably, but I don't know if the implementation complexity warrants worrying about this. But then again how many people have actually needed to implement the import machinery. =) I could be labeled as jaded.
Well, I know it's not an easy thing to do. But it's not finding the paths and or weather files are modules etc... that is hard. From what I understand the hard part is making it work so it can be extended and customized.
Is that correct?
Yes. I really think ditching this whole __main__ name thing is going to be the only solid solution. Defining a __main__() method for modules that gets executed makes the most sense to me. Just import the module and then execute the function if it exists. That allow runpy to have the name be set properly and does away with import problems without mucking with import semantics. Still have the name problem if you specify a file directly on the command line, though.
I'll have to see more details of how this would work I think. Part of me says sound good. And another part says, isn't this just moving stuff around? And what exactly does that solve?
The import machinery could then use those to determine how to handle imports in that module.
Is that clearer?
It is, but I don't like it. =)
It does't exactly have to work that way. ;-) It's the "does it do what I designed it to do" behavioral stuff of packages and modules that I want. If the module, however it is run gives an error or does something other than what I intended, then that's a problem. Ron
On 2/8/07, Ron Adam <rrr@ronadam.com> wrote:
Brett Cannon wrote:
On 2/7/07, Ron Adam <rrr@ronadam.com> wrote:
Brett Cannon wrote:
On 2/4/07, Ron Adam <rrr@ronadam.com> wrote:
It would be nice if __path__ were set on all modules in packages no matter how they are started.
There is a slight issue with that as the __path__ attribute represents the top of a package and thus that it has an __init__ module. It has some significance in terms of how stuff works at the moment.
Yes, and after some reading I found __path__ isn't exactly what I was thinking.
It could be it's only a matter of getting that first initial import right. An example of this is this recipe by Nick.
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/307772
But Nick already rolled this stuff into 2.5 when package support was added to runpy.
The real name could be worked out by comparing __path__ and __file__ if someone needs that. But I think it would be better to just go ahead and add a __realname__ attribute for when __name__ is "__main__".
__name__ == "__main__" can stay the same and still serve it's purpose to tell weather a script was started directly or imported.
I think the whole __main__ thing is the wrong thing to be trying to keep alive for this. I know it would break things, but it is probably better to come up with a better way for a module to know when it is being executed or do denote what code should only be run when it is executed.
I was trying to suggest things that would do the least harm as far as changing things in the eyes of the users. If not keeping the "__main__" name in python 3k is a real option then yes, then there may be more options. Is it a real option? Or is Guido set on keeping it?
Beats me. Wouldn't be hard to have 2to change ``if __name__ == '__main__'`` to a function definition instead.
If you remove the "__main__" name, then you will still need to have some attribute for python to determine the same thing.
Why? There is nothing saying we can't follow most other languages and just have a reserved function name that gets executed if the module is executed.
What you would end up doing is just moving the [if __name__=="__main__": __main__()] line off the end of program so that all program have it automatically. We just won't see it. And instead of checking __name__, the interpreter would check some other attribute.
So what and where would that other attribute be?
If a thing was done like that it would be in the global namespace of the module just like __name__ is.
Would it be exposed so we add if __ismain__: <body> to our programs for initialization purposes?
Or you could just replace it with an __ismain__ attribute then we can name our main functions anyhthing we want... like test().
if __ismain__: test()
That is shorter and maybe less confusing than the __name__ check.
(2) import this_package.module import this_package.sub_package
If this_package is the same name as the current package, then do not look on sys.path. Use the location of this_package.
Already does this (at least in my pure Python implementation). Searches are done on __path__ when you are within a package.
Cool! I don't think it's like that for the non-pure version, but it may do it that way if "from __future__ import absolute_import" is used.
It does do it both ways, there is just a fallback on the classic import semantics in terms of trying it both as a relative and absolute import. But I got the semantics from the current implementation so it is not some great inspiration of mine. =)
I think there shouldn't be a fall back.. that will just confuse things. Raise an exception here because most likely falling back is not what you want.
The fallback is the old way, so don't worry about it.
If someone wants to import an external to a package module with the same name as the package, (or modules in some other package with the same name), then there needs to be an explicit way to do that. But I really don't think this will come up that often.
<clipped general examples>
Or you could have copied the code I wrote for the filesystem importer's find_module method that already does this classification. =)
Part of the problem of working backwards from path to dotted name is that it might not import that way.
Maybe it should work that way? If someone wants other than that behavior, then maybe there can be other ways to get it?
That's my point; the "other way" needs to work and the default can be based on the path.
Hers's an example of a situation where you might think it would be a problem, but it isn't:
pkg1: __init__.py m1.py spkg1: __init__.py m3.py dirA: m4.py pkg2: __init__.py m5.py
You might think it wouldn't work for pkg2.m5, but that's actually ok. pkg2 is a package just being stored in dirA which just happens to be located inside another package.
Running m5.py directly will run it as a submodule of pkg2, which is what you want. It's not in a sub-package of pkg1. And m4.py is just a regular module.
Or are you thinking of other relationships?
I am thinking of a package's __path__ being set to a specific directory based on the platform or something. That totally changes the search order for the package that does not correspond to its directory location.
__path__ can be tweaked, importers and loaders can be written to interpret the directory structure or file names differently, etc.
Yes, and they will need a basic set of well defined default behaviors to build on. After that, it's up to them to be sure their interpretation does what they want.
Plus what about different file types like .ptl files from Quixote?
This is really a matter of using a corresponding file reader to get at it's contents or it's real (relative to python) type... Ie, is it really a module, a package, or a module in a package, or some other thing ... living inside of a zip, or some other device (or file) like container?
(4) import module import package
Module and package are not in a package, so don't look in any packages, even this one or sys.path locations inside of packages.
This is already done. Absolute imports would cause this to do a shallow check on sys.path for the module or package name.
Great! 2 down. Almost half way there. :-)
But will it check the current directory if you run a module directly because currently it doesn't know if it's part of a package. Is that correct?
Absolute import semantics go straight to sys.path, period.
Which includes the current directory. So in effect it will fall back to a relative type of behavior if a module with the same name is being imported exist in the current, inside this package direcotry, *if* you execute the module directly.
I think this should also give an error, it is the inverse of the situation above. (#2) In most cases (if not all) it's not what you want.
You wanted a module that is not part of this modules package, and got one that is.
MOTIVATION ==========
(A) Added reliability.
There will be much less chance of errors (silent or otherwise) due to path/import conflicts which are sometimes difficult to diagnose.
Probably, but I don't know if the implementation complexity warrants worrying about this. But then again how many people have actually needed to implement the import machinery. =) I could be labeled as jaded.
Well, I know it's not an easy thing to do. But it's not finding the paths and or weather files are modules etc... that is hard. From what I understand the hard part is making it work so it can be extended and customized.
Is that correct?
Yes. I really think ditching this whole __main__ name thing is going to be the only solid solution. Defining a __main__() method for modules that gets executed makes the most sense to me. Just import the module and then execute the function if it exists. That allow runpy to have the name be set properly and does away with import problems without mucking with import semantics. Still have the name problem if you specify a file directly on the command line, though.
I'll have to see more details of how this would work I think. Part of me says sound good. And another part says, isn't this just moving stuff around? And what exactly does that solve?
It is moving things around, but so what? Moving it keeps __name__ sane. At work a global could be set to the name of the module that started the execution or have an alias in sys.modules for the '__main__' key to the module being executed. The point of the solution it provides is it doesn't muck with import semantics. It allows the execution stuff to be external to imports and be its own thing. Guido has rejected this idea before (see PEP 299 : http://www.python.org/dev/peps/pep-0299/ ), but then again there was not this issue before. Now I see why Nick said he wouldn't touch this in PEP 338. =) -Brett
Brett Cannon wrote:
On 2/8/07, Ron Adam <rrr@ronadam.com> wrote:
Brett Cannon wrote:
On 2/7/07, Ron Adam <rrr@ronadam.com> wrote:
Brett Cannon wrote:
On 2/4/07, Ron Adam <rrr@ronadam.com> wrote:
It would be nice if __path__ were set on all modules in packages no matter how they are started.
There is a slight issue with that as the __path__ attribute represents the top of a package and thus that it has an __init__ module. It has some significance in terms of how stuff works at the moment.
Yes, and after some reading I found __path__ isn't exactly what I was thinking.
It could be it's only a matter of getting that first initial import right. An example of this is this recipe by Nick.
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/307772
But Nick already rolled this stuff into 2.5 when package support was added to runpy.
I'll take a look at runpy today sometime.
If you remove the "__main__" name, then you will still need to have some attribute for python to determine the same thing.
Why? There is nothing saying we can't follow most other languages and just have a reserved function name that gets executed if the module is executed.
Yes, but this is where python is different from other languages. In a way, python's main *is* the whole module from the top to bottom. And so the '__main__' name is referring to the whole module and not just a function in it. A more specific function would be needed to get the context right. Maybe __script__(), or __run__(). Or if you want to be consistent with class's, how about adding __call__() to modules? Then the main body of the module effectively works the same way as it does in a class. =) Hey, I think that has some cool possibilities, it makes modules callable in general. So if I want to run a module's __call__(), AKA main() as you call it, after importing I would just do... import module module() And it would just work. ;-)
What you would end up doing is just moving the [if __name__=="__main__": __main__()] line off the end of program so that all program have it automatically. We just won't see it. And instead of checking __name__, the interpreter would check some other attribute.
So what and where would that other attribute be?
If a thing was done like that it would be in the global namespace of the module just like __name__ is.
Forget this, I like the idea above much better! It's fully consistent with class's and so it would be easy to explain as well. A step towards unification of class's and modules. The __name__ attribute isn't changed as well. ;-)
If someone wants to import an external to a package module with the same name as the package, (or modules in some other package with the same name), then there needs to be an explicit way to do that. But I really don't think this will come up that often.
<clipped general examples>
Or you could have copied the code I wrote for the filesystem importer's find_module method that already does this classification. =)
Part of the problem of working backwards from path to dotted name is that it might not import that way.
Maybe it should work that way? If someone wants other than that behavior, then maybe there can be other ways to get it?
That's my point; the "other way" needs to work and the default can be based on the path.
We need to get much more specific on this. ie... examples. I don't think we will get anywhere trying to generalize this point.
Hers's an example of a situation where you might think it would be a problem, but it isn't:
pkg1: __init__.py m1.py spkg1: __init__.py m3.py dirA: m4.py pkg2: __init__.py m5.py
You might think it wouldn't work for pkg2.m5, but that's actually ok. pkg2 is a package just being stored in dirA which just happens to be located inside another package.
Running m5.py directly will run it as a submodule of pkg2, which is what you want. It's not in a sub-package of pkg1. And m4.py is just a regular module.
Or are you thinking of other relationships?
I am thinking of a package's __path__ being set to a specific directory based on the platform or something. That totally changes the search order for the package that does not correspond to its directory location.
In that case, I think the developer and anyone who tries to run the script in a way the developer did not intend will have to be on their own. For example if I add a directory to __path__ to include a module that normally lives someplace else. Thats ok. If I execute any of 'my' modules in 'my' package. It will import __init__.py and set the __path__ accordingly and everything will still work. But if I execute the 'other' module directly, then python needs to run it in what ever context it normally lives in. We shouldn't try to figure out what 'other' packages it may be used in, because it may be used in many packages. So the only thing to do is run it in the context it is in where we find it. And not this 'special' context we put it in. For situations where we might have several subdir's in our package that may be choosen from depending on platform (or other things). We may be able to put a hint in the directory, such as a _init__.py file. (Notice the single underscore.) Or some variation if that's too subtle. The idea is it's an inactive sub-package and the main packages __init__ file could activate a 'reserved' sub-package using some method like renaming the _init__.py to __init__.py, (but I really don't like renaming as a way to do that.) It would be better to have some other way. Then we could possibly still do the search up to find the root package by including _init__.py files in our search in those cases as well.
MOTIVATION ==========
(A) Added reliability.
There will be much less chance of errors (silent or otherwise) due to path/import conflicts which are sometimes difficult to diagnose.
Probably, but I don't know if the implementation complexity warrants worrying about this. But then again how many people have actually needed to implement the import machinery. =) I could be labeled as jaded.
Well, I know it's not an easy thing to do. But it's not finding the paths and or weather files are modules etc... that is hard. From what I understand the hard part is making it work so it can be extended and customized.
Is that correct?
Yes. I really think ditching this whole __main__ name thing is going to be the only solid solution. Defining a __main__() method for modules that gets executed makes the most sense to me. Just import the module and then execute the function if it exists. That allow runpy to have the name be set properly and does away with import problems without mucking with import semantics. Still have the name problem if you specify a file directly on the command line, though.
I'll have to see more details of how this would work I think. Part of me says sound good. And another part says, isn't this just moving stuff around? And what exactly does that solve?
It is moving things around, but so what? Moving it keeps __name__ sane. At work a global could be set to the name of the module that started the execution or have an alias in sys.modules for the '__main__' key to the module being executed.
Or just use __call__(). It already behaves in the way you want for class's. It could be reused I think for modules. The only difference is it won't have a self arguments. Which I think is not a problem.
The point of the solution it provides is it doesn't muck with import semantics. It allows the execution stuff to be external to imports and be its own thing.
Guido has rejected this idea before (see PEP 299 : http://www.python.org/dev/peps/pep-0299/ ), but then again there was not this issue before.
Now I see why Nick said he wouldn't touch this in PEP 338. =)
I read the thread, and backwards compatibility as well as Guido just not liking it were the reasons it was rejected. Backwards compatibility is less of a problem for py3k, but I also agree with his reasons for not liking it. I think a reserved __call__() function for modules may be a little easier to sell. It's already reserved in other situations for very much the same purpose as well. Cheers, Ron
On 2/9/07, Ron Adam <rrr@ronadam.com> wrote:
Brett Cannon wrote:
On 2/8/07, Ron Adam <rrr@ronadam.com> wrote:
[SNIP]
If you remove the "__main__" name, then you will still need to have some attribute for python to determine the same thing.
Why? There is nothing saying we can't follow most other languages and just have a reserved function name that gets executed if the module is executed.
Yes, but this is where python is different from other languages. In a way, python's main *is* the whole module from the top to bottom. And so the '__main__' name is referring to the whole module and not just a function in it.
A more specific function would be needed to get the context right. Maybe __script__(), or __run__().
Or if you want to be consistent with class's, how about adding __call__() to modules? Then the main body of the module effectively works the same way as it does in a class. =)
Hey, I think that has some cool possibilities, it makes modules callable in general. So if I want to run a module's __call__(), AKA main() as you call it, after importing I would just do...
import module module()
And it would just work. ;-)
I like this idea. Makes it very obvious. You just say "when a specific module is specified at the command line it is called. Could even have it take possibly sys.argv[1:] (which I think was supposed to turn into sys.args or sys.arg or something at some point). What do other people think? -Brett
"Brett Cannon" <brett@python.org> wrote:
On 2/9/07, Ron Adam <rrr@ronadam.com> wrote:
Brett Cannon wrote:
On 2/8/07, Ron Adam <rrr@ronadam.com> wrote:
[SNIP]
If you remove the "__main__" name, then you will still need to have some attribute for python to determine the same thing.
Why? There is nothing saying we can't follow most other languages and just have a reserved function name that gets executed if the module is executed.
Yes, but this is where python is different from other languages. In a way, python's main *is* the whole module from the top to bottom. And so the '__main__' name is referring to the whole module and not just a function in it.
A more specific function would be needed to get the context right. Maybe __script__(), or __run__().
Or if you want to be consistent with class's, how about adding __call__() to modules? Then the main body of the module effectively works the same way as it does in a class. =)
Hey, I think that has some cool possibilities, it makes modules callable in general. So if I want to run a module's __call__(), AKA main() as you call it, after importing I would just do...
import module module()
And it would just work. ;-)
I like this idea. Makes it very obvious. You just say "when a specific module is specified at the command line it is called. Could even have it take possibly sys.argv[1:] (which I think was supposed to turn into sys.args or sys.arg or something at some point).
What do other people think?
I don't like it. Much of my dislike comes from personal aesthetics, but then there is a logical disconnect. When an instance of a class is created, its __call__ method is not automatically called. By using the semantic of 'the __call__ function in the module namespace is automatically executed if the module is "run" from the command line', we are introducing a different instance creation semantic (an imported module is an instance of ModuleType). I think we should just stick with what has been proposed for *years*, a __main__ function that is automatically executed after the module has been imported if its __name__ == '__main__'. Even better, anyone who wants to write code compatible with the updated syntax can include the following literal block at the end of their files... if __name__ == '__main__': try: __main__ except NameError: pass else: try: __main__() finally: try: from __future__ import disable_run except SyntaxError: #we are using an older Python pass else: #we are using a new Python, and #disabling automatic running succeeded pass With such a semantic, current users of Python could include the above literal block and it would *just work*...then again, the new semantic wouldn't really be useful if people started using the above literal block. - Josiah
On 2/9/07, Josiah Carlson <jcarlson@uci.edu> wrote:
"Brett Cannon" <brett@python.org> wrote:
On 2/9/07, Ron Adam <rrr@ronadam.com> wrote:
Brett Cannon wrote:
On 2/8/07, Ron Adam <rrr@ronadam.com> wrote:
[SNIP]
If you remove the "__main__" name, then you will still need to have some attribute for python to determine the same thing.
Why? There is nothing saying we can't follow most other languages and just have a reserved function name that gets executed if the module is executed.
Yes, but this is where python is different from other languages. In a way, python's main *is* the whole module from the top to bottom. And so the '__main__' name is referring to the whole module and not just a function in it.
A more specific function would be needed to get the context right. Maybe __script__(), or __run__().
Or if you want to be consistent with class's, how about adding __call__() to modules? Then the main body of the module effectively works the same way as it does in a class. =)
Hey, I think that has some cool possibilities, it makes modules callable in general. So if I want to run a module's __call__(), AKA main() as you call it, after importing I would just do...
import module module()
And it would just work. ;-)
I like this idea. Makes it very obvious. You just say "when a specific module is specified at the command line it is called. Could even have it take possibly sys.argv[1:] (which I think was supposed to turn into sys.args or sys.arg or something at some point).
What do other people think?
I don't like it. Much of my dislike comes from personal aesthetics, but then there is a logical disconnect. When an instance of a class is created, its __call__ method is not automatically called. By using the semantic of 'the __call__ function in the module namespace is automatically executed if the module is "run" from the command line', we are introducing a different instance creation semantic (an imported module is an instance of ModuleType).
But I don't see the leap of how specifying a module to execute on the command line is any different than doing ``Class()()`` for instantiation with an immediate call. It would still be a separate step.
I think we should just stick with what has been proposed for *years*, a __main__ function that is automatically executed after the module has been imported if its __name__ == '__main__'.
But that does not solve the problem Ron has been trying to deal with; setting __name__ to __main__ prevents the execution of a module that uses relative imports because the import machinery can then no longer infer what package the module is in. -Brett
"Brett Cannon" <brett@python.org> wrote:
On 2/9/07, Josiah Carlson <jcarlson@uci.edu> wrote:
"Brett Cannon" <brett@python.org> wrote:
On 2/9/07, Ron Adam <rrr@ronadam.com> wrote: [snip]
import module module()
And it would just work. ;-)
I like this idea. Makes it very obvious. You just say "when a specific module is specified at the command line it is called. Could even have it take possibly sys.argv[1:] (which I think was supposed to turn into sys.args or sys.arg or something at some point).
What do other people think?
I don't like it. Much of my dislike comes from personal aesthetics, but then there is a logical disconnect. When an instance of a class is created, its __call__ method is not automatically called. By using the semantic of 'the __call__ function in the module namespace is automatically executed if the module is "run" from the command line', we are introducing a different instance creation semantic (an imported module is an instance of ModuleType).
But I don't see the leap of how specifying a module to execute on the command line is any different than doing ``Class()()`` for instantiation with an immediate call. It would still be a separate step.
I feel that there is a disconnect, but maybe it's personal aesthetics again.
I think we should just stick with what has been proposed for *years*, a __main__ function that is automatically executed after the module has been imported if its __name__ == '__main__'.
But that does not solve the problem Ron has been trying to deal with; setting __name__ to __main__ prevents the execution of a module that uses relative imports because the import machinery can then no longer infer what package the module is in.
I may have missed it, but how are either of the following ambiguous... from . import foo from ..bar import baz The leading dot tells me that 'relative to the path of the current module (or __init__.py module in a package), look for packages/modules named [everything else without a single leading dot]. Now, I tried the first of those lines in Python 2.5 and I was surpised that having two files foo and goo, goo importing foo via the first example above, didn't work. What is even worse is that over a year ago I was working on an import semantic for relative imports that would have made the above do as I would have expected. This leads me to believe that *something* about relative imports is broken, but being that I mostly skipped the earlier portions of this particular thread, I'm not certain what it is. I would *guess* that it has to do with the way the current importer comes up with package relative imports, and I believe it could be fixed by switching to a path-relative import. That is, when module goo is performing 'from . import foo', you don't look at goo.__name__ to determine where to look for foo, you look at goo.__file__ . With that change in semantic, both of the above cases work just fine, including 'from ..bar import baz', even without the current module being part of a package. That is, running goo.py in the following tree would succeed with the above two imports... .../ torun/ #include __init__.py if you want this to be a package goo.py foo.py bar/ __init__.py baz.py #I don't know if it would make sense to require __init__.py here It is still possible to handle import hooks, as the relative import stuff is only really applicable to getting the base path from which to start searching for sub packages (after you have stripped off all leading periods). It also naturally leads to a __name__ semantic that Guido had suggested to me when I was talking about relative imports: goo.__name__ == '__main__' foo.__name__ == '__main__.foo' baz.__name__ == '__main__..bar.baz' Which could more or less be used with the current importer; it just needs a special-casing of 'from . import ...' in the __main__ module. It may also make sense to do path/__name__ normalization relative to __main__ for any relative imports, so if in baz.py above you did 'from ..torun import foo', that it gave you the previously existing foo and not a new copy. I've got a bunch of code that implements the above name/path semantic, but it was never tested (being that I was using Python 2.4 at the time, and relative imports weren't proper syntax). - Josiah
On 2/9/07, Josiah Carlson <jcarlson@uci.edu> wrote:
"Brett Cannon" <brett@python.org> wrote:
On 2/9/07, Josiah Carlson <jcarlson@uci.edu> wrote:
"Brett Cannon" <brett@python.org> wrote:
On 2/9/07, Ron Adam <rrr@ronadam.com> wrote: [snip]
import module module()
And it would just work. ;-)
I like this idea. Makes it very obvious. You just say "when a specific module is specified at the command line it is called. Could even have it take possibly sys.argv[1:] (which I think was supposed to turn into sys.args or sys.arg or something at some point).
What do other people think?
I don't like it. Much of my dislike comes from personal aesthetics, but then there is a logical disconnect. When an instance of a class is created, its __call__ method is not automatically called. By using the semantic of 'the __call__ function in the module namespace is automatically executed if the module is "run" from the command line', we are introducing a different instance creation semantic (an imported module is an instance of ModuleType).
But I don't see the leap of how specifying a module to execute on the command line is any different than doing ``Class()()`` for instantiation with an immediate call. It would still be a separate step.
I feel that there is a disconnect, but maybe it's personal aesthetics again.
I think we should just stick with what has been proposed for *years*, a __main__ function that is automatically executed after the module has been imported if its __name__ == '__main__'.
But that does not solve the problem Ron has been trying to deal with; setting __name__ to __main__ prevents the execution of a module that uses relative imports because the import machinery can then no longer infer what package the module is in.
I may have missed it, but how are either of the following ambiguous...
from . import foo from ..bar import baz
The leading dot tells me that 'relative to the path of the current module (or __init__.py module in a package), look for packages/modules named [everything else without a single leading dot].
Right, but "path" in this case is not file path but an ambiguous path. Relative imports work for modules stored in file systems, files (e.g., zip files), databases, etc.
Now, I tried the first of those lines in Python 2.5 and I was surpised that having two files foo and goo, goo importing foo via the first example above, didn't work. What is even worse is that over a year ago I was working on an import semantic for relative imports that would have made the above do as I would have expected.
This leads me to believe that *something* about relative imports is broken, but being that I mostly skipped the earlier portions of this particular thread, I'm not certain what it is. I would *guess* that it has to do with the way the current importer comes up with package relative imports, and I believe it could be fixed by switching to a path-relative import.
That is, when module goo is performing 'from . import foo', you don't look at goo.__name__ to determine where to look for foo, you look at goo.__file__ .
But what about modules stored in a sqlite3 database? How is that supposed to work? What made basing relative imports off of __name__ so nice is that it allowed the import machinery figure out what the resulting absolute module name was. That allowed the modules to be stored any way that you wanted without making any assumptions about how the modules are stored or their location is determined by an importer's find_module method.
With that change in semantic, both of the above cases work just fine, including 'from ..bar import baz', even without the current module being part of a package. That is, running goo.py in the following tree would succeed with the above two imports...
.../ torun/ #include __init__.py if you want this to be a package goo.py foo.py bar/ __init__.py baz.py #I don't know if it would make sense to require __init__.py here
It is still possible to handle import hooks, as the relative import stuff is only really applicable to getting the base path from which to start searching for sub packages (after you have stripped off all leading periods).
But it isn't a file path, it's an absolute module name that you are after.
It also naturally leads to a __name__ semantic that Guido had suggested to me when I was talking about relative imports:
goo.__name__ == '__main__' foo.__name__ == '__main__.foo' baz.__name__ == '__main__..bar.baz'
Which could more or less be used with the current importer; it just needs a special-casing of 'from . import ...' in the __main__ module.
And I am trying to avoid special-casing for this. -Brett
"Brett Cannon" <brett@python.org> wrote:
On 2/9/07, Josiah Carlson <jcarlson@uci.edu> wrote: [snip]
Now, I tried the first of those lines in Python 2.5 and I was surpised that having two files foo and goo, goo importing foo via the first example above, didn't work. What is even worse is that over a year ago I was working on an import semantic for relative imports that would have made the above do as I would have expected.
This leads me to believe that *something* about relative imports is broken, but being that I mostly skipped the earlier portions of this particular thread, I'm not certain what it is. I would *guess* that it has to do with the way the current importer comes up with package relative imports, and I believe it could be fixed by switching to a path-relative import.
That is, when module goo is performing 'from . import foo', you don't look at goo.__name__ to determine where to look for foo, you look at goo.__file__ .
But what about modules stored in a sqlite3 database? How is that supposed to work? What made basing relative imports off of __name__ so nice is that it allowed the import machinery figure out what the resulting absolute module name was. That allowed the modules to be stored any way that you wanted without making any assumptions about how the modules are stored or their location is determined by an importer's find_module method.
How is it done now? Presumably if you have some module imported from a database (who does this, really?), it gets a name like dbimported.name1.name2, which an import hook can recognize as being imported from a database. Now, is dbimported.name1.name2 really the content of an __init__.py file (if it was a package), or is it a module in dbimported.name1? Right now, we can't distinguish (based on __name__) between the cases of... foo/ __init__.py and foo.py But we can, trivially, distinguish just by *examining* __file__ (at least for files from a filesystem). For example: >>> import bar >>> import baz >>> bar.__name__, bar.__file__ ('bar', 'bar.py') >>> baz.__name__, baz.__file__ ('baz', 'baz\\__init__.py') >>> It's pretty obvious to me which one is a package and which one is a module. If we codify the requirement that __file__ must end with '.../__init__.X' (where X can be; py, pyw, pyc, so, dll, pyd, etc.) if the thing we imported is a package, then the import hooks don't need to use the __file__ attribute for anything other than discerning between "is this a package, or is this a module", and can then handle the __name__ mangling as per import semantics. The only trick is if someone were to specifically import an __init__ module (from .package import __init__), but even then, the results are garbage (you get the module's __init__ method).
But it isn't a file path, it's an absolute module name that you are after.
If one were to just do "path" manipulations, afterwards you can translate that to an absolute name (perhaps based on the path of __main__). Pretending that you have a path can make the semantic non-ambigous, but I prefer the alternate I just described.
It also naturally leads to a __name__ semantic that Guido had suggested to me when I was talking about relative imports:
goo.__name__ == '__main__' foo.__name__ == '__main__.foo' baz.__name__ == '__main__..bar.baz'
Which could more or less be used with the current importer; it just needs a special-casing of 'from . import ...' in the __main__ module.
And I am trying to avoid special-casing for this.
And it's only really an issue because we can't currently discern between module or package, right? So let us choose some semantic for determining "is this a module or package?", perhaps set by whatever did the importing, and skip the special cases for __main__, etc. We can use the __file__ semantic I described earlier. Or we can specify a new attribute on modules; __package__. If __package__ == __name__, then the current module is a package. If __package__ != __name__, then the current module is not a package. Regardless, when doing relative imports, the 'name' we start out with is __package__. For example, say we have a file called foo.py that we have run from the command line. It's __name__ should be '__main__', as per Python history. However, __package__ will be ''. When foo.py performs 'from . import goo', we know precisely what "package" we are in, the same package (and "path") as the '__main__' module. Two remaining cases: 1) If goo is a module; goo.py sits next to foo.py (or equivalently in a database, etc.) goo.__package__ == '' goo.__name__ == 'goo' 2) If goo is a package; goo.__package__ == 'goo' goo.__name__ == 'goo' On the other hand, say foo.py sits in 'bin', did 'from ..pa import bar', but bar did 'from ..bin import foo', we now have an issue. How do we determine that the foo that bar imports is the same foo that was run from the command line? However, this is a problem regardless of your 'package or module' semantic, if you fix the 'from .. import baz' being run from the '__main__' module. - Josiah
On 2/9/07, Josiah Carlson <jcarlson@uci.edu> wrote:
"Brett Cannon" <brett@python.org> wrote:
On 2/9/07, Josiah Carlson <jcarlson@uci.edu> wrote: [snip]
Now, I tried the first of those lines in Python 2.5 and I was surpised that having two files foo and goo, goo importing foo via the first example above, didn't work. What is even worse is that over a year ago I was working on an import semantic for relative imports that would have made the above do as I would have expected.
This leads me to believe that *something* about relative imports is broken, but being that I mostly skipped the earlier portions of this particular thread, I'm not certain what it is. I would *guess* that it has to do with the way the current importer comes up with package relative imports, and I believe it could be fixed by switching to a path-relative import.
That is, when module goo is performing 'from . import foo', you don't look at goo.__name__ to determine where to look for foo, you look at goo.__file__ .
But what about modules stored in a sqlite3 database? How is that supposed to work? What made basing relative imports off of __name__ so nice is that it allowed the import machinery figure out what the resulting absolute module name was. That allowed the modules to be stored any way that you wanted without making any assumptions about how the modules are stored or their location is determined by an importer's find_module method.
How is it done now?
Importers and loaders; see PEP 302.
Presumably if you have some module imported from a database (who does this, really?),
No one at the moment. But think of it as an alternative to zip files if you use sqlite3 for the DB storage. Several people have told me privately they would like to see an importer and loader for this.
it gets a name like dbimported.name1.name2, which an import hook can recognize as being imported from a database. Now, is dbimported.name1.name2 really the content of an __init__.py file (if it was a package), or is it a module in dbimported.name1?
Right now, we can't distinguish (based on __name__) between the cases of...
foo/ __init__.py
and foo.py
Right, but you can based on whether __path__ is defined, which is how import tells.
But we can, trivially, distinguish just by *examining* __file__ (at least for files from a filesystem). For example:
>>> import bar >>> import baz >>> bar.__name__, bar.__file__ ('bar', 'bar.py') >>> baz.__name__, baz.__file__ ('baz', 'baz\\__init__.py') >>>
It's pretty obvious to me which one is a package and which one is a module.
Right, but as I said, this is what __path__ is for; to tell when a module is just a module or a package. And then the __name__ attribute is used to tell if it is a top-level module or not.
If we codify the requirement that __file__ must end with '.../__init__.X' (where X can be; py, pyw, pyc, so, dll, pyd, etc.) if the thing we imported is a package, then the import hooks don't need to use the __file__ attribute for anything other than discerning between "is this a package, or is this a module", and can then handle the __name__ mangling as per import semantics. The only trick is if someone were to specifically import an __init__ module (from .package import __init__), but even then, the results are garbage (you get the module's __init__ method).
But it isn't a file path, it's an absolute module name that you are after.
If one were to just do "path" manipulations, afterwards you can translate that to an absolute name (perhaps based on the path of __main__). Pretending that you have a path can make the semantic non-ambigous, but I prefer the alternate I just described.
It also naturally leads to a __name__ semantic that Guido had suggested to me when I was talking about relative imports:
goo.__name__ == '__main__' foo.__name__ == '__main__.foo' baz.__name__ == '__main__..bar.baz'
Which could more or less be used with the current importer; it just needs a special-casing of 'from . import ...' in the __main__ module.
And I am trying to avoid special-casing for this.
And it's only really an issue because we can't currently discern between module or package, right?
Basically. It's also where within a package the module is, so it isn't quite that simple. The issue comes down to that we lose where a module is contained within a package hierarchy once __name__ gets set to __main__. If we came up with another way to delineate that a module was being executed or some other way to specify where a module was in a package then the problem would be solved. I prefer trying to change the former since the latter is perfectly handled with __name__ as-is when it isn't changed to __main__. -Brett
On 2/10/07, Brett Cannon <brett@python.org> wrote:
On 2/9/07, Josiah Carlson <jcarlson@uci.edu> wrote:
Presumably if you have some module imported from a database (who does this, really?),
No one at the moment. But think of it as an alternative to zip files if you use sqlite3 for the DB storage. Several people have told me privately they would like to see an importer and loader for this.
I did stuff like this a while back, writing __import__ functions to pull modules from a database or directly from an SVN repository. I'm not saying it's a good idea, but it is being done. Collin Winter
"Brett Cannon" <brett@python.org> wrote:
On 2/9/07, Josiah Carlson <jcarlson@uci.edu> wrote:
"Brett Cannon" <brett@python.org> wrote:
On 2/9/07, Josiah Carlson <jcarlson@uci.edu> wrote:
[snip]
How is it done now?
Importers and loaders; see PEP 302.
Wow, I should have started there when I was doing my relative import work. That's much nicer to work with. [snip]
It also naturally leads to a __name__ semantic that Guido had suggested to me when I was talking about relative imports:
goo.__name__ == '__main__' foo.__name__ == '__main__.foo' baz.__name__ == '__main__..bar.baz'
Which could more or less be used with the current importer; it just needs a special-casing of 'from . import ...' in the __main__ module.
And I am trying to avoid special-casing for this.
And it's only really an issue because we can't currently discern between module or package, right?
Basically. It's also where within a package the module is, so it isn't quite that simple.
The issue comes down to that we lose where a module is contained within a package hierarchy once __name__ gets set to __main__. If we came up with another way to delineate that a module was being executed or some other way to specify where a module was in a package then the problem would be solved. I prefer trying to change the former since the latter is perfectly handled with __name__ as-is when it isn't changed to __main__.
Kind-of. It still seems to break in a few cases, which I believe to be the result of current __name__ semantics (which can be fixed if __name__ could be anything, but I'll talk about that again later). Say I wanted to do relative imports in paths below the current __main__ module. The following works, and gives me the __main__-derived package names that Guido suggested. >>> __path__ = [os.getcwd()] >>> from . import bar >>> bar <module '__main__.bar' from 'D:\Projects\python\py25\bar.pyc'> So far so good. What about "cousin" relative imports (an issue I've had to deal with myself)? >>> from ..py25b import baz Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: Relative importpath too deep No dice. Sub-packages seem to be ok with this hack to __main__, but nothing that goes to a parent of the "root" package of __main__. My attempts to hack __main__ with... __name__ = '.'.join(os.getcwd().split('\\')[1:]) just got me "parent package not loaded". Adding fake parent packages all the way to the root of the drive, leaving the above name mangling in place, allowed me to do the 'from ..py25b import baz' import in the "main" module. I like Guido's suggestion; allow multiple trailing dots off of __main__. That is, suppose that we were able to do the "from ..py25b import baz" import, the __name__ of the imported object should be __main__..py25b.baz . Allowing that particular semantic would let us keep the "if __name__ == '__main__':" thing we've been doing. This, however, would require a bunch of extra coding. Anyways...I hear where you are coming from with your statements of 'if __name__ could be anything, and we could train people to use ismain(), then all of this relative import stuff could *just work*'. It would require inserting a bunch of (fake?) packages in valid Python name parent paths (just in case people want to do cousin, etc., imports from __main__). You have convinced me. - Josiah
Josiah Carlson <jcarlson@uci.edu> wrote:
Anyways...I hear where you are coming from with your statements of 'if __name__ could be anything, and we could train people to use ismain(), then all of this relative import stuff could *just work*'. It would require inserting a bunch of (fake?) packages in valid Python name parent paths (just in case people want to do cousin, etc., imports from __main__).
You have convinced me.
And in that vein, I have implemented a bit of code that mangles the __name__ of the __main__ module, sets up pseudo-packages for parent paths with valid Python names, imports __init__.py modules in ancestor packages, adds an ismain() function to builtins, etc. It allows for crazy things like... from ..uncle import cousin from ..parent import sibling #the above equivalent to: from . import sibling from .sibling import nephew ...all executed within the __main__ module (which gets a new __name__). Even better, it works with vanilla Python 2.5, and doesn't even require an import hook. The only unfortunate thing is that because you cannot predict how far up the tree relative imports go, you cannot know how far up the paths one should go in creating the ancestral packages. My current (simple) implementation goes as far up as the root, or the parent of the deepest path with an __init__.py[cw] . If you are curious, I can send you a copy off-list. - Josiah
On 2/11/07, Josiah Carlson <jcarlson@uci.edu> wrote:
Josiah Carlson <jcarlson@uci.edu> wrote:
Anyways...I hear where you are coming from with your statements of 'if __name__ could be anything, and we could train people to use ismain(), then all of this relative import stuff could *just work*'. It would require inserting a bunch of (fake?) packages in valid Python name parent paths (just in case people want to do cousin, etc., imports from __main__).
You have convinced me.
And in that vein, I have implemented a bit of code that mangles the __name__ of the __main__ module, sets up pseudo-packages for parent paths with valid Python names, imports __init__.py modules in ancestor packages, adds an ismain() function to builtins, etc.
It allows for crazy things like...
from ..uncle import cousin from ..parent import sibling #the above equivalent to: from . import sibling from .sibling import nephew
...all executed within the __main__ module (which gets a new __name__). Even better, it works with vanilla Python 2.5, and doesn't even require an import hook.
The only unfortunate thing is that because you cannot predict how far up the tree relative imports go, you cannot know how far up the paths one should go in creating the ancestral packages. My current (simple) implementation goes as far up as the root, or the parent of the deepest path with an __init__.py[cw] .
Just to make sure that I understand this correctly, __name__ is set to __main__ for the module that is being executed. Then other modules in the package are also called __main__, but with the proper dots and such to resolve to the proper depth in the package?
If you are curious, I can send you a copy off-list.
I have way too much on my plate right now to dive into it right now, but I assume the patch is either against runpy or my import code? -Brett
"Brett Cannon" <brett@python.org> wrote:
On 2/11/07, Josiah Carlson <jcarlson@uci.edu> wrote:
Josiah Carlson <jcarlson@uci.edu> wrote:
Anyways...I hear where you are coming from with your statements of 'if __name__ could be anything, and we could train people to use ismain(), then all of this relative import stuff could *just work*'. It would require inserting a bunch of (fake?) packages in valid Python name parent paths (just in case people want to do cousin, etc., imports from __main__).
You have convinced me.
And in that vein, I have implemented a bit of code that mangles the __name__ of the __main__ module, sets up pseudo-packages for parent paths with valid Python names, imports __init__.py modules in ancestor packages, adds an ismain() function to builtins, etc.
It allows for crazy things like...
from ..uncle import cousin from ..parent import sibling #the above equivalent to: from . import sibling from .sibling import nephew
...all executed within the __main__ module (which gets a new __name__). Even better, it works with vanilla Python 2.5, and doesn't even require an import hook.
The only unfortunate thing is that because you cannot predict how far up the tree relative imports go, you cannot know how far up the paths one should go in creating the ancestral packages. My current (simple) implementation goes as far up as the root, or the parent of the deepest path with an __init__.py[cw] .
Just to make sure that I understand this correctly, __name__ is set to __main__ for the module that is being executed. Then other modules in the package are also called __main__, but with the proper dots and such to resolve to the proper depth in the package?
No. Say, for example, that you had a tree like the following. .../ pk1/ pk2/ __init__.py pk3/ __init__.py run.py Also say that run.py was run from the command line, and the relative import code that I have written gets executed. The following assumes that at least a "dummy" module is inserted into sys.modules['__main__'] 1) A fake package called 'pk1' with __path__ == ['../pk1'] is inserted into sys.modules. 2) 'pk1.pk2' is imported as per package rules (__init__.py is executed), and gets a __path__ == ['../pk1/pk2/'] . 3) 'pk1.pk2.pk3' is imported as per package rules (__init__.py is executed), and gets a __path__ == ['../pk1/pk2/pk3'] . 4) We fetch sys.packages['__main__'], give it a new __name__ of 'pk1.pk2.pk3.__main__', but don't give it a path. Also insert the module into sys.modules['pk1.pk2.pk3.__main__']. 5) Add ismain() to builtins. 6) The remainder of run.py is executed.
If you are curious, I can send you a copy off-list.
I have way too much on my plate right now to dive into it right now, but I assume the patch is either against runpy or my import code?
No. It's actually a standalone module. When imported (presumably from __main__ as the first thing it does), it performs the mangling, importing, etc. I'm sure I could modify runpy to do all of this, but only if alter_sys was True. I could probably do the same with your import code, where can I find it? One reason *not* to do the __main__..uncle.cousin namings is that it is not clear how one should go about removing those __main__ trailing dots without examining __main__'s __file__ all the time, especially with non-filesystem imports with nonsensical __file__. - Josiah
On 2/11/07, Josiah Carlson <jcarlson@uci.edu> wrote:
"Brett Cannon" <brett@python.org> wrote:
On 2/11/07, Josiah Carlson <jcarlson@uci.edu> wrote:
Josiah Carlson <jcarlson@uci.edu> wrote:
Anyways...I hear where you are coming from with your statements of 'if __name__ could be anything, and we could train people to use ismain(), then all of this relative import stuff could *just work*'. It would require inserting a bunch of (fake?) packages in valid Python name parent paths (just in case people want to do cousin, etc., imports from __main__).
You have convinced me.
And in that vein, I have implemented a bit of code that mangles the __name__ of the __main__ module, sets up pseudo-packages for parent paths with valid Python names, imports __init__.py modules in ancestor packages, adds an ismain() function to builtins, etc.
It allows for crazy things like...
from ..uncle import cousin from ..parent import sibling #the above equivalent to: from . import sibling from .sibling import nephew
...all executed within the __main__ module (which gets a new __name__). Even better, it works with vanilla Python 2.5, and doesn't even require an import hook.
The only unfortunate thing is that because you cannot predict how far up the tree relative imports go, you cannot know how far up the paths one should go in creating the ancestral packages. My current (simple) implementation goes as far up as the root, or the parent of the deepest path with an __init__.py[cw] .
Just to make sure that I understand this correctly, __name__ is set to __main__ for the module that is being executed. Then other modules in the package are also called __main__, but with the proper dots and such to resolve to the proper depth in the package?
No. Say, for example, that you had a tree like the following.
.../ pk1/ pk2/ __init__.py pk3/ __init__.py run.py
Also say that run.py was run from the command line, and the relative import code that I have written gets executed. The following assumes that at least a "dummy" module is inserted into sys.modules['__main__']
1) A fake package called 'pk1' with __path__ == ['../pk1'] is inserted into sys.modules. 2) 'pk1.pk2' is imported as per package rules (__init__.py is executed), and gets a __path__ == ['../pk1/pk2/'] . 3) 'pk1.pk2.pk3' is imported as per package rules (__init__.py is executed), and gets a __path__ == ['../pk1/pk2/pk3'] . 4) We fetch sys.packages['__main__'], give it a new __name__ of 'pk1.pk2.pk3.__main__', but don't give it a path. Also insert the module into sys.modules['pk1.pk2.pk3.__main__']. 5) Add ismain() to builtins. 6) The remainder of run.py is executed.
Ah, OK. Didn't realize you had gone ahead and done step 5.
If you are curious, I can send you a copy off-list.
I have way too much on my plate right now to dive into it right now, but I assume the patch is either against runpy or my import code?
No. It's actually a standalone module. When imported (presumably from __main__ as the first thing it does), it performs the mangling, importing, etc. I'm sure I could modify runpy to do all of this, but only if alter_sys was True.
I could probably do the same with your import code, where can I find it?
It's in the sandbox under import_in_py if you want the Python version. -Brett
"Brett Cannon" <brett@python.org> wrote:
On 2/11/07, Josiah Carlson <jcarlson@uci.edu> wrote:
"Brett Cannon" <brett@python.org> wrote:
On 2/11/07, Josiah Carlson <jcarlson@uci.edu> wrote:
Josiah Carlson <jcarlson@uci.edu> wrote:
Anyways...I hear where you are coming from with your statements of 'if __name__ could be anything, and we could train people to use ismain(), then all of this relative import stuff could *just work*'. It would require inserting a bunch of (fake?) packages in valid Python name parent paths (just in case people want to do cousin, etc., imports from __main__).
You have convinced me.
And in that vein, I have implemented a bit of code that mangles the __name__ of the __main__ module, sets up pseudo-packages for parent paths with valid Python names, imports __init__.py modules in ancestor packages, adds an ismain() function to builtins, etc.
It allows for crazy things like...
from ..uncle import cousin from ..parent import sibling #the above equivalent to: from . import sibling from .sibling import nephew
...all executed within the __main__ module (which gets a new __name__). Even better, it works with vanilla Python 2.5, and doesn't even require an import hook.
The only unfortunate thing is that because you cannot predict how far up the tree relative imports go, you cannot know how far up the paths one should go in creating the ancestral packages. My current (simple) implementation goes as far up as the root, or the parent of the deepest path with an __init__.py[cw] .
Just to make sure that I understand this correctly, __name__ is set to __main__ for the module that is being executed. Then other modules in the package are also called __main__, but with the proper dots and such to resolve to the proper depth in the package?
No. Say, for example, that you had a tree like the following.
.../ pk1/ pk2/ __init__.py pk3/ __init__.py run.py
Also say that run.py was run from the command line, and the relative import code that I have written gets executed. The following assumes that at least a "dummy" module is inserted into sys.modules['__main__']
1) A fake package called 'pk1' with __path__ == ['../pk1'] is inserted into sys.modules. 2) 'pk1.pk2' is imported as per package rules (__init__.py is executed), and gets a __path__ == ['../pk1/pk2/'] . 3) 'pk1.pk2.pk3' is imported as per package rules (__init__.py is executed), and gets a __path__ == ['../pk1/pk2/pk3'] . 4) We fetch sys.packages['__main__'], give it a new __name__ of 'pk1.pk2.pk3.__main__', but don't give it a path. Also insert the module into sys.modules['pk1.pk2.pk3.__main__']. 5) Add ismain() to builtins. 6) The remainder of run.py is executed.
Ah, OK. Didn't realize you had gone ahead and done step 5.
Yep, it was easy: def ismain(): try: raise ZeroDivisionError() except ZeroDivisionError: f = sys.exc_info()[2].tb_frame.f_back try: return sys.modules[f.f_globals['__name__']] is sys.modules['__main__'] except KeyError: return False With the current semantics, reload would also need to be changed to update both __main__ and whatever.__main__ in sys.modules.
It's in the sandbox under import_in_py if you want the Python version.
Great, found it. One issue with the code that I've been writing is that it more or less relies on the idea of a "root package", and that discovering the root package can be done in a straightforward way. In a filesystem import, it looks at the path in which the __main__ module lies and ancestors up to the root, or the parent path of a path with an __init__.py[cw] module. For code in which __init__.py[cw] modules aren't merely placeholders to turn a path into a package, this could result in "undesireable" code being run prior to the __main__ module. It is also ambiguous when confronted with database imports in which the command line is something like 'python -m dbimport.sub1.sub2.runme'. Do we also create/insert pseudo packages for the current path in the filesystem, potentially changing the "name" to something like "pkg1.pkg2.dbimport.sub1.sub2.runme"? And really, this question is applicable to any 'python -m' command line. We obviously have a few options. Among them; 1) make the above behavior optional with a __future__ import, must be done at the top of a __main__ module (ignored in all other cases) 2) along with 1, only perform the above when we use imports in a filesystem (zip imports are fine). 3) allow for a module variable to define how many ancestral paths are inserted (to prevent unwanted/unnecessary __init__ modules from being executed). 4) come up with a semantic for database and other non-filesystem imports. 5) toss the stuff I've hacked and more or less proposed. Having read PEP 328 again, it doesn't specify how non-filesystem imports should be handled, nor how to handle things like 'python -m', so we may want to just ignore them, and do the mangling just prior to the execution of the code for the __main__ module. - Josiah
On 2/11/07, Josiah Carlson <jcarlson@uci.edu> wrote:
"Brett Cannon" <brett@python.org> wrote:
On 2/11/07, Josiah Carlson <jcarlson@uci.edu> wrote:
"Brett Cannon" <brett@python.org> wrote:
On 2/11/07, Josiah Carlson <jcarlson@uci.edu> wrote:
Josiah Carlson <jcarlson@uci.edu> wrote:
Anyways...I hear where you are coming from with your statements of 'if __name__ could be anything, and we could train people to use ismain(), then all of this relative import stuff could *just work*'. It would require inserting a bunch of (fake?) packages in valid Python name parent paths (just in case people want to do cousin, etc., imports from __main__).
You have convinced me.
And in that vein, I have implemented a bit of code that mangles the __name__ of the __main__ module, sets up pseudo-packages for parent paths with valid Python names, imports __init__.py modules in ancestor packages, adds an ismain() function to builtins, etc.
It allows for crazy things like...
from ..uncle import cousin from ..parent import sibling #the above equivalent to: from . import sibling from .sibling import nephew
...all executed within the __main__ module (which gets a new __name__). Even better, it works with vanilla Python 2.5, and doesn't even require an import hook.
The only unfortunate thing is that because you cannot predict how far up the tree relative imports go, you cannot know how far up the paths one should go in creating the ancestral packages. My current (simple) implementation goes as far up as the root, or the parent of the deepest path with an __init__.py[cw] .
Just to make sure that I understand this correctly, __name__ is set to __main__ for the module that is being executed. Then other modules in the package are also called __main__, but with the proper dots and such to resolve to the proper depth in the package?
No. Say, for example, that you had a tree like the following.
.../ pk1/ pk2/ __init__.py pk3/ __init__.py run.py
Also say that run.py was run from the command line, and the relative import code that I have written gets executed. The following assumes that at least a "dummy" module is inserted into sys.modules['__main__']
1) A fake package called 'pk1' with __path__ == ['../pk1'] is inserted into sys.modules. 2) 'pk1.pk2' is imported as per package rules (__init__.py is executed), and gets a __path__ == ['../pk1/pk2/'] . 3) 'pk1.pk2.pk3' is imported as per package rules (__init__.py is executed), and gets a __path__ == ['../pk1/pk2/pk3'] . 4) We fetch sys.packages['__main__'], give it a new __name__ of 'pk1.pk2.pk3.__main__', but don't give it a path. Also insert the module into sys.modules['pk1.pk2.pk3.__main__']. 5) Add ismain() to builtins. 6) The remainder of run.py is executed.
Ah, OK. Didn't realize you had gone ahead and done step 5.
Yep, it was easy:
def ismain(): try: raise ZeroDivisionError() except ZeroDivisionError: f = sys.exc_info()[2].tb_frame.f_back try: return sys.modules[f.f_globals['__name__']] is sys.modules['__main__'] except KeyError: return False
With the current semantics, reload would also need to be changed to update both __main__ and whatever.__main__ in sys.modules.
It's in the sandbox under import_in_py if you want the Python version.
Great, found it.
One issue with the code that I've been writing is that it more or less relies on the idea of a "root package", and that discovering the root package can be done in a straightforward way. In a filesystem import, it looks at the path in which the __main__ module lies and ancestors up to the root, or the parent path of a path with an __init__.py[cw] module.
For code in which __init__.py[cw] modules aren't merely placeholders to turn a path into a package, this could result in "undesireable" code being run prior to the __main__ module.
It is also ambiguous when confronted with database imports in which the command line is something like 'python -m dbimport.sub1.sub2.runme'. Do we also create/insert pseudo packages for the current path in the filesystem, potentially changing the "name" to something like "pkg1.pkg2.dbimport.sub1.sub2.runme"? And really, this question is applicable to any 'python -m' command line.
We obviously have a few options. Among them; 1) make the above behavior optional with a __future__ import, must be done at the top of a __main__ module (ignored in all other cases) 2) along with 1, only perform the above when we use imports in a filesystem (zip imports are fine). 3) allow for a module variable to define how many ancestral paths are inserted (to prevent unwanted/unnecessary __init__ modules from being executed). 4) come up with a semantic for database and other non-filesystem imports. 5) toss the stuff I've hacked and more or less proposed.
Beats me. =) My brain is fried at the moment so I don't have a good answer at the moment. -Brett
Josiah Carlson wrote:
"Brett Cannon" <brett@python.org> wrote:
On 2/11/07, Josiah Carlson <jcarlson@uci.edu> wrote:
Also say that run.py was run from the command line, and the relative import code that I have written gets executed. The following assumes that at least a "dummy" module is inserted into sys.modules['__main__']
1) A fake package called 'pk1' with __path__ == ['../pk1'] is inserted into sys.modules.
For some reason I don't like the idea of fake packages. Seems too much like a hack to me. That could be just me though.
2) 'pk1.pk2' is imported as per package rules (__init__.py is executed), and gets a __path__ == ['../pk1/pk2/'] . 3) 'pk1.pk2.pk3' is imported as per package rules (__init__.py is executed), and gets a __path__ == ['../pk1/pk2/pk3'] . 4) We fetch sys.packages['__main__'], give it a new __name__ of 'pk1.pk2.pk3.__main__', but don't give it a path. Also insert the module into sys.modules['pk1.pk2.pk3.__main__']. 5) Add ismain() to builtins. 6) The remainder of run.py is executed.
Ah, OK. Didn't realize you had gone ahead and done step 5.
Yep, it was easy:
def ismain(): try: raise ZeroDivisionError() except ZeroDivisionError: f = sys.exc_info()[2].tb_frame.f_back try: return sys.modules[f.f_globals['__name__']] is sys.modules['__main__'] except KeyError: return False
With the current semantics, reload would also need to be changed to update both __main__ and whatever.__main__ in sys.modules.
It's in the sandbox under import_in_py if you want the Python version.
Great, found it.
One issue with the code that I've been writing is that it more or less relies on the idea of a "root package", and that discovering the root package can be done in a straightforward way. In a filesystem import, it looks at the path in which the __main__ module lies and ancestors up to the root, or the parent path of a path with an __init__.py[cw] module.
For code in which __init__.py[cw] modules aren't merely placeholders to turn a path into a package, this could result in "undesireable" code being run prior to the __main__ module.
I think the idea of a "root package", Is good. I agree with that. Think of it this way... You aren't running a module in a package as if it were a top level module; you are entering a package from a different access point. The package should still be cohesive. If someone wants to run a module as if it were not in a package, but have it within a package's directory structure, then they can put it in a sub directory that doesn't have an __init__ file, and add that directory to sys.path. Those modules would then be treated as top level modules if you execute them directly. You would also import them as if they were top level modules with no package prefix. (they are on sys.path) Then "package" modules can always use a "package" module importer to start them, and modules not part of packages can always use a simpler "module" importer. It would be good if there was no question as to which is which and which importer to use. (This part repeats some things I wrote earlier.) If you are running a module that depends on the __init__ to add it's path to the package, then it's not part of the package until the __init__ is executed. And of course the module can't know that before then, so it should be executed as it is, where it is, if there's not an __init__ in the same directory. It should then be treated as a top level module. It should be up to the package designer to take this into account, and not the import designer to determine what the package designer intended. Modules used in such ways can't know how many packages, or which packages, will use them in this indirect way.
It is also ambiguous when confronted with database imports in which the command line is something like 'python -m dbimport.sub1.sub2.runme'. Do we also create/insert pseudo packages for the current path in the filesystem, potentially changing the "name" to something like "pkg1.pkg2.dbimport.sub1.sub2.runme"? And really, this question is applicable to any 'python -m' command line.
Is the 'python -m' meant to run a module in a package as if it was a top level module? Or is it meant, (as I beleave), to allow you to use the python name instead of the file name? Help isn't clear on this point. -m mod : run library module as a script (terminates option list) One of my main points when starting this thread was...
Make pythons concept of a package, (currently an informal type), be stronger than that of the underlying file system search path and directory structure.
Which would mean to me that the __init__(s) in packages should always be run before modules in packages. That would simplify the import problem I think, and dummy packages would not be needed. Also this only needs to be done for the first module in the package that is imported. After that, then any additional modules added to the package by the __init__ becomes importable. You just can't use those modules as an entry point the package. It also makes the code clearer from a reading standpoint as it's becomes easier to determine what the relationships are.
We obviously have a few options. Among them; 1) make the above behavior optional with a __future__ import, must be done at the top of a __main__ module (ignored in all other cases) 2) along with 1, only perform the above when we use imports in a filesystem (zip imports are fine). 3) allow for a module variable to define how many ancestral paths are inserted (to prevent unwanted/unnecessary __init__ modules from being executed). 4) come up with a semantic for database and other non-filesystem imports. 5) toss the stuff I've hacked and more or less proposed.
Having read PEP 328 again, it doesn't specify how non-filesystem imports should be handled, nor how to handle things like 'python -m', so we may want to just ignore them, and do the mangling just prior to the execution of the code for the __main__ module.
Brett said things get a simpler if __name__ was always the module name. How about adding a pair of attributes to modules: __package__ -> package name # full package.sub-packages... etc. __module__ -> module name # is "" if it's a package. If a module isn't in a package then __package__ is "". Then __name__ == "__main__" could still work until ismain() is introduced. __name__ wouldn't be needed for import uses in this case. Cheers, Ron
Josiah Carlson wrote:
"Brett Cannon" <brett@python.org> wrote:
On 2/9/07, Ron Adam <rrr@ronadam.com> wrote:
Brett Cannon wrote:
On 2/8/07, Ron Adam <rrr@ronadam.com> wrote: [SNIP]
If you remove the "__main__" name, then you will still need to have some attribute for python to determine the same thing. Why? There is nothing saying we can't follow most other languages and just have a reserved function name that gets executed if the module is executed. Yes, but this is where python is different from other languages. In a way, python's main *is* the whole module from the top to bottom. And so the '__main__' name is referring to the whole module and not just a function in it.
A more specific function would be needed to get the context right. Maybe __script__(), or __run__().
Or if you want to be consistent with class's, how about adding __call__() to modules? Then the main body of the module effectively works the same way as it does in a class. =)
Hey, I think that has some cool possibilities, it makes modules callable in general. So if I want to run a module's __call__(), AKA main() as you call it, after importing I would just do...
import module module()
And it would just work. ;-)
I like this idea. Makes it very obvious. You just say "when a specific module is specified at the command line it is called. Could even have it take possibly sys.argv[1:] (which I think was supposed to turn into sys.args or sys.arg or something at some point).
What do other people think?
I don't like it. Much of my dislike comes from personal aesthetics, ....
Fair enough. I have one aesthetic dislike myself, but its really minor. Currently you can always look at the bottom of files to see what they will do if you execute them. With a main() function of any type, it may be someplace else. But I can live with that if the name is always the same and doesn't change.
.... but then there is a logical disconnect. When an instance of a class is created, its __call__ method is not automatically called. By using the semantic of 'the __call__ function in the module namespace is automatically executed if the module is "run" from the command line', we are introducing a different instance creation semantic (an imported module is an instance of ModuleType).
I'm not sure I follow. Importing a module would not call it's __call__() function. As I've shown above it's an extra step. Running a module isn't the same as importing one. A lot of stuff goes on under the covers, so there really is no logical disconnect. What it really does is just add one additional step to the "run from a command line" sequence of events.
I think we should just stick with what has been proposed for *years*, a __main__ function that is automatically executed after the module has been imported if its __name__ == '__main__'. Even better, anyone who wants to write code compatible with the updated syntax can include the following literal block at the end of their files...
if __name__ == '__main__': try: __main__ except NameError: pass else: try: __main__() finally: try: from __future__ import disable_run except SyntaxError: #we are using an older Python pass else: #we are using a new Python, and #disabling automatic running succeeded pass
With such a semantic, current users of Python could include the above literal block and it would *just work*...then again, the new semantic wouldn't really be useful if people started using the above literal block.
That seems to be too much for some reason. If __name__ returns the real name and a function ismain() is a new builtin, then these short one liners would work. # Put at bottom of new programs so they work with older python too. if __name__ == '__main__': __call__() # Makes old programs work with newer python. # Add this above "if __name__=='__main__'". if hasattr(__builtins__, 'ismain') and ismain(): __name__ = '__main__' Cheers, Ron
On 2/4/07, Ron Adam <rrr@ronadam.com> wrote:
After exploring this a bit further on comp.lang.python, I was able to organize these ideas better. The more I thought about it, the more '+'s I found, and about the only '-'s I can think of is the work required to actually make a patch to do it.
It's also good to keep in mind that since most people still rely on the old relative import behavior, most people have not run into some of the issues I mention here. But they will at some point.
I did mean to keep this short, but clarity won out. (At least it's clear to me, but that's an entirely subjective opinion on my part.)
Maybe someone will adopt this and make a real PEP out of it. :-)
For all the complexity of module attributes and global import hooks I think there's something that'll start to strip some of it away: make __import__() into a method of ModuleType, then ensure there's a way to load python modules using subclasses of ModuleType. You could have entirely different semantics, independent of sys.modules. You could load from a zip file that's entirely private, while still allowing modules within it to reach each other using relative imports. Once you break free of sys.modules and global hooks it becomes much easier to design something to replace them. I have some ideas on how to do that, but they don't seem nearly as important as this base functionality. -- Adam Olsen, aka Rhamphoryncus
Adam Olsen wrote:
On 2/4/07, Ron Adam <rrr@ronadam.com> wrote:
After exploring this a bit further on comp.lang.python, I was able to organize these ideas better. The more I thought about it, the more '+'s I found, and about the only '-'s I can think of is the work required to actually make a patch to do it.
It's also good to keep in mind that since most people still rely on the old relative import behavior, most people have not run into some of the issues I mention here. But they will at some point.
I did mean to keep this short, but clarity won out. (At least it's clear to me, but that's an entirely subjective opinion on my part.)
Maybe someone will adopt this and make a real PEP out of it. :-)
For all the complexity of module attributes and global import hooks I think there's something that'll start to strip some of it away: make __import__() into a method of ModuleType, then ensure there's a way to load python modules using subclasses of ModuleType.
Seems to me, weather the importer is part of this module or the new module doesn't make a difference. Just a matter of ordering. So if you want to modify the import mechanism: Would create a new ModuleType and modify it's import mechansims and then pass it a name? Or would it be better to subclass this modules import, modify it, pass it along with a name (to be imported) to the new ModuleType? Or modify this modules import and then subclass this ModuleType, which will inherit the new importer. So that ModuleType(__name__) will then initialize it self? (but this would inherit a lot of other things you may not want.)
You could have entirely different semantics, independent of sys.modules. You could load from a zip file that's entirely private, while still allowing modules within it to reach each other using relative imports.
Once you break free of sys.modules and global hooks it becomes much easier to design something to replace them.
I have some ideas on how to do that, but they don't seem nearly as important as this base functionality.
Then you are further along than I am. I just know what I want it to do. But I am trying to learn about how the actual imports function. Maybe in a few days I can give Brett some suggestions that are a little more concrete. Cheers, Ron
On 2/8/07, Adam Olsen <rhamph@gmail.com> wrote:
On 2/4/07, Ron Adam <rrr@ronadam.com> wrote:
After exploring this a bit further on comp.lang.python, I was able to organize these ideas better. The more I thought about it, the more '+'s I found, and about the only '-'s I can think of is the work required to actually make a patch to do it.
It's also good to keep in mind that since most people still rely on the old relative import behavior, most people have not run into some of the issues I mention here. But they will at some point.
I did mean to keep this short, but clarity won out. (At least it's clear to me, but that's an entirely subjective opinion on my part.)
Maybe someone will adopt this and make a real PEP out of it. :-)
For all the complexity of module attributes and global import hooks I think there's something that'll start to strip some of it away: make __import__() into a method of ModuleType, then ensure there's a way to load python modules using subclasses of ModuleType.
And so __import__ is to be bound to what module's method? Or is it a class method?
You could have entirely different semantics, independent of sys.modules.
You can have that now if you ignored 'reload'. You just need to delete a module from sys.modules as soon as you import. Then you can set up your imports to do what you want.
You could load from a zip file that's entirely private, while still allowing modules within it to reach each other using relative imports.
So are you saying that imports in a module are to use the module's __import__ method? So instantiate a module and then use it's method to initialize the module itself? -Brett
Brett, Ron: nevermind my idea. When trying to explain it further I found myself going around in circles. I need a clear set of requirements before I can make a half-decent proposal, and the existing docs and implementations don't make them easy to determine. -- Adam Olsen, aka Rhamphoryncus
participants (5)
-
Adam Olsen
-
Brett Cannon
-
Collin Winter
-
Josiah Carlson
-
Ron Adam