My objections to implicit package directories

It seems the consensus at the PyCon US sprints is that implicit package directories are a wonderful idea and we should have more of those. I still disagree (emphatically), but am prepared to go along with it so long as my documented objections are clearly and explicitly addressed in the new combined PEP, and the benefits ascribed to implicit package directories in the new PEP are more compelling than "other languages do it that way, so we should too". To save people having to trawl around various mailing list threads and reading through PEP 395, I'm providing those objections in a consolidated form here. If reading these objections in one place causes people to have second thoughts about the wisdom of implicit package directories, even better. 1. Implicit package directories go against the Zen of Python Getting this one out of the way first. As I see it, implicit package directories violate at least 4 of the design principles in the Zen: - Explicit is better than implicit (my calling them implicit package directories is a deliberate rhetorical ploy to harp on this point, although it's also an accurate name) - If the implementation is hard to explain, it's a bad idea (see the section about backwards compatibility challenges) - Readability counts (see the section about introducing ambiguity into filesystem layouts) - Errors should never pass silently (see the section about implicit relative imports from main) 2. Implicit package directories pose awkward backwards compatibility challenges It concerns me gravely that the consensus proposal MvL posted is *backwards incompatible with Python 3.2*, as it deliberately omits one of the PEP 402 features that provided that backwards compatibility. Specifically, under the consensus, a subdirectory "foo" of a directory on sys.path will shadow a "foo.py" or "foo/__init__.py" that appears later on sys.path. As Python 3.2 would have found that latter module/package correctly, this is an unacceptable breach of the backwards compatibility requirements. PEP 402 at least got this right by always executing the first "foo.py" or "foo/__init__.py" it found, even if another "foo" directory was found earlier in sys.path. We can't just wave that additional complexity away if an implicit package directory proposal is going to remain backwards compatible with current layouts (e.g. if an application's starting directory included a "json" subfolder containing json files rather than Python code, the consensus approach as posted by MvL would render the standard library's json module inaccessible) 3. Implicit package directories introduce ambiguity into filesystem layouts With the current Python package design, there is a clear 1:1 mapping between the filesystem layout and the module hierarchy. For example: parent/ # This directory goes on sys.path project/ # The "project" package __init__.py # Explicit package marker code.py # The "project.code" module tests/ # The "project.tests" package __init__.py # Explicit package marker test_code.py # The "projects.tests.test_code" module Any explicit package directory approach will preserve this 1:1 mapping. For example, under PEP 382: parent/ # This directory goes on sys.path project.pyp/ # The "project" package code.py # The "project.code" module tests.pyp/ # The "project.tests" package test_code.py # The "projects.tests.test_code" module With implicit package directories, you can no longer tell purely from the code structure which directory is meant to be added to sys.path, as there are at least two valid mappings to the Python module hierarchy: parent/ # This directory goes on sys.path project/ # The "project" package code.py # The "project.code" module tests/ # The "project.tests" package test_code.py # The "projects.tests.test_code" module parent/ project/ # This directory goes on sys.path code.py # The "code" module tests/ # The "tests" package test_code.py # The "tests.test_code" module What are implicit package directories buying us in exchange for this inevitable ambiguity? What can we do with them that can't be done with explicit package directories? And no, "Java does it that way" is not a valid argument. 4. Implicit package directories will permanently entrench current newbie-hostile behaviour in __main__ It's a fact of life that Python beginners learn that they can do a quick sanity check on modules they're writing by including an "if __name__ == '__main__':" section at the end and doing one of 3 things: - run "python mymodule.py" - hit F5 (or the relevant hot key) in their IDE - double click the module in their filesystem browser - start the Python REPL and do "import mymodule" However, there are some serious caveats to that as soon as you move the module inside a package: - if you use explicit relative imports, you can import it, but not run it directly using any of the above methods - if you rely on implicit relative imports, the above direct execution methods should work most of the time, but you won't be able to import it - if you use absolute imports for your own package, nothing will work (unless the parent directory for your package is already on sys.path) - if you only use absolute imports for *other* packages, everything should be fine The errors you get in these cases are *horrible*. The interpreter doesn't really know what is going on, so it gives the user bad error messages. In large part, the "Why are my imports broken?" section in PEP 395 exists because I sat down to try to document what does and doesn't work when you attempt to directly execute a module from inside a package directory. In building the list of what would work properly ("python -m" from the parent directory of the package) and what would sometimes break (everything else), I realised that instead of documenting the entire hairy mess, the 1:1 mapping from the filesystem layout to the Python module hierarchy meant we could *just fix it* to not do the wrong thing by default. If implicit package directories are blessed for inclusion in Python 3.3, that opportunity is lost forever - with the loss of the unambiguous 1:1 mapping from the filesystem layout to the module hierarchy, it's no longer possible for the interpreter to figure out the right thing to do without guessing. PJE proposed that newbies be instructed to add the following boilerplate to their modules if they want to use "if __name__ == '__main__':" for sanity checking: import pkgutil pkgutil.script_module(__name__, 'project.code.test_code') This completely defeats the purpose of having explicit relative imports in the language, as it embeds the absolute name of the module inside the module itself. If a package subtree is ever moved or renamed, you will have to manually fix every script_module() invocation in that subtree. Double-keying data like this is just plain bad design. The package structure should be recorded explicitly in exactly one place: the filesystem. PJE has other objections to the PEP 395 proposal, specifically relating to its behaviour on package layouts where the directories added to sys.path contain __init__.py files, such that the developer's intent is not accurately reflected in their filesystem layout. Such layouts are *broken*, and the misbehaviour under PEP 395 won't be any worse than the misbehaviour with the status quo (sys.path[0] is set incorrectly in either case, it will just be fixable under PEP 395 by removing the extraneous __init__.py files). A similar argument applies to cases where a parent package __init__ plays games with sys.path (although the PEP 395 algorithm could likely be refined to better handle that situation). Regardless, if implicit package directories are accepted into Python 3.3 in any form, I *will* be immediately marking PEP 395 as Rejected due to incompatibility with an accepted PEP. I'll then (eventually, once I'm less annoyed about the need to do so) write a new PEP to address a subset of the issues previously covered by PEP 395 that omits any proposals that rely on explicit package directories. Also, I consider it a requirement that any implicit packages PEP include an update to the tutorial to explain to beginners what will and won't work when they attempt to directly execute a module from inside a Python package. After all, such a PEP is closing off any possibility of ever fixing the problem: it should have to deal with the consequences. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Gah, wrong list. Please don't reply here - that message will be showing up on import-sig shortly.

On Mon, Mar 12, 2012 at 5:03 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Hi Nick, The write-up was a little unclear on a main point and I think that's contributed to some confusion here. The path search will continue to work in exactly the same way as it does now, with one difference. Instead of the current ImportError when nothing matches, the mechanism for namespace packages would be used. The mechanism would create a namespace package with a __path__ matching the paths corresponding to all namespace package "portions". The likely implementation will simply track the namespace package __path__ during the initial (normal) path search and use it only when there are no matching modules nor regular packages. Packages without __init__.py would only be allowed for namespace packages. So effectively namespace packages would be problematic for PEP 395, but not normal packages. Ultimately this is a form of PEP 402 without so much complexity. The trade-off is it requires a new kind of package. As far as I understand them, most of your concerns are based on the idea that namespace packages would be included in the initial traversal of sys.path, which is not the case. It sounds like there are a couple points you made that may still need attention, but hopefully this at least helps clarify what we talked about. -eric

On Tue, Mar 13, 2012 at 10:03 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
It has been pointed out that the above is based on a misreading of MvL's email. So, consider the following backwards compatibility concern instead: Many projects use the following snippet to find a json module: try: import json except ImportError: import simplejson as json Now, this particular snippet should still work fine with implicit package directories (even if a non-Python json directory exists on sys.path), since there *will* be a real json module in the standard library to find and the simplejson fallback won't be needed. However, for the general case: try: import foo except ImportError: import foobar as foo Then implicit package directories pose a backwards compatibility problem (specifically, if "foo" does not exist as a module or explicit package on sys.path, but there is a non-Python "foo/" directory, then "foo" will be silently be created as an empty package rather than falling back to "foobar"). Sure, the likelihood of that actually affecting anyone is fairly remote (although all it really takes is one broken uninstaller leaving a "foo" dir in site-packages), but we've rejected proposals in the past over smaller concerns than this. *Now*, my original comment about the consensus view rejecting complexity from PEP 402 by disregarding backwards compatibility concerns becomes accurate. PEP 402 addressed this issue specifically by disallowing direct imports of implicit packages (only finding them later when searching for submodules). This is in fact the motivating case given for that behaviour in the PEP: http://www.python.org/dev/peps/pep-0402/#backwards-compatibility-and-perform... So, *why* are we adopting implicit packages again, given all the challenges they pose? What, exactly, is the problem with a ".pyp" extension that makes all this additional complexity the preferred choice? So far, I've only heard two *positive* statements in favour of implicit package directories: 1. Java/Perl/etc do it that way. I've already made it clear that I don't care about that argument. If it was all that compelling, we'd have implicit self by now. (However, clearly Guido favours it in this case, given his message that arrived while I was writing this one) 2. It (arguably) makes it easier to convert an existing package into a namespace package With implicit package directories, you just delete your empty __init__.py file to turn an existing package into a namespace package. With a PEP 382 style directory suffix, you have to change your directory name to append the ".pyp" (and, optionally, delete your __init__.py file, since it's now going to be ignored anyway). Barry's also tried to convince me that ".pyp" directories are somehow harder for distributions to deal with, but his only example looked like trying to use "yield from" in Python 3.2 and then complaining when it didn't work. However, so long as the backwards compatibility from PEP 402 is incorporated, and the new PEP proposed a specific addition to the tutorial to document the "never CD into a package, never double-click a file in a package to execute it, always use -m to execute modules from inside packages" guideline (and makes it clear that you may get strange and unpredictable behaviour if you ever break it), then I can learn to live with it. IDLE should also be updated to allow correct execution of submodules via F5 (I guess it will need some mechanism to be told what working directories to add to sys.path). It still seems to me that moving to a marker *suffix* (rather than a marker file) as PEP 382 proposes brings all the real practical benefits of implicit package directories (i.e. no empty __init__.py files wasting space) and absolutely *none* of the pain (i.e. no backwards compatibility concerns, no ambiguity in the filesystem to module hierarchy mapping, still able to fix direct execution of modules inside packages rather than having to explain forevermore why it doesn't work), but Guido clearly feels otherwise. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Mar 13, 2012 at 4:07 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I think this paragraph really gets to the heart of what I'm objecting to. I agree wholeheartedly with the objective of eliminating __init__.py files, there's no need to convince me of that. However, *two* proposals were made to that end: PEP 382 kept the explicit marker, simply changing it to a directory suffix rather than a separate file. Simple, clean, straightforward, minimalist, effective. PEP 402 threw away the marker entirely, and then had to patch the package finding algorithm with a whole series of complications to avoid breaking backwards compatibility with Python 3.2. It also has the side effect of eliminating the 1:1 mapping between the filesystem and the module hierarchy. Once we lose that, there's no going back. What I really want out of the new PEP is a clear rationale for why the horrible package finding algorithm hacks needed to make the PEP 402 approach work in a backwards compatible way are to be preferred to the explicitly marked PEP 382 approach which *doesn't pose a backwards compatibility problem in the first place*. The other thing to keep in mind is that, if, for whatever reason, we decided further down the road that the explicit directory suffix solution wasn't good enough, then *we could change our minds* and allow implicit package directories after all (just as the formats for valid C extension module names have changed over time). There's no such freedom with implicit package directories - once they're in, they're in and we can never introduce a requirement for an explicit marker again without breaking working packages. Is it so bad that I want us to take baby steps here, rather than jumping straight to the implicit solution? Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Mar 12, 2012 at 11:35 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I think it comes down to this: I really, really, really hate directories with a suffix. I'd like to point out that the suffix is also introducing a backwards incompatibility: everybody will have to teach their tools, IDEs, and brains about .pyp directories, and they will also have to *rename* their directories (*if* they want to benefit from the new feature). Renaming directories is a huge pain -- I counted over 400 directories in Django, so that would mean renaming over 400 renames. In my experience renaming a directory is a huge pain no matter which version control system you use -- yes, it can be done, and modern VCSes have some support for renaming, but it's still a huge mess. Importing patches will be painful. Producing diffs across the renames will be hugely painful. I just think there are too many tools that won't know how to deal with this. (I just did a little experiment: I cloned a small project using Hg and renamed one directory. Then I made a small change to one of the files whose parent was renamed. I have not figured out how to get a diff between the latest version of that file and any version before the mass renaming; the renaming is shown as a delete of the entire old file and an add of the entire new file. Even if you can tell me how to do this, my point stays: it's not easy to figure out. Similarly for logs: by default, "hg log" stops at the rename. You must add --follow to see logs across the rename.) And regardless of which PEP we adopt, there will still be two types of package directories: PEP 382 still maintains backwards compatibility with directories that don't have a suffix but do have an __init__.py. So the unification still remains elusive. And at the end of the day I still really, really, really hate directories with a suffix. -- --Guido van Rossum (python.org/~guido)

Guido van Rossum <guido@python.org> writes:
And at the end of the day I still really, really, really hate directories with a suffix.
+1 -- \ “Good morning, Pooh Bear”, said Eeyore gloomily. “If it is a | `\ good morning”, he said. “Which I doubt”, said he. —A. A. Milne, | _o__) _Winnie-the-Pooh_ | Ben Finney

On 13 Mar, 2012, at 9:15, Guido van Rossum wrote:
Directories with a suffix have the advantage that you could teach GUIs to treat them differently, filemanagers could for example show a ".pyp" directory as a folder with a python logo just like ".py" files are shown as documents with a python logo. With the implicit approach it is much harder to recognize python packages as such without detailed knowledge about the import algorithm and python search path. Ronald

On 3/20/2012 11:49 AM, Ronald Oussoren wrote:
Package directories are files and can be imported to make modules. I think it would have been nice to use .pyp from the beginning. It would make Python easier to learn. Also, 'import x' would mean simply mean "Search sys.path directories for a file named 'x.py*', with no need for either the importer (or human reader) to look within directories for the magic __init__.py file. Sorting a directory listing by extension would sort all packages together. -- Terry Jan Reedy

On Mon, Mar 26, 2012 at 1:45 AM, Ronald Oussoren <ronaldoussoren@mac.com> wrote:
Yes. On what platform are you? On unixy platforms filename extensions are just a naming convention that can just as easily be used with directories.
IIUC that's how almost all filesystems treat them. However desktop software often assigns specific meanings to them -- the user can configure these, but there's a large set of predefined bindings too, and many key applications also play this game (since there is, frankly, not much else to go by -- some important file types are not easily guessable by reading their content, either because it's some esoteric binary format, or because it's something too universal, like XML). I know that's how it works on Windows and Mac, but I believe the Linux desktop things (the things I kill off or at lest ignore as soon as I log in :-) have the same idea. -- --Guido van Rossum (python.org/~guido)

On Tue, Mar 20, 2012 at 11:49, Ronald Oussoren <ronaldoussoren@mac.com>wrote:
OS X has made me dislike that possibility. Some git tools will make directories ending in .git be considered an opaque object in the file system, forcing me to drop into a shell or right-click and choose to inspect the directory in order to see its contents. -Brett

On 21 Mar, 2012, at 15:22, Brett Cannon wrote:
That's probably because those tools define ".git" directories as a package in their metadata and the finder won't show package contents by default (you can use the context menu of the finder to inspect the contents of packages, but that won't work in the file open/save panels). I'd have to experiment to be sure, but IIRC it is possible to assign icons to a suffix without making directories into packages. Ronald

On Mar 13, 2012, at 09:15 AM, Guido van Rossum wrote:
And at the end of the day I still really, really, really hate directories with a suffix.
I completely agree, for all the reasons you stated. Especially because would be extremely difficult to handle migrations from a pre-namespace-packages world to a post-namespace-packages world with directory suffixes. For example, let's say Debian/Ubuntu supports Python 3.2 and 3.3. We can continue to craft __init__.py files with the old-style namespace package code at installation time and pretty much do what we're currently doing. It's painful but the technology is there so it doesn't change much for us. But when we can drop support for < 3.3 (or we back port namespace package support to 3.2) then we can simply drop the code that creates these __init__.py files at installation time and we'll magically <wink> gain support for new-style namespace packages. With directory suffixes, I don't see how this is possible. I shudder to think what OS vendors will have to do to rename all the directories of *installed* packages, let alone have to rebuild all Python 3 packages to support the renamed directories, when they make the switch to a new-style world. catching-up-ly y'rs, -Barry

On 13 March 2012 06:07, Nick Coghlan <ncoghlan@gmail.com> wrote:
Whoa! I'm not sure I can. I just recently got bitten badly by this for real. The following was what I was doing: 1. I'm writing a package. 2. I'm trying to do the tests-as-I-develop approach (yes, I know I should have been doing this for years - so sue me :-)) 3. I have my tests as a subpackage of the main package. 3. I use the command line 4. I cd to the tests directory as that's the easiest way to edit tests: gvim test_dow<TAB> to edit test_download_binaries.py. And yes, I had endless trouble trying to work out why I can't then run the tests from the command line. I consider the trouble I have as a bug - it *should* work, in my view. I understand why what I'm doing is an edge case, but intuitively, I don't like it not working. I can change my practices, or use an IDE, or something. But my workflow will be less comfortable for me, and I don't feel that I understand why I should have to. I *think* that what Nick is proposing is a fix for this (and if it isn't, can you fix this scenario too, please Nick? :-)) and the idea that it's going to get documented as "don't do that" strikes me as unfriendly, if not outright wrong. I also don't think it should be documented in the tutorial - it's not something a new developer hits, it's later on, when people are writing bigger, more complex code, that they hit it. In summary, I guess I support Nick's objections 3 and 4. We need a better response than just "don't do that", IMHO. Paul.

On Tue, Mar 13, 2012 at 8:35 AM, Paul Moore <p.f.moore@gmail.com> wrote:
Oh, but there are several other solutions. For example, if you set PYTHONPATH to the directory *containing* your toplevel package, new code could be added that will discover the true toplevel no matter where you are. This code doesn't exist today, but Nick is proposing something similar looking for __init__.py files; the code that tries to find the script directory as a subdirectory of some path on sys.path could be added there. Also, the code Nick currently proposes for PEP 395 is still useful if you add __init__.py files to your package -- if you did that you wouldn't have to set PYTHONPATH (assuming we do add this code to Python 3.3, of course). -- --Guido van Rossum (python.org/~guido)

On 13 March 2012 15:57, Guido van Rossum <guido@python.org> wrote:
I tend not to use PYTHONPATH - I'm on Windows, and environment variables aren't the obvious solution there so I tend to forget. Also, I tend to initially develop projects in subdirectories of my "junk" directory, which has all sorts of cruft in it, including .py files with random names. So setting PYTHONPATH to that could introduce all sorts to my namespace, which is a bit less than ideal. OTOH, I don't have a problem with __init__.py files, so something that correctly autodetects the right thing to add to sys.path based on the presence of __init__ files would be fine. All of which assumes that me simply being more organised isn't the real answer here :-) Paul.

On Mar 14, 2012 5:24 AM, "Paul Moore" <p.f.moore@gmail.com>
I set up my projects the same way you do - it's a good, self-contained structure. And beginners (at least the ones that used Stack Overflow when I was spending time there) seemed to like it as well. That's the reason PEP 395 uses it as its main example. Over on import-sig, Eric Snow suggested a revised implicit package tolerant search algorithm that's too slow to use on interpreter start up, but should be fine for generating better error messages if __main__ or an interactive import fails with ImportError, so I'll likely revise 395 to propose that instead. Cheers, Nick. -- Sent from my phone, thus the relative brevity :)

On Mon, Mar 12, 2012 at 11:07 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Honestly, I don't really care about "compatibility" with Java or Perl. However that *both* of those languages do it this way (BTW what does Ruby do?) is an argument that this is a *natural* or *intuitive* way of setting things up. In fact, Python today also uses this: a package P lives in a directory named P. Plain and simple. Users can immediately understand this. Collapsing multiple directories named P along sys.path is also pretty natural, given that we already have the behavior of searching along sys.path. The requirement of having an __init__.py file however is a wart.
I hope I've added some indication that it's also harder to deal with in version control systems.
Those are all sensible requests.
I expect pain in different places. -- --Guido van Rossum (python.org/~guido)

Oh, shit. Nick posted a bunch of messages to python-ideas instead of import-sig, and I followed up there. Instead of reposting, I'm just going to suggest that people interested in this discussion will, unfortunately, have to follow both lists. -- --Guido van Rossum (python.org/~guido)

On Wed, Mar 14, 2012 at 2:24 AM, Guido van Rossum <guido@python.org> wrote:
I hope I've added some indication that it's also harder to deal with in version control systems.
Yeah, given that part of my argument when I updated PEP 414 was "the lack of explicit unicode literals creates useless noise in version control diffs", I can hardly fault you for using a similar argument against changing package directory names! Hopefully Eric can capture this clearly in the new PEP so future readers will have a clear understanding of the trade-offs involved. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan <ncoghlan@gmail.com> writes:
Are you convinced by the argument that a directory representing a package should be named exactly the same as the package? That's the most convincing reason I can see (though many other reasons are strong too) for not introducing special cases for the name of the package's directory.
Hopefully Eric can capture this clearly in the new PEP so future readers will have a clear understanding of the trade-offs involved.
Agreed. Thanks for encouraging discussion and recording it, Eric. -- \ “Pinky, are you pondering what I'm pondering?” “I think so, | `\ Brain, but Zero Mostel times anything will still give you Zero | _o__) Mostel.” —_Pinky and The Brain_ | Ben Finney

I've always had trouble understanding and explaining the complexities and intricacies of python packaging. Is there a most basic but comprehensive list of use cases? IIUC they are: * Eg Standard library - import from a list of paths to be searched. * Eg This project - import from a relative path based on this file's current directory (which python has an odd syntax for). * Eg Distributed packages and virtual-env - import from a relative path based on an anchor directory. If we were to start completely from scratch would this problem be an easy one? Yuval Greenfield

On Wed, Mar 14, 2012 at 7:34 PM, Yuval Greenfield <ubershmekel@gmail.com> wrote:
I've always had trouble understanding and explaining the complexities and intricacies of python packaging.
+1
I am the big proponent of user story/use case first approach, but somebody needs to show everyone how to do this properly. I've created a draft at http://wiki.python.org/moin/CodeDiscoveryUseCases - feel free to improve it.
If we were to start completely from scratch would this problem be an easy one?
With a list of user stories - yes. -- anatoly t.

Gah, wrong list. Please don't reply here - that message will be showing up on import-sig shortly.

On Mon, Mar 12, 2012 at 5:03 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Hi Nick, The write-up was a little unclear on a main point and I think that's contributed to some confusion here. The path search will continue to work in exactly the same way as it does now, with one difference. Instead of the current ImportError when nothing matches, the mechanism for namespace packages would be used. The mechanism would create a namespace package with a __path__ matching the paths corresponding to all namespace package "portions". The likely implementation will simply track the namespace package __path__ during the initial (normal) path search and use it only when there are no matching modules nor regular packages. Packages without __init__.py would only be allowed for namespace packages. So effectively namespace packages would be problematic for PEP 395, but not normal packages. Ultimately this is a form of PEP 402 without so much complexity. The trade-off is it requires a new kind of package. As far as I understand them, most of your concerns are based on the idea that namespace packages would be included in the initial traversal of sys.path, which is not the case. It sounds like there are a couple points you made that may still need attention, but hopefully this at least helps clarify what we talked about. -eric

On Tue, Mar 13, 2012 at 10:03 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
It has been pointed out that the above is based on a misreading of MvL's email. So, consider the following backwards compatibility concern instead: Many projects use the following snippet to find a json module: try: import json except ImportError: import simplejson as json Now, this particular snippet should still work fine with implicit package directories (even if a non-Python json directory exists on sys.path), since there *will* be a real json module in the standard library to find and the simplejson fallback won't be needed. However, for the general case: try: import foo except ImportError: import foobar as foo Then implicit package directories pose a backwards compatibility problem (specifically, if "foo" does not exist as a module or explicit package on sys.path, but there is a non-Python "foo/" directory, then "foo" will be silently be created as an empty package rather than falling back to "foobar"). Sure, the likelihood of that actually affecting anyone is fairly remote (although all it really takes is one broken uninstaller leaving a "foo" dir in site-packages), but we've rejected proposals in the past over smaller concerns than this. *Now*, my original comment about the consensus view rejecting complexity from PEP 402 by disregarding backwards compatibility concerns becomes accurate. PEP 402 addressed this issue specifically by disallowing direct imports of implicit packages (only finding them later when searching for submodules). This is in fact the motivating case given for that behaviour in the PEP: http://www.python.org/dev/peps/pep-0402/#backwards-compatibility-and-perform... So, *why* are we adopting implicit packages again, given all the challenges they pose? What, exactly, is the problem with a ".pyp" extension that makes all this additional complexity the preferred choice? So far, I've only heard two *positive* statements in favour of implicit package directories: 1. Java/Perl/etc do it that way. I've already made it clear that I don't care about that argument. If it was all that compelling, we'd have implicit self by now. (However, clearly Guido favours it in this case, given his message that arrived while I was writing this one) 2. It (arguably) makes it easier to convert an existing package into a namespace package With implicit package directories, you just delete your empty __init__.py file to turn an existing package into a namespace package. With a PEP 382 style directory suffix, you have to change your directory name to append the ".pyp" (and, optionally, delete your __init__.py file, since it's now going to be ignored anyway). Barry's also tried to convince me that ".pyp" directories are somehow harder for distributions to deal with, but his only example looked like trying to use "yield from" in Python 3.2 and then complaining when it didn't work. However, so long as the backwards compatibility from PEP 402 is incorporated, and the new PEP proposed a specific addition to the tutorial to document the "never CD into a package, never double-click a file in a package to execute it, always use -m to execute modules from inside packages" guideline (and makes it clear that you may get strange and unpredictable behaviour if you ever break it), then I can learn to live with it. IDLE should also be updated to allow correct execution of submodules via F5 (I guess it will need some mechanism to be told what working directories to add to sys.path). It still seems to me that moving to a marker *suffix* (rather than a marker file) as PEP 382 proposes brings all the real practical benefits of implicit package directories (i.e. no empty __init__.py files wasting space) and absolutely *none* of the pain (i.e. no backwards compatibility concerns, no ambiguity in the filesystem to module hierarchy mapping, still able to fix direct execution of modules inside packages rather than having to explain forevermore why it doesn't work), but Guido clearly feels otherwise. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Mar 13, 2012 at 4:07 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I think this paragraph really gets to the heart of what I'm objecting to. I agree wholeheartedly with the objective of eliminating __init__.py files, there's no need to convince me of that. However, *two* proposals were made to that end: PEP 382 kept the explicit marker, simply changing it to a directory suffix rather than a separate file. Simple, clean, straightforward, minimalist, effective. PEP 402 threw away the marker entirely, and then had to patch the package finding algorithm with a whole series of complications to avoid breaking backwards compatibility with Python 3.2. It also has the side effect of eliminating the 1:1 mapping between the filesystem and the module hierarchy. Once we lose that, there's no going back. What I really want out of the new PEP is a clear rationale for why the horrible package finding algorithm hacks needed to make the PEP 402 approach work in a backwards compatible way are to be preferred to the explicitly marked PEP 382 approach which *doesn't pose a backwards compatibility problem in the first place*. The other thing to keep in mind is that, if, for whatever reason, we decided further down the road that the explicit directory suffix solution wasn't good enough, then *we could change our minds* and allow implicit package directories after all (just as the formats for valid C extension module names have changed over time). There's no such freedom with implicit package directories - once they're in, they're in and we can never introduce a requirement for an explicit marker again without breaking working packages. Is it so bad that I want us to take baby steps here, rather than jumping straight to the implicit solution? Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Mar 12, 2012 at 11:35 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I think it comes down to this: I really, really, really hate directories with a suffix. I'd like to point out that the suffix is also introducing a backwards incompatibility: everybody will have to teach their tools, IDEs, and brains about .pyp directories, and they will also have to *rename* their directories (*if* they want to benefit from the new feature). Renaming directories is a huge pain -- I counted over 400 directories in Django, so that would mean renaming over 400 renames. In my experience renaming a directory is a huge pain no matter which version control system you use -- yes, it can be done, and modern VCSes have some support for renaming, but it's still a huge mess. Importing patches will be painful. Producing diffs across the renames will be hugely painful. I just think there are too many tools that won't know how to deal with this. (I just did a little experiment: I cloned a small project using Hg and renamed one directory. Then I made a small change to one of the files whose parent was renamed. I have not figured out how to get a diff between the latest version of that file and any version before the mass renaming; the renaming is shown as a delete of the entire old file and an add of the entire new file. Even if you can tell me how to do this, my point stays: it's not easy to figure out. Similarly for logs: by default, "hg log" stops at the rename. You must add --follow to see logs across the rename.) And regardless of which PEP we adopt, there will still be two types of package directories: PEP 382 still maintains backwards compatibility with directories that don't have a suffix but do have an __init__.py. So the unification still remains elusive. And at the end of the day I still really, really, really hate directories with a suffix. -- --Guido van Rossum (python.org/~guido)

Guido van Rossum <guido@python.org> writes:
And at the end of the day I still really, really, really hate directories with a suffix.
+1 -- \ “Good morning, Pooh Bear”, said Eeyore gloomily. “If it is a | `\ good morning”, he said. “Which I doubt”, said he. —A. A. Milne, | _o__) _Winnie-the-Pooh_ | Ben Finney

On 13 Mar, 2012, at 9:15, Guido van Rossum wrote:
Directories with a suffix have the advantage that you could teach GUIs to treat them differently, filemanagers could for example show a ".pyp" directory as a folder with a python logo just like ".py" files are shown as documents with a python logo. With the implicit approach it is much harder to recognize python packages as such without detailed knowledge about the import algorithm and python search path. Ronald

On 3/20/2012 11:49 AM, Ronald Oussoren wrote:
Package directories are files and can be imported to make modules. I think it would have been nice to use .pyp from the beginning. It would make Python easier to learn. Also, 'import x' would mean simply mean "Search sys.path directories for a file named 'x.py*', with no need for either the importer (or human reader) to look within directories for the magic __init__.py file. Sorting a directory listing by extension would sort all packages together. -- Terry Jan Reedy

On Mon, Mar 26, 2012 at 1:45 AM, Ronald Oussoren <ronaldoussoren@mac.com> wrote:
Yes. On what platform are you? On unixy platforms filename extensions are just a naming convention that can just as easily be used with directories.
IIUC that's how almost all filesystems treat them. However desktop software often assigns specific meanings to them -- the user can configure these, but there's a large set of predefined bindings too, and many key applications also play this game (since there is, frankly, not much else to go by -- some important file types are not easily guessable by reading their content, either because it's some esoteric binary format, or because it's something too universal, like XML). I know that's how it works on Windows and Mac, but I believe the Linux desktop things (the things I kill off or at lest ignore as soon as I log in :-) have the same idea. -- --Guido van Rossum (python.org/~guido)

On Tue, Mar 20, 2012 at 11:49, Ronald Oussoren <ronaldoussoren@mac.com>wrote:
OS X has made me dislike that possibility. Some git tools will make directories ending in .git be considered an opaque object in the file system, forcing me to drop into a shell or right-click and choose to inspect the directory in order to see its contents. -Brett

On 21 Mar, 2012, at 15:22, Brett Cannon wrote:
That's probably because those tools define ".git" directories as a package in their metadata and the finder won't show package contents by default (you can use the context menu of the finder to inspect the contents of packages, but that won't work in the file open/save panels). I'd have to experiment to be sure, but IIRC it is possible to assign icons to a suffix without making directories into packages. Ronald

On Mar 13, 2012, at 09:15 AM, Guido van Rossum wrote:
And at the end of the day I still really, really, really hate directories with a suffix.
I completely agree, for all the reasons you stated. Especially because would be extremely difficult to handle migrations from a pre-namespace-packages world to a post-namespace-packages world with directory suffixes. For example, let's say Debian/Ubuntu supports Python 3.2 and 3.3. We can continue to craft __init__.py files with the old-style namespace package code at installation time and pretty much do what we're currently doing. It's painful but the technology is there so it doesn't change much for us. But when we can drop support for < 3.3 (or we back port namespace package support to 3.2) then we can simply drop the code that creates these __init__.py files at installation time and we'll magically <wink> gain support for new-style namespace packages. With directory suffixes, I don't see how this is possible. I shudder to think what OS vendors will have to do to rename all the directories of *installed* packages, let alone have to rebuild all Python 3 packages to support the renamed directories, when they make the switch to a new-style world. catching-up-ly y'rs, -Barry

On 13 March 2012 06:07, Nick Coghlan <ncoghlan@gmail.com> wrote:
Whoa! I'm not sure I can. I just recently got bitten badly by this for real. The following was what I was doing: 1. I'm writing a package. 2. I'm trying to do the tests-as-I-develop approach (yes, I know I should have been doing this for years - so sue me :-)) 3. I have my tests as a subpackage of the main package. 3. I use the command line 4. I cd to the tests directory as that's the easiest way to edit tests: gvim test_dow<TAB> to edit test_download_binaries.py. And yes, I had endless trouble trying to work out why I can't then run the tests from the command line. I consider the trouble I have as a bug - it *should* work, in my view. I understand why what I'm doing is an edge case, but intuitively, I don't like it not working. I can change my practices, or use an IDE, or something. But my workflow will be less comfortable for me, and I don't feel that I understand why I should have to. I *think* that what Nick is proposing is a fix for this (and if it isn't, can you fix this scenario too, please Nick? :-)) and the idea that it's going to get documented as "don't do that" strikes me as unfriendly, if not outright wrong. I also don't think it should be documented in the tutorial - it's not something a new developer hits, it's later on, when people are writing bigger, more complex code, that they hit it. In summary, I guess I support Nick's objections 3 and 4. We need a better response than just "don't do that", IMHO. Paul.

On Tue, Mar 13, 2012 at 8:35 AM, Paul Moore <p.f.moore@gmail.com> wrote:
Oh, but there are several other solutions. For example, if you set PYTHONPATH to the directory *containing* your toplevel package, new code could be added that will discover the true toplevel no matter where you are. This code doesn't exist today, but Nick is proposing something similar looking for __init__.py files; the code that tries to find the script directory as a subdirectory of some path on sys.path could be added there. Also, the code Nick currently proposes for PEP 395 is still useful if you add __init__.py files to your package -- if you did that you wouldn't have to set PYTHONPATH (assuming we do add this code to Python 3.3, of course). -- --Guido van Rossum (python.org/~guido)

On 13 March 2012 15:57, Guido van Rossum <guido@python.org> wrote:
I tend not to use PYTHONPATH - I'm on Windows, and environment variables aren't the obvious solution there so I tend to forget. Also, I tend to initially develop projects in subdirectories of my "junk" directory, which has all sorts of cruft in it, including .py files with random names. So setting PYTHONPATH to that could introduce all sorts to my namespace, which is a bit less than ideal. OTOH, I don't have a problem with __init__.py files, so something that correctly autodetects the right thing to add to sys.path based on the presence of __init__ files would be fine. All of which assumes that me simply being more organised isn't the real answer here :-) Paul.

On Mar 14, 2012 5:24 AM, "Paul Moore" <p.f.moore@gmail.com>
I set up my projects the same way you do - it's a good, self-contained structure. And beginners (at least the ones that used Stack Overflow when I was spending time there) seemed to like it as well. That's the reason PEP 395 uses it as its main example. Over on import-sig, Eric Snow suggested a revised implicit package tolerant search algorithm that's too slow to use on interpreter start up, but should be fine for generating better error messages if __main__ or an interactive import fails with ImportError, so I'll likely revise 395 to propose that instead. Cheers, Nick. -- Sent from my phone, thus the relative brevity :)

On Mon, Mar 12, 2012 at 11:07 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Honestly, I don't really care about "compatibility" with Java or Perl. However that *both* of those languages do it this way (BTW what does Ruby do?) is an argument that this is a *natural* or *intuitive* way of setting things up. In fact, Python today also uses this: a package P lives in a directory named P. Plain and simple. Users can immediately understand this. Collapsing multiple directories named P along sys.path is also pretty natural, given that we already have the behavior of searching along sys.path. The requirement of having an __init__.py file however is a wart.
I hope I've added some indication that it's also harder to deal with in version control systems.
Those are all sensible requests.
I expect pain in different places. -- --Guido van Rossum (python.org/~guido)

Oh, shit. Nick posted a bunch of messages to python-ideas instead of import-sig, and I followed up there. Instead of reposting, I'm just going to suggest that people interested in this discussion will, unfortunately, have to follow both lists. -- --Guido van Rossum (python.org/~guido)

On Wed, Mar 14, 2012 at 2:24 AM, Guido van Rossum <guido@python.org> wrote:
I hope I've added some indication that it's also harder to deal with in version control systems.
Yeah, given that part of my argument when I updated PEP 414 was "the lack of explicit unicode literals creates useless noise in version control diffs", I can hardly fault you for using a similar argument against changing package directory names! Hopefully Eric can capture this clearly in the new PEP so future readers will have a clear understanding of the trade-offs involved. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan <ncoghlan@gmail.com> writes:
Are you convinced by the argument that a directory representing a package should be named exactly the same as the package? That's the most convincing reason I can see (though many other reasons are strong too) for not introducing special cases for the name of the package's directory.
Hopefully Eric can capture this clearly in the new PEP so future readers will have a clear understanding of the trade-offs involved.
Agreed. Thanks for encouraging discussion and recording it, Eric. -- \ “Pinky, are you pondering what I'm pondering?” “I think so, | `\ Brain, but Zero Mostel times anything will still give you Zero | _o__) Mostel.” —_Pinky and The Brain_ | Ben Finney

I've always had trouble understanding and explaining the complexities and intricacies of python packaging. Is there a most basic but comprehensive list of use cases? IIUC they are: * Eg Standard library - import from a list of paths to be searched. * Eg This project - import from a relative path based on this file's current directory (which python has an odd syntax for). * Eg Distributed packages and virtual-env - import from a relative path based on an anchor directory. If we were to start completely from scratch would this problem be an easy one? Yuval Greenfield

On Wed, Mar 14, 2012 at 7:34 PM, Yuval Greenfield <ubershmekel@gmail.com> wrote:
I've always had trouble understanding and explaining the complexities and intricacies of python packaging.
+1
I am the big proponent of user story/use case first approach, but somebody needs to show everyone how to do this properly. I've created a draft at http://wiki.python.org/moin/CodeDiscoveryUseCases - feel free to improve it.
If we were to start completely from scratch would this problem be an easy one?
With a list of user stories - yes. -- anatoly t.
participants (13)
-
anatoly techtonik
-
Barry Warsaw
-
Ben Finney
-
Brett Cannon
-
Chris Rebert
-
Eric Snow
-
Guido van Rossum
-
Nick Coghlan
-
Paul Moore
-
Ronald Oussoren
-
Sven Marnach
-
Terry Reedy
-
Yuval Greenfield