Mailman 3 My objections to implicit package directories - Python-ideas

March 12, 2012

      It seems the consensus at the PyCon US sprints is that implicit
package directories are a wonderful idea and we should have more of
those. I still disagree (emphatically), but am prepared to go along
with it so long as my documented objections are clearly and explicitly
addressed in the new combined PEP, and the benefits ascribed to
implicit package directories in the new PEP are more compelling than
"other languages do it that way, so we should too".

To save people having to trawl around various mailing list threads and
reading through PEP 395, I'm providing those objections in a
consolidated form here. If reading these objections in one place
causes people to have second thoughts about the wisdom of implicit
package directories, even better.

1. Implicit package directories go against the Zen of Python

Getting this one out of the way first. As I see it, implicit package
directories violate at least 4 of the design principles in the Zen:
- Explicit is better than implicit (my calling them implicit package
directories is a deliberate rhetorical ploy to harp on this point,
although it's also an accurate name)
- If the implementation is hard to explain, it's a bad idea (see the
section about backwards compatibility challenges)
- Readability counts (see the section about introducing ambiguity into
filesystem layouts)
- Errors should never pass silently (see the section about implicit
relative imports from main)

2. Implicit package directories pose awkward backwards compatibility challenges

It concerns me gravely that the consensus proposal MvL posted is
*backwards incompatible with Python 3.2*, as it deliberately omits one
of the PEP 402 features that provided that backwards compatibility.
Specifically, under the consensus, a subdirectory "foo" of a directory
on sys.path will shadow a "foo.py" or "foo/__init__.py" that appears
later on sys.path. As Python 3.2 would have found that latter
module/package correctly, this is an unacceptable breach of the
backwards compatibility requirements. PEP 402 at least got this right
by always executing the first "foo.py" or "foo/__init__.py" it found,
even if
another "foo" directory was found earlier in sys.path.

We can't just wave that additional complexity away if an implicit
package directory proposal is going to remain backwards compatible
with current layouts (e.g. if an application's starting directory
included a "json" subfolder containing json files rather than Python
code, the consensus approach as posted by MvL would render the
standard library's json module inaccessible)

3. Implicit package directories introduce ambiguity into filesystem layouts

With the current Python package design, there is a clear 1:1 mapping
between the filesystem layout and the module hierarchy. For example:

    parent/  # This directory goes on sys.path
        project/  # The "project" package
            __init__.py  # Explicit package marker
            code.py  # The "project.code" module
            tests/  # The "project.tests" package
                __init__.py  # Explicit package marker
                test_code.py  # The "projects.tests.test_code" module

Any explicit package directory approach will preserve this 1:1
mapping. For example, under PEP 382:

    parent/  # This directory goes on sys.path
        project.pyp/  # The "project" package
            code.py  # The "project.code" module
            tests.pyp/  # The "project.tests" package
                test_code.py  # The "projects.tests.test_code" module

With implicit package directories, you can no longer tell purely from
the code structure which directory is meant to be added to sys.path,
as there are at least two valid mappings to the Python module
hierarchy:

    parent/  # This directory goes on sys.path
        project/  # The "project" package
            code.py  # The "project.code" module
            tests/  # The "project.tests" package
                test_code.py  # The "projects.tests.test_code" module

    parent/
        project/  # This directory goes on sys.path
            code.py  # The "code" module
            tests/  # The "tests" package
                test_code.py  # The "tests.test_code" module

What are implicit package directories buying us in exchange for this
inevitable ambiguity? What can we do with them that can't be done with
explicit package directories? And no, "Java does it that way" is not a
valid argument.

4. Implicit package directories will permanently entrench current
newbie-hostile behaviour in __main__

It's a fact of life that Python beginners learn that they can do a
quick sanity check on modules they're writing by including an "if
__name__ == '__main__':" section at the end and doing one of 3 things:
- run "python mymodule.py"
- hit F5 (or the relevant hot key) in their IDE
- double click the module in their filesystem browser
- start the Python REPL and do "import mymodule"

However, there are some serious caveats to that as soon as you move
the module inside a package:
- if you use explicit relative imports, you can import it, but not run
it directly using any of the above methods
- if you rely on implicit relative imports, the above direct execution
methods should work most of the time, but you won't be able to import
it
- if you use absolute imports for your own package, nothing will work
(unless the parent directory for your package is already on sys.path)
- if you only use absolute imports for *other* packages, everything
should be fine

The errors you get in these cases are *horrible*. The interpreter
doesn't really know what is going on, so it gives the user bad error
messages.

In large part, the "Why are my imports broken?" section in PEP 395
exists because I sat down to try to document what does and doesn't
work when you attempt to directly execute a module from inside a
package directory. In building the list of what would work properly
("python -m" from the parent directory of the package) and what would
sometimes break (everything else), I realised that instead of
documenting the entire hairy mess, the 1:1 mapping from the filesystem
layout to the Python module hierarchy meant we could *just fix it* to
not do the wrong thing by default. If implicit package directories are
blessed for inclusion in Python 3.3, that opportunity is lost forever
- with the loss of the unambiguous 1:1 mapping from the filesystem
layout to the module hierarchy, it's no longer possible for the
interpreter to figure out the right thing to do without guessing.

PJE proposed that newbies be instructed to add the following
boilerplate to their modules if they want to use "if __name__ ==
'__main__':" for sanity checking:

    import pkgutil
    pkgutil.script_module(__name__, 'project.code.test_code')

This completely defeats the purpose of having explicit relative
imports in the language, as it embeds the absolute name of the module
inside the module itself. If a package subtree is ever moved or
renamed, you will have to manually fix every script_module()
invocation in that subtree. Double-keying data like this is just plain
bad design. The package structure should be recorded explicitly in
exactly one place: the filesystem.

PJE has other objections to the PEP 395 proposal, specifically
relating to its behaviour on package layouts where the directories
added to sys.path contain __init__.py files, such that the developer's
intent is not accurately reflected in their filesystem layout. Such
layouts are *broken*, and the misbehaviour under PEP 395 won't be any
worse than the misbehaviour with the status quo (sys.path[0] is set
incorrectly in either case, it will just be fixable under PEP 395 by
removing the extraneous __init__.py files). A similar argument applies
to cases where a parent package __init__ plays games with sys.path
(although the PEP 395 algorithm could likely be refined to better
handle that situation). Regardless, if implicit package directories
are accepted into Python 3.3 in any form, I *will* be immediately
marking PEP 395 as Rejected due to incompatibility with an accepted
PEP. I'll then (eventually, once I'm less annoyed about the need to do
so) write a new PEP to address a subset of the issues previously
covered by PEP 395 that omits any proposals that rely on explicit
package directories.

Also, I consider it a requirement that any implicit packages PEP
include an update to the tutorial to explain to beginners what will
and won't work when they attempt to directly execute a module from
inside a Python package. After all, such a PEP is closing off any
possibility of ever fixing the problem: it should have to deal with
the consequences.

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia

My objections to implicit package directories

tags

participants (13)