[Import-SIG] My objections to implicit package directories

Thu Mar 15 21:17:26 CET 2012

On Thu, Mar 15, 2012 at 1:50 PM, Guido van Rossum <guido at python.org> wrote:

> How would you implement that anyway?
>

>From the PEP:

"""If the parent package does not exist, or exists but lacks a
__path__ attribute,
an attempt is first made to create a "virtual path" for the parent package
(following the algorithm described in the section on virtual
paths<http://www.python.org/dev/peps/pep-0402/#virtual-paths>,
below)."""

This is actually a pretty straightforward change to the import process; I
drafted a patch for importlib at one point, and somebody else created
another.

(The main difference from the new proposal is that you do have to go back
over the path list a second time in the event the parent package isn't
found; but there's no reason why the protocols in the PEP wouldn't allow
you to build and cache a virtual path while doing the first search, if
you're worried about the performance.)

> The import logic always tries to
> import the parent module before importing the child module. So the
> import attempt for "foo" has no idea whether it is imported as *part*
> of "import foo.bar", or as plain "import foo", or perhaps as part of
> "from foo import bar".
>

Actually, this isn't entirely true.   __import__ is called with 'foo.bar'
when you import foo.bar.  In importlib, it recursively invokes __import__
with parent portions, and in import.c, it loops left to right for the
parents.  Either way, it knows the difference throughout the process, and
it's fairly straightforward to backtrack and create the parent modules when
the submodule import succeeds.

It would also be odd to find that
>
>  import foo
>  import foo.bar
>
> would fail, whereas
>
>  import foo.bar
>  import foo
>
> would succeed, because as a side effect of "import foo.bar", a module
> object for foo is created and inserted as sys.modules['foo'].
>

Assuming we know that the foo subdirectories actually exist, the
ImportError would simply say, "Can't import namespace package 'foo' before
one of its modules or subpackages are imported".

Granted, that does seem a bit crufty.  I erred this direction in order to
avoid pitchforks coming from the backward-compatibility direction, on
account of the ease with which something can get messed up at a distance
without this condition, and in a way that may be hard to identify, if a
piece of code is using package presence to control optional features.

IOW, it's not like either proposal results in a perfect clean result for
everybody.  It's a choice of which group to upset, where one group is
developers fiddling with their import order (and getting an error message
that says how to fix it), and the other group is people whose code suddenly
crashes or behaves differently because somebody created a directory
somewhere they shouldn't have (and which they might not be able to delete
or remove from sys.path for one reason or another), and which was there and
worked okay before until they installed a new version of the application
that's built on a new version of Python.

That is, the backward compatibility problem can break an app in the field
that worked perfectly in the developer's testing, and only fails when it
gets to the end user who has no way of even knowing it could be a problem.

It's up to you decide which of those groups' pitchforks to face; I just
want to be clear about why the tradeoff was proposed the way it was.  It's
not that the backward compatibilty problem harms a lot of people, so much
as that when it harms them, it can harm them a lot (e.g. crashing), and at
*runtime*, compared to tweaking your import sequence during *development*
and getting a clear and immediate "don't do that."

Why crashing?  Because "try: import json" will succeed, and then the app
does json.foobar() and boom, an unexpected AttributeError.  Far fetched?
 Perhaps, but the worst runtime import ordering problem I can think of is
if you have a bad import that's working due to a global import ordering
that's determined at runtime because of plugin loading.  But if you have
that problem, you correct the bad import in the plugin and it can never
happen again.

Granted, directory naming conflicts can *also* be fixed by changing your
imports; you can (and should) "try: from json import foobar" instead.  But
there isn't any way for us to give the user or developer an error message
that *tells* them that, or even clues them in as to why the json module on
that user's machine seems to be borked whenever they run the app from a
certain directory...

> Finally, in your example, why on earth would unittest/mock/ exist as
> an empty directory???
>

It's definitely true that the impact is limited in scope; the things most
likely to be affected are generically-named top-level packages like json,
email, text, xml, html, etc., that could collide with other directories
lying around, AND it's a package name you try/import to test for the
presence of.

As I said though, it's just that when it happens, it can happen to an *end
user*, whereas import order crankiness can essentially only happen during
actual coding.  Also, nobody's come up with examples of breakage caused by
trying to import the namespace, on account of there aren't many use cases
for importing an empty namespace, vs use cases for having a 'json'
directory or some such.  ;-)

All this being said, if you're happy with the tradeoff, I'm happy with the
tradeoff.  I'm not the one they're gonna come after with the pitchforks.
 ;-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20120315/59a5a219/attachment-0001.html>