[Import-SIG] My objections to implicit package directories
Guido van Rossum
guido at python.org
Thu Mar 15 22:44:59 CET 2012
On Thu, Mar 15, 2012 at 1:17 PM, PJ Eby <pje at telecommunity.com> wrote:
> On Thu, Mar 15, 2012 at 1:50 PM, Guido van Rossum <guido at python.org> wrote:
>> How would you implement that anyway?
> From the PEP:
> """If the parent package does not exist, or exists but lacks
> a __path__ attribute, an attempt is first made to create a "virtual path"
> for the parent package (following the algorithm described in the section
> on virtual paths, below)."""
> This is actually a pretty straightforward change to the import process; I
> drafted a patch for importlib at one point, and somebody else created
> (The main difference from the new proposal is that you do have to go back
> over the path list a second time in the event the parent package isn't
> found; but there's no reason why the protocols in the PEP wouldn't allow you
> to build and cache a virtual path while doing the first search, if you're
> worried about the performance.)
>> The import logic always tries to
>> import the parent module before importing the child module. So the
>> import attempt for "foo" has no idea whether it is imported as *part*
>> of "import foo.bar", or as plain "import foo", or perhaps as part of
>> "from foo import bar".
> Actually, this isn't entirely true. __import__ is called with 'foo.bar'
> when you import foo.bar. In importlib, it recursively invokes __import__
> with parent portions, and in import.c, it loops left to right for the
> parents. Either way, it knows the difference throughout the process, and
> it's fairly straightforward to backtrack and create the parent modules when
> the submodule import succeeds.
>> It would also be odd to find that
>> import foo
>> import foo.bar
>> would fail, whereas
>> import foo.bar
>> import foo
>> would succeed, because as a side effect of "import foo.bar", a module
>> object for foo is created and inserted as sys.modules['foo'].
> Assuming we know that the foo subdirectories actually exist, the ImportError
> would simply say, "Can't import namespace package 'foo' before one of its
> modules or subpackages are imported".
> Granted, that does seem a bit crufty. I erred this direction in order to
> avoid pitchforks coming from the backward-compatibility direction, on
> account of the ease with which something can get messed up at a distance
> without this condition, and in a way that may be hard to identify, if a
> piece of code is using package presence to control optional features.
> IOW, it's not like either proposal results in a perfect clean result for
> everybody. It's a choice of which group to upset, where one group is
> developers fiddling with their import order (and getting an error message
> that says how to fix it), and the other group is people whose code suddenly
> crashes or behaves differently because somebody created a directory
> somewhere they shouldn't have (and which they might not be able to delete or
> remove from sys.path for one reason or another), and which was there and
> worked okay before until they installed a new version of the application
> that's built on a new version of Python.
> That is, the backward compatibility problem can break an app in the field
> that worked perfectly in the developer's testing, and only fails when it
> gets to the end user who has no way of even knowing it could be a problem.
> It's up to you decide which of those groups' pitchforks to face; I just want
> to be clear about why the tradeoff was proposed the way it was. It's not
> that the backward compatibilty problem harms a lot of people, so much as
> that when it harms them, it can harm them a lot (e.g. crashing), and at
> *runtime*, compared to tweaking your import sequence during *development*
> and getting a clear and immediate "don't do that."
> Why crashing? Because "try: import json" will succeed, and then the app
> does json.foobar() and boom, an unexpected AttributeError. Far fetched?
> Perhaps, but the worst runtime import ordering problem I can think of is if
> you have a bad import that's working due to a global import ordering that's
> determined at runtime because of plugin loading. But if you have that
> problem, you correct the bad import in the plugin and it can never happen
> Granted, directory naming conflicts can *also* be fixed by changing your
> imports; you can (and should) "try: from json import foobar" instead. But
> there isn't any way for us to give the user or developer an error message
> that *tells* them that, or even clues them in as to why the json module on
> that user's machine seems to be borked whenever they run the app from a
> certain directory...
>> Finally, in your example, why on earth would unittest/mock/ exist as
>> an empty directory???
> It's definitely true that the impact is limited in scope; the things most
> likely to be affected are generically-named top-level packages like json,
> email, text, xml, html, etc., that could collide with other directories
> lying around, AND it's a package name you try/import to test for the
> presence of.
> As I said though, it's just that when it happens, it can happen to an *end
> user*, whereas import order crankiness can essentially only happen during
> actual coding. Also, nobody's come up with examples of breakage caused by
> trying to import the namespace, on account of there aren't many use cases
> for importing an empty namespace, vs use cases for having a 'json' directory
> or some such. ;-)
> All this being said, if you're happy with the tradeoff, I'm happy with the
> tradeoff. I'm not the one they're gonna come after with the pitchforks.
Yeah, I'm still happy with the tradeoff, even though it's a case of
picking your poison. In this case I much prefer having simpler rules
going forward than bending over *too* far for backward compatibility
-- even if we still have two types of packages, those with an
__init__.py and those without.
Also, it's not like there aren't already 50 ways to break things with
odd paths or modules -- I don't know if it's more likely that a user
would create an unrelated directory named json or an unrelated module
named json.py. At least we're now also *removing* a way to break
things: forgetting an empty __init__.py. Gentlemen, sharpen your
--Guido van Rossum (python.org/~guido)
More information about the Import-SIG