[Import-SIG] My objections to implicit package directories

Thu Mar 15 22:44:59 CET 2012

On Thu, Mar 15, 2012 at 1:17 PM, PJ Eby <pje at telecommunity.com> wrote:
> On Thu, Mar 15, 2012 at 1:50 PM, Guido van Rossum <guido at python.org> wrote:
>>
>> How would you implement that anyway?
>
>
> From the PEP:
>
> """If the parent package does not exist, or exists but lacks
> a __path__ attribute, an attempt is first made to create a "virtual path"
> for the parent package (following the algorithm described in the section
> on virtual paths, below)."""
>
> This is actually a pretty straightforward change to the import process; I
> drafted a patch for importlib at one point, and somebody else created
> another.
>
> (The main difference from the new proposal is that you do have to go back
> over the path list a second time in the event the parent package isn't
> found; but there's no reason why the protocols in the PEP wouldn't allow you
> to build and cache a virtual path while doing the first search, if you're
> worried about the performance.)
>
>>
>> The import logic always tries to
>> import the parent module before importing the child module. So the
>> import attempt for "foo" has no idea whether it is imported as *part*
>> of "import foo.bar", or as plain "import foo", or perhaps as part of
>> "from foo import bar".
>
>
> Actually, this isn't entirely true.   __import__ is called with 'foo.bar'
> when you import foo.bar.  In importlib, it recursively invokes __import__
> with parent portions, and in import.c, it loops left to right for the
> parents.  Either way, it knows the difference throughout the process, and
> it's fairly straightforward to backtrack and create the parent modules when
> the submodule import succeeds.
>
>
>> It would also be odd to find that
>>
>>  import foo
>>  import foo.bar
>>
>> would fail, whereas
>>
>>  import foo.bar
>>  import foo
>>
>> would succeed, because as a side effect of "import foo.bar", a module
>> object for foo is created and inserted as sys.modules['foo'].
>
>
> Assuming we know that the foo subdirectories actually exist, the ImportError
> would simply say, "Can't import namespace package 'foo' before one of its
> modules or subpackages are imported".
>
> Granted, that does seem a bit crufty.  I erred this direction in order to
> avoid pitchforks coming from the backward-compatibility direction, on
> account of the ease with which something can get messed up at a distance
> without this condition, and in a way that may be hard to identify, if a
> piece of code is using package presence to control optional features.
>
> IOW, it's not like either proposal results in a perfect clean result for
> everybody.  It's a choice of which group to upset, where one group is
> developers fiddling with their import order (and getting an error message
> that says how to fix it), and the other group is people whose code suddenly
> crashes or behaves differently because somebody created a directory
> somewhere they shouldn't have (and which they might not be able to delete or
> remove from sys.path for one reason or another), and which was there and
> worked okay before until they installed a new version of the application
> that's built on a new version of Python.
>
> That is, the backward compatibility problem can break an app in the field
> that worked perfectly in the developer's testing, and only fails when it
> gets to the end user who has no way of even knowing it could be a problem.
>
> It's up to you decide which of those groups' pitchforks to face; I just want
> to be clear about why the tradeoff was proposed the way it was.  It's not
> that the backward compatibilty problem harms a lot of people, so much as
> that when it harms them, it can harm them a lot (e.g. crashing), and at
> *runtime*, compared to tweaking your import sequence during *development*
> and getting a clear and immediate "don't do that."
>
> Why crashing?  Because "try: import json" will succeed, and then the app
> does json.foobar() and boom, an unexpected AttributeError.  Far fetched?
>  Perhaps, but the worst runtime import ordering problem I can think of is if
> you have a bad import that's working due to a global import ordering that's
> determined at runtime because of plugin loading.  But if you have that
> problem, you correct the bad import in the plugin and it can never happen
> again.
>
> Granted, directory naming conflicts can *also* be fixed by changing your
> imports; you can (and should) "try: from json import foobar" instead.  But
> there isn't any way for us to give the user or developer an error message
> that *tells* them that, or even clues them in as to why the json module on
> that user's machine seems to be borked whenever they run the app from a
> certain directory...
>
>
>>
>> Finally, in your example, why on earth would unittest/mock/ exist as
>> an empty directory???
>
>
> It's definitely true that the impact is limited in scope; the things most
> likely to be affected are generically-named top-level packages like json,
> email, text, xml, html, etc., that could collide with other directories
> lying around, AND it's a package name you try/import to test for the
> presence of.
>
> As I said though, it's just that when it happens, it can happen to an *end
> user*, whereas import order crankiness can essentially only happen during
> actual coding.  Also, nobody's come up with examples of breakage caused by
> trying to import the namespace, on account of there aren't many use cases
> for importing an empty namespace, vs use cases for having a 'json' directory
> or some such.  ;-)
>
> All this being said, if you're happy with the tradeoff, I'm happy with the
> tradeoff.  I'm not the one they're gonna come after with the pitchforks.
>  ;-)

Yeah, I'm still happy with the tradeoff, even though it's a case of
picking your poison. In this case I much prefer having simpler rules
going forward than bending over *too* far for backward compatibility
-- even if we still have two types of packages, those with an
__init__.py and those without.

Also, it's not like there aren't already 50 ways to break things with
odd paths or modules -- I don't know if it's more likely that a user
would create an unrelated directory named json or an unrelated module
named json.py. At least we're now also *removing* a way to break
things: forgetting an empty __init__.py. Gentlemen, sharpen your
pitchforks! :)

-- 
--Guido van Rossum (python.org/~guido)