[Python-Dev] PEP 402: Simplified Package Layout and Partitioning

Glyph Lefkowitz glyph at twistedmatrix.com
Fri Aug 12 23:03:47 CEST 2011


On Aug 12, 2011, at 2:33 PM, P.J. Eby wrote:

> At 01:09 PM 8/12/2011 -0400, Glyph Lefkowitz wrote:
>> Upon further reflection, PEP 402 _will_ make dealing with namespace packages from this code considerably easier: we won't need to do AST analysis to look for a __path__ attribute or anything gross like that improve correctness; we can just look in various directories on sys.path and accurately predict what __path__ will be synthesized to be.
> 
> The flip side of that is that you can't always know whether a directory is a virtual package without deep inspection: one consequence of PEP 402 is that any directory that contains a Python module (of whatever type), however deeply nested, will be a valid package name.  So, you can't rule out that a given directory *might* be a package, without walking its entire reachable subtree.  (Within the subset of directory names that are valid Python identifiers, of course.)

Are there any rules about passing invalid identifiers to __import__ though, or is that just less likely? :)

> However, you *can* quickly tell that a directory *might* be a package or is *probably* one: if it contains modules, or is the same name as an already-discovered module, it's a pretty safe bet that you can flag it as such.

I still like the idea of a 'marker' file.  It would be great if there were a new marker like "__package__.py".  I say this more for the benefit of users looking at a directory on their filesystem and trying to understand whether this is a package or not than I do for my own programmatic tools though; it's already hard enough to understand the package-ness of a part of your filesystem and its interactions with PYTHONPATH; making directories mysteriously and automatically become packages depending on context will worsen that situation, I think.

I also have this not-terribly-well-defined idea that it would be handy for different providers of the _contents_ of namespace packages to provide their own instrumentation to be made aware that they've been added to the __path__ of a particular package.  This may be a solution in search of a problem, but I imagine that each __package__.py would be executed in the same module namespace.  This would allow namespace packages to do things like set up compatibility aliases, lazy imports, plugin registrations, etc, as they currently do with __init__.py.  Perhaps it would be better to define its relationship to the package-module namespace in a more sensible way than "execute all over each other in no particular order".

Also, if I had my druthers, Python would raise an exception if someone added a directory marked as a package to sys.path, to refuse to import things from it, and when a submodule was run as a script, add the nearest directory not marked as a package to sys.path, rather than the script's directory itself.  The whole "__name__ is wrong because your current directory was wrong when you ran that command" thing is so confusing to explain that I hope we can eventually consign it to the dustbin of history.  But if you can't even reasonably guess whether a directory is supposed to be an entry on sys.path or a package, that's going to be really hard to do.

> In any case, you probably should *not* do the building of a virtual path yourself; the protocols and APIs added by PEP 402 should allow you to simply ask for the path to be constructed on your behalf.  Otherwise, you are going to be back in the same business of second-guessing arbitrary importer backends again!

What do you mean "building of a virtual path"?

> (E.g. note that PEP 402 does not say virtual package subpaths must be filesystem or zipfile subdirectories of their parents - an importer could just as easily allow you to treat subdirectories named 'twisted.python' as part of a virtual package with that name!)
> 
> Anyway, pkgutil defines some extra methods that importers can implement to support module-walking, and part of the PEP 402 implementation should be to make this support virtual packages as well.

The more that this can focus on module-walking without executing code, the happier I'll be :).

>> This code still needs to support Python 2.4, but I will make a note of this for future reference.
> 
> A suggestion: just take the pkgutil code and bundle it for Python 2.4 as something._pkgutil.  There's very little about it that's 2.5+ specific, at least when I wrote the bits that do the module walking.
> 
> Of course, the main disadvantage of pkgutil for your purposes is that it currently requires packages to be imported in order to walk their child modules.  (IIRC, it does *not*, however, require them to be imported in order to discover their existence.)

One of the stipulations of this code is that it might give different results when the modules are loaded and not.  So it's fine to inspect that first and then invoke pkgutil only in the 'loaded' case, with the knowledge that the not-loaded case may be incorrect in the face of certain configurations.

>> In that case, I guess it's a good thing; these bugs should be dealt with.  Thanks for pointing them out.  My opinion of PEP 402 has been completely reversed - although I'd still like to see a section about the module system from a library/tools author point of view rather than a time-traveling perl user's narrative :).
> 
> LOL.
> 
> If you will propose the wording you'd like to see, I'll be happy to check it for any current-and-or-future incorrect assumptions.  ;-)

If I can come up with anything I will definitely send it along.

-glyph
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110812/77ed67a5/attachment.html>


More information about the Python-Dev mailing list