[Python-Dev] PEP 402: Simplified Package Layout and Partitioning

P.J. Eby pje at telecommunity.com
Thu Aug 11 20:30:51 CEST 2011


At 04:39 PM 8/11/2011 +0200, Éric Araujo wrote:
>Hi,
>
>I've read PEP 402 and would like to offer comments.

Thanks.

>Minor: I would reserve "packaging" for
>packaging/distribution/installation/deployment matters, not Python
>modules.  I suggest "Python package semantics".

Changing to "Python package import semantics" to hopefully be even 
clearer.  ;-)

(Nitpick: I was somewhat intentionally ambiguous because we are 
talking here about how a package is physically implemented in the 
filesystem, and that actually *is* kind of a packaging issue.  But 
it's not necessarily a *useful* intentional ambiguity, so I've no 
problem with removing it.)


>Minor: In the UNIX world, or with version control tools, moving and
>renaming are the same one thing (hg mv spam.py spam/__init__.py for
>example).  Also, if you turn a module into a package, you may want to
>move code around, change imports, etc., so I'm not sure the renaming
>part is such a big step.  Anyway, if the import-sig people say that
>users think it's a complex or costly operation, I can believe it.

It's not that it's complex or costly in anything other than *mental* 
overhead -- you have to remember to do it and it's not particularly 
obvious.  (But people on import-sig did mention this and other things 
covered by the PEP as being a frequent root cause of beginner 
inquiries on #python, Stackoverflow, et al.)


> > (By the way, both of these additions to the import protocol (i.e. the
> > dynamically-added ``__path__``, and dynamically-created modules)
> > apply recursively to child packages, using the parent package's
> > ``__path__`` in place of ``sys.path`` as a basis for generating a
> > child ``__path__``.  This means that self-contained and virtual
> > packages can contain each other without limitation, with the caveat
> > that if you put a virtual package inside a self-contained one, it's
> > gonna have a really short ``__path__``!)
>I don't understand the caveat or its implications.

Since each package's __path__ is the same length or shorter than its 
parent's by default, then if you put a virtual package inside a 
self-contained one, it will be functionally speaking no different 
than a self-contained one, in that it will have only one path 
entry.  So, it's not really useful to put a virtual package inside a 
self-contained one, even though you can do it.  (Apart form it 
letting you avoid a superfluous __init__ module, assuming it's indeed 
superfluous.)


> > In other words, we don't allow pure virtual packages to be imported
> > directly, only modules and self-contained packages.  (This is an
> > acceptable limitation, because there is no *functional* value to
> > importing such a package by itself.  After all, the module object
> > will have no *contents* until you import at least one of its
> > subpackages or submodules!)
> >
> > Once ``zc.buildout`` has been successfully imported, though, there
> > *will* be a ``zc`` module in ``sys.modules``, and trying to import it
> > will of course succeed.  We are only preventing an *initial* import
> > from succeeding, in order to prevent false-positive import successes
> > when clashing subdirectories are present on ``sys.path``.
>I find that limitation acceptable.  After all, there is no zc project,
>and no zc module, just a zc namespace.  I'll just regret that it's not
>possible to provide a module docstring to inform that this is a
>namespace package used for X and Y.

It *is* possible - you'd just have to put it in a "zc.py" file.  IOW, 
this PEP still allows "namespace-defining packages" to exist, as was 
requested by early commenters on PEP 382.  It just doesn't *require* 
them to exist in order for the namespace contents to be importable.


> > The resulting list (whether empty or not) is then stored in a
> > ``sys.virtual_package_paths`` dictionary, keyed by module name.
>This was probably said on import-sig, but here I go: yet another import
>artifact in the sys module!  I hope we get ImportEngine in 3.3 to clean
>up all this.

Well, I rather *like* having them there, personally, vs. having to 
learn yet another API, but oh well, whatever.  AFAIK, ImportEngine 
isn't going to do away with the need for the global ones to live 
somewhere, at least not in 3.3.


> > * A new ``extend_virtual_paths(path_entry)`` function, to extend
> >   existing, already-imported virtual packages' ``__path__`` attributes
> >   to include any portions found in a new ``sys.path`` entry.  This
> >   function should be called by applications extending ``sys.path``
> >   at runtime, e.g. when adding a plugin directory or an egg to the
> >   path.
>Let's imagine my application Spam has a namespace spam.ext for plugins.
>  To use a custom directory where plugins are stored, or a zip file with
>plugins (I don't use eggs, so let me talk about zip files here), I'd
>have to call sys.path.append *and* pkgutil.extend_virtual_paths?

As written in the current proposal, yes.  There was some discussion 
on Python-Dev about having this happen automatically, and I proposed 
that it could be done by making virtual packages' __path__ attributes 
an iterable proxy object, rather than a list:

   http://mail.python.org/pipermail/python-dev/2011-July/112429.html

(This is an open option that hasn't been added to the PEP as yet, 
because I wanted to know Guido's thoughts on the proposal as it 
stands before burdening it with more implementation detail for a 
feature (automatic updates) that he might not be very keen on to 
begin with, even it does make the semantics that much more familiar 
for Perl or PHP users.)


> > * ``ImpImporter.iter_modules()`` should be changed to also detect and
> >   yield the names of modules found in virtual packages.
>Is there any value in providing an argument to get the pre-PEP behavior?
>  Or to look at it from a different place, how can Python code know that
>some module is a virtual or pure virtual package, if that is even a
>useful thing to know?

Is it a useful thing?  Dunno.  That's why it's open for comment.  If 
the auto-update approach is used, then the __path__ of virtual 
packages will have a distinguishable type().


> > Last, but not least, the ``imp`` module (or ``importlib``, if
> > appropriate) should expose the algorithm described in the `virtual
> > paths`_ section above, as a
> > ``get_virtual_path(modulename, parent_path=None)`` function, so that
> > creators of ``__import__`` replacements can use it.
>If I'm not mistaken, the rule of thumb these days is that imp is edited
>when it's absolutely necessary, otherwise code goes into importlib (more
>easily written, read and maintained).
>
>I wonder if importlib.import_module could implement the new import
>semantics all by itself, so that we can benefit from this PEP in older
>Pythons (importlib is on PyPI).
AFAIK, *that* importlib doesn't include a reimplementation of the 
full import process, though I suppose I could be wrong.  My personal 
plan was just to create a specific pep382 module to include with 
future versions of setuptools, but as things worked out, I'm not sure 
if that'll be sanely doable for pep402.


> > * If you are changing a currently self-contained package into a
> >   virtual one, it's important to note that you can no longer use its
> >   ``__file__`` attribute to locate data files stored in a package
> >   directory.  Instead, you must search ``__path__`` or use the
> >   ``__file__`` of a submodule adjacent to the desired files, or
> >   of a self-contained subpackage that contains the desired files.
>Wouldn't pkgutil.get_data help here?

Not so long as you passed it a package name instead of a module 
name.  This issue exists today with namespace pacakges; it's not new 
to virtual packages.


>Besides, putting data files in a Python package is held very poorly by
>some (mostly people following the File Hierarchy Standard),
ISTM that anybody who thinks that is being inconsistent in 
considering the Python code itself to not be a "data file" by that 
same criterion...  especially since one of the more common uses for 
such "data" files are for e.g. HTML templates (which usually contain 
some sort of code) or GUI resources (which are pretty tightly bound 
to the code).

Are those same people similarly concerned when a Firefox extension 
contains image files as well as JavaScript?  And if not, why is 
Python different?

IOW, I think that those people are being confused by our use of the 
term "data" and thus think of it as an entirely different sort of 
"data" than what is meant by "package data" in the Python world.  I 
am not sure what word would unconfuse (defuse?) them, but we simply 
mean "files that are part of the package but are not of a type that 
Python can import by default," not "user-modifiable data" or "data 
that has meaning or usefulness to code other than the code it was 
packaged with."

Perhaps "package-embedded resources" would be a better 
phrase?  Certainly, it implies that they're *supposed* to be embedded 
there.  ;-)


> > * XXX what is the __file__ of a "pure virtual" package?  ``None``?
> >   Some arbitrary string?  The path of the first directory with a
> >   trailing separator?  No matter what we put, *some* code is
> >   going to break, but the last choice might allow some code to
> >   accidentally work.  Is that good or bad?
>A pure virtual package having no source file, I think it should have no
>__file__ at all.  I don't know if that would break more code than using
>an empty string for example, but it feels righter.
>
> > For those implementing PEP \302 importer objects:
>Minor: Here I think a link would not be a nuisance (IOW remove the
>backslash).

Done. 



More information about the Python-Dev mailing list