[Python-Dev] PEP 402: Simplified Package Layout and Partitioning

Éric Araujo merwok at netwok.org
Thu Aug 11 16:39:34 CEST 2011


Hi,

I’ve read PEP 402 and would like to offer comments.

I know a bit about the import system, but not down to the nitty-gritty
details of PEP 302 and __path__ computations and all this fun stuff (by
which I mean, not fun at all).  As such, I can’t find nasty issues in
dark corners, but I can offer feedback as a user.  I think it’s a very
well-written explanation of a very useful feature: +1 from me.  If it is
accepted, the docs will certainly be much more concise, but the PEP as a
thought process is a useful document to read.

> When new users come to Python from other languages, they are often
> confused by Python's packaging semantics.
Minor: I would reserve “packaging” for
packaging/distribution/installation/deployment matters, not Python
modules.  I suggest “Python package semantics”.

> On the negative side, however, it is non-intuitive for beginners, and
> requires a more complex step to turn a module into a package.  If
> ``Foo`` begins its life as ``Foo.py``, then it must be moved and
> renamed to ``Foo/__init__.py``.
Minor: In the UNIX world, or with version control tools, moving and
renaming are the same one thing (hg mv spam.py spam/__init__.py for
example).  Also, if you turn a module into a package, you may want to
move code around, change imports, etc., so I’m not sure the renaming
part is such a big step.  Anyway, if the import-sig people say that
users think it’s a complex or costly operation, I can believe it.

> (By the way, both of these additions to the import protocol (i.e. the
> dynamically-added ``__path__``, and dynamically-created modules)
> apply recursively to child packages, using the parent package's
> ``__path__`` in place of ``sys.path`` as a basis for generating a
> child ``__path__``.  This means that self-contained and virtual
> packages can contain each other without limitation, with the caveat
> that if you put a virtual package inside a self-contained one, it's
> gonna have a really short ``__path__``!)
I don’t understand the caveat or its implications.

> In other words, we don't allow pure virtual packages to be imported
> directly, only modules and self-contained packages.  (This is an
> acceptable limitation, because there is no *functional* value to
> importing such a package by itself.  After all, the module object
> will have no *contents* until you import at least one of its
> subpackages or submodules!)
> 
> Once ``zc.buildout`` has been successfully imported, though, there
> *will* be a ``zc`` module in ``sys.modules``, and trying to import it
> will of course succeed.  We are only preventing an *initial* import
> from succeeding, in order to prevent false-positive import successes
> when clashing subdirectories are present on ``sys.path``.
I find that limitation acceptable.  After all, there is no zc project,
and no zc module, just a zc namespace.  I’ll just regret that it’s not
possible to provide a module docstring to inform that this is a
namespace package used for X and Y.

> The resulting list (whether empty or not) is then stored in a
> ``sys.virtual_package_paths`` dictionary, keyed by module name.
This was probably said on import-sig, but here I go: yet another import
artifact in the sys module!  I hope we get ImportEngine in 3.3 to clean
up all this.

> * A new ``extend_virtual_paths(path_entry)`` function, to extend
>   existing, already-imported virtual packages' ``__path__`` attributes
>   to include any portions found in a new ``sys.path`` entry.  This
>   function should be called by applications extending ``sys.path``
>   at runtime, e.g. when adding a plugin directory or an egg to the
>   path.
Let’s imagine my application Spam has a namespace spam.ext for plugins.
 To use a custom directory where plugins are stored, or a zip file with
plugins (I don’t use eggs, so let me talk about zip files here), I’d
have to call sys.path.append *and* pkgutil.extend_virtual_paths?

> * ``ImpImporter.iter_modules()`` should be changed to also detect and
>   yield the names of modules found in virtual packages.
Is there any value in providing an argument to get the pre-PEP behavior?
 Or to look at it from a different place, how can Python code know that
some module is a virtual or pure virtual package, if that is even a
useful thing to know?

> Last, but not least, the ``imp`` module (or ``importlib``, if
> appropriate) should expose the algorithm described in the `virtual
> paths`_ section above, as a
> ``get_virtual_path(modulename, parent_path=None)`` function, so that
> creators of ``__import__`` replacements can use it.
If I’m not mistaken, the rule of thumb these days is that imp is edited
when it’s absolutely necessary, otherwise code goes into importlib (more
easily written, read and maintained).

I wonder if importlib.import_module could implement the new import
semantics all by itself, so that we can benefit from this PEP in older
Pythons (importlib is on PyPI).

> * If you are changing a currently self-contained package into a
>   virtual one, it's important to note that you can no longer use its
>   ``__file__`` attribute to locate data files stored in a package
>   directory.  Instead, you must search ``__path__`` or use the
>   ``__file__`` of a submodule adjacent to the desired files, or
>   of a self-contained subpackage that contains the desired files.
Wouldn’t pkgutil.get_data help here?

Besides, putting data files in a Python package is held very poorly by
some (mostly people following the File Hierarchy Standard), and in
distutils2/packaging, we (will) have a resources system that’s as
convenient for users and more flexible for OS packagers.  Using __file__
for more than information on the module is frowned upon for other
reasons anyway (I talked about a Debian developer about this one day but
forgot), so I think the limitation is okay.

> * XXX what is the __file__ of a "pure virtual" package?  ``None``?
>   Some arbitrary string?  The path of the first directory with a
>   trailing separator?  No matter what we put, *some* code is
>   going to break, but the last choice might allow some code to
>   accidentally work.  Is that good or bad?
A pure virtual package having no source file, I think it should have no
__file__ at all.  I don’t know if that would break more code than using
an empty string for example, but it feels righter.

> For those implementing PEP \302 importer objects:
Minor: Here I think a link would not be a nuisance (IOW remove the
backslash).

Regards


More information about the Python-Dev mailing list