[Python-Dev] PEP 376

Paul Moore p.f.moore at gmail.com
Fri Jul 3 15:54:13 CEST 2009


2009/7/3 Łukasz Langa <lukasz.langa at stxnext.pl>:
>
> On 2009-06-22 at 14:23, Tarek Ziadé wrote:
>
>> Hello,
>>
>> We have polished out PEP 376 and its code prototype at Distutils-SIG.
>> It seems to fullfill now all the requirements,
>> so I am mailing it here again, for a new round of feedback, if needed.
>
>
> Hello.
> I have read the current version of the PEP from trunk and would like to make
> some comments about it.
>
> The .egg-info as a directory
> ============================
> At our company, after much fiddling about the current distutils and
> setuptools, we have developed a solution that enables us to serve
> pseudo-eggs. What I mean by that is that by plugging into the PEP 302
> infrastructure, we were able to make non-filesystem-based repositories
> behave like `.egg` files. There are a couple of use cases for that
> implementation, the most important ones being:
> * being able to deploy encrypted and sealed blobs with a binary loader, to
> discourage fiddling around the source code by its users
> * being able to serve modules from a database back-end instead of the
> filesystem
>
> This solution depends on the `EGG-INFO` to be fetched from the `.egg`
> itself. We wouldn't want to use a static filesystem-based folder because it
> breaks the encryption use-case by revealing the `SOURCES.txt` information.
> It also breaks the database back-end usecase by presenting static versioning
> information whereas the whole point of using a database backed "egg" is to
> have the newest version installed at all times.

This is a good point. Distutils only installs files in the filesystem
- it has no facilities for installing packages based on any other sort
of PEP 302 based importers. Hence, PEP 376 in principle should only
relate to filesystem-based distributions. But it also mentions
zipfile-based distributions: "Notice that the API is organized in five
classes that work with directories and Zip files (so it works with
files included in Zip files, see PEP 273 for more details [8])."

This is wrong. The PEP should either (1) restrict itself to filesystem
based implementations (leaving the problem of other PEP 302 loaders to
systems that manage these) or (2) be defined in a sufficiently general
way that it can be implemented for *any* PEP 302 based loader - which
probably means extending the PEP 302 protocols - and supplying zipfile
functions as an example of how this is used.

I believe that (1) is unlikely to be sufficient for real world use.
Zip files (eggs, py2exe embedded modules, etc) are far too important a
real world use case to ignore. The problem with (2) is that it
requires significant extra work. But special-casing zip files (as the
present implementation appears to do) will break as soon as any other
PEP 302 compliant format becomes popular.

> Moreover, the proposed ``egginfo_dirname()`` routine is a step-back from the
> ``pkg_resources`` approach where we don't enforce resources to reside on a
> traditional filesystem.

On the other hand, pkgutil.get_data is the standard library means of
reading resources from a package. This is PEP 302 compliant now. This
new PEP doesn't affect that.

> The uninstall feature
> =====================
> This is a great feature but I don't seem to understand why there seems to be
> a consensus here that it's good for it not to even warn about dependencies,
> let alone manage them. ``easy_install`` and co. does manage dependencies
> while installing, why shouldn't it be symmetrical in that regard? Moreover,
> wouldn't dependency management be useful while using ``easy_install`` to
> *upgrade* an installation of a package?

easy_install is not a standard library feature - the fact that it has
no uninstall facility is not relevant.

python setup.py install is the standard library feature. The new
uninstall is intended to correspond to that.

> More high-level remarks
> =======================
> This isn't probably the best place to cover this but may serve as a
> rationale for the above .egg-info problems we have with the proposed
> changes.
>
> It's great that you push the metadata functionality to core distutils. At
> the same time, it would be feasible to think over the whole `egg` concept.
> We would argue that having the `egg` format as a container more-less
> agnostic to the underlying data storage structure would be **very** useful.
> Imagine installable `egg` support packages which would enable you to:
> * use other compression formats beside ZIP
> * use other means of module storage beside the filesystem
> * use sealed and encrypted archives for deployment
> * use programmatic `egg`s for licensing, etc.

PEP 302 gives you (in theory) all of this now. I'd recommend you look
at it - and at Brett's importlib, which will make experimenting with
such things far easier.

What PEP 302 doesn't provide is package management. But Python itself
doesn't provide package management, except in the form of distutils
setup.py install - which is solely filesystem based.

Maybe there's a case for extending PEP 302 and distutils to allow
integrated management of other forms of importer format, but that's a
huge other project, which no-one to my knowledge is even looking at.

> * ``easy_install`` from different protocols, archive formats, etc.

easy_install is not a standard library feature, so is not relevant here.

> Decoupling the `egg` format from the concrete "zip-based" or
> "directory-based" implementations is a step forward in that direction. In
> that regard, the solution PEP 376 is proposing isn't ultimately solving the
> issues we're having.

Eggs are fundamentally a PEP 302 zip file format. There are some extra
bits of metadata for setuptools/easy_install in there (as I understand
things) but essentially they are zip files. When you say "decoupling
the egg format", I assume you mean "decoupling the egg metadata" -
which is fine, but to properly decouple, you need API level access to
the metadata. PEP 376 offers read-only access, but as you rightly
point out, it is only for filesystem data (and some form of zip file,
which appears to be limited in some way, as it isn't PEP 302 based,
and the actual format isn't defined anywhere).

The basic point here is that PEP 376 needs to define precisely how
pkgutil.get_distributions() scans sys.path looking for ".egg-info
directories". What does it do for sys.path entries that don't
correspond to filesystem directories? (Note - these may or may not be
zip files. Even if they are zip files, an earlier entry on
sys.path_hooks could have taken precedence. At the very least, you
should only process path entries as zip files if their importer - in
sys.path_importer_cache or via an explicit path hook scan - is a
zipimporter object.).

To be honest, this is a major can of worms. But if PEP 376 is not
going to support PEP 302, then it must state that fact explicitly, to
avoid giving people false expectations - particularly with Brett's
importlib in Python 3.1, which will make it far easier for people to
experiment with new packaging formats such as the ones Lukasz mentions
above. And it MUST fail gracefully in the face of unsupported importer
types.

Paul.


More information about the Python-Dev mailing list