
2009/7/3 Łukasz Langa <lukasz.langa@stxnext.pl>:
On 2009-06-22 at 14:23, Tarek Ziadé wrote:
Hello,
We have polished out PEP 376 and its code prototype at Distutils-SIG. It seems to fullfill now all the requirements, so I am mailing it here again, for a new round of feedback, if needed.
Hello. I have read the current version of the PEP from trunk and would like to make some comments about it.
The .egg-info as a directory ============================ At our company, after much fiddling about the current distutils and setuptools, we have developed a solution that enables us to serve pseudo-eggs. What I mean by that is that by plugging into the PEP 302 infrastructure, we were able to make non-filesystem-based repositories behave like `.egg` files. There are a couple of use cases for that implementation, the most important ones being: * being able to deploy encrypted and sealed blobs with a binary loader, to discourage fiddling around the source code by its users * being able to serve modules from a database back-end instead of the filesystem
This solution depends on the `EGG-INFO` to be fetched from the `.egg` itself. We wouldn't want to use a static filesystem-based folder because it breaks the encryption use-case by revealing the `SOURCES.txt` information. It also breaks the database back-end usecase by presenting static versioning information whereas the whole point of using a database backed "egg" is to have the newest version installed at all times.
This is a good point. Distutils only installs files in the filesystem - it has no facilities for installing packages based on any other sort of PEP 302 based importers. Hence, PEP 376 in principle should only relate to filesystem-based distributions. But it also mentions zipfile-based distributions: "Notice that the API is organized in five classes that work with directories and Zip files (so it works with files included in Zip files, see PEP 273 for more details [8])." This is wrong. The PEP should either (1) restrict itself to filesystem based implementations (leaving the problem of other PEP 302 loaders to systems that manage these) or (2) be defined in a sufficiently general way that it can be implemented for *any* PEP 302 based loader - which probably means extending the PEP 302 protocols - and supplying zipfile functions as an example of how this is used. I believe that (1) is unlikely to be sufficient for real world use. Zip files (eggs, py2exe embedded modules, etc) are far too important a real world use case to ignore. The problem with (2) is that it requires significant extra work. But special-casing zip files (as the present implementation appears to do) will break as soon as any other PEP 302 compliant format becomes popular.
Moreover, the proposed ``egginfo_dirname()`` routine is a step-back from the ``pkg_resources`` approach where we don't enforce resources to reside on a traditional filesystem.
On the other hand, pkgutil.get_data is the standard library means of reading resources from a package. This is PEP 302 compliant now. This new PEP doesn't affect that.
The uninstall feature ===================== This is a great feature but I don't seem to understand why there seems to be a consensus here that it's good for it not to even warn about dependencies, let alone manage them. ``easy_install`` and co. does manage dependencies while installing, why shouldn't it be symmetrical in that regard? Moreover, wouldn't dependency management be useful while using ``easy_install`` to *upgrade* an installation of a package?
easy_install is not a standard library feature - the fact that it has no uninstall facility is not relevant. python setup.py install is the standard library feature. The new uninstall is intended to correspond to that.
More high-level remarks ======================= This isn't probably the best place to cover this but may serve as a rationale for the above .egg-info problems we have with the proposed changes.
It's great that you push the metadata functionality to core distutils. At the same time, it would be feasible to think over the whole `egg` concept. We would argue that having the `egg` format as a container more-less agnostic to the underlying data storage structure would be **very** useful. Imagine installable `egg` support packages which would enable you to: * use other compression formats beside ZIP * use other means of module storage beside the filesystem * use sealed and encrypted archives for deployment * use programmatic `egg`s for licensing, etc.
PEP 302 gives you (in theory) all of this now. I'd recommend you look at it - and at Brett's importlib, which will make experimenting with such things far easier. What PEP 302 doesn't provide is package management. But Python itself doesn't provide package management, except in the form of distutils setup.py install - which is solely filesystem based. Maybe there's a case for extending PEP 302 and distutils to allow integrated management of other forms of importer format, but that's a huge other project, which no-one to my knowledge is even looking at.
* ``easy_install`` from different protocols, archive formats, etc.
easy_install is not a standard library feature, so is not relevant here.
Decoupling the `egg` format from the concrete "zip-based" or "directory-based" implementations is a step forward in that direction. In that regard, the solution PEP 376 is proposing isn't ultimately solving the issues we're having.
Eggs are fundamentally a PEP 302 zip file format. There are some extra bits of metadata for setuptools/easy_install in there (as I understand things) but essentially they are zip files. When you say "decoupling the egg format", I assume you mean "decoupling the egg metadata" - which is fine, but to properly decouple, you need API level access to the metadata. PEP 376 offers read-only access, but as you rightly point out, it is only for filesystem data (and some form of zip file, which appears to be limited in some way, as it isn't PEP 302 based, and the actual format isn't defined anywhere). The basic point here is that PEP 376 needs to define precisely how pkgutil.get_distributions() scans sys.path looking for ".egg-info directories". What does it do for sys.path entries that don't correspond to filesystem directories? (Note - these may or may not be zip files. Even if they are zip files, an earlier entry on sys.path_hooks could have taken precedence. At the very least, you should only process path entries as zip files if their importer - in sys.path_importer_cache or via an explicit path hook scan - is a zipimporter object.). To be honest, this is a major can of worms. But if PEP 376 is not going to support PEP 302, then it must state that fact explicitly, to avoid giving people false expectations - particularly with Brett's importlib in Python 3.1, which will make it far easier for people to experiment with new packaging formats such as the ones Lukasz mentions above. And it MUST fail gracefully in the face of unsupported importer types. Paul.