![](https://secure.gravatar.com/avatar/5e5142d6a1a578f02e2d94c4d6d31088.jpg?s=120&d=mm&r=g)
Hello, We have polished out PEP 376 and its code prototype at Distutils-SIG. It seems to fullfill now all the requirements, so I am mailing it here again, for a new round of feedback, if needed. - the pep : http://svn.python.org/projects/peps/trunk/pep-0376.txt - the code prototype : http://bitbucket.org/tarek/pep376/src/tip/pkgutil.py Notice that if the PEP is accepted at this point, I will : - focus on making the code work as fast as possible, for directories browsing - work on the backport and the required patches for setuptools and pip at the same time, and see if I can get some beta-testers that are willing to switch to this new version to test it extensively before 2.7/3.2 are out. Regards Tarek -- Tarek Ziadé | http://ziade.org
![](https://secure.gravatar.com/avatar/db5f70d2f2520ef725839f046bdc32fb.jpg?s=120&d=mm&r=g)
Hello, Tarek Ziadé <ziade.tarek <at> gmail.com> writes:
so I am mailing it here again, for a new round of feedback, if needed.
- the pep : http://svn.python.org/projects/peps/trunk/pep-0376.txt
Some comments: - the **MD5** hash of the file, encoded in hex. Notice that `pyc` and `pyo` generated files will not have a hash. Why the exception for pyc and pyo files? - `zlib` and `zlib-2.5.2.egg-info` are located in `site-packages` so the file paths are relative to it. Is it a general rule? That is, the paths in RECORD are always relative to its grandparent directory? The RECORD format ----------------- The `RECORD` file is a CSV file, composed of records, one line per installed file. The ``csv`` module is used to read the file, with the `excel` dialect, which uses these options to read the file: - field delimiter : `,` - quoting char : `"`. - line terminator : `\r\n` Wouldn't it be better to use the native line terminator on the current platform? (someone might want to edit or at least view the file) What is the character encoding for non-ASCII filenames? UTF-8? Are the RECORD file's contents somehow part of the DistributionMetadata? - ``DistributionDirectories``: manages ``EggInfoDirectory`` instances. What is an EggInfoDirectory ? A plural class name looks strange (I think it's the first time I see one in the CPython codebase). How about another name? (DistributionPool, DistributionMap, WorkingSet etc.). - ``get_egginfo_file(path, binary=False)`` -> file object Returns a file located under the `.egg-info` directory. Returns a ``file`` instance for the file pointed by ``path``. Is it always a file instance or just a file-like object? (zipped distributions come to mind). Is it opened read-only? - ``owner(path)`` -> ``Distribution`` instance or None If ``path`` is used by only one ``Distribution`` instance, returns it. Otherwise returns None. This is a bit confusing. If it returns None, it doesn't distinguish between the case where several Distributions refer to the path, and the case where no distributions refer to it, does it? Is there any reason to have this method while file_users(path) already exists? A new class called ``DistributionDirectories`` is created. It's a collection of ``DistributionDirectory`` and ``ZippedDistributionDirectory`` instances. The constructor takes one optional argument ``use_cache`` set to ``True`` by default. You forgot to describe the constructor's signature and what it does exactly. ``EggInfoDirectories`` also provides the following methods besides the ones from ``dict``:: What is EggInfoDirectories? - ``append(path)`` Creates an ``DistributionDirectory`` (or ``ZippedDistributionDirectory``) instance for ``path`` and adds it in the mapping. - ``load(paths)`` Creates and adds ``DistributionDirectory`` (or ``ZippedDistributionDirectory``) instances corresponding to ``paths``. Why are these methods named completely differently although they do almost the same thing? Besides, append() makes it look like ordering matters. Does it? (for a naming suggestion, e.g. load(path) and load_all(paths). Or, even simpler, load(*paths) or load(paths)) - ``get_file_users(path)`` -> Iterator of ``Distribution`` (or ``ZippedDistribution``) instances. This method is named file_users in another class. Perhaps the naming should be consistent? All these functions use the same global instance of ``DistributionDirectories`` to use the cache. Is the global instance available to users? >>> for path, hash, size in dist.get_installed_files():: ... print '%s %s %d %s' % (path, hash, size) There's one too many "%s" here. Thanks for your work! Antoine.
![](https://secure.gravatar.com/avatar/5e5142d6a1a578f02e2d94c4d6d31088.jpg?s=120&d=mm&r=g)
On Mon, Jun 22, 2009 at 4:59 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
As in PEP 262, since they are produced automatically from a py file, checking the py file is enough to decide if the file has changed.
no, they can be located anywhere on the system. But when the paths are located in the same directory where the .egg-info directory is located, a relative path is used. (see the section before this example) I'll add an example that contains files located elswhere in the system (like a script and a data file)
Good idea, I'll change that,
What is the character encoding for non-ASCII filenames? UTF-8?
Yes, there's a constant in Distutils, called PKG_INFO_ENCODING that will be used for the file generation.
Are the RECORD file's contents somehow part of the DistributionMetadata?
The DistributionMetadata correspond to the fields defined in PEP 345, e.g. written in the PKG-INFO file, which is mentioned in the RECORD file. We are reworking PEP 345 as well, to add some fields. What did you have in mind ?
- ``DistributionDirectories``: manages ``EggInfoDirectory`` instances.
What is an EggInfoDirectory ?
Typo (old name), fixing this..
Sure, WorkingSet is nice, it's the name used in setuptools,
It's in read-only mode, either "r" either "rb" and in case of a zip file, it returns a file-like object using zipfile.ZipFile.open.
The idea of this API is to find out of a distribution "owns" a file, e.g. is the only distribution that uses it, so it can be safely removed.
Is there any reason to have this method while file_users(path) already exists?
Its just a helper for uninstallers, but file_users() could probably be enough, I can remove "owns" if people find it confusing,
I'll add that,
This is a DistributionDirectories, one more instance I forgot to rename, I'll fix that
Right, I'll fix that,
Right, it used to be get_*, that's a typo. I'll fix it,
No I didn't made it available to avoid concurrency problems,
Fixing it too,
Thanks for your work!
Thanks for the feedback, I'll commit the fixes asap. Tarek
![](https://secure.gravatar.com/avatar/35aa6fee222660ce1382d45a7a9a92fd.jpg?s=120&d=mm&r=g)
A WorkingSet and a DistributionDirectories (or whatever it gets named to) are different things though, no? A WorkingSet is "a collection of active distributions", where each distribution might come from different distribution directories: http://peak.telecommunity.com/DevCenter/PkgResources#workingset-objects Where as DistributionDirectories is a dictionary of locations where distributions are installed. The WorkingSet may be comprised of distributions from several different locations, and each location may contain the same or different versions of the same distribution. (as far as I understand things ...) I can't really think of a better name for a dict of distribution locations ... but then I'm not averse to a pluralized class name. Overall though, I think PEP 376 is starting to look very good!
![](https://secure.gravatar.com/avatar/5e5142d6a1a578f02e2d94c4d6d31088.jpg?s=120&d=mm&r=g)
On Tue, Jun 23, 2009 at 3:41 AM, Kevin Teague<kevin@bud.ca> wrote:
DistributionDirectories can contain directories that are not located in the same parent directory, so I find it rather similar besides the "active" feature in Python doesn't exist (yet) In any case, maybe picking up a name that is not from setuptools will be less confusing for people that uses WorkingSet classes nowadays. What about using the same names used in Python's site module: "sitedir" is the name used for a directory we named DistributionDirectory. So what about : DistributionDirectory -> SiteDir DistributionDirectories -> SiteDirMap ++ Tarek
![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
Tarek Ziadé wrote:
'site' has too many connections to existing concepts for my liking (site.py, sitesetup.py, site-packages). Something like DistributionDirectoryMap should cover it. You could probably get away with shortening "Directory" to "Dir" in the class names though: - Distribution - ZippedDistribution - DistributionDir - ZippedDistributionDir - DistributionDirMap (Shortening Distribution to Dist might also be a possibility, but I don't think that works well for the two basic classes, and if those use the long form then shortening it for the *Dir and *DirMap classes would just look odd) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
![](https://secure.gravatar.com/avatar/2ee41cc7bcaacf6fcdcb7a2269e97b86.jpg?s=120&d=mm&r=g)
On 2009-06-22 at 14:23, Tarek Ziadé wrote:
Hello. I have read the current version of the PEP from trunk and would like to make some comments about it. The .egg-info as a directory ============================ At our company, after much fiddling about the current distutils and setuptools, we have developed a solution that enables us to serve pseudo-eggs. What I mean by that is that by plugging into the PEP 302 infrastructure, we were able to make non-filesystem-based repositories behave like `.egg` files. There are a couple of use cases for that implementation, the most important ones being: * being able to deploy encrypted and sealed blobs with a binary loader, to discourage fiddling around the source code by its users * being able to serve modules from a database back-end instead of the filesystem This solution depends on the `EGG-INFO` to be fetched from the `.egg` itself. We wouldn't want to use a static filesystem-based folder because it breaks the encryption use-case by revealing the `SOURCES.txt` information. It also breaks the database back-end usecase by presenting static versioning information whereas the whole point of using a database backed "egg" is to have the newest version installed at all times. Moreover, the proposed ``egginfo_dirname()`` routine is a step-back from the ``pkg_resources`` approach where we don't enforce resources to reside on a traditional filesystem. The uninstall feature ===================== This is a great feature but I don't seem to understand why there seems to be a consensus here that it's good for it not to even warn about dependencies, let alone manage them. ``easy_install`` and co. does manage dependencies while installing, why shouldn't it be symmetrical in that regard? Moreover, wouldn't dependency management be useful while using ``easy_install`` to *upgrade* an installation of a package? The second issue is that while having ``python -m distutils.uninstall`` is the preferred way to go, I would argue that enabling an ``uninstall`` entry-point in ``setup.py`` is desirable as well, if only for retaining an intuitive symmetrical system where I can do ``python setup.py install`` and ``python setup.py uninstall`` as well. I'm not forced to download the sources just to uninstall the package (``distutils.uninstall`` covers this use case) but if I do have the source code available, it would feel very natural to use the ``setup.py`` provided. There already is the precedent of ``setup.py develop --uninstall``. Having a ``setup.py uninstall`` could handle this use case as well. More high-level remarks ======================= This isn't probably the best place to cover this but may serve as a rationale for the above .egg-info problems we have with the proposed changes. It's great that you push the metadata functionality to core distutils. At the same time, it would be feasible to think over the whole `egg` concept. We would argue that having the `egg` format as a container more-less agnostic to the underlying data storage structure would be **very** useful. Imagine installable `egg` support packages which would enable you to: * use other compression formats beside ZIP * use other means of module storage beside the filesystem * use sealed and encrypted archives for deployment * use programmatic `egg`s for licensing, etc. * ``easy_install`` from different protocols, archive formats, etc. Decoupling the `egg` format from the concrete "zip-based" or "directory-based" implementations is a step forward in that direction. In that regard, the solution PEP 376 is proposing isn't ultimately solving the issues we're having. Thanks for your time. -- Best regards, Łukasz Langa Senior Developer STX-Next Sp. z o.o. tel: +48 791 080 144 skype: lukaszlanga
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
2009/7/3 Łukasz Langa <lukasz.langa@stxnext.pl>:
This is a good point. Distutils only installs files in the filesystem - it has no facilities for installing packages based on any other sort of PEP 302 based importers. Hence, PEP 376 in principle should only relate to filesystem-based distributions. But it also mentions zipfile-based distributions: "Notice that the API is organized in five classes that work with directories and Zip files (so it works with files included in Zip files, see PEP 273 for more details [8])." This is wrong. The PEP should either (1) restrict itself to filesystem based implementations (leaving the problem of other PEP 302 loaders to systems that manage these) or (2) be defined in a sufficiently general way that it can be implemented for *any* PEP 302 based loader - which probably means extending the PEP 302 protocols - and supplying zipfile functions as an example of how this is used. I believe that (1) is unlikely to be sufficient for real world use. Zip files (eggs, py2exe embedded modules, etc) are far too important a real world use case to ignore. The problem with (2) is that it requires significant extra work. But special-casing zip files (as the present implementation appears to do) will break as soon as any other PEP 302 compliant format becomes popular.
On the other hand, pkgutil.get_data is the standard library means of reading resources from a package. This is PEP 302 compliant now. This new PEP doesn't affect that.
easy_install is not a standard library feature - the fact that it has no uninstall facility is not relevant. python setup.py install is the standard library feature. The new uninstall is intended to correspond to that.
PEP 302 gives you (in theory) all of this now. I'd recommend you look at it - and at Brett's importlib, which will make experimenting with such things far easier. What PEP 302 doesn't provide is package management. But Python itself doesn't provide package management, except in the form of distutils setup.py install - which is solely filesystem based. Maybe there's a case for extending PEP 302 and distutils to allow integrated management of other forms of importer format, but that's a huge other project, which no-one to my knowledge is even looking at.
* ``easy_install`` from different protocols, archive formats, etc.
easy_install is not a standard library feature, so is not relevant here.
Eggs are fundamentally a PEP 302 zip file format. There are some extra bits of metadata for setuptools/easy_install in there (as I understand things) but essentially they are zip files. When you say "decoupling the egg format", I assume you mean "decoupling the egg metadata" - which is fine, but to properly decouple, you need API level access to the metadata. PEP 376 offers read-only access, but as you rightly point out, it is only for filesystem data (and some form of zip file, which appears to be limited in some way, as it isn't PEP 302 based, and the actual format isn't defined anywhere). The basic point here is that PEP 376 needs to define precisely how pkgutil.get_distributions() scans sys.path looking for ".egg-info directories". What does it do for sys.path entries that don't correspond to filesystem directories? (Note - these may or may not be zip files. Even if they are zip files, an earlier entry on sys.path_hooks could have taken precedence. At the very least, you should only process path entries as zip files if their importer - in sys.path_importer_cache or via an explicit path hook scan - is a zipimporter object.). To be honest, this is a major can of worms. But if PEP 376 is not going to support PEP 302, then it must state that fact explicitly, to avoid giving people false expectations - particularly with Brett's importlib in Python 3.1, which will make it far easier for people to experiment with new packaging formats such as the ones Lukasz mentions above. And it MUST fail gracefully in the face of unsupported importer types. Paul.
![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
Paul Moore wrote:
importlib is in 2.7 as well. I agree that even if the reference implementation of PEP 376 only handles normal directories and zipfiles, the PEP itself needs to explain how the developer of a custom PEP 302 importer or loader can hook into the mechanism in order to provide metadata that distutils can understand. Or, as you suggest, state explicitly that the proposal at this stage is specifically limited to filesystem and zipfile packages and defer extension to arbitrary PEP 302 importers and loaders to a later PEP. To be honest, I'd personally be OK with the latter strategy - while other PEP 302 loaders and importers do exist (as Lukasz'z post shows), it would be unfortunate to unduly hold up improved installation metadata for the vast majority of typical use cases while we try to figure out ways to support the more esoteric use cases. Supporting both would obviously be better, but given the choice between the status quo and partial support, I believe this is a case where the partial support would still be worthwhile. I will note (and I believe this is also the main point that Lukasz was making) that having the distribution metadata outside the distribution as currently proposed in PEP 376 is going to make any eventual PEP 302 integration much harder - 302 importers only provide a mechanism for accessing files inside the distribution, not "adjacent" to them, so the mechanism in the PEP doesn't generalise properly. I suspect this limitation of the PEP 302 APIs is the origin of the setuptools format that embeds the metadata inside the distribution - it lets you get at the metadata without having to assume that it exists directly on the filesystem anywhere. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
![](https://secure.gravatar.com/avatar/5e5142d6a1a578f02e2d94c4d6d31088.jpg?s=120&d=mm&r=g)
On Fri, Jul 3, 2009 at 4:28 PM, Nick Coghlan<ncoghlan@gmail.com> wrote:
Maybe we could rework the pkgutil classes for PEP 376 so they look like an implementation of the PEP 302 protocol for directories and zip files, with an extra "get_metadata()" API and state that it could be an extension for PEP 302 later.
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
2009/7/3 Tarek Ziadé <ziade.tarek@gmail.com>:
That sounds like an excellent approach And given that PEP 302 is nothing more than an API spec, it's nice and easy to extend PEP 302 to say that importers can/should implement this (after all, the only 2 "real" cases in the wild are done, via PEP 376). Warning: I've not thought all this through fully, so I may be missing some subtleties! The ultimate spec needs to be clearly worded, and as a consequence of this discussion it occurs to me that the current PEP 376 is very far from clear in defining exactly what the naming, layout and location of egg-info directories is - it makes a lot of assumptions that may break for more general path importers I'll do some thinking, and maybe come up with some examples of PEP 302 edge cases, so that we can at least be sure that they've not been ignored (rejected as to stupid to care about, on the other hand, I'm happy with :-)) Paul.
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
2009/7/3 Nick Coghlan <ncoghlan@gmail.com>:
You put it more clearly than I did. That's basically what I think, with the one proviso that we should make sure that PEP 376 doesn't specify something that out and out breaks the more esoteric PEP 302 cases. When Just and I were developing PEP 302, we found that the best way to do that was to leave anything that didn't need to be specified, as unspecified (hence the fact that there's so little defined in PEP 302!). It's easier to add things later than to remove or change them. That's why I was recommending to Tarek that he take out of the PEP any details about classes or APIs that couldn't be directly obtained from the core API. The same applies here (just talking in terms of a duck-typed notional Distribution class, allows people to implement their own URLDistribution or SqliteDistribution, or whatever, and not have to emulate any more of a filesystem than necessary).
Again, agreed. But remember that PEP 302 is driven by looking up a module or package name. PEP 376 is looking up a *distribution* name. The docutils example in the PEP shows this: - docutils/ - roman.py - docutils-0.5-py2.6.egg-info/ There are 3 things here: a package (docutils), a module (roman) and a distribution (also named docutils, but could be called George for all it matters). So none of the PEP 302 lookup mechanisms (which work on package/module name) apply. The need for a separate concept (a "distribution") is unfortunate, as it adds complexity, but there are enough real life cases that make it clear that it's necessary. Hmm, I suspect that the implication here is that PEP 302 could do with an overhaul, to extend it to encompass the concept of a distribution. I'd be willing to have a look at that. Paul.
![](https://secure.gravatar.com/avatar/eaa875d37f5e9ca7d663f1372efa1317.jpg?s=120&d=mm&r=g)
At 12:28 AM 7/4/2009 +1000, Nick Coghlan wrote:
I think you have this backwards; it's setuptools that doesn't care where (or whether) the metadata exists on the file system; it delegates metadata operations to a "metadata provider" that's usually an adapter over a PEP 302 "loader". See http://peak.telecommunity.com/DevCenter/PkgResources#supporting-custom-impor... for the API details of how to register support for arbitrary PEP 302 importers and loaders. (Which presumably, Lukasz is using. I didn't know that anybody was actually using it, but it's nice to know that the documentation is apparently sufficient for *some* people. ;-) )
![](https://secure.gravatar.com/avatar/5e5142d6a1a578f02e2d94c4d6d31088.jpg?s=120&d=mm&r=g)
2009/7/3 Paul Moore <p.f.moore@gmail.com>:
Right. While it would be feasible to make pgutil works with PEP 302 loaders, we would still need to define in a generic way the content of the RECORD files. Right now it works for directory and zipped files since it's expressed with '/' separated paths. And if I understand PEP 302 right, any backend would be able to handle these paths no matter how they are stored, as long as the implement APIs like get_data()
Sounds like a fully-featured packaging managment system, which is imho, out of scope. And I don't see PEP 376 making it impossible for someone to build such a packaging system on the top of distutils. I've started one myself for the sake of experimentation, with built-in multiversion support, for a full replacement of site-packages.
And also PEP 376 goal is to define a single standard location of egg-info files for filesystem data. The zip form was built so it could work with zipped site-packages directories, like what the py2app project uses.
I'll add more details on that part. right now it visits directories and zip files. Tarek
![](https://secure.gravatar.com/avatar/db5f70d2f2520ef725839f046bdc32fb.jpg?s=120&d=mm&r=g)
Hello, Tarek Ziadé <ziade.tarek <at> gmail.com> writes:
so I am mailing it here again, for a new round of feedback, if needed.
- the pep : http://svn.python.org/projects/peps/trunk/pep-0376.txt
Some comments: - the **MD5** hash of the file, encoded in hex. Notice that `pyc` and `pyo` generated files will not have a hash. Why the exception for pyc and pyo files? - `zlib` and `zlib-2.5.2.egg-info` are located in `site-packages` so the file paths are relative to it. Is it a general rule? That is, the paths in RECORD are always relative to its grandparent directory? The RECORD format ----------------- The `RECORD` file is a CSV file, composed of records, one line per installed file. The ``csv`` module is used to read the file, with the `excel` dialect, which uses these options to read the file: - field delimiter : `,` - quoting char : `"`. - line terminator : `\r\n` Wouldn't it be better to use the native line terminator on the current platform? (someone might want to edit or at least view the file) What is the character encoding for non-ASCII filenames? UTF-8? Are the RECORD file's contents somehow part of the DistributionMetadata? - ``DistributionDirectories``: manages ``EggInfoDirectory`` instances. What is an EggInfoDirectory ? A plural class name looks strange (I think it's the first time I see one in the CPython codebase). How about another name? (DistributionPool, DistributionMap, WorkingSet etc.). - ``get_egginfo_file(path, binary=False)`` -> file object Returns a file located under the `.egg-info` directory. Returns a ``file`` instance for the file pointed by ``path``. Is it always a file instance or just a file-like object? (zipped distributions come to mind). Is it opened read-only? - ``owner(path)`` -> ``Distribution`` instance or None If ``path`` is used by only one ``Distribution`` instance, returns it. Otherwise returns None. This is a bit confusing. If it returns None, it doesn't distinguish between the case where several Distributions refer to the path, and the case where no distributions refer to it, does it? Is there any reason to have this method while file_users(path) already exists? A new class called ``DistributionDirectories`` is created. It's a collection of ``DistributionDirectory`` and ``ZippedDistributionDirectory`` instances. The constructor takes one optional argument ``use_cache`` set to ``True`` by default. You forgot to describe the constructor's signature and what it does exactly. ``EggInfoDirectories`` also provides the following methods besides the ones from ``dict``:: What is EggInfoDirectories? - ``append(path)`` Creates an ``DistributionDirectory`` (or ``ZippedDistributionDirectory``) instance for ``path`` and adds it in the mapping. - ``load(paths)`` Creates and adds ``DistributionDirectory`` (or ``ZippedDistributionDirectory``) instances corresponding to ``paths``. Why are these methods named completely differently although they do almost the same thing? Besides, append() makes it look like ordering matters. Does it? (for a naming suggestion, e.g. load(path) and load_all(paths). Or, even simpler, load(*paths) or load(paths)) - ``get_file_users(path)`` -> Iterator of ``Distribution`` (or ``ZippedDistribution``) instances. This method is named file_users in another class. Perhaps the naming should be consistent? All these functions use the same global instance of ``DistributionDirectories`` to use the cache. Is the global instance available to users? >>> for path, hash, size in dist.get_installed_files():: ... print '%s %s %d %s' % (path, hash, size) There's one too many "%s" here. Thanks for your work! Antoine.
![](https://secure.gravatar.com/avatar/5e5142d6a1a578f02e2d94c4d6d31088.jpg?s=120&d=mm&r=g)
On Mon, Jun 22, 2009 at 4:59 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
As in PEP 262, since they are produced automatically from a py file, checking the py file is enough to decide if the file has changed.
no, they can be located anywhere on the system. But when the paths are located in the same directory where the .egg-info directory is located, a relative path is used. (see the section before this example) I'll add an example that contains files located elswhere in the system (like a script and a data file)
Good idea, I'll change that,
What is the character encoding for non-ASCII filenames? UTF-8?
Yes, there's a constant in Distutils, called PKG_INFO_ENCODING that will be used for the file generation.
Are the RECORD file's contents somehow part of the DistributionMetadata?
The DistributionMetadata correspond to the fields defined in PEP 345, e.g. written in the PKG-INFO file, which is mentioned in the RECORD file. We are reworking PEP 345 as well, to add some fields. What did you have in mind ?
- ``DistributionDirectories``: manages ``EggInfoDirectory`` instances.
What is an EggInfoDirectory ?
Typo (old name), fixing this..
Sure, WorkingSet is nice, it's the name used in setuptools,
It's in read-only mode, either "r" either "rb" and in case of a zip file, it returns a file-like object using zipfile.ZipFile.open.
The idea of this API is to find out of a distribution "owns" a file, e.g. is the only distribution that uses it, so it can be safely removed.
Is there any reason to have this method while file_users(path) already exists?
Its just a helper for uninstallers, but file_users() could probably be enough, I can remove "owns" if people find it confusing,
I'll add that,
This is a DistributionDirectories, one more instance I forgot to rename, I'll fix that
Right, I'll fix that,
Right, it used to be get_*, that's a typo. I'll fix it,
No I didn't made it available to avoid concurrency problems,
Fixing it too,
Thanks for your work!
Thanks for the feedback, I'll commit the fixes asap. Tarek
![](https://secure.gravatar.com/avatar/35aa6fee222660ce1382d45a7a9a92fd.jpg?s=120&d=mm&r=g)
A WorkingSet and a DistributionDirectories (or whatever it gets named to) are different things though, no? A WorkingSet is "a collection of active distributions", where each distribution might come from different distribution directories: http://peak.telecommunity.com/DevCenter/PkgResources#workingset-objects Where as DistributionDirectories is a dictionary of locations where distributions are installed. The WorkingSet may be comprised of distributions from several different locations, and each location may contain the same or different versions of the same distribution. (as far as I understand things ...) I can't really think of a better name for a dict of distribution locations ... but then I'm not averse to a pluralized class name. Overall though, I think PEP 376 is starting to look very good!
![](https://secure.gravatar.com/avatar/5e5142d6a1a578f02e2d94c4d6d31088.jpg?s=120&d=mm&r=g)
On Tue, Jun 23, 2009 at 3:41 AM, Kevin Teague<kevin@bud.ca> wrote:
DistributionDirectories can contain directories that are not located in the same parent directory, so I find it rather similar besides the "active" feature in Python doesn't exist (yet) In any case, maybe picking up a name that is not from setuptools will be less confusing for people that uses WorkingSet classes nowadays. What about using the same names used in Python's site module: "sitedir" is the name used for a directory we named DistributionDirectory. So what about : DistributionDirectory -> SiteDir DistributionDirectories -> SiteDirMap ++ Tarek
![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
Tarek Ziadé wrote:
'site' has too many connections to existing concepts for my liking (site.py, sitesetup.py, site-packages). Something like DistributionDirectoryMap should cover it. You could probably get away with shortening "Directory" to "Dir" in the class names though: - Distribution - ZippedDistribution - DistributionDir - ZippedDistributionDir - DistributionDirMap (Shortening Distribution to Dist might also be a possibility, but I don't think that works well for the two basic classes, and if those use the long form then shortening it for the *Dir and *DirMap classes would just look odd) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
![](https://secure.gravatar.com/avatar/2ee41cc7bcaacf6fcdcb7a2269e97b86.jpg?s=120&d=mm&r=g)
On 2009-06-22 at 14:23, Tarek Ziadé wrote:
Hello. I have read the current version of the PEP from trunk and would like to make some comments about it. The .egg-info as a directory ============================ At our company, after much fiddling about the current distutils and setuptools, we have developed a solution that enables us to serve pseudo-eggs. What I mean by that is that by plugging into the PEP 302 infrastructure, we were able to make non-filesystem-based repositories behave like `.egg` files. There are a couple of use cases for that implementation, the most important ones being: * being able to deploy encrypted and sealed blobs with a binary loader, to discourage fiddling around the source code by its users * being able to serve modules from a database back-end instead of the filesystem This solution depends on the `EGG-INFO` to be fetched from the `.egg` itself. We wouldn't want to use a static filesystem-based folder because it breaks the encryption use-case by revealing the `SOURCES.txt` information. It also breaks the database back-end usecase by presenting static versioning information whereas the whole point of using a database backed "egg" is to have the newest version installed at all times. Moreover, the proposed ``egginfo_dirname()`` routine is a step-back from the ``pkg_resources`` approach where we don't enforce resources to reside on a traditional filesystem. The uninstall feature ===================== This is a great feature but I don't seem to understand why there seems to be a consensus here that it's good for it not to even warn about dependencies, let alone manage them. ``easy_install`` and co. does manage dependencies while installing, why shouldn't it be symmetrical in that regard? Moreover, wouldn't dependency management be useful while using ``easy_install`` to *upgrade* an installation of a package? The second issue is that while having ``python -m distutils.uninstall`` is the preferred way to go, I would argue that enabling an ``uninstall`` entry-point in ``setup.py`` is desirable as well, if only for retaining an intuitive symmetrical system where I can do ``python setup.py install`` and ``python setup.py uninstall`` as well. I'm not forced to download the sources just to uninstall the package (``distutils.uninstall`` covers this use case) but if I do have the source code available, it would feel very natural to use the ``setup.py`` provided. There already is the precedent of ``setup.py develop --uninstall``. Having a ``setup.py uninstall`` could handle this use case as well. More high-level remarks ======================= This isn't probably the best place to cover this but may serve as a rationale for the above .egg-info problems we have with the proposed changes. It's great that you push the metadata functionality to core distutils. At the same time, it would be feasible to think over the whole `egg` concept. We would argue that having the `egg` format as a container more-less agnostic to the underlying data storage structure would be **very** useful. Imagine installable `egg` support packages which would enable you to: * use other compression formats beside ZIP * use other means of module storage beside the filesystem * use sealed and encrypted archives for deployment * use programmatic `egg`s for licensing, etc. * ``easy_install`` from different protocols, archive formats, etc. Decoupling the `egg` format from the concrete "zip-based" or "directory-based" implementations is a step forward in that direction. In that regard, the solution PEP 376 is proposing isn't ultimately solving the issues we're having. Thanks for your time. -- Best regards, Łukasz Langa Senior Developer STX-Next Sp. z o.o. tel: +48 791 080 144 skype: lukaszlanga
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
2009/7/3 Łukasz Langa <lukasz.langa@stxnext.pl>:
This is a good point. Distutils only installs files in the filesystem - it has no facilities for installing packages based on any other sort of PEP 302 based importers. Hence, PEP 376 in principle should only relate to filesystem-based distributions. But it also mentions zipfile-based distributions: "Notice that the API is organized in five classes that work with directories and Zip files (so it works with files included in Zip files, see PEP 273 for more details [8])." This is wrong. The PEP should either (1) restrict itself to filesystem based implementations (leaving the problem of other PEP 302 loaders to systems that manage these) or (2) be defined in a sufficiently general way that it can be implemented for *any* PEP 302 based loader - which probably means extending the PEP 302 protocols - and supplying zipfile functions as an example of how this is used. I believe that (1) is unlikely to be sufficient for real world use. Zip files (eggs, py2exe embedded modules, etc) are far too important a real world use case to ignore. The problem with (2) is that it requires significant extra work. But special-casing zip files (as the present implementation appears to do) will break as soon as any other PEP 302 compliant format becomes popular.
On the other hand, pkgutil.get_data is the standard library means of reading resources from a package. This is PEP 302 compliant now. This new PEP doesn't affect that.
easy_install is not a standard library feature - the fact that it has no uninstall facility is not relevant. python setup.py install is the standard library feature. The new uninstall is intended to correspond to that.
PEP 302 gives you (in theory) all of this now. I'd recommend you look at it - and at Brett's importlib, which will make experimenting with such things far easier. What PEP 302 doesn't provide is package management. But Python itself doesn't provide package management, except in the form of distutils setup.py install - which is solely filesystem based. Maybe there's a case for extending PEP 302 and distutils to allow integrated management of other forms of importer format, but that's a huge other project, which no-one to my knowledge is even looking at.
* ``easy_install`` from different protocols, archive formats, etc.
easy_install is not a standard library feature, so is not relevant here.
Eggs are fundamentally a PEP 302 zip file format. There are some extra bits of metadata for setuptools/easy_install in there (as I understand things) but essentially they are zip files. When you say "decoupling the egg format", I assume you mean "decoupling the egg metadata" - which is fine, but to properly decouple, you need API level access to the metadata. PEP 376 offers read-only access, but as you rightly point out, it is only for filesystem data (and some form of zip file, which appears to be limited in some way, as it isn't PEP 302 based, and the actual format isn't defined anywhere). The basic point here is that PEP 376 needs to define precisely how pkgutil.get_distributions() scans sys.path looking for ".egg-info directories". What does it do for sys.path entries that don't correspond to filesystem directories? (Note - these may or may not be zip files. Even if they are zip files, an earlier entry on sys.path_hooks could have taken precedence. At the very least, you should only process path entries as zip files if their importer - in sys.path_importer_cache or via an explicit path hook scan - is a zipimporter object.). To be honest, this is a major can of worms. But if PEP 376 is not going to support PEP 302, then it must state that fact explicitly, to avoid giving people false expectations - particularly with Brett's importlib in Python 3.1, which will make it far easier for people to experiment with new packaging formats such as the ones Lukasz mentions above. And it MUST fail gracefully in the face of unsupported importer types. Paul.
![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
Paul Moore wrote:
importlib is in 2.7 as well. I agree that even if the reference implementation of PEP 376 only handles normal directories and zipfiles, the PEP itself needs to explain how the developer of a custom PEP 302 importer or loader can hook into the mechanism in order to provide metadata that distutils can understand. Or, as you suggest, state explicitly that the proposal at this stage is specifically limited to filesystem and zipfile packages and defer extension to arbitrary PEP 302 importers and loaders to a later PEP. To be honest, I'd personally be OK with the latter strategy - while other PEP 302 loaders and importers do exist (as Lukasz'z post shows), it would be unfortunate to unduly hold up improved installation metadata for the vast majority of typical use cases while we try to figure out ways to support the more esoteric use cases. Supporting both would obviously be better, but given the choice between the status quo and partial support, I believe this is a case where the partial support would still be worthwhile. I will note (and I believe this is also the main point that Lukasz was making) that having the distribution metadata outside the distribution as currently proposed in PEP 376 is going to make any eventual PEP 302 integration much harder - 302 importers only provide a mechanism for accessing files inside the distribution, not "adjacent" to them, so the mechanism in the PEP doesn't generalise properly. I suspect this limitation of the PEP 302 APIs is the origin of the setuptools format that embeds the metadata inside the distribution - it lets you get at the metadata without having to assume that it exists directly on the filesystem anywhere. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
![](https://secure.gravatar.com/avatar/5e5142d6a1a578f02e2d94c4d6d31088.jpg?s=120&d=mm&r=g)
On Fri, Jul 3, 2009 at 4:28 PM, Nick Coghlan<ncoghlan@gmail.com> wrote:
Maybe we could rework the pkgutil classes for PEP 376 so they look like an implementation of the PEP 302 protocol for directories and zip files, with an extra "get_metadata()" API and state that it could be an extension for PEP 302 later.
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
2009/7/3 Tarek Ziadé <ziade.tarek@gmail.com>:
That sounds like an excellent approach And given that PEP 302 is nothing more than an API spec, it's nice and easy to extend PEP 302 to say that importers can/should implement this (after all, the only 2 "real" cases in the wild are done, via PEP 376). Warning: I've not thought all this through fully, so I may be missing some subtleties! The ultimate spec needs to be clearly worded, and as a consequence of this discussion it occurs to me that the current PEP 376 is very far from clear in defining exactly what the naming, layout and location of egg-info directories is - it makes a lot of assumptions that may break for more general path importers I'll do some thinking, and maybe come up with some examples of PEP 302 edge cases, so that we can at least be sure that they've not been ignored (rejected as to stupid to care about, on the other hand, I'm happy with :-)) Paul.
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
2009/7/3 Nick Coghlan <ncoghlan@gmail.com>:
You put it more clearly than I did. That's basically what I think, with the one proviso that we should make sure that PEP 376 doesn't specify something that out and out breaks the more esoteric PEP 302 cases. When Just and I were developing PEP 302, we found that the best way to do that was to leave anything that didn't need to be specified, as unspecified (hence the fact that there's so little defined in PEP 302!). It's easier to add things later than to remove or change them. That's why I was recommending to Tarek that he take out of the PEP any details about classes or APIs that couldn't be directly obtained from the core API. The same applies here (just talking in terms of a duck-typed notional Distribution class, allows people to implement their own URLDistribution or SqliteDistribution, or whatever, and not have to emulate any more of a filesystem than necessary).
Again, agreed. But remember that PEP 302 is driven by looking up a module or package name. PEP 376 is looking up a *distribution* name. The docutils example in the PEP shows this: - docutils/ - roman.py - docutils-0.5-py2.6.egg-info/ There are 3 things here: a package (docutils), a module (roman) and a distribution (also named docutils, but could be called George for all it matters). So none of the PEP 302 lookup mechanisms (which work on package/module name) apply. The need for a separate concept (a "distribution") is unfortunate, as it adds complexity, but there are enough real life cases that make it clear that it's necessary. Hmm, I suspect that the implication here is that PEP 302 could do with an overhaul, to extend it to encompass the concept of a distribution. I'd be willing to have a look at that. Paul.
![](https://secure.gravatar.com/avatar/eaa875d37f5e9ca7d663f1372efa1317.jpg?s=120&d=mm&r=g)
At 12:28 AM 7/4/2009 +1000, Nick Coghlan wrote:
I think you have this backwards; it's setuptools that doesn't care where (or whether) the metadata exists on the file system; it delegates metadata operations to a "metadata provider" that's usually an adapter over a PEP 302 "loader". See http://peak.telecommunity.com/DevCenter/PkgResources#supporting-custom-impor... for the API details of how to register support for arbitrary PEP 302 importers and loaders. (Which presumably, Lukasz is using. I didn't know that anybody was actually using it, but it's nice to know that the documentation is apparently sufficient for *some* people. ;-) )
![](https://secure.gravatar.com/avatar/5e5142d6a1a578f02e2d94c4d6d31088.jpg?s=120&d=mm&r=g)
2009/7/3 Paul Moore <p.f.moore@gmail.com>:
Right. While it would be feasible to make pgutil works with PEP 302 loaders, we would still need to define in a generic way the content of the RECORD files. Right now it works for directory and zipped files since it's expressed with '/' separated paths. And if I understand PEP 302 right, any backend would be able to handle these paths no matter how they are stored, as long as the implement APIs like get_data()
Sounds like a fully-featured packaging managment system, which is imho, out of scope. And I don't see PEP 376 making it impossible for someone to build such a packaging system on the top of distutils. I've started one myself for the sake of experimentation, with built-in multiversion support, for a full replacement of site-packages.
And also PEP 376 goal is to define a single standard location of egg-info files for filesystem data. The zip form was built so it could work with zipped site-packages directories, like what the py2app project uses.
I'll add more details on that part. right now it visits directories and zip files. Tarek
participants (8)
-
Antoine Pitrou
-
Kevin Teague
-
Nick Coghlan
-
P.J. Eby
-
Paul Moore
-
Sridhar Ratnakumar
-
Tarek Ziadé
-
Łukasz Langa