
On Sat, Apr 05, 2008 at 10:49:24PM -0400, Phillip J. Eby wrote:
At 02:18 AM 4/6/2008 +0100, Floris Bruynooghe wrote:
On Sat, Apr 05, 2008 at 07:50:19PM -0400, Phillip J. Eby wrote:
At 10:07 PM 4/5/2008 +0100, Floris Bruynooghe wrote: (One comment, though: I really don't like the idea of extending PKG-INFO to include installation data; it's only incidentally related and there are other contexts in which we use PKG-INFO where having that data included would make no sense. Plus, it's really not an ideal file format for including data about a potentially rather large number of files.)
That's fair. Blowing up the files with the PKG-INFO information in could have bad performance effects. rfc822 in the stdlib reads everything in memory AFAIK.
Secondly I'm not sure how useful it is for the version number to be encoded in the filename.
It's very useful for setuptools, as it avoids the need to open and parse the file when searching for a suitable version of a desired package.
Hmm, it's not that much work to read the contents of a .egg-info. Just seems odd to me to have this info in two places so close to each other.
It allows pkg_resources to grok the entire contents of a directory using only a single listdir operation -- not an unbounded number of open-and-read operations.
I'm still not thrilled. To quote the "Rejected Suggestions" section of PEP 262: "First, performance is probably not an extremely pressing concern as the database is only used when installing or removing software, a relatively infrequent task."
Yet, it's a done fact so there's no point in me complaining about it - I'll live with it.
The second part was introducing a "virtual project" for pure namespace packages, where the project name would have to be the same as the package name in order to find it.
I think there would also need to be some prefix to the name, to prevent confusion in the event that there exists a normal project name that happens to use that package name. (Again: the two namespaces are unrelated, so a new/reserved namespace would be required for these virtual projects.)
Sounds sensible.
AFAIK this should cover namespace packages.
Unfortunately, this doesn't fix the problem, since either *some* package has to own the __init__.py, or there has to be a way for Python to treat the directory as a package without one. And for system package managers (esp. on Linux), some *one* system package must own the file - it can't be owned by multiple system packages.
With the format I suggested a package tool could detect on install if a required pure namespace package was already installed or still needed to be installed/created. Similar on removal it is possible to detect if the pure namespace package is still required (by checking if it's directory contains any other files then those provided by the namespace package) on removal of a sub-package.
Again... some system packaging folks need to speak up on this, because my understanding is that some tools simply can't do something like this. They need to make explicit what a given package depends on, and install that, not dynamically decide what dependencies something has. (And then there is the possibility of a problem if a non-system packager installs the namespace, and then you install a system package for something that includes packages in that namespace.)
As for dpkg it will just overwirte an existing __init__.py in the namespace package if it doesn't own it. It won't even tell you it did so (I was surprised at this).
However --and I know you don't like this-- this still is no problem. What we are concerned here is that a user or sysadmin owned directory on the sys.path can be managed sanely. dpkg and co will keep out of those, they have /usr/lib to play in, and sysadmins or users should stay out of /usr/lib in their turn.
What is needed to cooperate with system packagers is:
1. Detect existing packages on other directories of sys.path and accept them to satisfy dependencies on the distribution being installed.
2. Find a solution for a namespace package spread out over two directories of sys.path.
Maybe we're making it too hard by wanting to cover *every* file installed by python projects? The main reason for this installdb, as I understand it, is so that a package tool can install a sub-project in a namespace package installed by someone else. And similarly that someone else doesn't wipe away the sub-package when it thinks it can remove the namespace package.
It's not just about namespace packages, it's about any package or module. We also want to know about installed scripts, data, etc., so that they can be cleaned up by a tool that does uninstalls.
No, it's only about namespace packages. Everything else is easy, each tool can keep their own database of installed package in a suitable location if it wants to do that. If you didn't install a file you don't remove it.
Ah, this make me think of the people that complain on comp.lang.python that Python namespaces are too tightly bound to files and directories... It all makes sense now, we wouldn't even be having this discussion if a package could declare it's namespace in the code! ;-)
Or if you could import from directories without needing there to be an __init__.py, and Python supported namespace packages by default.
Also good point. I'm sure people can come up with negative site-effect of this but I can't come up with any myself now. So any takers? Is this a possible option to solve the problem? What is the reason for requiring __init__.py?
The longer this discussion goes on the less I like the idea of a full PEP 262 style database (I do admit that at first it seemed like a reasonable idea to me). One issue I've always had with it is that it suddenly stores management data in library directories (it should live in /var). The .egg-info files do already do this, but then they only really provide the sort of information that can be found in .so files of shared libraries but for python files.
To summarise what I think are the issues:
* Python packaging tools (distutils, setuptools) need to be able to detect packages on all sys.path directories and use them to satisfy dependencies. AIUI this is already done in Python 2.5 with the .egg-info files.
* Python packaging tools need to be able to share namespace packages in a user owned sys.path/site-packages directory. Installation and removal of the __init__.py needs coordination between the different tools. This is what PEP 262 could solve, but it's not necesarily the best or most loved solution.
* Namespace packages need to be able to be spread over multiple sys.path directories so that the system can provide part of it, the sysadmin some more and the user yet another sub-package.