At 02:18 AM 4/6/2008 +0100, Floris Bruynooghe wrote:
On Sat, Apr 05, 2008 at 07:50:19PM -0400, Phillip J. Eby wrote:
At 10:07 PM 4/5/2008 +0100, Floris Bruynooghe wrote: (One comment, though: I really don't like the idea of extending PKG-INFO to include installation data; it's only incidentally related and there are other contexts in which we use PKG-INFO where having that data included would make no sense. Plus, it's really not an ideal file format for including data about a potentially rather large number of files.)
That's fair. Blowing up the files with the PKG-INFO information in could have bad performance effects. rfc822 in the stdlib reads everything in memory AFAIK.
Secondly I'm not sure how useful it is for the version number to be encoded in the filename.
It's very useful for setuptools, as it avoids the need to open and parse the file when searching for a suitable version of a desired package.
Hmm, it's not that much work to read the contents of a .egg-info. Just seems odd to me to have this info in two places so close to each other.
It allows pkg_resources to grok the entire contents of a directory using only a single listdir operation -- not an unbounded number of open-and-read operations. Of course, if we're going to *also* have a properly-named .egg-info file, then using just the project name is sufficient for the install db.
[...]
All of this is moot, since project/distribution names are unrelated to package names.
So this means there is a flat namespace for all project names and nested namespace for modules. When I was saying that project names "steal" names from modules that is because they end up in the same directory. I.e. project "foo" with foo_1.0.egg-info provides module "bar", while project "bar" with bar_1.0.egg-info provides module "bar2". Not ideal.
I have no idea what you're saying here. There is absolutely no relationship between project names and the Python package/module namespace. None. Thus, any attempt to talk about them as though they are related is just noise to me.
The second part was introducing a "virtual project" for pure namespace packages, where the project name would have to be the same as the package name in order to find it.
I think there would also need to be some prefix to the name, to prevent confusion in the event that there exists a normal project name that happens to use that package name. (Again: the two namespaces are unrelated, so a new/reserved namespace would be required for these virtual projects.)
AFAIK this should cover namespace packages.
Unfortunately, this doesn't fix the problem, since either *some* package has to own the __init__.py, or there has to be a way for Python to treat the directory as a package without one. And for system package managers (esp. on Linux), some *one* system package must own the file - it can't be owned by multiple system packages.
With the format I suggested a package tool could detect on install if a required pure namespace package was already installed or still needed to be installed/created. Similar on removal it is possible to detect if the pure namespace package is still required (by checking if it's directory contains any other files then those provided by the namespace package) on removal of a sub-package.
Again... some system packaging folks need to speak up on this, because my understanding is that some tools simply can't do something like this. They need to make explicit what a given package depends on, and install that, not dynamically decide what dependencies something has. (And then there is the possibility of a problem if a non-system packager installs the namespace, and then you install a system package for something that includes packages in that namespace.)
Lastly --and I'm not sure how happy I'm about this, should have thought of this earlier-- the python packaging tools need to support giving away ownership at install time! Since Debian and Redhat etc just call setup.py that would mean each package they install would be owned by distutils/setuptools/... That's bad.
I propose that setup.py needs to honour an environment variable: PYI_OWNER so that distros can set this to their custom name (dpkg, rpm, ...).
A command-line option to 'install' that's inherited by 'install_egg_info' would handle this; I don't think an environment variable is a good idea for this -- too implicit. Note that bdist_rpm, for example, would need to encode this as a command-line option in the .spec file, anyway.
I picked an environment variable here because then it would be possible to call setup.py identical whether or not it provides this new installdb. Providing a non-existing command line option tends to cause more problems.
How so, if this is going into a new version of Python?
Not wanting to blow up PKG-INFO is laudable, but OTOH separating out the data is dubious as is replicating data (PKG-INFO data in .egg-info AND the installdb). PKG-INFO was just simple as it's there and tools can use it already.
Maybe we're making it too hard by wanting to cover *every* file installed by python projects? The main reason for this installdb, as I understand it, is so that a package tool can install a sub-project in a namespace package installed by someone else. And similarly that someone else doesn't wipe away the sub-package when it thinks it can remove the namespace package.
It's not just about namespace packages, it's about any package or module. We also want to know about installed scripts, data, etc., so that they can be cleaned up by a tool that does uninstalls.
Ah, this make me think of the people that complain on comp.lang.python that Python namespaces are too tightly bound to files and directories... It all makes sense now, we wouldn't even be having this discussion if a package could declare it's namespace in the code! ;-)
Or if you could import from directories without needing there to be an __init__.py, and Python supported namespace packages by default.