
On Sat, Apr 05, 2008 at 07:50:19PM -0400, Phillip J. Eby wrote:
At 10:07 PM 4/5/2008 +0100, Floris Bruynooghe wrote: (One comment, though: I really don't like the idea of extending PKG-INFO to include installation data; it's only incidentally related and there are other contexts in which we use PKG-INFO where having that data included would make no sense. Plus, it's really not an ideal file format for including data about a potentially rather large number of files.)
That's fair. Blowing up the files with the PKG-INFO information in could have bad performance effects. rfc822 in the stdlib reads everything in memory AFAIK.
Secondly I'm not sure how useful it is for the version number to be encoded in the filename.
It's very useful for setuptools, as it avoids the need to open and parse the file when searching for a suitable version of a desired package.
Hmm, it's not that much work to read the contents of a .egg-info. Just seems odd to me to have this info in two places so close to each other.
[...]
All of this is moot, since project/distribution names are unrelated to package names.
So this means there is a flat namespace for all project names and nested namespace for modules. When I was saying that project names "steal" names from modules that is because they end up in the same directory. I.e. project "foo" with foo_1.0.egg-info provides module "bar", while project "bar" with bar_1.0.egg-info provides module "bar2". Not ideal.
What I was trying to get at was to prefix project names that provide a sub-module for a namespace with the namespace module name. Inside the hypothetical installdb that is. But maybe that makes the whole project namespace vs modules namespace just more confusing (thinking of it definatly a bad idea if the project of the sub-package also installs a script or so).
The second part was introducing a "virtual project" for pure namespace packages, where the project name would have to be the same as the package name in order to find it.
AFAIK this should cover namespace packages.
Unfortunately, this doesn't fix the problem, since either *some* package has to own the __init__.py, or there has to be a way for Python to treat the directory as a package without one. And for system package managers (esp. on Linux), some *one* system package must own the file - it can't be owned by multiple system packages.
With the format I suggested a package tool could detect on install if a required pure namespace package was already installed or still needed to be installed/created. Similar on removal it is possible to detect if the pure namespace package is still required (by checking if it's directory contains any other files then those provided by the namespace package) on removal of a sub-package.
My guess is that this is true, *even if* the file is automatically generated. Some system packaging folks will need to chime in here.
System packagers would create 2 packages out of a package requiring a namespace package. One the pure namespace, the other with the sub-package. Other sub-packages then just need to depend on the pure namespace one.
Lastly --and I'm not sure how happy I'm about this, should have thought of this earlier-- the python packaging tools need to support giving away ownership at install time! Since Debian and Redhat etc just call setup.py that would mean each package they install would be owned by distutils/setuptools/... That's bad.
I propose that setup.py needs to honour an environment variable: PYI_OWNER so that distros can set this to their custom name (dpkg, rpm, ...).
A command-line option to 'install' that's inherited by 'install_egg_info' would handle this; I don't think an environment variable is a good idea for this -- too implicit. Note that bdist_rpm, for example, would need to encode this as a command-line option in the .spec file, anyway.
I picked an environment variable here because then it would be possible to call setup.py identical whether or not it provides this new installdb. Providing a non-existing command line option tends to cause more problems.
Phew, thanks for reading this far! I hope this is useful, if it is we should probably start writing the text for the new PEP262 on a wiki somewhere while we discus details.
The major issues at the moment are that 1) your spec is confused about packages vs. projects or distributions (and thus needs to be revamped with that in mind),
See clarification above.
and 2) PKG-INFO is a really lousy place to put this, from a formatting perspective. It's one thing to include the PKG-INFO in the install DB, and another thing entirely to include the install db into the PKG-INFO! I think PEP 262 had the right idea, even though I'm not overjoyed by its proposed format, either.
Not wanting to blow up PKG-INFO is laudable, but OTOH separating out the data is dubious as is replicating data (PKG-INFO data in .egg-info AND the installdb). PKG-INFO was just simple as it's there and tools can use it already.
Maybe we're making it too hard by wanting to cover *every* file installed by python projects? The main reason for this installdb, as I understand it, is so that a package tool can install a sub-project in a namespace package installed by someone else. And similarly that someone else doesn't wipe away the sub-package when it thinks it can remove the namespace package. Ah, this make me think of the people that complain on comp.lang.python that Python namespaces are too tightly bound to files and directories... It all makes sense now, we wouldn't even be having this discussion if a package could declare it's namespace in the code! ;-)
Regards Floris