Hello On Fri, Mar 28, 2008 at 11:02:19AM -0400, Alexander Michael wrote:
I'll continue my fool hearty effort [1] to build a concrete proposal for "a database of installed packages" by offering up a sketch of a possible straw-man "solution". I realize that this is likely oversimplified to a fault, but I hope it will help us move forward. Apologies if the equivalent of this has been proposed and rejected before. My proposal is basically to make PKG-INFO functional and usable by:
* Fixing the technical issues with requirements (i.e. dependencies) and naming as specified by PEP 314/345.
* Modifying distutils to install PKG-INFO alongside each module file or package directory as a side-car file of the same name but with a special extension (.pyi or whatever). These files would be the place to include the optional list of installed files as well as the optional md5sums, if desired by the installer. Files in the package will be listed using relative paths, while far flung files (bin, shared, etc) will get full paths so that there is full allowance for relocating simple (nothing in bin or shared) modules and packages. Although optional, "python setup.py install" will include the installed file list by default.
That's it.
This proposal has been here about a week now, with no comments on it. I take that as positive as no one has had major objections. :-) Personally I think it is a good proposal, it does basically what an installation database would have to do while being minimally invasive. The important question is however: Is this enough for setuptools to work withouth doing all it's path magic? Would this be a workable solution for setuptools? Now my own thoughts about the technicalities (sorry this got long)... Distutils does create a ${pkgname}-${version}.egg-info file right now with the PKG-INFO data in. From earlier discussions it seems the .egg-info extension is not very loved, so a change to .pyi could be done (also, it has little to do with eggs). Secondly I'm not sure how useful it is for the version number to be encoded in the filename. It seems the .egg-info file does get installed in the site-packages root currently. This will likely give conflicts when we're starting to use namespace packages. We can't put the .pyi *in* the package since then we lose support for simple modules, so we have to place it *next* to the package. So if "bar" is a namespace package inside "foo" then we would have: site-packages/foo/bar.pyi site-packages/foo/bar/__init__.py This means any package tool will need to recursively scan the site-packages directory to find the files, but that doesn't seem like to much a penalty? The alternative is to have a separate directory for the intalldb files: site-packages/foo/bar/__init__.py site-packages/install.db/foo/bar.pyi This could significantly reduce the scanning time since there are far fewer files too walk. I chose a name with a "." for install.db so we're not stealing a possible module or package name. Other then that the name of the directory can by anything we manage to agree on. :-) Using this approach might create confusion about relative paths mentioned in .pyi files though (is the root the current direcotry or do we pretend the .pyi was actually next to the package/module?). Distribution not providing a package/module or with a different distribution name then the package(s)/module(s) provided would end up in the top-level of the database (in both scenarios), effectively stealing package/module names but that seems to be the current behaviour of distutils already anyway. Namespace sub-distributions (bar in the example above) with a different distribution name as package/module name would steal names from it's namespace. Namespace packages are not fully handled yet, there is still the issue of who owns site-packages/foo/__init__.py. That would logically be defined by site-packages/foo.pyi, but we don't want the user to have to install yet another package for this. So for a namespace package the .pyi could look like this: Metadata-Version: 1.0 Name: foo ... Owner: setuptools Namespace: True Directory: foo/ File: foo/__init__.py It might be possible that a namespace package doesn't need an owner so that a different tool is allowed to clean it up, but I'm not sure about that. When "bar" gets uninstalled now it should know if it can clean up it's namespace "foo" too (if it is empty). So bar.pyi should have: Metadata-Version: 1.0 Name: bar ... Owner: setuptools Requires-Namespaces: foo Directory: foo/bar/ File: foo/bar/__init__.py Here "foo" could also have been a dotted name: "foo.sub.package". So the "foo.sub" package would have both the Namespace: and Requires-Namespace: fields in it's .pyi. AFAIK this should cover namespace packages. So the new headers to turn the PKG-INFO into a .pyi would be: Owner: The owner of this distribution. This would be any string representing the package tool, e.g.: distutils, setuptools, zc.buildout, rpm, dpkg, etc. Provides: Copied from PEP262. Don't like this in it's original form since it's ambiguous. So this lists the *distributions* provided by this package on top of it's native name in the Name: field. Optional (and very rare). Modules: List of packages/modules provided. If no packages or modules are installed this doesn't need to be present. You could argue that this should be called Packages: or so. Derived from PEP262. Namespace: The value of this doesn't matter, when present it indicates this .pyi file describes a namespace package. Requires: Copied from PEP262 (also ambiguous). Optional. Distributions that must be installed for this distribution to work. Requires-Modules: Optional list of packages/modules required. No need to list modules in the standard library. (Figuring out if this site-packages tree is of the correct python version is of no use for the installdb). Derived from PEP262. Requires-Namespaces: This package requires a namespace. The value is a list of dotted names of the namespace packages, as they would appear in an import statement. Directory: A directory from this package. Relative to this .pyi or absolute. For directories inside site-packages they *should* be relative, for outside site-packges they *should* be absolute. File: The value of this is first an optional MD5 hash (or SHA1?) of the file, followed by the path of the installed file (absolute or relative, same rules as for Directory: above). The only restriction this makes on a filename is that you can't have a file in the current directory that is also a valid hash and does not have a hash itself. You can work around this by prepending the filename with ./ however - but why would you want such a file? The only issue I can think of right now is with File:. It is not very extensible if a tool wants too keep track of extra info like file permissions. AFAIK RFC822 requires you to keep the order of the fields, if so we could make this: File: foo.py File-MD5: XXXXXXXXXXXX X-MyTool-File-perms: -rw-rw-r- File: bar.py ... Lastly --and I'm not sure how happy I'm about this, should have thought of this earlier-- the python packaging tools need to support giving away ownership at install time! Since Debian and Redhat etc just call setup.py that would mean each package they install would be owned by distutils/setuptools/... That's bad. I propose that setup.py needs to honour an environment variable: PYI_OWNER so that distros can set this to their custom name (dpkg, rpm, ...). Although I can imagine in Debian's case that it's better to change the dh_py* tools to go and modify the .pyi files. So if all distros are happy with having to modify installed files this might not be necessary. Another a nice/required feature for distros would be to ask the tool to only install the namespace package or omit the namespace packge. This could just be a command line switch to setup.py I think. Again this is not a hard requirement, I can imagine Debian's dh_py* tools to scan the .pyi files, detect namespace packages and (re)move them as required. But once more I don't know enough about other distro's. Phew, thanks for reading this far! I hope this is useful, if it is we should probably start writing the text for the new PEP262 on a wiki somewhere while we discus details. Regards Floris -- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org