Re: [Distutils] A Modest Proposal for "A Database of Installed Packages"
At 05:05 PM 4/7/2008 -0400, Alexander Michael wrote:
a. I believe that having side-car files that sit alongside packages because they have the same base name makes the database more transparent to the uninitiated.
I'm not aware that this was ever a stated design goal, nor why it should have any priority. OTOH, files named by distribution would be at least as, if not even *more* transparent than package names, so I don't see any particular benefit to this.
Just browsing a directory of python packages will allow you to see what's going on. Moving like-names files around manually maintains the integrity and availability of the data.
Moving anything manually, other than the *entire* directory, will be unlikely to retain any form of integrity, so it's best not to give the false impression that it would.
I think that having magic entries in an essentially "hidden" directory somewhere will cause all sorts of trouble that could be avoiding at the cost of a small bit of duplication. b. I assume, perhaps incorrectly, that most distributions contain only a single package.
Very incorrectly, unless you mean a single top-level package. Odds are fairly good that if there's a package, there's probably at least a subpackage, too, like perhaps a tests subpackage.
That said, I do agree that if you are primarily interested in a database of *distributions* (as opposed to *packages*) then something like is proposed in PEP 262 makes more sense (but it would have to be per directory and not site-wide due to the dynamic nature of the python path).
That's exactly what I want. The only reason I didn't just implement easy_install using a per-directory form of PEP 262 is that I wanted something done rather more immediately. That was years ago, so I can afford to be more patient now. :)
This is a trade-off between putting the metadata up front in an obvious and easy to understand way so that it will hopefully have a better chance of being noticed and maintained, versus tucking it away hidden someplace so that even though it is broken, it doesn't bother anyone until they care enough to fix it. *It is this trade-off that I am exploring with this strawman "counter" proposal to PEP 262.*
Someone would have to be crazy to maintain this information by hand. So I'd actually consider it an advantage if the file format made this fact plain, by using something that's difficult for a human being to maintain, like say a pickle. ;-) OTOH, it's possible that some system packagers will not wish to use Python to generate the files, so using something a bit less complex would be a good idea. The format proposed by PEP 262 isn't really that bad of a trade-off in those terms.
2. The strawman proposal did not explicitly address how optional add-on tools (like setuptools) might manage namespace packages.
I think there's some mistunderstanding here about the proposal's goals. If the proposal doesn't work for setuptools, it doesn't work, period. The entire point is to allow setuptools to do its work without annoying the people who don't want to use it.
I agree with Floris that the best way to avoid magic is to actually have the sub-packages in a namespace share the same parent directory on disk.
I agree with this also. The issue is that an __init__.py must exist for this to happen, but most system packaging tools (e.g. RPM) require that a given file be owned by at most one system package (i.e., distribution), whereas the contents of a namespace package are assembled from multiple distributions. That's the problem that needs solving, not runtime support for the namespace itself.
4. Concerns were raised about the performance penalty for using the side-car style files without version numbers possibly not all of which were located at the top-most level of the directory listed in the python path.
Any add-on tool that actually used the data would very likely need to build a cache of the data using a more efficient representation,
This is a misunderstanding of the point I raised. Floris merely asked why there were version numbers in .egg-info files, and I answered him. That doesn't actually have much, if anything, to do with the package database proposal. It's merely how installed distributions' versions can be recognized quickly at runtime, not anything to do with how potential installation conflicts are handled at installation time. easy_install uses eggs for installation simply because it need never worry about file ownership conflicts. There is a direct mapping from a distribution to its files: the contents of a zipfile or subdirectory. This also allows for (relatively) straightforward uninstallation. The goal of the proposal, then, is to have a way for easy_install to have another way to map from a distribution to its owned files (and vice versa), so that eggs are not necessary for normal, single-version installations.
On Mon, Apr 7, 2008 at 11:18 PM, Phillip J. Eby <pje@telecommunity.com> wrote:
At 05:05 PM 4/7/2008 -0400, Alexander Michael wrote:
a. I believe that having side-car files that sit alongside packages because they have the same base name makes the database more transparent to the uninitiated.
I'm not aware that this was ever a stated design goal, nor why it should have any priority. OTOH, files named by distribution would be at least as, if not even *more* transparent than package names, so I don't see any particular benefit to this.
Your right. I didn't state this in my rationale. I'm suggesting it *partly* because I thought that subverting the "database" aspect by inverting the relationship between distributions and packages would help solve the social problem. Now it could be that a technical solution to a social problem is suspect, but the other part of why I suggested it this way was an honest attempt at improving transparency by putting the metadata up front. I did indeed read PEP 262 and upon reading it decided it sounded too much like a "system packager" and not enough like a way to get the missing metadata out of PKG-INFO and into the installed packages. I appreciate your patience with me. I know I've earned little social capital in this community and am likely trying too hard to be helpful, but I'm earnestly trying to be helpful and I have read-up a fair bit in attempting to do so. That said, I think your probably right and inverting the distribution/package name relationship is too problematic, if only because of the redundancy it results in for distributions that contain more than one package (or worse, a slew of modules). This pretty much leaves us with the egg-info files I've been reading about. Since I use setuptools for everything but wxPython (whose WinXP installer doesn't seem to include them) and these aren't included in the standard library except for wsgiref, I don't really see these files, even though I see the code to produce them in distutils. But this is perhaps beside the point.
Just browsing a directory of python packages will allow you to see what's going on. Moving like-names files around manually maintains the integrity and availability of the data.
Moving anything manually, other than the *entire* directory, will be unlikely to retain any form of integrity, so it's best not to give the false impression that it would.
I disagree with this. Certainly it decreases in likelihood when the side-car files are named by distribution and not package and if the distribution contains more than one package, but other than at, it seems pretty easy (e.g. hmm.. maybe I should move the mypkg.pkg-info file along with the mypkg directory. let's look inside. oh! I see how this works!)
I think that having magic entries in an essentially "hidden" directory somewhere will cause all sorts of trouble that could be avoiding at the cost of a small bit of duplication. b. I assume, perhaps incorrectly, that most distributions contain only a single package.
Very incorrectly, unless you mean a single top-level package. Odds are fairly good that if there's a package, there's probably at least a subpackage, too, like perhaps a tests subpackage.
I do mean single top-level package (but one setup.py), thanks for clarifying.
That said, I do agree that if you are primarily interested in a database of *distributions* (as opposed to *packages*) then something like is proposed in PEP 262 makes more sense (but it would have to be per directory and not site-wide due to the dynamic nature of the python path).
That's exactly what I want. The only reason I didn't just implement easy_install using a per-directory form of PEP 262 is that I wanted something done rather more immediately. That was years ago, so I can afford to be more patient now. :)
Its ironic how impatience is rewarded! ;)
This is a trade-off between putting the metadata up front in an obvious and easy to understand way so that it will hopefully have a better chance of being noticed and maintained, versus tucking it away hidden someplace so that even though it is broken, it doesn't bother anyone until they care enough to fix it. *It is this trade-off that I am exploring with this strawman "counter" proposal to PEP 262.*
Someone would have to be crazy to maintain this information by hand. So I'd actually consider it an advantage if the file format made this fact plain, by using something that's difficult for a human being to maintain, like say a pickle. ;-) OTOH, it's possible that some system packagers will not wish to use Python to generate the files, so using something a bit less complex would be a good idea. The format proposed by PEP 262 isn't really that bad of a trade-off in those terms.
What made you think people were rational? :) I do think that being able to maintain it by hand will aid in transparency and clarity which linux users just love, so maybe it will help win them over to the idea of python knowing a little bit about itself. ;)
2. The strawman proposal did not explicitly address how optional add-on tools (like setuptools) might manage namespace packages.
I think there's some mistunderstanding here about the proposal's goals. If the proposal doesn't work for setuptools, it doesn't work, period.
The entire point is to allow setuptools to do its work without annoying the people who don't want to use it.
This is my misunderstanding. I was trying to take setuptools out of the equation (if only to avoid the social backlash of "trying to get setuptools into the standard library") by providing a proposal that met the objectives of my installation-tool agnostic rationale. .
4. Concerns were raised about the performance penalty for using the side-car style files without version numbers possibly not all of which were located at the top-most level of the directory listed in the python path.
Any add-on tool that actually used the data would very likely need to build a cache of the data using a more efficient representation,
This is a misunderstanding of the point I raised. Floris merely asked why there were version numbers in .egg-info files, and I answered him. That doesn't actually have much, if anything, to do with the package database proposal. It's merely how installed distributions' versions can be recognized quickly at runtime, not anything to do with how potential installation conflicts are handled at installation time.
Apologies for confusing point of fact with objective.
easy_install uses eggs for installation simply because it need never worry about file ownership conflicts. There is a direct mapping from a distribution to its files: the contents of a zipfile or subdirectory. This also allows for (relatively) straightforward uninstallation.
I actually like zipped eggs (much more than easy_install as package manager), but that is besides the point since the BDFL vetoed them.
The goal of the proposal, then, is to have a way for easy_install to have another way to map from a distribution to its owned files (and vice versa), so that eggs are not necessary for normal, single-version installations.
This is where we misunderstood each other and where I've probably gone astray as I wasn't trying to propose anything at all for easy_install (it wasn't in my attempt at a rationale), but a generic common ground that tools like easy_install (but not exlcusively) could use without stepping on people's toes like easy_install does. I really wanted the proposal to standalone from easy_install so that easy_install haters wouldn't have to fear it as well as provide some utility for those who don't even use such tools. But this is probably just tilting at windmills. Thanks again for your patience. You must be overwhelmed by all of the opinions and misdirected attempts to help (including mine). I did wish we could have come-up with a setuptools agnostic support layer for installation managers suitable for inclusion in the standard library, but for some reason that doesn't seem desired by too many people. Apologies (and thanks for all your hard work).
participants (2)
-
Alexander Michael
-
Phillip J. Eby