On Tue, Nov 20, 2012 at 9:35 AM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
Daniel Holth <dholth <at> gmail.com> writes:

> Mostly it seems a bit silly to have so much conversations about parts of the
> pep that remain unchanged from previously accepted versions...

I don't agree with the suggestion that we shouldn't discuss it because it was
accepted in a previous version. Perhaps it didn't receive the right scrutiny at
that time, but since it hasn't been implemented, it's reasonable to discuss it.

ISTM that implementing it as suggested in the PEP can lead to certain problems,
since it is a multi-valued field. If it is left in, then something should be
said in the PEP about the potential difficulties and if/how they can be
resolved.

The difficulties I am talking about relate to dependency resolution. Given the
current definition of Provides-Dist, it is possible for a package A on PyPI to
"Provide" all of e.g. "A (1.0)", "B (1.2)" and "C (1.5)", and it is also
possible for packages B and C on PyPI to provide the same (or slightly
different) versions of logical packages of A, B, and C. This will likely lead
to the need for a sophisticated dependency resolver because the dependency
graph can get quite convoluted. (Remember, we might need to do this resolution
when removing packages as well as when installing them.) I know there are SAT
solvers and such, but I'm not sure we need that level of sophistication, or
whether its complexity cost is outweighed by any benefit. Remember, we are
managing fine without multi-valued Provides-Dist, and while a case has been
made for virtual packages and forks (which just require a single-valued field),
no compelling case has been made for bundling packages in general (I understand
that such requirements might sometimes arise in certain corporate environments,
but they don't seem to be a mainstream use case). Hence, no strong case has
been made for a multi-valued "Provides" field.

If we have a good index and packaging infrastructure, there is no general need
for packages to bundle other packages, unless those bundled packages are changed
in some way to suit the bundler's needs. In that case, I don't know how you
could be sure that a bundled "A (1.0)" hasn't diverged from the equivalent
package on PyPI.

The "Provides" seems essentially useless in a metadata index, since if, when
asked to install D which has a dependency on A, you would download and install
A to resolve it rather than B or C, and I can't see when you would want to
query the index to say "who provides A?" and then use some heuristic to pick
e.g. B or C, rather than A.

distlib currently contains support for the multi-valued "Provides", but I'm
not confident that will work as expected given pathological cases like the
example I suggested, without getting "complicated" in the Zen of Python sense.
I'm not convinced that the maintenance burden of a complicated solution is
worth the heretofore unnecessary ability to bundle stuff in arbitrary ways.

If you don't have Provides-Dist, then distribute must continue to bundle an extra .egg-info directory to emulate the feature. This is more than enough justification for me. Name: is essentially an alias for Provides-Dist: (or vice-versa) so there is no such thing as a single-valued Provides-Dist. Having two names for a package is just as complicated as having twenty.

You should not implement Provides-Dist by searching for every Provides-Dist: name on PyPI. You should only use it when deciding whether to download setuptools when distribute is already installed and a package depends on setuptools.

The bundling term was bad wording on the part of the PEP. No one should ever include non-renamed copies of other dists in their dists "import six" vs. "import django.util.six". I've suggested a new wording in this thread.