[Distutils] What metadata does pip actually need about sdists?

Donald Stufft donald at stufft.io
Mon Oct 12 07:49:47 CEST 2015


On October 12, 2015 at 12:54:07 AM, Nathaniel Smith (njs at pobox.com) wrote:
I dunno -- I'm sure there must exist some other ways forward that 
don't require dropping the dream of static dependencies. At one 
extreme, we had a birds-of-a-feature at SciPy this year on "the future 
of numpy", and the most vocal audience contingent was in favor of 
numpy simply dropping upstream support for pip/wheels/pypi entirely 
and requiring all downstream packages/users to switch to conda or 
building by hand. It sounds like a terrible idea to me. But I would 
find it easier to believe in the pypi/pip ecosystem if there were some 
concrete plan for how this all-static world was actually going to 
work, and that it wasn't just chasing rainbows. 
FTR, I plan on making some sort of auto builder for PyPI so it’s possible that we can get pip to a point where almost all things it downloads are binary wheels and we don’t need to worry too much about needing to optimize the sdist case. I also think that part of the problem with egg-info and setup.y’s and pip’s attempted use of them is just down to the setup.py interface being pretty horrible and setup.py’s serving too many masters where it needs to be both VCS entry point, sdist metadata entry point, installation entry point, and wheel building entry point.

I also think that it would be a terrible idea to have the science stack leave the “standard” Python packaging ecosystem and go their own way and I think it’d make the science packages essentially useless to the bulk of the non science users. I think a lot of the chasing rainbows like stuff comes mostly from: We have some desires from our experiences but we haven’t yet taken the time to fully flesh out what the impact of those desires are, nor are any of us science stack users (or contributors) to my knowledge, so we don’t deal with the complexities of that much [1].

One possible idea that I’ve thought of here, which may or may not be a good idea:

Require packages to declare up front conditional dependencies (I’m assuming the list of dependencies that a project *could* have is both finite and known ahead of time) and let them give groups of these dependencies a name. I’m thinking something similar to setuptools extras where you might be able to put a list of dependencies to a named group. The build interface could include a way for the thing that’s calling the build tool to say “I require the feature represented by this named group of dependencies”[2], and then the build tool can hard fail if it detects it can’t be build in a way that requires those dependencies at runtime. When the final build is done, it could put into the Wheel a list of all of the additional named groups of dependencies it built with. The runtime dependencies of a wheel would then be the combination of all of those named groups of dependencies + the typical install_requires dependencies. This could possibly even be presented nicely on PyPI as a sort of “here are the extra features you can get with this library, and what that does to it’s dependencies”.

Would something like that solve Numpy’s dependency needs? Or is it the case that you don’t know ahead of time what the entire list of the dependency specifiers could be (or that it would be unreasonable to have to declare them all up front?). I think I recall someone saying that something might depend on something like “Numpy >= 1.0” in their sdist, but once it’s been built then they’ll need to depend on something like “Numpy >= $VERSION_IT_WAS_BUILT_AGAINST”. If this is something that’s needed, then we might not be able to satisfy this particular thing.

I think I should point out too, that I’m not dead set on having static dependency information inside of a sdist/source wheel/whatever. What I am dead set on having is that all of the metadata *inside* of the sdist/source wheel/whatever should be static and it should include as much as possible which isn’t specific to a particular wheel. This means that things like name, version, summary, description, project URLs, etc. these are obviously (to me anyways) not going to be specific to a wheel and should be kept static inside of the sdist (and then copied over to resulting wheel as static as well [3]) and you simply can’t get information out of a sdist that is inherent to wheels. Obvious (again, to me) examples of data like that are things like build number, ABI the wheel was compiled against, etc. Is the list of runtime dependencies one of the things that are specific to a particular wheel? I don’t know, it’s not obvious to me but maybe it is. [4]

[1] That being said, I’d love it if someone who does deal with this things would become more involved with the “standard” ecosystem so we can have experts who deal with that side of things as well, because I do think it’s an important use case.

[2] And probably: “Even if you can do the feature suggested by this named group of dependencies, I don’t want it”

[3] This rule should be setup so that we can have an assertion put into place that this data will remain the exact same when building a wheel from a sdist.

[4] I think there’s possibly some confusion in what is causing problems. I think that the entirety of ``setup.py`` is full of problems with executing in different environments and isn’t very well designed to enable robust packages. A better designed interface would likely resolve a good number of the problems that pip currently has either way. It is true however that needing to download the sdist prior to resolving dependencies makes things a lot slower, but I also think that doing it correctly is more important than doing it quickly. We could possibly resolve some of this by pushing more people to publish Wheels, expanding the cases where Wheels are possible to be uploaded, and creating a wheel builder service. These things could make it possible to reduce the situations where you *need* to download + build locally to do a dependency resolution. I think that something like the pip wheel building cache can reduce the need for this as well, since we’ll already have built wheels available locally in the wheel cache that we can query for dependency information. 

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20151012/2f7d9c8c/attachment-0001.html>


More information about the Distutils-SIG mailing list