
On 12 October 2015 at 18:36, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
Again trying to split out some more focused discussion from the big thread about sdists...
One big theme there has been the problem of "sources of truth": e.g. in current sdists, there is a PKG-INFO file that has lots of static metadata in it, but because the "real" version of that metadata is in setup.py, everyone ignores PKG-INFO.
A clear desideratum for a new sdist format is that we avoid this problem, by having static metadata that is actually trustworthy. I see two fundamentally different strategies that we might use to accomplish this. In time honored mailing list tradition, these are of course the one that I hear other people advocating and the one that I like ;-).
The first strategy is: sdists and the wheels they generate logically share the same metadata; so, we need some mechanism to enforce that
This is false: they don't share the same metadata. Some portions are the same, but deps, supported platforms, those will differ (and perhaps more than that). In particular, an sdist doesn't have a dependency on an ABI, and a wheel doesn't have a dependency on an API. Some APIs are ABIs (approximately true for all pure Python packages, for instance), but some are not (numpy).
The second strategy is: put static metadata in both sdists and wheels, but treat them as logically distinct things: the static metadata in sdists is the source of truth for information *about that sdist* (sdist name, sdist version, sdist description, sdist authors, etc.), and the static metadata in wheels is the source of truth for information about that wheel, but we think of these as distinct things and don't pretend that we can statically guarantee that they will match. I mean, in practice, they basically always will match.
The analgous current data won't match for pbr using packages when we fix https://bugs.launchpad.net/pbr/+bug/1502692 (older pip's don't support PEP-426 environment markers, but don't error when they are used either, leading to silent failure to install dependencies). Now, you might say 'hey, but the new shiny will support markers from day one'. Well the problem is backwards compat: we're going to have future things that change, and the more we split things out the more the changes are likely to need skewed results like this approach to deal with it. ...
the sdist name instead of the wheel name, it can actually do it
but the sdist and the wheel have to have the same name- or do you mean the filename on disk, vs the distribution name?
reliably in a totally static way, without having to run arbitrary code to validate this. OTOH pip will always have to be prepared to handle the possibility of mismatch between what it was expecting based on the sdist metadata and what it actually got after building it, so we might as well acknowledge that in our mental model.
One potential advantage of this approach is that we might be able to talk ourselves into trusting the existing PKG-INFO as providing static metadata about the sdist, and thus PyPI at least could start trusting it for things like the "description" field, and if we define a new
The challenge is the 40K broken packages up there on PyPI. Basically pip has a bugfix for any of: sdists built using distutils sdists built using random build systems that don't understand what an sdist is (e.g. automake) sdists built using versions of setuptools that had a bug in this area There is no corrective mechanism for broken packages other than route-around-it-while-you-ask-the-author-to-upload-a-fix. So I think to tackle the 'please trust the metadata in the sdist' problem, one needs to have a graceful ramp-up of that trust with robust backoff mechanisms that don't involve 50% of PyPI users hating on that one old project in the corner everyone has a dep on but that is actually moribund and not doing uploads. I can imagine several such routes, including a crowdsourced blacklist - but its going to be (like we're dealing with with the automatic wheel cache already) years of bug reports until things age out.
sdist format then it would be possible to generate its static metadata from current setup.py files (e.g. by modifying setuptools's sdist command). Contrast this with the other approach, where getting any kind of static source-of-truth would require rewriting almost all existing setup.py files.
We already generate static metadata from current setup.py files: setup.py egg_info does precisely that. There, bug fixed ;).
The challenge, of course, is that there are a few places where pip actually does need to know something about wheels based on examining an sdist -- in particular name and version and (controversially) dependencies. But this can/should be addressed explicitly, e.g. by writing down a special rule about the name and version fields.
I'm sorry, I don't follow. -Rob -- Robert Collins <rbtcollins@hp.com> Distinguished Technologist HP Converged Cloud