[Distutils] Sources of truth

Robert Collins robertc at robertcollins.net
Mon Oct 12 08:00:51 CEST 2015


On 12 October 2015 at 18:36, Nathaniel Smith <njs at pobox.com> wrote:
> Hi all,
>
> Again trying to split out some more focused discussion from the big
> thread about sdists...
>
> One big theme there has been the problem of "sources of truth": e.g.
> in current sdists, there is a PKG-INFO file that has lots of static
> metadata in it, but because the "real" version of that metadata is in
> setup.py, everyone ignores PKG-INFO.
>
> A clear desideratum for a new sdist format is that we avoid this
> problem, by having static metadata that is actually trustworthy. I see
> two fundamentally different strategies that we might use to accomplish
> this. In time honored mailing list tradition, these are of course the
> one that I hear other people advocating and the one that I like ;-).



> The first strategy is: sdists and the wheels they generate logically
> share the same metadata; so, we need some mechanism to enforce that

This is false: they don't share the same metadata. Some portions are
the same, but deps, supported platforms, those will differ (and
perhaps more than that).

In particular, an sdist doesn't have a dependency on an ABI, and a
wheel doesn't have a dependency on an API. Some APIs are ABIs
(approximately true for all pure Python packages, for instance), but
some are not (numpy).

> The second strategy is: put static metadata in both sdists and wheels,
> but treat them as logically distinct things: the static metadata in
> sdists is the source of truth for information *about that sdist*
> (sdist name, sdist version, sdist description, sdist authors, etc.),
> and the static metadata in wheels is the source of truth for
> information about that wheel, but we think of these as distinct things
> and don't pretend that we can statically guarantee that they will
> match. I mean, in practice, they basically always will match.

The analgous current data won't match for pbr using packages when we
fix https://bugs.launchpad.net/pbr/+bug/1502692 (older pip's don't
support PEP-426 environment markers, but don't error when they are
used either, leading to silent failure to install dependencies).

Now, you might say 'hey, but the new shiny will support markers from
day one'. Well the problem is backwards compat: we're going to have
future things that change, and the more we split things out the more
the changes are likely to need skewed results like this approach to
deal with it.

...
> the sdist name instead of the wheel name, it can actually do it

but the sdist and the wheel have to have the same name- or do you mean
the filename on disk, vs the distribution name?

> reliably in a totally static way, without having to run arbitrary code
> to validate this. OTOH pip will always have to be prepared to handle
> the possibility of mismatch between what it was expecting based on the
> sdist metadata and what it actually got after building it, so we might
> as well acknowledge that in our mental model.
>
> One potential advantage of this approach is that we might be able to
> talk ourselves into trusting the existing PKG-INFO as providing static
> metadata about the sdist, and thus PyPI at least could start trusting
> it for things like the "description" field, and if we define a new

The challenge is the 40K broken packages up there on PyPI. Basically
pip has a bugfix for any of:
sdists built using distutils
sdists built using random build systems that don't understand what an
sdist is (e.g. automake)
sdists built using versions of setuptools that had a bug in this area

There is no corrective mechanism for broken packages other than
route-around-it-while-you-ask-the-author-to-upload-a-fix.

So I think to tackle the 'please trust the metadata in the sdist'
problem, one needs to have a graceful ramp-up of that trust with
robust backoff mechanisms that don't involve 50% of PyPI users hating
on that one old project in the corner everyone has a dep on but that
is actually moribund and not doing uploads. I can imagine several such
routes, including a crowdsourced blacklist - but its going to be (like
we're dealing with with the automatic wheel cache already) years of
bug reports until things age out.

> sdist format then it would be possible to generate its static metadata
> from current setup.py files (e.g. by modifying setuptools's sdist
> command). Contrast this with the other approach, where getting any
> kind of static source-of-truth would require rewriting almost all
> existing setup.py files.

We already generate static metadata from current setup.py files:
setup.py egg_info does precisely that. There, bug fixed ;).

> The challenge, of course, is that there are a few places where pip
> actually does need to know something about wheels based on examining
> an sdist -- in particular name and version and (controversially)
> dependencies. But this can/should be addressed explicitly, e.g. by
> writing down a special rule about the name and version fields.

I'm sorry, I don't follow.

-Rob


-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud


More information about the Distutils-SIG mailing list