[Distutils] Sources of truth

Nathaniel Smith njs at pobox.com
Mon Oct 12 09:23:46 CEST 2015

On Sun, Oct 11, 2015 at 11:00 PM, Robert Collins
<robertc at robertcollins.net> wrote:
> On 12 October 2015 at 18:36, Nathaniel Smith <njs at pobox.com> wrote:
>> the sdist name instead of the wheel name, it can actually do it
> but the sdist and the wheel have to have the same name- or do you mean
> the filename on disk, vs the distribution name?

I mean the distribution name - there's no way to guarantee that
building foo-1.0.zip won't spit out bar-7.4.whl, where by "no way" I
mean "it's literally undecideable". I mean, if someone actually did
this it would be super weird and we would all shun them, but our code
and specs still need to be prepared for the possibility. IIUC this is
why PyPI can't trust PKG-INFO: 99.9% of the time the metadata in
PKG-INFO matches what you will get when you run setup.py, but right
now PyPI wants to know what setup.py will do, and there's no way to
know if it will be the same as what PKG-INFO says, so it just doesn't
trust PKG-INFO.

OTOH if we redefine PyPI's goal as being, figure out what's in
PKG-INFO (or whatever replaces it), and declare that it's okay (for
PyPI's purposes) if that doesn't match what the build system will
eventually do, then that's a viable way forward.

>> reliably in a totally static way, without having to run arbitrary code
>> to validate this. OTOH pip will always have to be prepared to handle
>> the possibility of mismatch between what it was expecting based on the
>> sdist metadata and what it actually got after building it, so we might
>> as well acknowledge that in our mental model.
>> One potential advantage of this approach is that we might be able to
>> talk ourselves into trusting the existing PKG-INFO as providing static
>> metadata about the sdist, and thus PyPI at least could start trusting
>> it for things like the "description" field, and if we define a new
> The challenge is the 40K broken packages up there on PyPI. Basically
> pip has a bugfix for any of:
> sdists built using distutils
> sdists built using random build systems that don't understand what an
> sdist is (e.g. automake)
> sdists built using versions of setuptools that had a bug in this area
> There is no corrective mechanism for broken packages other than
> route-around-it-while-you-ask-the-author-to-upload-a-fix.

IIUC what PyPI wants to do with PKG-INFO is read out stuff like the
description and trove classifiers fields. Are there really 40K sdists
on PyPI that have PKG-INFO files and where those files contain
incorrect descriptions and so forth? I mean, obviously someone would
have to check :-) But it seems unlikely, since almost everyone uploads
by running 'sdist upload' or twine or something similarly automated.

> So I think to tackle the 'please trust the metadata in the sdist'
> problem, one needs to have a graceful ramp-up of that trust with
> robust backoff mechanisms that don't involve 50% of PyPI users hating
> on that one old project in the corner everyone has a dep on but that
> is actually moribund and not doing uploads. I can imagine several such
> routes, including a crowdsourced blacklist - but its going to be (like
> we're dealing with with the automatic wheel cache already) years of
> bug reports until things age out.
>> sdist format then it would be possible to generate its static metadata
>> from current setup.py files (e.g. by modifying setuptools's sdist
>> command). Contrast this with the other approach, where getting any
>> kind of static source-of-truth would require rewriting almost all
>> existing setup.py files.
> We already generate static metadata from current setup.py files:
> setup.py egg_info does precisely that. There, bug fixed ;).

I'm pretty sure that merely making it so 'setup.py sdist' created a
file that contained the output from egg_info would not solve the
current problem. That's pretty much exactly what the existing PKG-INFO
*is*, isn't it? Yet apparently no-one trusts it.

>> The challenge, of course, is that there are a few places where pip
>> actually does need to know something about wheels based on examining
>> an sdist -- in particular name and version and (controversially)
>> dependencies. But this can/should be addressed explicitly, e.g. by
>> writing down a special rule about the name and version fields.
> I'm sorry, I don't follow.

E.g., we can document that if you have a sdist foo-1.0, then pip and
similar tools will expect this to generate a foo-1.0 wheel (but be
prepared to do something sensible if this doesn't happen, like give an
error message or whatever). That's really all pip needs, right?


Nathaniel J. Smith -- http://vorpus.org

More information about the Distutils-SIG mailing list