[Distutils] Towards a simple and standard sdist format that isn't intertwined with distutils

Nathaniel Smith njs at pobox.com
Sat Oct 3 00:47:15 CEST 2015

On Fri, Oct 2, 2015 at 3:26 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 2 October 2015 at 22:45, Nathaniel Smith <njs at pobox.com> wrote:
>>> I'm uncomfortable with the fact that the proposed sdist format has
>>> more or less no metadata of its own (even the filename format is only
>>> a recommendation) so (for example) if someone does "pip install
>>> foo==1.0" I don't see how pip can find a suitable sdist, if no wheel
>>> is available.
>> About the filename thing:
>> The reason that the draft makes the inclusion of package/version info
>> a SHOULD instead of a MUST is that regardless of what the spec says,
>> all decent installation tools are going to support doing things like
>>     curl https://github.com/numpy/numpy/archive/master.zip -O numpy-master.zip
>>     pip install numpy-master.zip
>> So we can either handle that by saying that "numpy-master.zip" is an
>> sdist, just not one that we would allow on PyPI (which is what the
>> current draft does), or we could handle it by saying that
>> numpy-master.zip is almost-but-not-quite an sdist, and handling it is
>> a commonly supported extension to the standard. Doesn't really matter
>> that much either way -- just a matter of terminology. Either way the
>> sdists are PyPI are obviously going to be named
>> <package>-<version>.<ext>.
> OK, that's a good point, and I never felt it was crucial that the
> name/version be encoded in the filename. But having them in some form
> of static metadata should be mandatory. Your _pypackage.cfg doesn't
> contain the package name or version, so how would I get them without
> running code? That's my real point.

Well, first, it's just not possible for a devel snapshot like
numpy-master.zip or a VCS checkout to contain static version metadata,
since the actual version of the generated wheels *will* be determined
by running arbitrary code (e.g. 'git rev-parse HEAD'). So we're only
talking about tagged/released source trees.

Then the argument would be, what are you going to do with that
name/version information? If the answer is "decide to install it",
then (a) if you want to support installation from VCS snapshots (and
you do) then your tool already has to support running arbitrary code
to get the version number, and (b) installing the package will also
certainly require running arbitrary code even if you have a nice
official-release sdist, so.

OTOH the twine upload case that Donald mentioned is a good example of
an operation that might actually want some metadata from release
sdists specifically :-). I'm not opposed to adding it if there's a
clear use case, I just don't think we should try to shove every piece
of wheel metadata into the sdist without a clear understanding of how
they make sense and solve a problem *for sdists*.

>>> I would rather see an sdist format that can be introspected *without*
>>> running code or a build tool. Installers and packaging tools like pip
>>> need to be able to do that - one of the biggest issues with pip's
>>> current sdist handling is that it can't make any meaningful decisions
>>> before building at least the egg-info.
>> Another way to look at this is to say that pip's current handling is
>> proof that the build-to-get-metadata strategy is viable :-).
> Not if you look at the bug reports for pip that can be traced back to
> needing to run setup.py egg-info to get metadata, or other variations
> on not having static introspectable metadata in sdists.

That sounds interesting! Can you elaborate? Links?

I know that one unpleasant aspect of the current design is that the
split between egg-info and actual building creates the possibility for
time-of-definition-to-time-of-use bugs, where the final wheel
hopefully matches what egg-info said it would, but in practice there
could be skew. (Of course this is true in any system which represents
this information in more than one place -- e.g. in sdist metadata and
also in wheel metadata -- but right now it's particularly bad in cases
where you can't actually get all the arguments you want to pass to
setup() without running some code, but the code you need to run needs
to be fetched via setup_requires=..., so you have to lie to setup()
during the egg-info operation and hope that everything will work out
in the end.) This is the motivation for the draft PEP to dropping
egg-info as a separate operation. But there are certainly people who
know more about the internal details of what pip needs than I do, and
I'd love to hear more.


Nathaniel J. Smith -- http://vorpus.org

More information about the Distutils-SIG mailing list