[Distutils] Towards a simple and standard sdist format that isn't intertwined with distutils

Nathaniel Smith njs at pobox.com
Sat Oct 3 04:27:33 CEST 2015

Hi Donald,

Thanks for taking the time to make such detailed comments! Thoughts below.

On Fri, Oct 2, 2015 at 4:04 PM, Donald Stufft <donald at stufft.io> wrote:
> On October 2, 2015 at 12:54:03 AM, Nathaniel Smith (njs at pobox.com) wrote:
>> Distutils delenda est.
> I think that you should drop (from this PEP) the handling of a VCS/arbitrary
> directories and focus solely on creating a format for source distributions. A
> source distribution can (and should) be fairly strict and well defined exactly
> where all of the files go, what files exist and don't exist, and things of that
> nature (more on this later).

Hmm. Okay, I think this really helps clarify our key point of difference!

For me, an important requirement is that there continue to be a single
standard command that end-users can use to install a VCS checkout.
This is a really important usability property -- everyone knows
"setup.py install". Unfortunately we can't keep "setup.py install"
given its entanglement with distutils and the desire to split building
and installation, so the obvious answer is that this should become
'pip install <directory>', and from that everything else follows.

Having a standard way to install from a VCS checkout is also useful
for things like requirements files... and in fact it's required by our
current standards. PEP 440 has this as an example of a valid
dependency specification:
  pip @ git+https://github.com/pypa/pip.git@7921be1537eac1e97bc40179a57f0349c2aee67d

So I'm extremely reluctant to give up on standardizing how to handle
VCS checkouts. And if we're going to have a standard for that, then
would sure be nice if we could share the work between this standard
and the one for sdists, given how similar they are.

> I don't believe that Python should develop anything like the Debian ability to
> have a single source "package" create multiple binary packages. The metadata of
> the Wheel *must* strictly match the metadata of the sdist (except for things
> that are Wheel specific). This includes things like name, version, etc. Trying
> to go down this path I think will make things a lot more complicated since we
> have a segmented archive where people have to claim particular names, otherwise
> how do you prevent me from registering the name "foobar" on PyPI and saying it
> produces the "Django" wheel?

What prevents it in the current draft is that there's no way for
foobar to say any such thing :-). If you ask for Django, then the only
sdist it will look at is the one in the Django segment. This is an
intentionally limited solution, based on the intuition that multiple
wheels from a single sdist will tend to be a relatively rare case,
when they do occur then there will generally be one "main" wheel that
people will want to depend on, and that people should be uploading
wheels anyway rather than relying on sdists.

(Part of the intuition for the last part is that we also have a
not-terribly-secret-conspiracy here for writing a PEP to get Linux
wheels onto PyPI and at least achieve feature parity with Windows / OS
X. Obviously there will always be weird platforms -- iOS and FreeBSD
and Linux-without-glibc and ... -- but this should dramatically reduce
the frequency with which people need sdist dependencies.)

If that proves inadequate, then the obvious extension would be to add
some metadata to the sdist similar to Debian's, where an sdist has a
list of all the wheels that it (might) produce when built, PyPI would
grow an API by which pip-or-whoever could query for all sdists that
claim to be able to produce wheel X, and at the same time PyPI would
start enforcing the rule that if you want to upload an sdist that
claims to produce wheel X then you have to own the name X. (After all,
you need to own that name anyway so you can upload the wheels.) Or
alternatively people could just split up their packages, like would be
required by your proposal anyway :-).

So I sorta doubt it will be a problem in practice, but even if becomes
one then it won't be hard to fix.

(And to be clear, the multiple-wheels-from-one-sdist thing is not a
primary goal of this proposal -- the main reason we put it in is that
once you've given up on having static wheel metadata inside the sdist
then supporting multiple-wheels-from-one-sdist is trivial, so you
might as well do it, esp. since it's a case that does seem to come up
with some regularity in real life and you don't want to make people
fight with their tools when it's unnecessary.)

> Since I think this should only deal with source distributions, then the primary
> thing we need is an operation that will take an unpacked source distribution
> that is currently sitting on the filesystem and turn it into a wheel located
> in a specific location.
> The layout for a source distribution should be specified, I think something
> like:
>     .
>     ├── meta
>     │   ├── DESCRIPTION.rst
>     │   ├── FORMAT-VERSION
>     │   ├── LICENSE.txt
>     │   └── METADATA.json
>     └── src
>         ├── my-cool-build-tool.cfg
>         └── mypackage
>             └── __init__.py
> I don't particularly care about the exact names, but this layout gives us two
> top level directories (and only two), one is a place where all of the source
> distribution metadata goes, and one is a src directory where all of the files
> for the project should go, including any relevant configuration for the build
> tool in use by the project. Having two directories like this eliminates the
> need to worry about naming collisions between the metadata files and the
> project itself.
> We should probably give this a new name instead of "sdist" and give it a
> dedicated extension. Perhaps we should call them "source wheels" and have the
> extension be something like .swhl or .src.whl. This means we don't need to
> worry about making the same artifact compatible with both the legacy toolchain
> and a toolchain that supports "source wheels".
> We should also probably specify a particular container format to be used for
> a .whl/.src.whl. It probably makes sense to simply use zip since that is what
> wheels use and it supports different compression algorithms internally. We
> probably want to at least suggest limiting compression algorithms used to
> Deflate and None, if not mandate that one of those two are used.
> We should include absolutely as much metadata as part of the static metadata
> inside the sdist as we can. I don't think there is any case to be made for
> things like name, version, summary, description, classifiers, license,
> keywords, contact information (author/maintainers), project URLs, etc are
> Wheel specific. I think there are other things which are arguably able to be
> specified in the sdist, but I'd need to fiddle with it to be sure. Basically
> any metadata that isn't included as static information will not be able to be
> displayed on PyPI.

I feel like this idea of "source wheels" makes some sense if we want
something that looks like a wheel, but without the ABI compatibility
issues of wheels. I'm uncertain how well it can be made to work in
practice, or how urgent it is once we have a 95% solution in place for
linux wheels, but it's certainly an interesting idea. To me it feels
rather different from a traditional sdist, and obviously there's still
the problem of having a standard way to build from a VCS checkout.

It might even make sense to have standard methods to go:
   VCS checkout -> sdist -> (wheels and/or source wheels)

> The metada should directly include the specifiers inside of it and shouldn't
> propagate the meme that pip's requirements.txt format is anything but a way
> to recreate a specific environment with pip.

Yeah, there's a big question mark next to the requirements.txt stuff
in the draft PEP, because something more standard and structured would
certainly be nice. But requirements.txt is wildly popular, and for a
good reason -- it provides a simple terse syntax that does what people
want. (By comparison, the PEP 426 JSON syntax for requirements with
extras and environment specifiers is extremely cumbersome yet less
featureful.) And semantically, what we want here is a way to say "to
build *this* I need an environment that looks like *this*", which is
pretty close to what requirements.txt is actually designed for. So I
dunno -- instead of fighting the meme maybe we should embrace it :-).

But obviously this is a tangent to the main questions.

> I don't think there's ever going to be a world where pip depends on virtualenv
> or pyvenv.

Huh, really? Can you elaborate on why not? The standard doesn't have
to require the use of clean build environments (I was thinking of the
text in the standard as applying with the "as if rule" -- a valid
sdist is one that can be built the way described, if you have some way
that will work to build such sdists then your way is valid too). But
using clean environments by default is really the only way that we're
going to get a world where most packages have accurate build


Nathaniel J. Smith -- http://vorpus.org

More information about the Distutils-SIG mailing list