[Distutils] Towards a simple and standard sdist format that isn't intertwined with distutils

Nathaniel Smith njs at pobox.com
Mon Oct 5 08:29:31 CEST 2015

On Sat, Oct 3, 2015 at 10:50 AM, Donald Stufft <donald at stufft.io> wrote:
> I feel like you have some sort of "a sdist is jsut a tarball of a VCS"
> mentality and I don't think that idea of a sdist is generally useful.

Hmm, so, between thinking this over more on the plane today and reading
your and Paul's great replies, I think I see where a lot of this
disjunction might be arising. I'll try to reply to those in more detail
later, but first let me try to lay this out and see if it makes things

First, let's drop the word "sdist", it's confusing.

I'm starting from the long history and conventions around how people make
what I'll call "source releases" (and in a few paragraphs will contrast
with "source wheels"). 'Everyone knows' that when you release a new version
of some package (in pretty much any language), then one key step is to put
together a file called <package>-<version>.<zip or .tar.gz>. And 'everyone
knows' that if you see a file that follows this naming convention, and you
download it, then what you'll find inside is: a single directory called
<package>-<version>/, and inside this directory will be something that's
almost like a VCS checkout -- it'll probably contain a README, source files
in convenient places to be edited or grepped, etc. The differences from a
VCS checkout (if any) will be little convenience stuff -- like ./autogen.sh
will have been run already, or there will be an extra file containing a
fixed version number instead of it being autodetected, or -DNDEBUG will be
in the default CFLAGS, or Cython files will have been pre-translated to C
-- but fundamentally it will be similar to a VCS checkout, and switching
back and forth between them won't be too jarring. 95% of the time there
will be a standard way to build the thing ('./configure && make && make
install', or 'python setup.py install', or similar). And these kind of
source releases have a rich ecosystem around them and serve a wide range of
uses: they provide a low-tech archival record (while VCS's come and go),
they end up in deb and rpm "original source" bundles, they get downloaded
by users and built by hand (maybe with weird configury on top, like a hack
to enable cross-compilation) or poked around in by hand, etc. etc. When
sdists were originally designed, then "source releases" is what the
designers were thinking about.

Then, easy_install came along, and pulled off a clever hack where when you
asked for some package by name, then it would try to automagically go out
and track down any relevant source releases and build them all. And it
works great, except when it doesn't. And creates massive headaches for
everyone trying to work on python packaging afterwards, because source
releases were not designed to be used this way.

My hypothesis is that the requirements that were confusing me are based
around the idea that an sdist should be something designed to slot into
this particular use case: i.e., something that pip can automatically grab
and work with while solving a dependency resolution problem. Therefore it
really needs to have a static name, and static version number, and static
dependencies, and must produce exactly one binary wheel that shares all
that metadata. Let's call this a "source wheel" -- what we're really
looking for here is a way to ship extension modules inside something that
acts like an architecture-neutral "none-any" wheel.

So: the email that started this thread was a proposal for how to
standardize the format of "source releases", and Donald's counter was a
proposal for how to standardize the format of "source wheels". Does that
sound right?

If so, then some follow-up thoughts:

1) If we design a source wheel format, then I am 100% in favor of the
suggestion of giving it a unique extension like "swhl". I'm still a young
whippersnapper compared to some, but I've been downloading files named
<package>-<version>.<zip or tar.gz> for 20 years, and AFAICR every one of
those files unpacked to make a single directory that was laid out like a
VCS checkout. Obviously we can change what goes inside, but we should
change the naming convention at the same time because otherwise we're just
going to confuse people.

2) I think there's a strong case to be made that Python actually needs
standards for *both* source releases and source wheels. There's certainly
no logical contradiction -- they're conceptually different things. It
sounds like we all agree that "pip" should continue to have a way to build
and install an arbitrary VCS checkout, and extending that standard to cover
building and installing a classic "source release" would be... almost
difficult *not* to do.

And I think that there will continue to be a clear need for source releases
even in a world where source wheels exist, because of all those places
where source releases get used that aren't
automagic-easy_install/pip-builds. For example, most pure Python packages
(the ones that already make "none-any" wheels) have no need at all for
source wheels, but they still need source releases to serve as archival
snapshots. And more complex packages that need build-time configuration
(e.g. numpy) will continue to require source releases that can be
configured to build wheels that have a variety of different properties
(e.g., different dependency metadata), so they can't get by with source
wheels alone -- but you can imagine that such projects might reasonably *in
addition* provide a source wheel that locks down the same default
configuration that gets used for their uploaded binary wheel builds, and is
designed for pip to use when trying to resolve dependencies on platforms
where a regular binary wheel is unavailable.

Pictorially, this world would look like:

VCS checkout -> source release
     \              \
      --------------------+--> in-place install
                          +--> wheels -> install
                          +--> source wheels -> wheels -> install

3) It sounds like we all agree that
  - 'pip install <VCS checkout>' should work
  - that there is some crucial metadata that VCS checkouts won't be able to
provide without running arbitrary code (e.g. dependencies and version
  - that what metadata they do provide (e.g., which arbitrary code to run)
should be specified in a human-friendly configuration file

Given this, in the big picture it sounds like the only really essentially
controversial things about the original proposal might be:
  - that 'pip install <tarball of VCS checkout>' should work the same as
'pip install <VCS checkout>' (does anyone actually disagree?)
  - the 1 paragraph describing the "transitional plan" allowing pip to
automatically install from these source releases *as part of a dependency
resolution plan* (as opposed to when a VCS-checkout-or-equivalent is
explicitly given as an install target). Which honestly I don't like either,
it's just there as a backcompat measure so that these source releases don't
create a regression versus existing sdists -- note that one of the goals of
the design was that current sdists could be upgraded into this format by
dropping in a single static file (or by an install tool "virtually"
dropping in this file when it encounters an old-style sdist -- so you don't
need to keep around code to handle both cases separately).

Does that sound right?

(Other features of the original proposal include stuff like the lack of
trivial metadata like "name" and "description", and the support for
generating multiple wheels from one directory. I am explicitly calling
these "inessential".)


Nathaniel J. Smith -- http://vorpus.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20151004/974baa7b/attachment.html>

More information about the Distutils-SIG mailing list