[Distutils] Towards a simple and standard sdist format that isn't intertwined with distutils

Donald Stufft donald at stufft.io
Fri Oct 2 13:58:50 CEST 2015

On October 2, 2015 at 12:54:03 AM, Nathaniel Smith (njs at pobox.com) wrote:
> > We realized that actually as far as we could tell, it wouldn't  
> be that
> hard at this point to clean up how sdists work so that it would be  
> possible to migrate away from distutils. So we wrote up a little  
> draft
> proposal.
> The main question is, does this approach seem sound?

I've just read over your proposal, but I've also just woken up so I might be
a little slow still! After reading what you have, I don't think that this
proposal is the right way to go about improving sdists.

The first thing that immediately stood out to me, is that it's recommending
that downstream redistributors like Debian, Fedora, etc utilize Wheels instead
of the sdist to build their packages from. However, that is not really going to
fly with most (all?) of the downstream redistributors. Debian for instance has
policy that requires the use of building all of it's packages from Source, not
from anything else and Wheels are not a source package. While it can
theoretically work for pure python packages, it quickly devolves into a mess
when you factor in packages that have any C code what so ever.

Overall, this feels more like a sidegrade than an upgrade. One major theme
throughout of the PEP is that we're going to push to rely heavily on wheels as
the primary format of installation. While that works well for things like
Debian, I don't think it's going to work as wheel for us. If we were only
distributing pure python packages, then yes absolutely, however given that we
are not, we have to worry about ABI issues. Given that there is so many
different environments that a particular package might be installed into, all
with different ABIs we have to assume that installing from source is still
going to be a primary path for end users to install and that we are never going
to have a world where we can assume a Wheel in a repository.

One of the problems with the current system, is that we have no mechanism by
which to determine dependencies of a source distribution without downloading
the file and executing some potentially untrusted code. This makes dependency
resolution harder and much much slower than if we could read that information
statically from a source distribution. This PEP doesn't offer anything in the
way of solving this problem.

To a similar tune, this PEP also doesn't make it possible to really get at
any other metadata without executing software. This makes it pratically
impossible to safely inspect an unknown or untrusted package to determine what
it is and to get information about it. Right now PyPI relies on the uploading
tool to send that information alongside of the file it is uploading, but
honestly what it should be doing is extracting that information from within the
file. This is sort of possible right now since distutils and setuptools both
create a static metadata file within the source distribution, but we don't rely
on that within PyPI because that information may or may not be accurate and may
or may not exist. However the twine uploading tool *does* rely on that, and
this PEP would break the ability for twine to upload a package without
executing arbitrary code.

Overall, I don't think that this really solves most of the foundational
problems with the current format. Largely it feels that what it achieves is
shuffling around some logic (you need to create a hook that you reference from
within a .cfg file instead of creating a setuptools extension or so) but
without fixing most of the problems. The largest benefit I see to switching to
this right now is that it would enable us to have build time dependencies that
were controlled by pip rather than installed implicitly via the execution of
the setup.py. That doesn't feel like a big enough benefit to me to do a mass
shakeup of what we recommend and tell people to do. Having people adjust and
change and do something new requires effort, and we need something to justify
that effort to other people and I don't think that this PEP has something we
can really use to justify that effort.

I *do* think that there is a core of some ideas here that are valuable, and in
fact are similar to some ideas I've had. The main flaw I see here is that it
doesn't really fix sdists, it takes a solution that would work for VCS
checkouts and then reuses it for sdists. In my mind, the supported flow for
package installation would be:

    VCS/Bare Directory -> Source Distribution -> Wheel

This would (eventually) be the only path that was supported for installation
but you could "enter" the path at any stage. For example, if there is a Wheel
already available, then you jump right on at the end and just install that, if
there is a sdist available then pip first builds it into a wheel and then
installs that, etc.

I think your PEP is something like what the VCS/Bare Directory to sdist tooling
could look like, but I don't think it's what the sdist to wheel path should
look like. 

Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

More information about the Distutils-SIG mailing list