[Distutils] Towards a simple and standard sdist format that isn't intertwined with distutils

Donald Stufft donald at stufft.io
Fri Oct 2 23:30:29 CEST 2015


On October 2, 2015 at 4:20:00 PM, Nathaniel Smith (njs at pobox.com) wrote:
> On Fri, Oct 2, 2015 at 4:58 AM, Donald Stufft wrote:
> >
> > One of the problems with the current system, is that we have no mechanism by
> > which to determine dependencies of a source distribution without downloading
> > the file and executing some potentially untrusted code. This makes dependency
> > resolution harder and much much slower than if we could read that information
> > statically from a source distribution. This PEP doesn't offer anything in the
> > way of solving this problem.
>  
> What are the "dependencies of a source distribution"? Do you mean the
> runtime dependencies of the wheels that will be built from a source
> distribution?
>  
> If you need that metadata to be statically in the sdist, then you
> might as well give up now because it's simply impossible.

I don’t believe this is impossible.

>  
> As the very simplest example, every package that uses the numpy C API
> gets a runtime dependency on "numpy >= [whatever version happened to
> be installed on the *build* machine]”.

A quick, off the cuff idea here is to allow additional ABI declarations and
stop trying to use the same system for API and ABI. A source distribution can't
have ABI dependencies, only wheels can and an installation has to be valid for
both API and any relevant ABI requirements.

> There are plenty of more
> complex examples too (e.g. ones that involve build/configure-time
> decisions about whether to rely on particular system libraries, or
> build/configure-time decisions about whether particular packages
> should even be built).

I don't think build/configure-time decisions are great ideas as it's near
impossible to actually depend on them. For example, take Pillow, Pillow will
conditionally compile against libraries that enable it to much around with
PNGs. However, if I *need* Pillow with PNG support, I don't have any mechanism
to declare that. If instead, builds were *not* conditional and Pillow instead
split it's PNG capabilities out into it's own package called say, Pillow-PNG
which also did not conditionally compile against anything, but unconditionally
did, then we could add in something like having Pillow declare a "weak"
dependency on Pillow-PNG where we attempt to get it by default if possible, but
we will skip installing it if we can't locate/build it. If you combine this
with Extras, you could then easily make it so that people can depend on
particular conditional features by doing something like ``Pillow[PNG]`` in
their dependency metadata.


>  
> For comparison, here's the Debian source package metadata:
> https://www.debian.org/doc/debian-policy/ch-controlfields.html#s-debiansourcecontrolfiles  
> Note that the only mandatory fields are format version / package name
> / package version / maintainer / checksums. The closest they come to
> making promises about the built packages are the Package-List and
> Binary fields which provide a optional hint about what binary packages
> will be built, and are allowed to contain lies (e.g. they explicitly
> don't guarantee that all the binary packages named will actually be
> produced on every architecture). The only kind of dependencies that a
> source package can declare are build-depends.

Debian doesn't really have "source packages" like we do, but inside of the
debian/ directory is the control file which lists all of the dependency
information (or explicitly lists a placeholder where something can't be
statically declared).

>  
> > To a similar tune, this PEP also doesn't make it possible to really get at
> > any other metadata without executing software. This makes it pratically
> > impossible to safely inspect an unknown or untrusted package to determine what
> > it is and to get information about it. Right now PyPI relies on the uploading
> > tool to send that information alongside of the file it is uploading, but
> > honestly what it should be doing is extracting that information from within the
> > file. This is sort of possible right now since distutils and setuptools both
> > create a static metadata file within the source distribution, but we don't rely
> > on that within PyPI because that information may or may not be accurate and may
> > or may not exist. However the twine uploading tool *does* rely on that, and
> > this PEP would break the ability for twine to upload a package without
> > executing arbitrary code.
>  
> Okay, what metadata do you need? We certainly could put name / version
> kind of stuff in there. We left it out because we weren't sure what
> was necessary and it's easy to add later, but anything that's needed
> by twine fits neatly into the existing text saying that we should
> "include extra metadata in source distributions if it helps solve
> specific problems that are unique to distribution" -- twine uploads
> definitely count.

Everything that isn't specific to a built wheel. Look at the previously
accepted metadata specs as well as PEP 426. If you're not including a field
that was included in one of those, there should be a rationale for why that
field is no longer being included.

>  
> > Overall, I don't think that this really solves most of the foundational
> > problems with the current format. Largely it feels that what it achieves is
> > shuffling around some logic (you need to create a hook that you reference from
> > within a .cfg file instead of creating a setuptools extension or so) but
>  
> numpy.distutils is the biggest distutils/setuptools extension around,
> and everyone involved in maintaining it wants to kill it with fire
> :-). That's a problem…

Well, it's not really what I call setuptools extension because it doesn't use
the extension points in setuptools to do it's work. It expects you to just
``import numpy.distutils`` at the top of your ``setup.py`` and use that, which
means that it breaks things like pip because we don't have a way to know that
we need to install numpy first.

>  
> > without fixing most of the problems. The largest benefit I see to switching to
> > this right now is that it would enable us to have build time dependencies that
> > were controlled by pip rather than installed implicitly via the execution of
> > the setup.py.
>  
> Yes, this problem means that literally every numerical python package
> currently has a broken setup.py.

Because numpy.distutils wasn't written to plug into setuptools. If it had been
they wouldn't be. 

>  
> > That doesn't feel like a big enough benefit to me to do a mass
> > shakeup of what we recommend and tell people to do. Having people adjust and
> > change and do something new requires effort, and we need something to justify
> > that effort to other people and I don't think that this PEP has something we
> > can really use to justify that effort.
>  
> The end-user adjustment is teaching people to switch to always using
> pip to install packages -- this seems like something we will certainly
> do sooner or later, so we might as well get started.
>  
> And it's already actually the right thing to do -- if you use
> 'setup.py install' then you get a timebomb in your venv where later
> upgrades may leave you with a broken package :-(. (This is orthogonal
> to the actual PEP.) In the long run, the idea that every package has
> to contain code that knows how to implement installation in very
> possible configuration (--user? --single-version-externally-managed?)
> is clearly broken, and teaching people to use 'pip install' is
> obviously the only sensible alternative.
>  

Sorry, ambigous end-user in my original statement. I don't mean the
end-end-users (e.g. the people executing ``pip install``), I mean the
packagers.

I'm going to re-read the originall proposal and try to point out more
actionable feedback shortly.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA




More information about the Distutils-SIG mailing list