[Distutils] Second draft of a plan for a new source tree / sdist format

Nathaniel Smith njs at pobox.com
Mon Oct 26 02:04:18 EDT 2015

Hi all,

Here's a second round of text towards making a build-system
independent interface between pip and source trees/sdists. My idea
this time is to take a divide-and-conquer approach: this text tries to
summarize all the stuff that it seemed like we had mostly reached
consensus on in the previous thread + call, with blank chunks marked
"TBD" where there are specific points that still need To Be
Determined. So my hope is that everyone will read what's here and
agree that it's great as far as it goes, and then we can go through
and fill in each missing piece one at a time.


PEP: ??
Title: A build-system independent format for source trees and
       source distributions
Version: $Revision$
Last-Modified: $Date$
Author: Nathaniel J. Smith <njs at pobox.com>
Status: Draft
Type: Standards-Track
Content-Type: text/x-rst
Created: 30-Sep-2015
Post-History: 1 Oct 2015, 25 Oct 2015
Discussions-To: <distutils-sig at python.org>


Distutils delenda est.

Extended abstract

While ``distutils`` / ``setuptools`` have taken us a long way, they
suffer from three serious problems: (a) they're missing important
features like autoconfiguration and usable build-time dependency
declaration, (b) extending them is quirky, complicated, and fragile,
(c) it's very difficult to use anything else, because they provide the
standard interface for installing python packages expected by both
users and installation tools like ``pip``.

Previous efforts (e.g. distutils2 or setuptools itself) have attempted
to solve problems (a) and/or (b). We propose to solve (c).

The goal of this PEP is get distutils-sig out of the business of being
a gatekeeper for Python build systems. If you want to use distutils,
great; if you want to use something else, then that should be easy to
do using standardized methods. The difficulty of interfacing with
distutils means that there aren't many such systems right now, but to
give a sense of what we're thinking about see `flit
<https://github.com/takluyver/flit>`_ or `bento
<https://cournape.github.io/Bento/>`_. Fortunately, wheels have now
solved many of the hard problems here -- e.g. it's no longer necessary
that a build system also know about every possible installation
configuration -- so pretty much all we really need from a build system
is that it have some way to spit out standard-compliant wheels.

We therefore propose a new, relatively minimal interface for
installation tools like ``pip`` to interact with package source trees
and source distributions.

In addition, we propose a wheel-inspired static metadata format for
sdists, suitable for tools like PyPI and pip's resolver.

Terminology and goals

A *source tree* is something like a VCS checkout. We need a standard
interface for installing from this format, to support usages like
``pip install some-directory/``.

A *source distribution* is a static snapshot representing a particular
release of some source code, like ``lxml-3.4.4.zip``. Source
distributions serve many purposes: they form an archival record of
releases, they provide a stupid-simple de facto standard for tools
that want to ingest and process large corpora of code, possibly
written in many languages (e.g. code search), they act as the input to
downstream packaging systems like Debian/Fedora/Conda/..., and so
forth. In the Python ecosystem they additionally have a particularly
important role to play, because packaging tools like ``pip`` are able
to use source distributions to fulfill binary dependencies, e.g. if
there is a distribution ``foo.whl`` which declares a dependency on
``bar``, then we need to support the case where ``pip install bar`` or
``pip install foo`` automatically locates the sdist for ``bar``,
downloads it, builds it, and installs the resulting package.

Source distributions are also known as "sdists" for short.

Source trees

We retroactively declare the legacy source tree format involving
``setup.py`` to be "version 0". We don't try to specify it further;
its de facto specification is encoded in the source code and
documentation of ``distutils``, ``setuptools``, ``pip``, and other

A "version 1" (or greater) source tree is any directory which contains
a file named ``pypackage.cfg``, which will -- in some manner whose
details are TBD -- describe the package's build dependencies and how
to invoke the build system. This mechanism:

- Will allow for both static and dynamic specification of build dependencies

- Will have some degree of isolation of different builds from each
other, so that it will be possible for a single run of pip to install
one package that build-depends on ``foo = 1.1`` and another package
that build-depends on ``foo = 1.2``.

- Will leave the actual installation of the package in the hands of
the build/installation tool (i.e. individual package build systems
will not need to know about things like --user versus --global or make
decisions about when and how to modify .pth files)

[TBD: the exact set of operations to be supported and their detailed semantics]

[TBD: should builds be performed in a fully isolated environment, or
should they get access to packages that are already installed in the
target install environment? The former simplifies a number of things,
but Robert was skeptical it would be possible.]

[TBD: the form of the communication channel between an installation
tool like ``pip`` and the build system, over which these operations
are requested]

[TBD: the syntactic details of the configuration file format itself.
We can change the name too if we want, I just think it's useful to
have a single name to refer to it for now, and this is the last and
least interesting thing to figure out.]

Source distributions

[possibly this should get split off into a separate PEP, but I'll keep
it together for now for ease of discussion]

A "version 1" (or greater) source distribution is a file meeting the
following criteria:

- It MUST have a name of the form: {PACKAGE}-{VERSION}.{EXT}, where
{PACKAGE} is the package name, {VERSION} is a PEP 440-compliant
version number, and {EXT} is a compliant archive format.

  The set of compliant archive formats is: zip, [TBD]

  [QUESTION: should we continue to allow .tar.gz and friends? In
practice by "allow" I mean something like "accept new-style sdists on
PyPI in this format". I'm inclined not to -- zip is the most
universally supported format around, it allows file-based random
access (unlike tar-based things) which is useful for pulling out
metadata without decompressing the whole thing, and standardizing on
one format dodges distracting and pointless discussions about which
format to use, i.e. it's TOOWTDI-compliant. Of course pip is free to
continue to support other archive formats when passed explicitly on
the command line. Any objections?]

  Similar to wheels, the archive is Unicode, and the filenames inside
the archive are encoded in UTF-8.

- When unpacked, it MUST contain a single directory directory tree
named ``{PACKAGE}-{VERSION}``.

- This directory tree MUST be a valid version 1 (or greater) source
tree as defined above.

- It MUST additionally contain a directory named
``{PACKAGE}-{VERSION}.sdist-info`` (notice the ``s``), with the
following contents:

  - ``SDIST``: Mandatory. Same record-oriented format as a wheel's
``WHEEL`` file, but with different fields::

      SDist-Version: 1.0
      Generator: setuptools sdist 20.1

    ``SDist-Version`` is the version number of this specification.
Software that processes sdists should warn if ``SDist-Version`` is
greater than the version it supports, and must fail if
``SDist-Version`` has a greater major version than the version it

    ``Generator`` is the name and optionally the version of the
software that produced the archive.

  - ``RECORD``: Mandatory. A list of all files contained in the sdist
(except for the RECORD file itself and any signature files) together
with their hashes, as specified in PEP 427.

  - ``RECORD.jws``, ``RECORD.p7s``: Optional. Signature files as
specified in PEP 427.

  - ``METADATA``: Mandatory. Metadata version 1.1 or greater format
metadata, with an additional rule that fields may contain the special
sentinel value ``__SDIST_DYNAMIC__``, which indicates that the value
of this field cannot be determined until build time. If a "multiple
use field" is present with the value ``__SDIST_DYNAMIC__``, then this
field MUST occur exactly once, e.g.::

       # Okay:
       Requires-Dist: lxml (> 3.3)
       Requires-Dist: requests

       # no Requires-Dist lines at all is okay
       # (meaning: this package's requirements are the empty set)

       # Okay, requirements will be determined at build time:
       Requires-Dist: __SDIST_DYNAMIC__

       # NOT okay:
       Requires-Dist: lxml (> 3.3)
       Requires-Dist: __SDIST_DYNAMIC__

    (The use of a special token allows us to distinguish between
multiple use fields whose value is statically the empty list versus
one whose value is dynamic; it also allows us to distinguish between
optional fields which are statically not present versus ones whose
value is dynamic.)

    When this sdist is built, the resulting wheel MUST have metadata
which is identical to the metadata present in this file, except that
any fields with value ``__SDIST_DYNAMIC__`` in the sdist may have
arbitrary values in the wheel.

    A valid sdist MUST NOT use the ``__SDIST_DYNAMIC__`` mechanism for
the package name or version (i.e., these must be given statically),
and these MUST match the {PACKAGE} and {VERSION} of the sdist as
described above.

    [TBD: do we want to forbid the use of dynamic metadata for any
other fields? I assume PyPI will enforce some stricter rules at least,
but I don't know if we want to make that part of the spec, or just
part of PyPI's administrative rules.]

This is intentionally a close analogue of a wheel's ``.dist-info``
directory; intention is that as future metadata standards are defined,
the specifications for the ``.sdist-info`` and ``.dist-info``
directories will evolve in synchrony.

Evolutionary notes

A goal here is to make it as simple as possible to convert old-style
sdists to new-style sdists. (E.g., this is one motivation for
supporting dynamic build requirements.) The ideal would be that there
would be a single static pypackage.cfg that could be dropped into any
"version 0" VCS checkout to convert it to the new shiny. This is
probably not 100% possible, but we can get close, and it's important
to keep track of how close we are... hence this section.

A rough plan would be: Create a build system package
(``setuptools_pypackage`` or whatever) that knows how to speak
whatever hook language we come up with, and convert them into
setuptools calls. This will probably require some sort of hooking or
monkeypatching to setuptools to provide a way to extract the
``setup_requires=`` argument when needed, and to provide a new version
of the sdist command that generates the new-style format. This all
seems doable and sufficient for a large proportion of packages (though
obviously we'll want to prototype such a system before we finalize
anything here). (Alternatively, these changes could be made to
setuptools itself rather than going into a separate package.)

But there remain two obstacles that mean we probably won't be able to
automatically upgrade packages to the new format:

1) There currently exist packages which insist on particular packages
being available in their environment before setup.py is executed. This
means that if we decide to execute build scripts in an isolated
virtualenv-like environment, then projects will need to check whether
they do this, and if so then when upgrading to the new system they
will have to start explicitly declaring these dependencies (either via
``setup_requires=`` or via static declaration in ``pypackage.cfg``).

2) There currently exist packages which do not declare consistent
metadata (e.g. ``egg_info`` and ``bdist_wheel`` might get different
``install_requires=``). When upgrading to the new system, projects
will have to evaluate whether this applies to them, and if so they
will need to either stop doing that, or else add ``__SDIST_DYNAMIC__``
annotations at appropriate places.

   We'll also presumably need some API for packages to describe which
parts of the METADATA file should be marked ``__SDIST_DYNAMIC__``, for
the packages that need it (a new argument to ``setup()`` or some
setting in ``setup.cfg`` or something).

Nathaniel J. Smith -- http://vorpus.org

More information about the Distutils-SIG mailing list