[Distutils] Towards a simple and standard sdist format that isn't intertwined with distutils

Nathaniel Smith njs at pobox.com
Fri Oct 2 06:53:39 CEST 2015


Hi all,

We realized that actually as far as we could tell, it wouldn't be that
hard at this point to clean up how sdists work so that it would be
possible to migrate away from distutils. So we wrote up a little draft
proposal.

The main question is, does this approach seem sound?

-n

---

PEP: ??
Title: Standard interface for interacting with source trees
       and source distributions
Version: $Revision$
Last-Modified: $Date$
Author: Nathaniel J. Smith <njs at pobox.com>
        Thomas Kluyver <takowl at gmail.com>
Status: Draft
Type: Standards-Track
Content-Type: text/x-rst
Created: 30-Sep-2015
Post-History:
Discussions-To: <distutils-sig at python.org>

Abstract
========

Distutils delenda est.


Extended abstract
=================

While ``distutils`` / ``setuptools`` have taken us a long way, they
suffer from three serious problems: (a) they're missing important
features like autoconfiguration and usable build-time dependency
declaration, (b) extending them is quirky, complicated, and fragile,
(c) you are forced to use them anyway, because they provide the
standard interface for installing python packages expected by both
users and installation tools like ``pip``.

Previous efforts (e.g. distutils2 or setuptools itself) have attempted
to solve problems (a) and/or (b). We propose to solve (c).

The goal of this PEP is get distutils-sig out of the business of being
a gatekeeper for Python build systems. If you want to use distutils,
great; if you want to use something else, then the more the merrier.
The difficulty of interfacing with distutils means that there aren't
many such systems right now, but to give a sense of what we're
thinking about see `flit <https://github.com/takluyver/flit>`_ or
`bento
<https://cournape.github.io/Bento/>`_. Fortunately, wheels have now
solved many of the hard problems here -- e.g. it's no longer necessary
that a build system also know about every possible installation
configuration -- so pretty much all we really need from a build system
is that it have some way to spit out standard-compliant wheels.

We therefore propose a new, relatively minimal interface for
installation tools like ``pip`` to interact with package source trees
and source distributions.


Synopsis and rationale
======================

To limit the scope of our design, we adopt several principles.

First, we distinguish between a *source tree* (e.g., a VCS checkout)
and a *source distribution* (e.g., an official snapshot release like
``lxml-3.4.4.zip``).

There isn't a whole lot that *source trees* can be assumed to have in
common. About all you know is that they can -- via some more or less
Rube-Goldbergian process -- produce one or more binary distributions.
In particular, you *cannot* tell via simple static inspection:
- What version number will be attached to the resulting packages (e.g.
it might be determined programmatically by consulting VCS metadata --
I have here a build of numpy version "1.11.0.dev0+4a9ad17")
- What build- or run-time dependencies are required (e.g. these may
depend on arbitrarily complex configuration settings that are
determined via a mix of manual settings and auto-probing)
- Or even how many distinct binary distributions will be produced
(e.g. a source distribution may always produce wheel A, but only
produce wheel B when built on Unix-like systems).

Therefore, when dealing with source trees, our goal is just to provide
a standard UX for the core operations that are commonly performed on
other people's packages; anything fancier and more developer-centric
we leave at the discretion of individual package developers. So our
source trees just provide some simple hooks to let a tool like
``pip``:

- query for build dependencies
- run a build, producing wheels as output
- set up the current source tree so that it can be placed on
``sys.path`` in "develop mode"

and that's it. We teach users that the standard way to install a
package from a VCS checkout is now ``pip install .`` instead of
``python setup.py install``. (This is already a good idea anyway --
e.g., pip can do reliable uninstall / upgrades.)

Next, we note that pretty much all the operations that you might want
to perform on a *source distribution* are also operations that you
might want to perform on a source tree, and via the same UX. The only
thing you do with source distributions that you don't do with source
trees is, well, distribute them. There's all kind of metadata you
could imagine including in a source distribution, but each piece of
metadata puts an increased burden on source distribution generation
tools, and most operations will still have to work without this
metadata. So we only include extra metadata in source distributions if
it helps solve specific problems that are unique to distribution. If
you want wheel-style metadata, get a wheel and look at it -- they're
great and getting better.

Therefore, our source distributions are basically just source trees +
a mechanism for signing.

Finally: we explicitly do *not* have any concept of "depending on a
source distribution". As in other systems like Debian, dependencies
are always phrased in terms of binary distributions (wheels), and when
a user runs something like ``pip install <package>``, then the
long-run plan is that <package> and all its transitive dependencies
should be available as wheels in a package index. But this is not yet
realistic, so as a transitional / backwards-compatibility measure, we
provide a simple mechanism for ``pip install <package>`` to handle
cases where <package> is provided only as a source distribution.


Source trees
============

We retroactively declare the legacy source tree format involving
``setup.py`` to be "version 0". We don't try to specify it further;
its de facto specification is encoded in the source code of
``distutils``, ``setuptools``, ``pip``, and other tools.

A version 1-or-greater format source tree can be identified by the
presence of a file ``_pypackage/_pypackage.cfg``.

If both ``_pypackage/_pypackage.cfg`` and ``setup.py`` are present,
then we have a version 1+ source tree, i.e., ``setup.py`` is ignored.
This is necessary because we anticipate that version 1+ source trees
may want to contain a ``setup.py`` file for backwards compatibility,
e.g.::

    #!/usr/bin/env python
    import sys
    print("Don't call setup.py directly!")
    print("Use 'pip install .' instead!")
    print("(You might have to upgrade pip first.)")
    sys.exit(1)

In the current version of the specification, the one file
``_pypackage/_pypackage.cfg`` is where pretty much all the action is
(though see below). The motivation for putting it into a subdirectory
is that:
- the way of all standards is that cruft accumulates over time, so
this way we pre-emptively have a place to put it,
- real-world projects often accumulate build system cruft as well, so
we might as well provide one obvious place to put it too.

Of course this then creates the possibility of collisions between
standard files and user files, and trying to teach arbitrary users not
to scatter files around willy-nilly never works, so we adopt the
convention that names starting with an underscore are reserved for
official use, and non-underscored names are available for
idiosyncratic use by individual projects.

The alternative would be to simply place the main configuration file
at the top-level, create the subdirectory only when specifically
needed (most trees won't need it), and let users worry about finding
their own place for their cruft. Not sure which is the best approach.
Plus we can have a nice bikeshed about the names in general (FIXME).

_pypackage.cfg
--------------

The ``_pypackage.cfg`` file contains various settings. Another good
bike-shed topic is which file format to use for storing these (FIXME),
but for purposes of this draft I'll write examples using `toml
<https://github.com/toml-lang/toml>`_, because you'll instantly be
able to understand the semantics, it has similar expressivity to JSON
while being more human-friendly (e.g., it supports comments and
multi-line strings), it's better-specified than ConfigParser, and it's
much simpler than YAML. Rust's package manager uses toml for similar
purposes.

Here's an example ``_pypackage/_pypackage.cfg``::

    # Version of the "pypackage format" that this file uses.
    # Optional. If not present then 1 is assumed.
    # All version changes indicate incompatible changes; backwards
    # compatible changes are indicated by just having extra stuff in
    # the file.
    version = 1

    [build]
    # An inline requirements file. Optional.
    # (FIXME: I guess this means we need a spec for requirements files?)
    requirements = """
        mybuildtool >= 2.1
        special_windows_tool ; sys_platform == "win32"
    """
    # The path to an out-of-line requirements file. Optional.
    requirements-file = "build-requirements.txt"
    # A hook that will be called to query build requirements. Optional.
    requirements-dynamic = "mybuildtool:get_requirements"

    # A hook that will be called to build wheels. Required.
    build-wheels = "mybuildtool:do_build"

    # A hook that will be called to do an in-place build (see below).
    # Optional.
    build-in-place = "mybuildtool:do_inplace_build"

    # The "x" namespace is reserved for third-party extensions.
    # To use x.foo you should own the name "foo" on pypi.
    [x.mybuildtool]
    spam = ["spam", "spam", "spam"]

All paths are relative to the ``_pypackage/`` directory (so e.g. the
build.requirements-file value above refers to a file named
``_pypackage/build-requirements.txt``).

A *hook* is a Python object that is looked up using the same rules as
traditional setuptools entry_points: a dotted module name, followed by
a colon, followed by a dotted name that is looked up within that
module. *Running a hook* means: first, find or create a python
interpreter which is executing in the current venv, whose working
directory is set to the ``_pypackage/`` directory, and which has the
``_pypackage/`` directory on ``sys.path``. Then, inside this
interpreter, look up the hook object, and call it, with arguments as
specified below.

A build command like ``pip wheel <source tree>`` performs the following steps:

1) Validate the ``_pypackage.cfg`` version number.

2) Create an empty virtualenv / venv, that matches the environment
that the installer is targeting (e.g. if you want wheels for CPython
3.4 on 64-bit windows, then you make a CPython 3.4 64-bit windows
venv).

3) If the build.requirements key is present, then in this venv run the
equivalent of ``pip install -r <a file containing its value>``, using
whatever index settings are currently in effect.

4) If the build.requirements-file key is present, then in this venv
run the equivalent of ``pip install -r <the named file>``, using
whatever index settings are currently in effect.

5) If the build.requirements-dynamic key is present, then in this venv
 run the hook with no arguments, capture its stdout, and pipe it into
``pip install -r -``, using whatever index settings are currently in
effect. If the hook raises an exception, then abort the build with an
error.

   Note: because these steps are performed in sequence, the
build.requirements-dynamic hook is allowed to use packages that are
listed in build.requirements or build.requirements-file.

6) In this venv, run the build.build-wheels hook. This should be a
Python function which takes one argument.

   This argument is an arbitrary dictionary intended to contain
user-specified configuration, specified via some install-tool-specific
mechanism. The intention is that tools like ``pip`` should provide
some way for users to specify key/value settings that will be passed
in here, analogous to the legacy ``--install-option`` and
``--global-option`` arguments.

   To make it easier for packages to transition from version 0 to
version 1 sdists, we suggest that ``pip`` and other tools that have
such existing option-setting interfaces SHOULD map them to entries in
this dictionary when -- e.g.::

       pip --global-option=a --install-option=b --install-option=c

   could produce a dict like::

       {"--global-option": ["a"], "--install-option": ["b", "c"]}

   The hook's return value is a list of pathnames relative to the
scratch directory. Each entry names a wheel file created by this
build.

   Errors are signaled by raising an exception.

When performing an in-place build (e.g. for ``pip install -e .``),
then the same steps are followed, except that instead of the
build.build-wheels hook, we call the build.build-in-place hook, and
instead of returning a list of wheel files, it returns the name of a
directory that should be placed onto ``sys.path`` (usually this will
be the source tree itself, but may not be, e.g. if a build system
wants to enforce a rule where the source is always kept pristine then
it could symlink the .py files into a build directory, place the
extension modules and dist-info there, and return that). This
directory must contain importable versions of the code in the source
tree, along with appropriate .dist-info directories.

(FIXME: in-place builds are useful but intrinsically kinda broken --
e.g. extensions / source / metadata can all easily get out of sync --
so while I think this paragraph provides a reasonable hack that
preserves current functionality, maybe we should defer specifying them
to until after we've thought through the issues more?)

When working with source trees, build tools like ``pip`` are
encouraged to cache and re-use virtualenvs for performance.


Other contents of _pypackage/
-----------------------------

_RECORD, _RECORD.jws, _RECORD.p7s: see below.

_x/<pypi name>/: reserved for use by tools (e.g.
_x/mybuildtool/build/, _x/pip/venv-cache/cp34-none-linux_x86_64/)


Source distributions
====================

A *source distribution* is a file in a well-known archive format such
as zip or tar.gz, which contains a single directory, and this
directory is a source tree (in the sense defined in the previous
section).

The ``_pypackage/`` directory in a source distribution SHOULD also
contain a _RECORD file, as defined in PEP 427, and MAY also contain
_RECORD.jws and/or _RECORD.p7s signature files.

For official releases, source distributions SHOULD be named as
``<package>-<version>.<ext>``, and the directory they contain SHOULD
be named ``<package>-<version>``, and building this source tree SHOULD
produce a wheel named ``<package>-<version>-<compatibility tag>.whl``
(though it may produce other wheels as well).

(FIXME: maybe we should add that if you want your sdist on PyPI then
you MUST include a proper _RECORD file and use the proper naming
convention?)

Integration tools like ``pip`` SHOULD take advantage of this
convention by applying the following heuristic: when seeking a package
<package>, if no appropriate wheel can be found, but an sdist named
<package>-<version>.<ext> is found, then:

1) build the sdist
2) add the resulting wheels to the package search space
3) retry the original operation

This handles a variety of simple and complex cases -- for example, if
we need a package 'foo', and we find foo-1.0.zip which builds foo.whl
and bar.whl, and foo.whl depends on bar.whl, then everything will work
out. There remain other cases that are not handled, e.g. if we start
out searching for bar.whl we will never discover foo-1.0.zip. We take
the perspective that this is nonetheless sufficient for a transitional
heuristic, and anyone who runs into this problem should just upload
wheels already. If this turns out to be inadequate in practice, then
it will be addressed by future extensions.


Examples
========

**Example 1:** While we assume that installation tools will have to
continue supporting version 0 sdists for the indefinite future, it's a
useful check to make sure that our new format can continue to support
packages using distutils / setuptools as their build system. We assume
that a future version ``pip`` will take its existing knowledge of
distutils internals and expose them as the appropriate hooks, and then
existing distutils / setuptools packages can be ported forward by
using the following ``_pypackage/_pypackage.cfg``::

    [build]
    requirements = """
      pip >= whatever
      wheel
    """
    # Applies monkeypatches, then does 'setup.py dist_info' and
    # extracts the setup_requires
    requirements-dynamic = "pip.pypackage_hooks:setup_requirements"
    # Applies monkeypatches, then does 'setup.py wheel'
    build-wheels = "pip.pypackage_hooks:build_wheels"
    # Applies monkeypatches, then does:
    #    setup.py dist_info && setup.py build_ext -i
    build-in-place = "pip.pypackage_hooks:build_in_place"

This is also useful for any other installation tools that may want to
support version 0 sdists without having to implement bug-for-bug
compatibility with pip -- if no ``_pypackage/_pypackage.cfg`` is
present, they can use this as a default.

**Example 2:** For packages using numpy.distutils. This is identical
to the distutils / setuptools example above, except that numpy is
moved into the list of static build requirements. Right now, most
projects using numpy.distutils don't bother trying to declare this
dependency, and instead simply error out if numpy is not already
installed. This is because currently the only way to declare a build
dependency is via the ``setup_requires`` argument to the ``setup``
function, and in this case the ``setup`` function is
``numpy.distutils.setup``, which... obviously doesn't work very well.
Drop this ``_pypackage.cfg`` into an existing project like this and it
will become robustly pip-installable with no further changes::

    [build]
    requirements = """
      numpy
      pip >= whatever
      wheel
    """
    requirements-dynamic = "pip.pypackage_hooks:setup_requirements"
    build-wheels = "pip.pypackage_hooks:build_wheels"
    build-in-place = "pip.pypackage_hooks:build_in_place"

**Example 3:** `flit <https://github.com/takluyver/flit>`_ is a tool
designed to make distributing simple packages simple, but it currently
has no support for sdists, and for convenience includes its own
installation code that's redundant with that in pip. These 4 lines of
boilerplate make any flit-using source tree pip-installable, and lets
flit get out of the package installation business::

    [build]
    requirements = "flit"
    build-wheels = "flit.pypackage_hooks:build_wheels"
    build-in-place = "flit.pypackage_hooks:build_in_place"


FAQ
===

**Why is it version 1 instead of version 2?** Because the legacy sdist
format is barely a format at all, and to `remind us to keep things
simple <https://en.wikipedia.org/wiki/The_Mythical_Man-Month#The_second-system_effect>`_.

**What about cross-compilation?** Standardizing an interface for
cross-compilation seems premature given how complicated the
configuration required can be, the lack of an existing de facto
standard, and the authors of this PEP's inexperience with
cross-compilation. This would be a great target for future extensions,
though. In the mean time, there's no requirement that
``_pypackage/_pypackage.cfg`` contain the *only* entry points to a
project's build system -- packages that want to support
cross-compilation can still do so, they'll just need to include a
README explaining how to do it.

**PEP 426 says that the new sdist format will support automatically
creating policy-compliant .deb/.rpm packages. What happened to that?**
Step 1: enhance the wheel format as necessary so that a wheel can be
automatically converted into a policy-compliant .deb/.rpm package (see
PEP 491). Step 2: make it possible to automatically turn sdists into
wheels (this PEP). Step 3: we're done.

**What about automatically running tests?** Arguably this is another
thing that should be pushed off to wheel metadata instead of sdist
metadata: it's good practice to include tests inside your built
distribution so that end-users can test their install (and see above
re: our focus here being on stuff that end-users want to do, not
dedicated package developers), there are lots of packages that have to
be built before they can be tested anyway (e.g. because of binary
extensions), and in any case it's good practice to test against an
installed version in order to make sure your install code works
properly. But even if we do want this in sdist, then it's hardly
urgent (e.g. there is no ``pip test`` that people will miss), so we
defer that for a future extension to avoid blocking the core
functionality.

-- 
Nathaniel J. Smith -- http://vorpus.org


More information about the Distutils-SIG mailing list