[Distutils] Towards a simple and standard sdist format that isn't intertwined with distutils

Daniel Holth dholth at gmail.com
Fri Oct 2 16:17:52 CEST 2015

Thank you for your work on this. We have to kill distutils to make progress
in packaging.

On Fri, Oct 2, 2015 at 10:12 AM Daniel Holth <dholth at gmail.com> wrote:

> One way to do sdist 2.0 would be to have the package-1.0.dist-info
> directory in there (most sdists contain setuptools metadata) and to have a
> flag static-metadata=1 in setup.cfg asserting that setup.py [if present]
> does not alter the list of dependencies.
> In the old MEBS design the package could suggest a build system, but pip
> would invoke a list of build plugins to inspect the directory and return
> True if they were able to build the package. This would allow for ignoring
> the package's suggested build system. Instead of defining a command line
> interface for setup.py MEBS would define a set of methods on the build
> plugin.
> I thought Robert Collins had a working setup-requires implementation
> already? I have a worse but backwards compatible one too at
> https://bitbucket.org/dholth/setup-requires/src/tip/setup.py
> On Fri, Oct 2, 2015 at 9:42 AM Marcus Smith <qwcode at gmail.com> wrote:
>> Can you clarify the relationship to PEP426 metadata?
>> There's no standard for metadata in here other than what's required to
>> run a build hook.
>> Does that imply you would have each build tool enforce their own
>> convention for where metadata is found?
>> On Thu, Oct 1, 2015 at 9:53 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>> Hi all,
>>> We realized that actually as far as we could tell, it wouldn't be that
>>> hard at this point to clean up how sdists work so that it would be
>>> possible to migrate away from distutils. So we wrote up a little draft
>>> proposal.
>>> The main question is, does this approach seem sound?
>>> -n
>>> ---
>>> PEP: ??
>>> Title: Standard interface for interacting with source trees
>>>        and source distributions
>>> Version: $Revision$
>>> Last-Modified: $Date$
>>> Author: Nathaniel J. Smith <njs at pobox.com>
>>>         Thomas Kluyver <takowl at gmail.com>
>>> Status: Draft
>>> Type: Standards-Track
>>> Content-Type: text/x-rst
>>> Created: 30-Sep-2015
>>> Post-History:
>>> Discussions-To: <distutils-sig at python.org>
>>> Abstract
>>> ========
>>> Distutils delenda est.
>>> Extended abstract
>>> =================
>>> While ``distutils`` / ``setuptools`` have taken us a long way, they
>>> suffer from three serious problems: (a) they're missing important
>>> features like autoconfiguration and usable build-time dependency
>>> declaration, (b) extending them is quirky, complicated, and fragile,
>>> (c) you are forced to use them anyway, because they provide the
>>> standard interface for installing python packages expected by both
>>> users and installation tools like ``pip``.
>>> Previous efforts (e.g. distutils2 or setuptools itself) have attempted
>>> to solve problems (a) and/or (b). We propose to solve (c).
>>> The goal of this PEP is get distutils-sig out of the business of being
>>> a gatekeeper for Python build systems. If you want to use distutils,
>>> great; if you want to use something else, then the more the merrier.
>>> The difficulty of interfacing with distutils means that there aren't
>>> many such systems right now, but to give a sense of what we're
>>> thinking about see `flit <https://github.com/takluyver/flit>`_ or
>>> `bento
>>> <https://cournape.github.io/Bento/>`_. Fortunately, wheels have now
>>> solved many of the hard problems here -- e.g. it's no longer necessary
>>> that a build system also know about every possible installation
>>> configuration -- so pretty much all we really need from a build system
>>> is that it have some way to spit out standard-compliant wheels.
>>> We therefore propose a new, relatively minimal interface for
>>> installation tools like ``pip`` to interact with package source trees
>>> and source distributions.
>>> Synopsis and rationale
>>> ======================
>>> To limit the scope of our design, we adopt several principles.
>>> First, we distinguish between a *source tree* (e.g., a VCS checkout)
>>> and a *source distribution* (e.g., an official snapshot release like
>>> ``lxml-3.4.4.zip``).
>>> There isn't a whole lot that *source trees* can be assumed to have in
>>> common. About all you know is that they can -- via some more or less
>>> Rube-Goldbergian process -- produce one or more binary distributions.
>>> In particular, you *cannot* tell via simple static inspection:
>>> - What version number will be attached to the resulting packages (e.g.
>>> it might be determined programmatically by consulting VCS metadata --
>>> I have here a build of numpy version "1.11.0.dev0+4a9ad17")
>>> - What build- or run-time dependencies are required (e.g. these may
>>> depend on arbitrarily complex configuration settings that are
>>> determined via a mix of manual settings and auto-probing)
>>> - Or even how many distinct binary distributions will be produced
>>> (e.g. a source distribution may always produce wheel A, but only
>>> produce wheel B when built on Unix-like systems).
>>> Therefore, when dealing with source trees, our goal is just to provide
>>> a standard UX for the core operations that are commonly performed on
>>> other people's packages; anything fancier and more developer-centric
>>> we leave at the discretion of individual package developers. So our
>>> source trees just provide some simple hooks to let a tool like
>>> ``pip``:
>>> - query for build dependencies
>>> - run a build, producing wheels as output
>>> - set up the current source tree so that it can be placed on
>>> ``sys.path`` in "develop mode"
>>> and that's it. We teach users that the standard way to install a
>>> package from a VCS checkout is now ``pip install .`` instead of
>>> ``python setup.py install``. (This is already a good idea anyway --
>>> e.g., pip can do reliable uninstall / upgrades.)
>>> Next, we note that pretty much all the operations that you might want
>>> to perform on a *source distribution* are also operations that you
>>> might want to perform on a source tree, and via the same UX. The only
>>> thing you do with source distributions that you don't do with source
>>> trees is, well, distribute them. There's all kind of metadata you
>>> could imagine including in a source distribution, but each piece of
>>> metadata puts an increased burden on source distribution generation
>>> tools, and most operations will still have to work without this
>>> metadata. So we only include extra metadata in source distributions if
>>> it helps solve specific problems that are unique to distribution. If
>>> you want wheel-style metadata, get a wheel and look at it -- they're
>>> great and getting better.
>>> Therefore, our source distributions are basically just source trees +
>>> a mechanism for signing.
>>> Finally: we explicitly do *not* have any concept of "depending on a
>>> source distribution". As in other systems like Debian, dependencies
>>> are always phrased in terms of binary distributions (wheels), and when
>>> a user runs something like ``pip install <package>``, then the
>>> long-run plan is that <package> and all its transitive dependencies
>>> should be available as wheels in a package index. But this is not yet
>>> realistic, so as a transitional / backwards-compatibility measure, we
>>> provide a simple mechanism for ``pip install <package>`` to handle
>>> cases where <package> is provided only as a source distribution.
>>> Source trees
>>> ============
>>> We retroactively declare the legacy source tree format involving
>>> ``setup.py`` to be "version 0". We don't try to specify it further;
>>> its de facto specification is encoded in the source code of
>>> ``distutils``, ``setuptools``, ``pip``, and other tools.
>>> A version 1-or-greater format source tree can be identified by the
>>> presence of a file ``_pypackage/_pypackage.cfg``.
>>> If both ``_pypackage/_pypackage.cfg`` and ``setup.py`` are present,
>>> then we have a version 1+ source tree, i.e., ``setup.py`` is ignored.
>>> This is necessary because we anticipate that version 1+ source trees
>>> may want to contain a ``setup.py`` file for backwards compatibility,
>>> e.g.::
>>>     #!/usr/bin/env python
>>>     import sys
>>>     print("Don't call setup.py directly!")
>>>     print("Use 'pip install .' instead!")
>>>     print("(You might have to upgrade pip first.)")
>>>     sys.exit(1)
>>> In the current version of the specification, the one file
>>> ``_pypackage/_pypackage.cfg`` is where pretty much all the action is
>>> (though see below). The motivation for putting it into a subdirectory
>>> is that:
>>> - the way of all standards is that cruft accumulates over time, so
>>> this way we pre-emptively have a place to put it,
>>> - real-world projects often accumulate build system cruft as well, so
>>> we might as well provide one obvious place to put it too.
>>> Of course this then creates the possibility of collisions between
>>> standard files and user files, and trying to teach arbitrary users not
>>> to scatter files around willy-nilly never works, so we adopt the
>>> convention that names starting with an underscore are reserved for
>>> official use, and non-underscored names are available for
>>> idiosyncratic use by individual projects.
>>> The alternative would be to simply place the main configuration file
>>> at the top-level, create the subdirectory only when specifically
>>> needed (most trees won't need it), and let users worry about finding
>>> their own place for their cruft. Not sure which is the best approach.
>>> Plus we can have a nice bikeshed about the names in general (FIXME).
>>> _pypackage.cfg
>>> --------------
>>> The ``_pypackage.cfg`` file contains various settings. Another good
>>> bike-shed topic is which file format to use for storing these (FIXME),
>>> but for purposes of this draft I'll write examples using `toml
>>> <https://github.com/toml-lang/toml>`_, because you'll instantly be
>>> able to understand the semantics, it has similar expressivity to JSON
>>> while being more human-friendly (e.g., it supports comments and
>>> multi-line strings), it's better-specified than ConfigParser, and it's
>>> much simpler than YAML. Rust's package manager uses toml for similar
>>> purposes.
>>> Here's an example ``_pypackage/_pypackage.cfg``::
>>>     # Version of the "pypackage format" that this file uses.
>>>     # Optional. If not present then 1 is assumed.
>>>     # All version changes indicate incompatible changes; backwards
>>>     # compatible changes are indicated by just having extra stuff in
>>>     # the file.
>>>     version = 1
>>>     [build]
>>>     # An inline requirements file. Optional.
>>>     # (FIXME: I guess this means we need a spec for requirements files?)
>>>     requirements = """
>>>         mybuildtool >= 2.1
>>>         special_windows_tool ; sys_platform == "win32"
>>>     """
>>>     # The path to an out-of-line requirements file. Optional.
>>>     requirements-file = "build-requirements.txt"
>>>     # A hook that will be called to query build requirements. Optional.
>>>     requirements-dynamic = "mybuildtool:get_requirements"
>>>     # A hook that will be called to build wheels. Required.
>>>     build-wheels = "mybuildtool:do_build"
>>>     # A hook that will be called to do an in-place build (see below).
>>>     # Optional.
>>>     build-in-place = "mybuildtool:do_inplace_build"
>>>     # The "x" namespace is reserved for third-party extensions.
>>>     # To use x.foo you should own the name "foo" on pypi.
>>>     [x.mybuildtool]
>>>     spam = ["spam", "spam", "spam"]
>>> All paths are relative to the ``_pypackage/`` directory (so e.g. the
>>> build.requirements-file value above refers to a file named
>>> ``_pypackage/build-requirements.txt``).
>>> A *hook* is a Python object that is looked up using the same rules as
>>> traditional setuptools entry_points: a dotted module name, followed by
>>> a colon, followed by a dotted name that is looked up within that
>>> module. *Running a hook* means: first, find or create a python
>>> interpreter which is executing in the current venv, whose working
>>> directory is set to the ``_pypackage/`` directory, and which has the
>>> ``_pypackage/`` directory on ``sys.path``. Then, inside this
>>> interpreter, look up the hook object, and call it, with arguments as
>>> specified below.
>>> A build command like ``pip wheel <source tree>`` performs the following
>>> steps:
>>> 1) Validate the ``_pypackage.cfg`` version number.
>>> 2) Create an empty virtualenv / venv, that matches the environment
>>> that the installer is targeting (e.g. if you want wheels for CPython
>>> 3.4 on 64-bit windows, then you make a CPython 3.4 64-bit windows
>>> venv).
>>> 3) If the build.requirements key is present, then in this venv run the
>>> equivalent of ``pip install -r <a file containing its value>``, using
>>> whatever index settings are currently in effect.
>>> 4) If the build.requirements-file key is present, then in this venv
>>> run the equivalent of ``pip install -r <the named file>``, using
>>> whatever index settings are currently in effect.
>>> 5) If the build.requirements-dynamic key is present, then in this venv
>>>  run the hook with no arguments, capture its stdout, and pipe it into
>>> ``pip install -r -``, using whatever index settings are currently in
>>> effect. If the hook raises an exception, then abort the build with an
>>> error.
>>>    Note: because these steps are performed in sequence, the
>>> build.requirements-dynamic hook is allowed to use packages that are
>>> listed in build.requirements or build.requirements-file.
>>> 6) In this venv, run the build.build-wheels hook. This should be a
>>> Python function which takes one argument.
>>>    This argument is an arbitrary dictionary intended to contain
>>> user-specified configuration, specified via some install-tool-specific
>>> mechanism. The intention is that tools like ``pip`` should provide
>>> some way for users to specify key/value settings that will be passed
>>> in here, analogous to the legacy ``--install-option`` and
>>> ``--global-option`` arguments.
>>>    To make it easier for packages to transition from version 0 to
>>> version 1 sdists, we suggest that ``pip`` and other tools that have
>>> such existing option-setting interfaces SHOULD map them to entries in
>>> this dictionary when -- e.g.::
>>>        pip --global-option=a --install-option=b --install-option=c
>>>    could produce a dict like::
>>>        {"--global-option": ["a"], "--install-option": ["b", "c"]}
>>>    The hook's return value is a list of pathnames relative to the
>>> scratch directory. Each entry names a wheel file created by this
>>> build.
>>>    Errors are signaled by raising an exception.
>>> When performing an in-place build (e.g. for ``pip install -e .``),
>>> then the same steps are followed, except that instead of the
>>> build.build-wheels hook, we call the build.build-in-place hook, and
>>> instead of returning a list of wheel files, it returns the name of a
>>> directory that should be placed onto ``sys.path`` (usually this will
>>> be the source tree itself, but may not be, e.g. if a build system
>>> wants to enforce a rule where the source is always kept pristine then
>>> it could symlink the .py files into a build directory, place the
>>> extension modules and dist-info there, and return that). This
>>> directory must contain importable versions of the code in the source
>>> tree, along with appropriate .dist-info directories.
>>> (FIXME: in-place builds are useful but intrinsically kinda broken --
>>> e.g. extensions / source / metadata can all easily get out of sync --
>>> so while I think this paragraph provides a reasonable hack that
>>> preserves current functionality, maybe we should defer specifying them
>>> to until after we've thought through the issues more?)
>>> When working with source trees, build tools like ``pip`` are
>>> encouraged to cache and re-use virtualenvs for performance.
>>> Other contents of _pypackage/
>>> -----------------------------
>>> _RECORD, _RECORD.jws, _RECORD.p7s: see below.
>>> _x/<pypi name>/: reserved for use by tools (e.g.
>>> _x/mybuildtool/build/, _x/pip/venv-cache/cp34-none-linux_x86_64/)
>>> Source distributions
>>> ====================
>>> A *source distribution* is a file in a well-known archive format such
>>> as zip or tar.gz, which contains a single directory, and this
>>> directory is a source tree (in the sense defined in the previous
>>> section).
>>> The ``_pypackage/`` directory in a source distribution SHOULD also
>>> contain a _RECORD file, as defined in PEP 427, and MAY also contain
>>> _RECORD.jws and/or _RECORD.p7s signature files.
>>> For official releases, source distributions SHOULD be named as
>>> ``<package>-<version>.<ext>``, and the directory they contain SHOULD
>>> be named ``<package>-<version>``, and building this source tree SHOULD
>>> produce a wheel named ``<package>-<version>-<compatibility tag>.whl``
>>> (though it may produce other wheels as well).
>>> (FIXME: maybe we should add that if you want your sdist on PyPI then
>>> you MUST include a proper _RECORD file and use the proper naming
>>> convention?)
>>> Integration tools like ``pip`` SHOULD take advantage of this
>>> convention by applying the following heuristic: when seeking a package
>>> <package>, if no appropriate wheel can be found, but an sdist named
>>> <package>-<version>.<ext> is found, then:
>>> 1) build the sdist
>>> 2) add the resulting wheels to the package search space
>>> 3) retry the original operation
>>> This handles a variety of simple and complex cases -- for example, if
>>> we need a package 'foo', and we find foo-1.0.zip which builds foo.whl
>>> and bar.whl, and foo.whl depends on bar.whl, then everything will work
>>> out. There remain other cases that are not handled, e.g. if we start
>>> out searching for bar.whl we will never discover foo-1.0.zip. We take
>>> the perspective that this is nonetheless sufficient for a transitional
>>> heuristic, and anyone who runs into this problem should just upload
>>> wheels already. If this turns out to be inadequate in practice, then
>>> it will be addressed by future extensions.
>>> Examples
>>> ========
>>> **Example 1:** While we assume that installation tools will have to
>>> continue supporting version 0 sdists for the indefinite future, it's a
>>> useful check to make sure that our new format can continue to support
>>> packages using distutils / setuptools as their build system. We assume
>>> that a future version ``pip`` will take its existing knowledge of
>>> distutils internals and expose them as the appropriate hooks, and then
>>> existing distutils / setuptools packages can be ported forward by
>>> using the following ``_pypackage/_pypackage.cfg``::
>>>     [build]
>>>     requirements = """
>>>       pip >= whatever
>>>       wheel
>>>     """
>>>     # Applies monkeypatches, then does 'setup.py dist_info' and
>>>     # extracts the setup_requires
>>>     requirements-dynamic = "pip.pypackage_hooks:setup_requirements"
>>>     # Applies monkeypatches, then does 'setup.py wheel'
>>>     build-wheels = "pip.pypackage_hooks:build_wheels"
>>>     # Applies monkeypatches, then does:
>>>     #    setup.py dist_info && setup.py build_ext -i
>>>     build-in-place = "pip.pypackage_hooks:build_in_place"
>>> This is also useful for any other installation tools that may want to
>>> support version 0 sdists without having to implement bug-for-bug
>>> compatibility with pip -- if no ``_pypackage/_pypackage.cfg`` is
>>> present, they can use this as a default.
>>> **Example 2:** For packages using numpy.distutils. This is identical
>>> to the distutils / setuptools example above, except that numpy is
>>> moved into the list of static build requirements. Right now, most
>>> projects using numpy.distutils don't bother trying to declare this
>>> dependency, and instead simply error out if numpy is not already
>>> installed. This is because currently the only way to declare a build
>>> dependency is via the ``setup_requires`` argument to the ``setup``
>>> function, and in this case the ``setup`` function is
>>> ``numpy.distutils.setup``, which... obviously doesn't work very well.
>>> Drop this ``_pypackage.cfg`` into an existing project like this and it
>>> will become robustly pip-installable with no further changes::
>>>     [build]
>>>     requirements = """
>>>       numpy
>>>       pip >= whatever
>>>       wheel
>>>     """
>>>     requirements-dynamic = "pip.pypackage_hooks:setup_requirements"
>>>     build-wheels = "pip.pypackage_hooks:build_wheels"
>>>     build-in-place = "pip.pypackage_hooks:build_in_place"
>>> **Example 3:** `flit <https://github.com/takluyver/flit>`_ is a tool
>>> designed to make distributing simple packages simple, but it currently
>>> has no support for sdists, and for convenience includes its own
>>> installation code that's redundant with that in pip. These 4 lines of
>>> boilerplate make any flit-using source tree pip-installable, and lets
>>> flit get out of the package installation business::
>>>     [build]
>>>     requirements = "flit"
>>>     build-wheels = "flit.pypackage_hooks:build_wheels"
>>>     build-in-place = "flit.pypackage_hooks:build_in_place"
>>> FAQ
>>> ===
>>> **Why is it version 1 instead of version 2?** Because the legacy sdist
>>> format is barely a format at all, and to `remind us to keep things
>>> simple <
>>> https://en.wikipedia.org/wiki/The_Mythical_Man-Month#The_second-system_effect
>>> >`_.
>>> **What about cross-compilation?** Standardizing an interface for
>>> cross-compilation seems premature given how complicated the
>>> configuration required can be, the lack of an existing de facto
>>> standard, and the authors of this PEP's inexperience with
>>> cross-compilation. This would be a great target for future extensions,
>>> though. In the mean time, there's no requirement that
>>> ``_pypackage/_pypackage.cfg`` contain the *only* entry points to a
>>> project's build system -- packages that want to support
>>> cross-compilation can still do so, they'll just need to include a
>>> README explaining how to do it.
>>> **PEP 426 says that the new sdist format will support automatically
>>> creating policy-compliant .deb/.rpm packages. What happened to that?**
>>> Step 1: enhance the wheel format as necessary so that a wheel can be
>>> automatically converted into a policy-compliant .deb/.rpm package (see
>>> PEP 491). Step 2: make it possible to automatically turn sdists into
>>> wheels (this PEP). Step 3: we're done.
>>> **What about automatically running tests?** Arguably this is another
>>> thing that should be pushed off to wheel metadata instead of sdist
>>> metadata: it's good practice to include tests inside your built
>>> distribution so that end-users can test their install (and see above
>>> re: our focus here being on stuff that end-users want to do, not
>>> dedicated package developers), there are lots of packages that have to
>>> be built before they can be tested anyway (e.g. because of binary
>>> extensions), and in any case it's good practice to test against an
>>> installed version in order to make sure your install code works
>>> properly. But even if we do want this in sdist, then it's hardly
>>> urgent (e.g. there is no ``pip test`` that people will miss), so we
>>> defer that for a future extension to avoid blocking the core
>>> functionality.
>>> --
>>> Nathaniel J. Smith -- http://vorpus.org
>>> _______________________________________________
>>> Distutils-SIG maillist  -  Distutils-SIG at python.org
>>> https://mail.python.org/mailman/listinfo/distutils-sig
>> _______________________________________________
>> Distutils-SIG maillist  -  Distutils-SIG at python.org
>> https://mail.python.org/mailman/listinfo/distutils-sig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20151002/6c44528f/attachment-0001.html>

More information about the Distutils-SIG mailing list