[Distutils] Towards a simple and standard sdist format that isn't intertwined with distutils

Marcus Smith qwcode at gmail.com
Fri Oct 2 07:45:51 CEST 2015


Can you clarify the relationship to PEP426 metadata?
There's no standard for metadata in here other than what's required to run
a build hook.
Does that imply you would have each build tool enforce their own convention
for where metadata is found?

On Thu, Oct 1, 2015 at 9:53 PM, Nathaniel Smith <njs at pobox.com> wrote:

> Hi all,
>
> We realized that actually as far as we could tell, it wouldn't be that
> hard at this point to clean up how sdists work so that it would be
> possible to migrate away from distutils. So we wrote up a little draft
> proposal.
>
> The main question is, does this approach seem sound?
>
> -n
>
> ---
>
> PEP: ??
> Title: Standard interface for interacting with source trees
>        and source distributions
> Version: $Revision$
> Last-Modified: $Date$
> Author: Nathaniel J. Smith <njs at pobox.com>
>         Thomas Kluyver <takowl at gmail.com>
> Status: Draft
> Type: Standards-Track
> Content-Type: text/x-rst
> Created: 30-Sep-2015
> Post-History:
> Discussions-To: <distutils-sig at python.org>
>
> Abstract
> ========
>
> Distutils delenda est.
>
>
> Extended abstract
> =================
>
> While ``distutils`` / ``setuptools`` have taken us a long way, they
> suffer from three serious problems: (a) they're missing important
> features like autoconfiguration and usable build-time dependency
> declaration, (b) extending them is quirky, complicated, and fragile,
> (c) you are forced to use them anyway, because they provide the
> standard interface for installing python packages expected by both
> users and installation tools like ``pip``.
>
> Previous efforts (e.g. distutils2 or setuptools itself) have attempted
> to solve problems (a) and/or (b). We propose to solve (c).
>
> The goal of this PEP is get distutils-sig out of the business of being
> a gatekeeper for Python build systems. If you want to use distutils,
> great; if you want to use something else, then the more the merrier.
> The difficulty of interfacing with distutils means that there aren't
> many such systems right now, but to give a sense of what we're
> thinking about see `flit <https://github.com/takluyver/flit>`_ or
> `bento
> <https://cournape.github.io/Bento/>`_. Fortunately, wheels have now
> solved many of the hard problems here -- e.g. it's no longer necessary
> that a build system also know about every possible installation
> configuration -- so pretty much all we really need from a build system
> is that it have some way to spit out standard-compliant wheels.
>
> We therefore propose a new, relatively minimal interface for
> installation tools like ``pip`` to interact with package source trees
> and source distributions.
>
>
> Synopsis and rationale
> ======================
>
> To limit the scope of our design, we adopt several principles.
>
> First, we distinguish between a *source tree* (e.g., a VCS checkout)
> and a *source distribution* (e.g., an official snapshot release like
> ``lxml-3.4.4.zip``).
>
> There isn't a whole lot that *source trees* can be assumed to have in
> common. About all you know is that they can -- via some more or less
> Rube-Goldbergian process -- produce one or more binary distributions.
> In particular, you *cannot* tell via simple static inspection:
> - What version number will be attached to the resulting packages (e.g.
> it might be determined programmatically by consulting VCS metadata --
> I have here a build of numpy version "1.11.0.dev0+4a9ad17")
> - What build- or run-time dependencies are required (e.g. these may
> depend on arbitrarily complex configuration settings that are
> determined via a mix of manual settings and auto-probing)
> - Or even how many distinct binary distributions will be produced
> (e.g. a source distribution may always produce wheel A, but only
> produce wheel B when built on Unix-like systems).
>
> Therefore, when dealing with source trees, our goal is just to provide
> a standard UX for the core operations that are commonly performed on
> other people's packages; anything fancier and more developer-centric
> we leave at the discretion of individual package developers. So our
> source trees just provide some simple hooks to let a tool like
> ``pip``:
>
> - query for build dependencies
> - run a build, producing wheels as output
> - set up the current source tree so that it can be placed on
> ``sys.path`` in "develop mode"
>
> and that's it. We teach users that the standard way to install a
> package from a VCS checkout is now ``pip install .`` instead of
> ``python setup.py install``. (This is already a good idea anyway --
> e.g., pip can do reliable uninstall / upgrades.)
>
> Next, we note that pretty much all the operations that you might want
> to perform on a *source distribution* are also operations that you
> might want to perform on a source tree, and via the same UX. The only
> thing you do with source distributions that you don't do with source
> trees is, well, distribute them. There's all kind of metadata you
> could imagine including in a source distribution, but each piece of
> metadata puts an increased burden on source distribution generation
> tools, and most operations will still have to work without this
> metadata. So we only include extra metadata in source distributions if
> it helps solve specific problems that are unique to distribution. If
> you want wheel-style metadata, get a wheel and look at it -- they're
> great and getting better.
>
> Therefore, our source distributions are basically just source trees +
> a mechanism for signing.
>
> Finally: we explicitly do *not* have any concept of "depending on a
> source distribution". As in other systems like Debian, dependencies
> are always phrased in terms of binary distributions (wheels), and when
> a user runs something like ``pip install <package>``, then the
> long-run plan is that <package> and all its transitive dependencies
> should be available as wheels in a package index. But this is not yet
> realistic, so as a transitional / backwards-compatibility measure, we
> provide a simple mechanism for ``pip install <package>`` to handle
> cases where <package> is provided only as a source distribution.
>
>
> Source trees
> ============
>
> We retroactively declare the legacy source tree format involving
> ``setup.py`` to be "version 0". We don't try to specify it further;
> its de facto specification is encoded in the source code of
> ``distutils``, ``setuptools``, ``pip``, and other tools.
>
> A version 1-or-greater format source tree can be identified by the
> presence of a file ``_pypackage/_pypackage.cfg``.
>
> If both ``_pypackage/_pypackage.cfg`` and ``setup.py`` are present,
> then we have a version 1+ source tree, i.e., ``setup.py`` is ignored.
> This is necessary because we anticipate that version 1+ source trees
> may want to contain a ``setup.py`` file for backwards compatibility,
> e.g.::
>
>     #!/usr/bin/env python
>     import sys
>     print("Don't call setup.py directly!")
>     print("Use 'pip install .' instead!")
>     print("(You might have to upgrade pip first.)")
>     sys.exit(1)
>
> In the current version of the specification, the one file
> ``_pypackage/_pypackage.cfg`` is where pretty much all the action is
> (though see below). The motivation for putting it into a subdirectory
> is that:
> - the way of all standards is that cruft accumulates over time, so
> this way we pre-emptively have a place to put it,
> - real-world projects often accumulate build system cruft as well, so
> we might as well provide one obvious place to put it too.
>
> Of course this then creates the possibility of collisions between
> standard files and user files, and trying to teach arbitrary users not
> to scatter files around willy-nilly never works, so we adopt the
> convention that names starting with an underscore are reserved for
> official use, and non-underscored names are available for
> idiosyncratic use by individual projects.
>
> The alternative would be to simply place the main configuration file
> at the top-level, create the subdirectory only when specifically
> needed (most trees won't need it), and let users worry about finding
> their own place for their cruft. Not sure which is the best approach.
> Plus we can have a nice bikeshed about the names in general (FIXME).
>
> _pypackage.cfg
> --------------
>
> The ``_pypackage.cfg`` file contains various settings. Another good
> bike-shed topic is which file format to use for storing these (FIXME),
> but for purposes of this draft I'll write examples using `toml
> <https://github.com/toml-lang/toml>`_, because you'll instantly be
> able to understand the semantics, it has similar expressivity to JSON
> while being more human-friendly (e.g., it supports comments and
> multi-line strings), it's better-specified than ConfigParser, and it's
> much simpler than YAML. Rust's package manager uses toml for similar
> purposes.
>
> Here's an example ``_pypackage/_pypackage.cfg``::
>
>     # Version of the "pypackage format" that this file uses.
>     # Optional. If not present then 1 is assumed.
>     # All version changes indicate incompatible changes; backwards
>     # compatible changes are indicated by just having extra stuff in
>     # the file.
>     version = 1
>
>     [build]
>     # An inline requirements file. Optional.
>     # (FIXME: I guess this means we need a spec for requirements files?)
>     requirements = """
>         mybuildtool >= 2.1
>         special_windows_tool ; sys_platform == "win32"
>     """
>     # The path to an out-of-line requirements file. Optional.
>     requirements-file = "build-requirements.txt"
>     # A hook that will be called to query build requirements. Optional.
>     requirements-dynamic = "mybuildtool:get_requirements"
>
>     # A hook that will be called to build wheels. Required.
>     build-wheels = "mybuildtool:do_build"
>
>     # A hook that will be called to do an in-place build (see below).
>     # Optional.
>     build-in-place = "mybuildtool:do_inplace_build"
>
>     # The "x" namespace is reserved for third-party extensions.
>     # To use x.foo you should own the name "foo" on pypi.
>     [x.mybuildtool]
>     spam = ["spam", "spam", "spam"]
>
> All paths are relative to the ``_pypackage/`` directory (so e.g. the
> build.requirements-file value above refers to a file named
> ``_pypackage/build-requirements.txt``).
>
> A *hook* is a Python object that is looked up using the same rules as
> traditional setuptools entry_points: a dotted module name, followed by
> a colon, followed by a dotted name that is looked up within that
> module. *Running a hook* means: first, find or create a python
> interpreter which is executing in the current venv, whose working
> directory is set to the ``_pypackage/`` directory, and which has the
> ``_pypackage/`` directory on ``sys.path``. Then, inside this
> interpreter, look up the hook object, and call it, with arguments as
> specified below.
>
> A build command like ``pip wheel <source tree>`` performs the following
> steps:
>
> 1) Validate the ``_pypackage.cfg`` version number.
>
> 2) Create an empty virtualenv / venv, that matches the environment
> that the installer is targeting (e.g. if you want wheels for CPython
> 3.4 on 64-bit windows, then you make a CPython 3.4 64-bit windows
> venv).
>
> 3) If the build.requirements key is present, then in this venv run the
> equivalent of ``pip install -r <a file containing its value>``, using
> whatever index settings are currently in effect.
>
> 4) If the build.requirements-file key is present, then in this venv
> run the equivalent of ``pip install -r <the named file>``, using
> whatever index settings are currently in effect.
>
> 5) If the build.requirements-dynamic key is present, then in this venv
>  run the hook with no arguments, capture its stdout, and pipe it into
> ``pip install -r -``, using whatever index settings are currently in
> effect. If the hook raises an exception, then abort the build with an
> error.
>
>    Note: because these steps are performed in sequence, the
> build.requirements-dynamic hook is allowed to use packages that are
> listed in build.requirements or build.requirements-file.
>
> 6) In this venv, run the build.build-wheels hook. This should be a
> Python function which takes one argument.
>
>    This argument is an arbitrary dictionary intended to contain
> user-specified configuration, specified via some install-tool-specific
> mechanism. The intention is that tools like ``pip`` should provide
> some way for users to specify key/value settings that will be passed
> in here, analogous to the legacy ``--install-option`` and
> ``--global-option`` arguments.
>
>    To make it easier for packages to transition from version 0 to
> version 1 sdists, we suggest that ``pip`` and other tools that have
> such existing option-setting interfaces SHOULD map them to entries in
> this dictionary when -- e.g.::
>
>        pip --global-option=a --install-option=b --install-option=c
>
>    could produce a dict like::
>
>        {"--global-option": ["a"], "--install-option": ["b", "c"]}
>
>    The hook's return value is a list of pathnames relative to the
> scratch directory. Each entry names a wheel file created by this
> build.
>
>    Errors are signaled by raising an exception.
>
> When performing an in-place build (e.g. for ``pip install -e .``),
> then the same steps are followed, except that instead of the
> build.build-wheels hook, we call the build.build-in-place hook, and
> instead of returning a list of wheel files, it returns the name of a
> directory that should be placed onto ``sys.path`` (usually this will
> be the source tree itself, but may not be, e.g. if a build system
> wants to enforce a rule where the source is always kept pristine then
> it could symlink the .py files into a build directory, place the
> extension modules and dist-info there, and return that). This
> directory must contain importable versions of the code in the source
> tree, along with appropriate .dist-info directories.
>
> (FIXME: in-place builds are useful but intrinsically kinda broken --
> e.g. extensions / source / metadata can all easily get out of sync --
> so while I think this paragraph provides a reasonable hack that
> preserves current functionality, maybe we should defer specifying them
> to until after we've thought through the issues more?)
>
> When working with source trees, build tools like ``pip`` are
> encouraged to cache and re-use virtualenvs for performance.
>
>
> Other contents of _pypackage/
> -----------------------------
>
> _RECORD, _RECORD.jws, _RECORD.p7s: see below.
>
> _x/<pypi name>/: reserved for use by tools (e.g.
> _x/mybuildtool/build/, _x/pip/venv-cache/cp34-none-linux_x86_64/)
>
>
> Source distributions
> ====================
>
> A *source distribution* is a file in a well-known archive format such
> as zip or tar.gz, which contains a single directory, and this
> directory is a source tree (in the sense defined in the previous
> section).
>
> The ``_pypackage/`` directory in a source distribution SHOULD also
> contain a _RECORD file, as defined in PEP 427, and MAY also contain
> _RECORD.jws and/or _RECORD.p7s signature files.
>
> For official releases, source distributions SHOULD be named as
> ``<package>-<version>.<ext>``, and the directory they contain SHOULD
> be named ``<package>-<version>``, and building this source tree SHOULD
> produce a wheel named ``<package>-<version>-<compatibility tag>.whl``
> (though it may produce other wheels as well).
>
> (FIXME: maybe we should add that if you want your sdist on PyPI then
> you MUST include a proper _RECORD file and use the proper naming
> convention?)
>
> Integration tools like ``pip`` SHOULD take advantage of this
> convention by applying the following heuristic: when seeking a package
> <package>, if no appropriate wheel can be found, but an sdist named
> <package>-<version>.<ext> is found, then:
>
> 1) build the sdist
> 2) add the resulting wheels to the package search space
> 3) retry the original operation
>
> This handles a variety of simple and complex cases -- for example, if
> we need a package 'foo', and we find foo-1.0.zip which builds foo.whl
> and bar.whl, and foo.whl depends on bar.whl, then everything will work
> out. There remain other cases that are not handled, e.g. if we start
> out searching for bar.whl we will never discover foo-1.0.zip. We take
> the perspective that this is nonetheless sufficient for a transitional
> heuristic, and anyone who runs into this problem should just upload
> wheels already. If this turns out to be inadequate in practice, then
> it will be addressed by future extensions.
>
>
> Examples
> ========
>
> **Example 1:** While we assume that installation tools will have to
> continue supporting version 0 sdists for the indefinite future, it's a
> useful check to make sure that our new format can continue to support
> packages using distutils / setuptools as their build system. We assume
> that a future version ``pip`` will take its existing knowledge of
> distutils internals and expose them as the appropriate hooks, and then
> existing distutils / setuptools packages can be ported forward by
> using the following ``_pypackage/_pypackage.cfg``::
>
>     [build]
>     requirements = """
>       pip >= whatever
>       wheel
>     """
>     # Applies monkeypatches, then does 'setup.py dist_info' and
>     # extracts the setup_requires
>     requirements-dynamic = "pip.pypackage_hooks:setup_requirements"
>     # Applies monkeypatches, then does 'setup.py wheel'
>     build-wheels = "pip.pypackage_hooks:build_wheels"
>     # Applies monkeypatches, then does:
>     #    setup.py dist_info && setup.py build_ext -i
>     build-in-place = "pip.pypackage_hooks:build_in_place"
>
> This is also useful for any other installation tools that may want to
> support version 0 sdists without having to implement bug-for-bug
> compatibility with pip -- if no ``_pypackage/_pypackage.cfg`` is
> present, they can use this as a default.
>
> **Example 2:** For packages using numpy.distutils. This is identical
> to the distutils / setuptools example above, except that numpy is
> moved into the list of static build requirements. Right now, most
> projects using numpy.distutils don't bother trying to declare this
> dependency, and instead simply error out if numpy is not already
> installed. This is because currently the only way to declare a build
> dependency is via the ``setup_requires`` argument to the ``setup``
> function, and in this case the ``setup`` function is
> ``numpy.distutils.setup``, which... obviously doesn't work very well.
> Drop this ``_pypackage.cfg`` into an existing project like this and it
> will become robustly pip-installable with no further changes::
>
>     [build]
>     requirements = """
>       numpy
>       pip >= whatever
>       wheel
>     """
>     requirements-dynamic = "pip.pypackage_hooks:setup_requirements"
>     build-wheels = "pip.pypackage_hooks:build_wheels"
>     build-in-place = "pip.pypackage_hooks:build_in_place"
>
> **Example 3:** `flit <https://github.com/takluyver/flit>`_ is a tool
> designed to make distributing simple packages simple, but it currently
> has no support for sdists, and for convenience includes its own
> installation code that's redundant with that in pip. These 4 lines of
> boilerplate make any flit-using source tree pip-installable, and lets
> flit get out of the package installation business::
>
>     [build]
>     requirements = "flit"
>     build-wheels = "flit.pypackage_hooks:build_wheels"
>     build-in-place = "flit.pypackage_hooks:build_in_place"
>
>
> FAQ
> ===
>
> **Why is it version 1 instead of version 2?** Because the legacy sdist
> format is barely a format at all, and to `remind us to keep things
> simple <
> https://en.wikipedia.org/wiki/The_Mythical_Man-Month#The_second-system_effect
> >`_.
>
> **What about cross-compilation?** Standardizing an interface for
> cross-compilation seems premature given how complicated the
> configuration required can be, the lack of an existing de facto
> standard, and the authors of this PEP's inexperience with
> cross-compilation. This would be a great target for future extensions,
> though. In the mean time, there's no requirement that
> ``_pypackage/_pypackage.cfg`` contain the *only* entry points to a
> project's build system -- packages that want to support
> cross-compilation can still do so, they'll just need to include a
> README explaining how to do it.
>
> **PEP 426 says that the new sdist format will support automatically
> creating policy-compliant .deb/.rpm packages. What happened to that?**
> Step 1: enhance the wheel format as necessary so that a wheel can be
> automatically converted into a policy-compliant .deb/.rpm package (see
> PEP 491). Step 2: make it possible to automatically turn sdists into
> wheels (this PEP). Step 3: we're done.
>
> **What about automatically running tests?** Arguably this is another
> thing that should be pushed off to wheel metadata instead of sdist
> metadata: it's good practice to include tests inside your built
> distribution so that end-users can test their install (and see above
> re: our focus here being on stuff that end-users want to do, not
> dedicated package developers), there are lots of packages that have to
> be built before they can be tested anyway (e.g. because of binary
> extensions), and in any case it's good practice to test against an
> installed version in order to make sure your install code works
> properly. But even if we do want this in sdist, then it's hardly
> urgent (e.g. there is no ``pip test`` that people will miss), so we
> defer that for a future extension to avoid blocking the core
> functionality.
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20151001/52f0f8f7/attachment-0001.html>


More information about the Distutils-SIG mailing list