![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
On 26 October 2015 at 06:04, Nathaniel Smith <njs@pobox.com> wrote:
Here's a second round of text towards making a build-system independent interface between pip and source trees/sdists. My idea this time is to take a divide-and-conquer approach: this text tries to summarize all the stuff that it seemed like we had mostly reached consensus on in the previous thread + call, with blank chunks marked "TBD" where there are specific points that still need To Be Determined. So my hope is that everyone will read what's here and agree that it's great as far as it goes, and then we can go through and fill in each missing piece one at a time.
I'll comment on what's here, but ignore the TBD items - I'd rather (as you suggest) leave discussion of those details till the basic idea is agreed.
Abstract ========
Distutils delenda est.
While this makes a nice tagline, I'd rather something less negative. Distutils does not "need" to be destroyed. It's perfectly adequate (although hardly user friendly) for a lot of cases - I'd be willing to suggest *most* users can work just fine with distutils. I'm not a fan of distutils, but I'd prefer it if we kept the rhetoric limited - as Nick pointed out this whole area is as much a political issue as a technical one.
Extended abstract =================
While ``distutils`` / ``setuptools`` have taken us a long way, they suffer from three serious problems: (a) they're missing important features like autoconfiguration and usable build-time dependency declaration, (b) extending them is quirky, complicated, and fragile, (c) it's very difficult to use anything else, because they provide the standard interface for installing python packages expected by both users and installation tools like ``pip``.
Again, this is overstated. You very nearly lost me right here - people won't read the details of the proposal if they disagree with the abstract(s). Specifically: * The features in (a) are only important to *some* parts of the community. The scientific community is the major one, and is a huge influence over the direction we want to go in, but again, not crucial to many people. And even where they might be useful (e.g., Windows users building pyyaml, lxml, pillow, ...) the description implies "working out what's there" rather than "allowing users to easily manage non-Python dependencies", which gives the wrong impression. * The features in (b) are highly specialised. Very few people extend setuptools/distutils. And those who do, have often invested a lot of effort in doing so. Sure, they'd rather not have needed to, but now that they have, a replacement system simply means that work is lost. Arguably, fixing (b) is only useful for people (like the scientific community) who have needed to extend setuptools and have been unable to achieve their goals that way. That's an even smaller part of the community.
Previous efforts (e.g. distutils2 or setuptools itself) have attempted to solve problems (a) and/or (b). We propose to solve (c).
Agreed - this is a good approach. But it's at odds with your abstract, which says distutils must die. Here you're saying you want to allow people to keep using distutils but allow people with specialised needs to choose an alternative. Or are you offering an alternative to people who use distutils? The whole of the above is confusing on the face of it. The details below clarify a lot, as does knowing how the previous discussions have gone. But it would help a lot if the introduction to this PEP were clearer.
The goal of this PEP is get distutils-sig out of the business of being a gatekeeper for Python build systems. If you want to use distutils, great; if you want to use something else, then that should be easy to do using standardized methods. The difficulty of interfacing with distutils means that there aren't many such systems right now, but to give a sense of what we're thinking about see `flit <https://github.com/takluyver/flit>`_ or `bento <https://cournape.github.io/Bento/>`_. Fortunately, wheels have now solved many of the hard problems here -- e.g. it's no longer necessary that a build system also know about every possible installation configuration -- so pretty much all we really need from a build system is that it have some way to spit out standard-compliant wheels.
OK. Although I see a risk here that if I want to build package FOO, I now have to worry whether FOO's build system supports Windows, as well as worrying whether FOO itself supports Windows. There's still a role for some "gatekeeper" (not a good word IMO, maybe "coordinator") to provide a certain level of support or review of build systems, and a point of contact for users with build issues (the point of this proposal is to some extent that people don't need to *know* what build system a project uses, so suggesting everyone has to direct issues to the correct build system support forum isn't necessarily practical).
We therefore propose a new, relatively minimal interface for installation tools like ``pip`` to interact with package source trees and source distributions.
In addition, we propose a wheel-inspired static metadata format for sdists, suitable for tools like PyPI and pip's resolver.
Terminology and goals =====================
A *source tree* is something like a VCS checkout. We need a standard interface for installing from this format, to support usages like ``pip install some-directory/``.
A *source distribution* is a static snapshot representing a particular release of some source code, like ``lxml-3.4.4.zip``. Source distributions serve many purposes: they form an archival record of releases, they provide a stupid-simple de facto standard for tools that want to ingest and process large corpora of code, possibly written in many languages (e.g. code search), they act as the input to downstream packaging systems like Debian/Fedora/Conda/..., and so forth. In the Python ecosystem they additionally have a particularly important role to play, because packaging tools like ``pip`` are able to use source distributions to fulfill binary dependencies, e.g. if there is a distribution ``foo.whl`` which declares a dependency on ``bar``, then we need to support the case where ``pip install bar`` or ``pip install foo`` automatically locates the sdist for ``bar``, downloads it, builds it, and installs the resulting package.
This is somewhat misleading, given that you go on to specify the format below, but maybe that's only an issue for someone like me who saw the previous debate over "source distribution" (as a bundled up source tree) vs "sdist" as a specified format. If I understand, you've now discarded the former sense of source distribution, and are sticking with the latter (specified format) definition.
Source distributions are also known as "sdists" for short.
Source trees ============
We retroactively declare the legacy source tree format involving ``setup.py`` to be "version 0". We don't try to specify it further; its de facto specification is encoded in the source code and documentation of ``distutils``, ``setuptools``, ``pip``, and other tools.
A "version 1" (or greater) source tree is any directory which contains a file named ``pypackage.cfg``, which will -- in some manner whose details are TBD -- describe the package's build dependencies and how to invoke the build system. This mechanism:
- Will allow for both static and dynamic specification of build dependencies
- Will have some degree of isolation of different builds from each other, so that it will be possible for a single run of pip to install one package that build-depends on ``foo = 1.1`` and another package that build-depends on ``foo = 1.2``.
All good so far.
- Will leave the actual installation of the package in the hands of the build/installation tool (i.e. individual package build systems will not need to know about things like --user versus --global or make decisions about when and how to modify .pth files)
This seems completely backwards to me. It's pip's job to do the actual install. The build tool should *only* focus on generating standard conforming binary wheels - otherwise what's the point of the separation of concerns that wheels provide? Or maybe I'm confused by the term "build/installation tool" - by that did you actually mean pip, rather than the build system? (TBDs omitted)
Source distributions ====================
[possibly this should get split off into a separate PEP, but I'll keep it together for now for ease of discussion]
A "version 1" (or greater) source distribution is a file meeting the following criteria:
- It MUST have a name of the form: {PACKAGE}-{VERSION}.{EXT}, where {PACKAGE} is the package name, {VERSION} is a PEP 440-compliant version number, and {EXT} is a compliant archive format.
The set of compliant archive formats is: zip, [TBD]
[QUESTION: should we continue to allow .tar.gz and friends? In practice by "allow" I mean something like "accept new-style sdists on PyPI in this format". I'm inclined not to -- zip is the most universally supported format around, it allows file-based random access (unlike tar-based things) which is useful for pulling out metadata without decompressing the whole thing, and standardizing on one format dodges distracting and pointless discussions about which format to use, i.e. it's TOOWTDI-compliant. Of course pip is free to continue to support other archive formats when passed explicitly on the command line. Any objections?]
+1 on having a single archive format, and zip seems like the best choice.
Similar to wheels, the archive is Unicode, and the filenames inside the archive are encoded in UTF-8.
This isn't the job of the sdist format to specify. It should be implicit in the choice of archive format. Having said that, I'd go with 1. The sdist filename MUST support the full range of package names as specified in PEP 426 (https://www.python.org/dev/peps/pep-0426/#name) and versions as in PEP 440 (https://www.python.org/dev/peps/pep-0440/). That's actually far less than full Unicode. 2. The archive format MUST support arbitrary Unicode filenames. That means zip is OK, but tar.gz isn't unless you specify UTF-8 is used (the tar format doesn't allow for an encoding declaration - see https://docs.python.org/3.5/library/tarfile.html#tar-unicode for details on Unicode issues in the tar format). Having said that I'd also go with "filenames in the archive SHOULD be limited to ASCII" - because we have had issues with pip where test files have Unicode filenames, and builds break because they get mangled on systems with weird encoding setups... IIRC, these are typically related to .tar.gz sdists, which (due to the lack of encoding support) result in files being unpacked with the wrong names. So maybe if we enforce zip format we don't need to add this limitation.
- When unpacked, it MUST contain a single directory directory tree named ``{PACKAGE}-{VERSION}``.
- This directory tree MUST be a valid version 1 (or greater) source tree as defined above.
- It MUST additionally contain a directory named ``{PACKAGE}-{VERSION}.sdist-info`` (notice the ``s``), with the following contents:
- ``SDIST``: Mandatory. Same record-oriented format as a wheel's ``WHEEL`` file, but with different fields::
SDist-Version: 1.0 Generator: setuptools sdist 20.1
``SDist-Version`` is the version number of this specification. Software that processes sdists should warn if ``SDist-Version`` is greater than the version it supports, and must fail if ``SDist-Version`` has a greater major version than the version it supports.
``Generator`` is the name and optionally the version of the software that produced the archive.
- ``RECORD``: Mandatory. A list of all files contained in the sdist (except for the RECORD file itself and any signature files) together with their hashes, as specified in PEP 427.
- ``RECORD.jws``, ``RECORD.p7s``: Optional. Signature files as specified in PEP 427.
- ``METADATA``: Mandatory. Metadata version 1.1 or greater format metadata, with an additional rule that fields may contain the special sentinel value ``__SDIST_DYNAMIC__``, which indicates that the value of this field cannot be determined until build time. If a "multiple use field" is present with the value ``__SDIST_DYNAMIC__``, then this field MUST occur exactly once, e.g.::
# Okay: Requires-Dist: lxml (> 3.3) Requires-Dist: requests
# no Requires-Dist lines at all is okay # (meaning: this package's requirements are the empty set)
# Okay, requirements will be determined at build time: Requires-Dist: __SDIST_DYNAMIC__
# NOT okay: Requires-Dist: lxml (> 3.3) Requires-Dist: __SDIST_DYNAMIC__
(The use of a special token allows us to distinguish between multiple use fields whose value is statically the empty list versus one whose value is dynamic; it also allows us to distinguish between optional fields which are statically not present versus ones whose value is dynamic.)
When this sdist is built, the resulting wheel MUST have metadata which is identical to the metadata present in this file, except that any fields with value ``__SDIST_DYNAMIC__`` in the sdist may have arbitrary values in the wheel.
A valid sdist MUST NOT use the ``__SDIST_DYNAMIC__`` mechanism for the package name or version (i.e., these must be given statically), and these MUST match the {PACKAGE} and {VERSION} of the sdist as described above.
This seems pretty good at first reading.
[TBD: do we want to forbid the use of dynamic metadata for any other fields? I assume PyPI will enforce some stricter rules at least, but I don't know if we want to make that part of the spec, or just part of PyPI's administrative rules.]
This covers the main point of contention. It would be bad if build systems started using __SDIST_DYNAMIC__ just because "it's easier". Maybe add * A valid sdist SHOULD NOT use the __SDIST_DYNAMIC__ mechanism any more than necessary (i.e., if the metadata is the same in all generated wheels, it does not need to use the __SDIST_DYNAMIC__ mechanism, and so should not do so).
This is intentionally a close analogue of a wheel's ``.dist-info`` directory; intention is that as future metadata standards are defined, the specifications for the ``.sdist-info`` and ``.dist-info`` directories will evolve in synchrony.
Evolutionary notes ==================
A goal here is to make it as simple as possible to convert old-style sdists to new-style sdists. (E.g., this is one motivation for supporting dynamic build requirements.) The ideal would be that there would be a single static pypackage.cfg that could be dropped into any "version 0" VCS checkout to convert it to the new shiny. This is probably not 100% possible, but we can get close, and it's important to keep track of how close we are... hence this section.
A rough plan would be: Create a build system package (``setuptools_pypackage`` or whatever) that knows how to speak whatever hook language we come up with, and convert them into setuptools calls. This will probably require some sort of hooking or monkeypatching to setuptools to provide a way to extract the ``setup_requires=`` argument when needed, and to provide a new version of the sdist command that generates the new-style format. This all seems doable and sufficient for a large proportion of packages (though obviously we'll want to prototype such a system before we finalize anything here). (Alternatively, these changes could be made to setuptools itself rather than going into a separate package.)
But there remain two obstacles that mean we probably won't be able to automatically upgrade packages to the new format:
1) There currently exist packages which insist on particular packages being available in their environment before setup.py is executed. This means that if we decide to execute build scripts in an isolated virtualenv-like environment, then projects will need to check whether they do this, and if so then when upgrading to the new system they will have to start explicitly declaring these dependencies (either via ``setup_requires=`` or via static declaration in ``pypackage.cfg``).
2) There currently exist packages which do not declare consistent metadata (e.g. ``egg_info`` and ``bdist_wheel`` might get different ``install_requires=``). When upgrading to the new system, projects will have to evaluate whether this applies to them, and if so they will need to either stop doing that, or else add ``__SDIST_DYNAMIC__`` annotations at appropriate places.
We'll also presumably need some API for packages to describe which parts of the METADATA file should be marked ``__SDIST_DYNAMIC__``, for the packages that need it (a new argument to ``setup()`` or some setting in ``setup.cfg`` or something).
I'm confused here. And it's just now become clear *why* I'm confused. The sdist format MUST be a generated format - i.e., we should insist (in principle at least) that it's only ever generated by tools. Otherwise it's way too easy for people to just zip up their source tree, hand craft something generic (that over-uses __SDIST_DYNAMIC__) and say "here's an sdist". Obviously, people always *can* manually create an sdist but we need to pin down the spec tightly, or we've not improved things. That's why I'm concerned about __SDIST_DYNAMIC__ and it's also what confuses me about the above transition plan. For people using setuptools currently, the transition should be simply that they upgrade setuptools, and the "setup.py sdist" command in the new setuptools generates the new sdist format. By default, the setuptools sdist process assumes everything is static and requires the user to modify the setup.py to explicitly mark which metadata they want to be left to build time. That way, we get a relatively transparent transition, while avoiding overuse of dynamic metadata. If setup.py has to explicitly mark dynamic metadata, that also allows us to reject attempts to make name and version dynamic. Which is good. Paul