Second draft of a plan for a new source tree / sdist format
Hi all, Here's a second round of text towards making a build-system independent interface between pip and source trees/sdists. My idea this time is to take a divide-and-conquer approach: this text tries to summarize all the stuff that it seemed like we had mostly reached consensus on in the previous thread + call, with blank chunks marked "TBD" where there are specific points that still need To Be Determined. So my hope is that everyone will read what's here and agree that it's great as far as it goes, and then we can go through and fill in each missing piece one at a time. ------ PEP: ?? Title: A build-system independent format for source trees and source distributions Version: $Revision$ Last-Modified: $Date$ Author: Nathaniel J. Smith <njs@pobox.com> Status: Draft Type: Standards-Track Content-Type: text/x-rst Created: 30-Sep-2015 Post-History: 1 Oct 2015, 25 Oct 2015 Discussions-To: <distutils-sig@python.org> Abstract ======== Distutils delenda est. Extended abstract ================= While ``distutils`` / ``setuptools`` have taken us a long way, they suffer from three serious problems: (a) they're missing important features like autoconfiguration and usable build-time dependency declaration, (b) extending them is quirky, complicated, and fragile, (c) it's very difficult to use anything else, because they provide the standard interface for installing python packages expected by both users and installation tools like ``pip``. Previous efforts (e.g. distutils2 or setuptools itself) have attempted to solve problems (a) and/or (b). We propose to solve (c). The goal of this PEP is get distutils-sig out of the business of being a gatekeeper for Python build systems. If you want to use distutils, great; if you want to use something else, then that should be easy to do using standardized methods. The difficulty of interfacing with distutils means that there aren't many such systems right now, but to give a sense of what we're thinking about see `flit <https://github.com/takluyver/flit>`_ or `bento <https://cournape.github.io/Bento/>`_. Fortunately, wheels have now solved many of the hard problems here -- e.g. it's no longer necessary that a build system also know about every possible installation configuration -- so pretty much all we really need from a build system is that it have some way to spit out standard-compliant wheels. We therefore propose a new, relatively minimal interface for installation tools like ``pip`` to interact with package source trees and source distributions. In addition, we propose a wheel-inspired static metadata format for sdists, suitable for tools like PyPI and pip's resolver. Terminology and goals ===================== A *source tree* is something like a VCS checkout. We need a standard interface for installing from this format, to support usages like ``pip install some-directory/``. A *source distribution* is a static snapshot representing a particular release of some source code, like ``lxml-3.4.4.zip``. Source distributions serve many purposes: they form an archival record of releases, they provide a stupid-simple de facto standard for tools that want to ingest and process large corpora of code, possibly written in many languages (e.g. code search), they act as the input to downstream packaging systems like Debian/Fedora/Conda/..., and so forth. In the Python ecosystem they additionally have a particularly important role to play, because packaging tools like ``pip`` are able to use source distributions to fulfill binary dependencies, e.g. if there is a distribution ``foo.whl`` which declares a dependency on ``bar``, then we need to support the case where ``pip install bar`` or ``pip install foo`` automatically locates the sdist for ``bar``, downloads it, builds it, and installs the resulting package. Source distributions are also known as "sdists" for short. Source trees ============ We retroactively declare the legacy source tree format involving ``setup.py`` to be "version 0". We don't try to specify it further; its de facto specification is encoded in the source code and documentation of ``distutils``, ``setuptools``, ``pip``, and other tools. A "version 1" (or greater) source tree is any directory which contains a file named ``pypackage.cfg``, which will -- in some manner whose details are TBD -- describe the package's build dependencies and how to invoke the build system. This mechanism: - Will allow for both static and dynamic specification of build dependencies - Will have some degree of isolation of different builds from each other, so that it will be possible for a single run of pip to install one package that build-depends on ``foo = 1.1`` and another package that build-depends on ``foo = 1.2``. - Will leave the actual installation of the package in the hands of the build/installation tool (i.e. individual package build systems will not need to know about things like --user versus --global or make decisions about when and how to modify .pth files) [TBD: the exact set of operations to be supported and their detailed semantics] [TBD: should builds be performed in a fully isolated environment, or should they get access to packages that are already installed in the target install environment? The former simplifies a number of things, but Robert was skeptical it would be possible.] [TBD: the form of the communication channel between an installation tool like ``pip`` and the build system, over which these operations are requested] [TBD: the syntactic details of the configuration file format itself. We can change the name too if we want, I just think it's useful to have a single name to refer to it for now, and this is the last and least interesting thing to figure out.] Source distributions ==================== [possibly this should get split off into a separate PEP, but I'll keep it together for now for ease of discussion] A "version 1" (or greater) source distribution is a file meeting the following criteria: - It MUST have a name of the form: {PACKAGE}-{VERSION}.{EXT}, where {PACKAGE} is the package name, {VERSION} is a PEP 440-compliant version number, and {EXT} is a compliant archive format. The set of compliant archive formats is: zip, [TBD] [QUESTION: should we continue to allow .tar.gz and friends? In practice by "allow" I mean something like "accept new-style sdists on PyPI in this format". I'm inclined not to -- zip is the most universally supported format around, it allows file-based random access (unlike tar-based things) which is useful for pulling out metadata without decompressing the whole thing, and standardizing on one format dodges distracting and pointless discussions about which format to use, i.e. it's TOOWTDI-compliant. Of course pip is free to continue to support other archive formats when passed explicitly on the command line. Any objections?] Similar to wheels, the archive is Unicode, and the filenames inside the archive are encoded in UTF-8. - When unpacked, it MUST contain a single directory directory tree named ``{PACKAGE}-{VERSION}``. - This directory tree MUST be a valid version 1 (or greater) source tree as defined above. - It MUST additionally contain a directory named ``{PACKAGE}-{VERSION}.sdist-info`` (notice the ``s``), with the following contents: - ``SDIST``: Mandatory. Same record-oriented format as a wheel's ``WHEEL`` file, but with different fields:: SDist-Version: 1.0 Generator: setuptools sdist 20.1 ``SDist-Version`` is the version number of this specification. Software that processes sdists should warn if ``SDist-Version`` is greater than the version it supports, and must fail if ``SDist-Version`` has a greater major version than the version it supports. ``Generator`` is the name and optionally the version of the software that produced the archive. - ``RECORD``: Mandatory. A list of all files contained in the sdist (except for the RECORD file itself and any signature files) together with their hashes, as specified in PEP 427. - ``RECORD.jws``, ``RECORD.p7s``: Optional. Signature files as specified in PEP 427. - ``METADATA``: Mandatory. Metadata version 1.1 or greater format metadata, with an additional rule that fields may contain the special sentinel value ``__SDIST_DYNAMIC__``, which indicates that the value of this field cannot be determined until build time. If a "multiple use field" is present with the value ``__SDIST_DYNAMIC__``, then this field MUST occur exactly once, e.g.:: # Okay: Requires-Dist: lxml (> 3.3) Requires-Dist: requests # no Requires-Dist lines at all is okay # (meaning: this package's requirements are the empty set) # Okay, requirements will be determined at build time: Requires-Dist: __SDIST_DYNAMIC__ # NOT okay: Requires-Dist: lxml (> 3.3) Requires-Dist: __SDIST_DYNAMIC__ (The use of a special token allows us to distinguish between multiple use fields whose value is statically the empty list versus one whose value is dynamic; it also allows us to distinguish between optional fields which are statically not present versus ones whose value is dynamic.) When this sdist is built, the resulting wheel MUST have metadata which is identical to the metadata present in this file, except that any fields with value ``__SDIST_DYNAMIC__`` in the sdist may have arbitrary values in the wheel. A valid sdist MUST NOT use the ``__SDIST_DYNAMIC__`` mechanism for the package name or version (i.e., these must be given statically), and these MUST match the {PACKAGE} and {VERSION} of the sdist as described above. [TBD: do we want to forbid the use of dynamic metadata for any other fields? I assume PyPI will enforce some stricter rules at least, but I don't know if we want to make that part of the spec, or just part of PyPI's administrative rules.] This is intentionally a close analogue of a wheel's ``.dist-info`` directory; intention is that as future metadata standards are defined, the specifications for the ``.sdist-info`` and ``.dist-info`` directories will evolve in synchrony. Evolutionary notes ================== A goal here is to make it as simple as possible to convert old-style sdists to new-style sdists. (E.g., this is one motivation for supporting dynamic build requirements.) The ideal would be that there would be a single static pypackage.cfg that could be dropped into any "version 0" VCS checkout to convert it to the new shiny. This is probably not 100% possible, but we can get close, and it's important to keep track of how close we are... hence this section. A rough plan would be: Create a build system package (``setuptools_pypackage`` or whatever) that knows how to speak whatever hook language we come up with, and convert them into setuptools calls. This will probably require some sort of hooking or monkeypatching to setuptools to provide a way to extract the ``setup_requires=`` argument when needed, and to provide a new version of the sdist command that generates the new-style format. This all seems doable and sufficient for a large proportion of packages (though obviously we'll want to prototype such a system before we finalize anything here). (Alternatively, these changes could be made to setuptools itself rather than going into a separate package.) But there remain two obstacles that mean we probably won't be able to automatically upgrade packages to the new format: 1) There currently exist packages which insist on particular packages being available in their environment before setup.py is executed. This means that if we decide to execute build scripts in an isolated virtualenv-like environment, then projects will need to check whether they do this, and if so then when upgrading to the new system they will have to start explicitly declaring these dependencies (either via ``setup_requires=`` or via static declaration in ``pypackage.cfg``). 2) There currently exist packages which do not declare consistent metadata (e.g. ``egg_info`` and ``bdist_wheel`` might get different ``install_requires=``). When upgrading to the new system, projects will have to evaluate whether this applies to them, and if so they will need to either stop doing that, or else add ``__SDIST_DYNAMIC__`` annotations at appropriate places. We'll also presumably need some API for packages to describe which parts of the METADATA file should be marked ``__SDIST_DYNAMIC__``, for the packages that need it (a new argument to ``setup()`` or some setting in ``setup.cfg`` or something). -- Nathaniel J. Smith -- http://vorpus.org
On 26 October 2015 at 06:04, Nathaniel Smith <njs@pobox.com> wrote:
Here's a second round of text towards making a build-system independent interface between pip and source trees/sdists. My idea this time is to take a divide-and-conquer approach: this text tries to summarize all the stuff that it seemed like we had mostly reached consensus on in the previous thread + call, with blank chunks marked "TBD" where there are specific points that still need To Be Determined. So my hope is that everyone will read what's here and agree that it's great as far as it goes, and then we can go through and fill in each missing piece one at a time.
I'll comment on what's here, but ignore the TBD items - I'd rather (as you suggest) leave discussion of those details till the basic idea is agreed.
Abstract ========
Distutils delenda est.
While this makes a nice tagline, I'd rather something less negative. Distutils does not "need" to be destroyed. It's perfectly adequate (although hardly user friendly) for a lot of cases - I'd be willing to suggest *most* users can work just fine with distutils. I'm not a fan of distutils, but I'd prefer it if we kept the rhetoric limited - as Nick pointed out this whole area is as much a political issue as a technical one.
Extended abstract =================
While ``distutils`` / ``setuptools`` have taken us a long way, they suffer from three serious problems: (a) they're missing important features like autoconfiguration and usable build-time dependency declaration, (b) extending them is quirky, complicated, and fragile, (c) it's very difficult to use anything else, because they provide the standard interface for installing python packages expected by both users and installation tools like ``pip``.
Again, this is overstated. You very nearly lost me right here - people won't read the details of the proposal if they disagree with the abstract(s). Specifically: * The features in (a) are only important to *some* parts of the community. The scientific community is the major one, and is a huge influence over the direction we want to go in, but again, not crucial to many people. And even where they might be useful (e.g., Windows users building pyyaml, lxml, pillow, ...) the description implies "working out what's there" rather than "allowing users to easily manage non-Python dependencies", which gives the wrong impression. * The features in (b) are highly specialised. Very few people extend setuptools/distutils. And those who do, have often invested a lot of effort in doing so. Sure, they'd rather not have needed to, but now that they have, a replacement system simply means that work is lost. Arguably, fixing (b) is only useful for people (like the scientific community) who have needed to extend setuptools and have been unable to achieve their goals that way. That's an even smaller part of the community.
Previous efforts (e.g. distutils2 or setuptools itself) have attempted to solve problems (a) and/or (b). We propose to solve (c).
Agreed - this is a good approach. But it's at odds with your abstract, which says distutils must die. Here you're saying you want to allow people to keep using distutils but allow people with specialised needs to choose an alternative. Or are you offering an alternative to people who use distutils? The whole of the above is confusing on the face of it. The details below clarify a lot, as does knowing how the previous discussions have gone. But it would help a lot if the introduction to this PEP were clearer.
The goal of this PEP is get distutils-sig out of the business of being a gatekeeper for Python build systems. If you want to use distutils, great; if you want to use something else, then that should be easy to do using standardized methods. The difficulty of interfacing with distutils means that there aren't many such systems right now, but to give a sense of what we're thinking about see `flit <https://github.com/takluyver/flit>`_ or `bento <https://cournape.github.io/Bento/>`_. Fortunately, wheels have now solved many of the hard problems here -- e.g. it's no longer necessary that a build system also know about every possible installation configuration -- so pretty much all we really need from a build system is that it have some way to spit out standard-compliant wheels.
OK. Although I see a risk here that if I want to build package FOO, I now have to worry whether FOO's build system supports Windows, as well as worrying whether FOO itself supports Windows. There's still a role for some "gatekeeper" (not a good word IMO, maybe "coordinator") to provide a certain level of support or review of build systems, and a point of contact for users with build issues (the point of this proposal is to some extent that people don't need to *know* what build system a project uses, so suggesting everyone has to direct issues to the correct build system support forum isn't necessarily practical).
We therefore propose a new, relatively minimal interface for installation tools like ``pip`` to interact with package source trees and source distributions.
In addition, we propose a wheel-inspired static metadata format for sdists, suitable for tools like PyPI and pip's resolver.
Terminology and goals =====================
A *source tree* is something like a VCS checkout. We need a standard interface for installing from this format, to support usages like ``pip install some-directory/``.
A *source distribution* is a static snapshot representing a particular release of some source code, like ``lxml-3.4.4.zip``. Source distributions serve many purposes: they form an archival record of releases, they provide a stupid-simple de facto standard for tools that want to ingest and process large corpora of code, possibly written in many languages (e.g. code search), they act as the input to downstream packaging systems like Debian/Fedora/Conda/..., and so forth. In the Python ecosystem they additionally have a particularly important role to play, because packaging tools like ``pip`` are able to use source distributions to fulfill binary dependencies, e.g. if there is a distribution ``foo.whl`` which declares a dependency on ``bar``, then we need to support the case where ``pip install bar`` or ``pip install foo`` automatically locates the sdist for ``bar``, downloads it, builds it, and installs the resulting package.
This is somewhat misleading, given that you go on to specify the format below, but maybe that's only an issue for someone like me who saw the previous debate over "source distribution" (as a bundled up source tree) vs "sdist" as a specified format. If I understand, you've now discarded the former sense of source distribution, and are sticking with the latter (specified format) definition.
Source distributions are also known as "sdists" for short.
Source trees ============
We retroactively declare the legacy source tree format involving ``setup.py`` to be "version 0". We don't try to specify it further; its de facto specification is encoded in the source code and documentation of ``distutils``, ``setuptools``, ``pip``, and other tools.
A "version 1" (or greater) source tree is any directory which contains a file named ``pypackage.cfg``, which will -- in some manner whose details are TBD -- describe the package's build dependencies and how to invoke the build system. This mechanism:
- Will allow for both static and dynamic specification of build dependencies
- Will have some degree of isolation of different builds from each other, so that it will be possible for a single run of pip to install one package that build-depends on ``foo = 1.1`` and another package that build-depends on ``foo = 1.2``.
All good so far.
- Will leave the actual installation of the package in the hands of the build/installation tool (i.e. individual package build systems will not need to know about things like --user versus --global or make decisions about when and how to modify .pth files)
This seems completely backwards to me. It's pip's job to do the actual install. The build tool should *only* focus on generating standard conforming binary wheels - otherwise what's the point of the separation of concerns that wheels provide? Or maybe I'm confused by the term "build/installation tool" - by that did you actually mean pip, rather than the build system? (TBDs omitted)
Source distributions ====================
[possibly this should get split off into a separate PEP, but I'll keep it together for now for ease of discussion]
A "version 1" (or greater) source distribution is a file meeting the following criteria:
- It MUST have a name of the form: {PACKAGE}-{VERSION}.{EXT}, where {PACKAGE} is the package name, {VERSION} is a PEP 440-compliant version number, and {EXT} is a compliant archive format.
The set of compliant archive formats is: zip, [TBD]
[QUESTION: should we continue to allow .tar.gz and friends? In practice by "allow" I mean something like "accept new-style sdists on PyPI in this format". I'm inclined not to -- zip is the most universally supported format around, it allows file-based random access (unlike tar-based things) which is useful for pulling out metadata without decompressing the whole thing, and standardizing on one format dodges distracting and pointless discussions about which format to use, i.e. it's TOOWTDI-compliant. Of course pip is free to continue to support other archive formats when passed explicitly on the command line. Any objections?]
+1 on having a single archive format, and zip seems like the best choice.
Similar to wheels, the archive is Unicode, and the filenames inside the archive are encoded in UTF-8.
This isn't the job of the sdist format to specify. It should be implicit in the choice of archive format. Having said that, I'd go with 1. The sdist filename MUST support the full range of package names as specified in PEP 426 (https://www.python.org/dev/peps/pep-0426/#name) and versions as in PEP 440 (https://www.python.org/dev/peps/pep-0440/). That's actually far less than full Unicode. 2. The archive format MUST support arbitrary Unicode filenames. That means zip is OK, but tar.gz isn't unless you specify UTF-8 is used (the tar format doesn't allow for an encoding declaration - see https://docs.python.org/3.5/library/tarfile.html#tar-unicode for details on Unicode issues in the tar format). Having said that I'd also go with "filenames in the archive SHOULD be limited to ASCII" - because we have had issues with pip where test files have Unicode filenames, and builds break because they get mangled on systems with weird encoding setups... IIRC, these are typically related to .tar.gz sdists, which (due to the lack of encoding support) result in files being unpacked with the wrong names. So maybe if we enforce zip format we don't need to add this limitation.
- When unpacked, it MUST contain a single directory directory tree named ``{PACKAGE}-{VERSION}``.
- This directory tree MUST be a valid version 1 (or greater) source tree as defined above.
- It MUST additionally contain a directory named ``{PACKAGE}-{VERSION}.sdist-info`` (notice the ``s``), with the following contents:
- ``SDIST``: Mandatory. Same record-oriented format as a wheel's ``WHEEL`` file, but with different fields::
SDist-Version: 1.0 Generator: setuptools sdist 20.1
``SDist-Version`` is the version number of this specification. Software that processes sdists should warn if ``SDist-Version`` is greater than the version it supports, and must fail if ``SDist-Version`` has a greater major version than the version it supports.
``Generator`` is the name and optionally the version of the software that produced the archive.
- ``RECORD``: Mandatory. A list of all files contained in the sdist (except for the RECORD file itself and any signature files) together with their hashes, as specified in PEP 427.
- ``RECORD.jws``, ``RECORD.p7s``: Optional. Signature files as specified in PEP 427.
- ``METADATA``: Mandatory. Metadata version 1.1 or greater format metadata, with an additional rule that fields may contain the special sentinel value ``__SDIST_DYNAMIC__``, which indicates that the value of this field cannot be determined until build time. If a "multiple use field" is present with the value ``__SDIST_DYNAMIC__``, then this field MUST occur exactly once, e.g.::
# Okay: Requires-Dist: lxml (> 3.3) Requires-Dist: requests
# no Requires-Dist lines at all is okay # (meaning: this package's requirements are the empty set)
# Okay, requirements will be determined at build time: Requires-Dist: __SDIST_DYNAMIC__
# NOT okay: Requires-Dist: lxml (> 3.3) Requires-Dist: __SDIST_DYNAMIC__
(The use of a special token allows us to distinguish between multiple use fields whose value is statically the empty list versus one whose value is dynamic; it also allows us to distinguish between optional fields which are statically not present versus ones whose value is dynamic.)
When this sdist is built, the resulting wheel MUST have metadata which is identical to the metadata present in this file, except that any fields with value ``__SDIST_DYNAMIC__`` in the sdist may have arbitrary values in the wheel.
A valid sdist MUST NOT use the ``__SDIST_DYNAMIC__`` mechanism for the package name or version (i.e., these must be given statically), and these MUST match the {PACKAGE} and {VERSION} of the sdist as described above.
This seems pretty good at first reading.
[TBD: do we want to forbid the use of dynamic metadata for any other fields? I assume PyPI will enforce some stricter rules at least, but I don't know if we want to make that part of the spec, or just part of PyPI's administrative rules.]
This covers the main point of contention. It would be bad if build systems started using __SDIST_DYNAMIC__ just because "it's easier". Maybe add * A valid sdist SHOULD NOT use the __SDIST_DYNAMIC__ mechanism any more than necessary (i.e., if the metadata is the same in all generated wheels, it does not need to use the __SDIST_DYNAMIC__ mechanism, and so should not do so).
This is intentionally a close analogue of a wheel's ``.dist-info`` directory; intention is that as future metadata standards are defined, the specifications for the ``.sdist-info`` and ``.dist-info`` directories will evolve in synchrony.
Evolutionary notes ==================
A goal here is to make it as simple as possible to convert old-style sdists to new-style sdists. (E.g., this is one motivation for supporting dynamic build requirements.) The ideal would be that there would be a single static pypackage.cfg that could be dropped into any "version 0" VCS checkout to convert it to the new shiny. This is probably not 100% possible, but we can get close, and it's important to keep track of how close we are... hence this section.
A rough plan would be: Create a build system package (``setuptools_pypackage`` or whatever) that knows how to speak whatever hook language we come up with, and convert them into setuptools calls. This will probably require some sort of hooking or monkeypatching to setuptools to provide a way to extract the ``setup_requires=`` argument when needed, and to provide a new version of the sdist command that generates the new-style format. This all seems doable and sufficient for a large proportion of packages (though obviously we'll want to prototype such a system before we finalize anything here). (Alternatively, these changes could be made to setuptools itself rather than going into a separate package.)
But there remain two obstacles that mean we probably won't be able to automatically upgrade packages to the new format:
1) There currently exist packages which insist on particular packages being available in their environment before setup.py is executed. This means that if we decide to execute build scripts in an isolated virtualenv-like environment, then projects will need to check whether they do this, and if so then when upgrading to the new system they will have to start explicitly declaring these dependencies (either via ``setup_requires=`` or via static declaration in ``pypackage.cfg``).
2) There currently exist packages which do not declare consistent metadata (e.g. ``egg_info`` and ``bdist_wheel`` might get different ``install_requires=``). When upgrading to the new system, projects will have to evaluate whether this applies to them, and if so they will need to either stop doing that, or else add ``__SDIST_DYNAMIC__`` annotations at appropriate places.
We'll also presumably need some API for packages to describe which parts of the METADATA file should be marked ``__SDIST_DYNAMIC__``, for the packages that need it (a new argument to ``setup()`` or some setting in ``setup.cfg`` or something).
I'm confused here. And it's just now become clear *why* I'm confused. The sdist format MUST be a generated format - i.e., we should insist (in principle at least) that it's only ever generated by tools. Otherwise it's way too easy for people to just zip up their source tree, hand craft something generic (that over-uses __SDIST_DYNAMIC__) and say "here's an sdist". Obviously, people always *can* manually create an sdist but we need to pin down the spec tightly, or we've not improved things. That's why I'm concerned about __SDIST_DYNAMIC__ and it's also what confuses me about the above transition plan. For people using setuptools currently, the transition should be simply that they upgrade setuptools, and the "setup.py sdist" command in the new setuptools generates the new sdist format. By default, the setuptools sdist process assumes everything is static and requires the user to modify the setup.py to explicitly mark which metadata they want to be left to build time. That way, we get a relatively transparent transition, while avoiding overuse of dynamic metadata. If setup.py has to explicitly mark dynamic metadata, that also allows us to reject attempts to make name and version dynamic. Which is good. Paul
The drawback of .zip is file size since it compresses each file individually rather than giving the compression algorithm a larger input, it's a great format otherwise. Ubiquitous including Apple iOS packages, Java, word processor file formats. And most Python packages are small. We must do the hard work to support Unicode file names, and spaces and accent marks in home directory names (historically a problem on Windows), in our packaging system. It is the right thing to do. It is not the publisher's fault that your system has broken Unicode. On Tue, Oct 27, 2015 at 6:43 AM Paul Moore <p.f.moore@gmail.com> wrote:
On 26 October 2015 at 06:04, Nathaniel Smith <njs@pobox.com> wrote:
Here's a second round of text towards making a build-system independent interface between pip and source trees/sdists. My idea this time is to take a divide-and-conquer approach: this text tries to summarize all the stuff that it seemed like we had mostly reached consensus on in the previous thread + call, with blank chunks marked "TBD" where there are specific points that still need To Be Determined. So my hope is that everyone will read what's here and agree that it's great as far as it goes, and then we can go through and fill in each missing piece one at a time.
I'll comment on what's here, but ignore the TBD items - I'd rather (as you suggest) leave discussion of those details till the basic idea is agreed.
Abstract ========
Distutils delenda est.
While this makes a nice tagline, I'd rather something less negative. Distutils does not "need" to be destroyed. It's perfectly adequate (although hardly user friendly) for a lot of cases - I'd be willing to suggest *most* users can work just fine with distutils.
I'm not a fan of distutils, but I'd prefer it if we kept the rhetoric limited - as Nick pointed out this whole area is as much a political issue as a technical one.
Extended abstract =================
While ``distutils`` / ``setuptools`` have taken us a long way, they suffer from three serious problems: (a) they're missing important features like autoconfiguration and usable build-time dependency declaration, (b) extending them is quirky, complicated, and fragile, (c) it's very difficult to use anything else, because they provide the standard interface for installing python packages expected by both users and installation tools like ``pip``.
Again, this is overstated. You very nearly lost me right here - people won't read the details of the proposal if they disagree with the abstract(s). Specifically:
* The features in (a) are only important to *some* parts of the community. The scientific community is the major one, and is a huge influence over the direction we want to go in, but again, not crucial to many people. And even where they might be useful (e.g., Windows users building pyyaml, lxml, pillow, ...) the description implies "working out what's there" rather than "allowing users to easily manage non-Python dependencies", which gives the wrong impression.
* The features in (b) are highly specialised. Very few people extend setuptools/distutils. And those who do, have often invested a lot of effort in doing so. Sure, they'd rather not have needed to, but now that they have, a replacement system simply means that work is lost. Arguably, fixing (b) is only useful for people (like the scientific community) who have needed to extend setuptools and have been unable to achieve their goals that way. That's an even smaller part of the community.
Previous efforts (e.g. distutils2 or setuptools itself) have attempted to solve problems (a) and/or (b). We propose to solve (c).
Agreed - this is a good approach. But it's at odds with your abstract, which says distutils must die. Here you're saying you want to allow people to keep using distutils but allow people with specialised needs to choose an alternative. Or are you offering an alternative to people who use distutils?
The whole of the above is confusing on the face of it. The details below clarify a lot, as does knowing how the previous discussions have gone. But it would help a lot if the introduction to this PEP were clearer.
The goal of this PEP is get distutils-sig out of the business of being a gatekeeper for Python build systems. If you want to use distutils, great; if you want to use something else, then that should be easy to do using standardized methods. The difficulty of interfacing with distutils means that there aren't many such systems right now, but to give a sense of what we're thinking about see `flit <https://github.com/takluyver/flit>`_ or `bento <https://cournape.github.io/Bento/>`_. Fortunately, wheels have now solved many of the hard problems here -- e.g. it's no longer necessary that a build system also know about every possible installation configuration -- so pretty much all we really need from a build system is that it have some way to spit out standard-compliant wheels.
OK. Although I see a risk here that if I want to build package FOO, I now have to worry whether FOO's build system supports Windows, as well as worrying whether FOO itself supports Windows.
There's still a role for some "gatekeeper" (not a good word IMO, maybe "coordinator") to provide a certain level of support or review of build systems, and a point of contact for users with build issues (the point of this proposal is to some extent that people don't need to *know* what build system a project uses, so suggesting everyone has to direct issues to the correct build system support forum isn't necessarily practical).
We therefore propose a new, relatively minimal interface for installation tools like ``pip`` to interact with package source trees and source distributions.
In addition, we propose a wheel-inspired static metadata format for sdists, suitable for tools like PyPI and pip's resolver.
Terminology and goals =====================
A *source tree* is something like a VCS checkout. We need a standard interface for installing from this format, to support usages like ``pip install some-directory/``.
A *source distribution* is a static snapshot representing a particular release of some source code, like ``lxml-3.4.4.zip``. Source distributions serve many purposes: they form an archival record of releases, they provide a stupid-simple de facto standard for tools that want to ingest and process large corpora of code, possibly written in many languages (e.g. code search), they act as the input to downstream packaging systems like Debian/Fedora/Conda/..., and so forth. In the Python ecosystem they additionally have a particularly important role to play, because packaging tools like ``pip`` are able to use source distributions to fulfill binary dependencies, e.g. if there is a distribution ``foo.whl`` which declares a dependency on ``bar``, then we need to support the case where ``pip install bar`` or ``pip install foo`` automatically locates the sdist for ``bar``, downloads it, builds it, and installs the resulting package.
This is somewhat misleading, given that you go on to specify the format below, but maybe that's only an issue for someone like me who saw the previous debate over "source distribution" (as a bundled up source tree) vs "sdist" as a specified format. If I understand, you've now discarded the former sense of source distribution, and are sticking with the latter (specified format) definition.
Source distributions are also known as "sdists" for short.
Source trees ============
We retroactively declare the legacy source tree format involving ``setup.py`` to be "version 0". We don't try to specify it further; its de facto specification is encoded in the source code and documentation of ``distutils``, ``setuptools``, ``pip``, and other tools.
A "version 1" (or greater) source tree is any directory which contains a file named ``pypackage.cfg``, which will -- in some manner whose details are TBD -- describe the package's build dependencies and how to invoke the build system. This mechanism:
- Will allow for both static and dynamic specification of build dependencies
- Will have some degree of isolation of different builds from each other, so that it will be possible for a single run of pip to install one package that build-depends on ``foo = 1.1`` and another package that build-depends on ``foo = 1.2``.
All good so far.
- Will leave the actual installation of the package in the hands of the build/installation tool (i.e. individual package build systems will not need to know about things like --user versus --global or make decisions about when and how to modify .pth files)
This seems completely backwards to me. It's pip's job to do the actual install. The build tool should *only* focus on generating standard conforming binary wheels - otherwise what's the point of the separation of concerns that wheels provide?
Or maybe I'm confused by the term "build/installation tool" - by that did you actually mean pip, rather than the build system?
(TBDs omitted)
Source distributions ====================
[possibly this should get split off into a separate PEP, but I'll keep it together for now for ease of discussion]
A "version 1" (or greater) source distribution is a file meeting the following criteria:
- It MUST have a name of the form: {PACKAGE}-{VERSION}.{EXT}, where {PACKAGE} is the package name, {VERSION} is a PEP 440-compliant version number, and {EXT} is a compliant archive format.
The set of compliant archive formats is: zip, [TBD]
[QUESTION: should we continue to allow .tar.gz and friends? In practice by "allow" I mean something like "accept new-style sdists on PyPI in this format". I'm inclined not to -- zip is the most universally supported format around, it allows file-based random access (unlike tar-based things) which is useful for pulling out metadata without decompressing the whole thing, and standardizing on one format dodges distracting and pointless discussions about which format to use, i.e. it's TOOWTDI-compliant. Of course pip is free to continue to support other archive formats when passed explicitly on the command line. Any objections?]
+1 on having a single archive format, and zip seems like the best choice.
Similar to wheels, the archive is Unicode, and the filenames inside the archive are encoded in UTF-8.
This isn't the job of the sdist format to specify. It should be implicit in the choice of archive format.
Having said that, I'd go with
1. The sdist filename MUST support the full range of package names as specified in PEP 426 (https://www.python.org/dev/peps/pep-0426/#name) and versions as in PEP 440 (https://www.python.org/dev/peps/pep-0440/). That's actually far less than full Unicode. 2. The archive format MUST support arbitrary Unicode filenames. That means zip is OK, but tar.gz isn't unless you specify UTF-8 is used (the tar format doesn't allow for an encoding declaration - see https://docs.python.org/3.5/library/tarfile.html#tar-unicode for details on Unicode issues in the tar format).
Having said that I'd also go with "filenames in the archive SHOULD be limited to ASCII" - because we have had issues with pip where test files have Unicode filenames, and builds break because they get mangled on systems with weird encoding setups... IIRC, these are typically related to .tar.gz sdists, which (due to the lack of encoding support) result in files being unpacked with the wrong names. So maybe if we enforce zip format we don't need to add this limitation.
- When unpacked, it MUST contain a single directory directory tree named ``{PACKAGE}-{VERSION}``.
- This directory tree MUST be a valid version 1 (or greater) source tree as defined above.
- It MUST additionally contain a directory named ``{PACKAGE}-{VERSION}.sdist-info`` (notice the ``s``), with the following contents:
- ``SDIST``: Mandatory. Same record-oriented format as a wheel's ``WHEEL`` file, but with different fields::
SDist-Version: 1.0 Generator: setuptools sdist 20.1
``SDist-Version`` is the version number of this specification. Software that processes sdists should warn if ``SDist-Version`` is greater than the version it supports, and must fail if ``SDist-Version`` has a greater major version than the version it supports.
``Generator`` is the name and optionally the version of the software that produced the archive.
- ``RECORD``: Mandatory. A list of all files contained in the sdist (except for the RECORD file itself and any signature files) together with their hashes, as specified in PEP 427.
- ``RECORD.jws``, ``RECORD.p7s``: Optional. Signature files as specified in PEP 427.
- ``METADATA``: Mandatory. Metadata version 1.1 or greater format metadata, with an additional rule that fields may contain the special sentinel value ``__SDIST_DYNAMIC__``, which indicates that the value of this field cannot be determined until build time. If a "multiple use field" is present with the value ``__SDIST_DYNAMIC__``, then this field MUST occur exactly once, e.g.::
# Okay: Requires-Dist: lxml (> 3.3) Requires-Dist: requests
# no Requires-Dist lines at all is okay # (meaning: this package's requirements are the empty set)
# Okay, requirements will be determined at build time: Requires-Dist: __SDIST_DYNAMIC__
# NOT okay: Requires-Dist: lxml (> 3.3) Requires-Dist: __SDIST_DYNAMIC__
(The use of a special token allows us to distinguish between multiple use fields whose value is statically the empty list versus one whose value is dynamic; it also allows us to distinguish between optional fields which are statically not present versus ones whose value is dynamic.)
When this sdist is built, the resulting wheel MUST have metadata which is identical to the metadata present in this file, except that any fields with value ``__SDIST_DYNAMIC__`` in the sdist may have arbitrary values in the wheel.
A valid sdist MUST NOT use the ``__SDIST_DYNAMIC__`` mechanism for the package name or version (i.e., these must be given statically), and these MUST match the {PACKAGE} and {VERSION} of the sdist as described above.
This seems pretty good at first reading.
[TBD: do we want to forbid the use of dynamic metadata for any other fields? I assume PyPI will enforce some stricter rules at least, but I don't know if we want to make that part of the spec, or just part of PyPI's administrative rules.]
This covers the main point of contention. It would be bad if build systems started using __SDIST_DYNAMIC__ just because "it's easier".
Maybe add
* A valid sdist SHOULD NOT use the __SDIST_DYNAMIC__ mechanism any more than necessary (i.e., if the metadata is the same in all generated wheels, it does not need to use the __SDIST_DYNAMIC__ mechanism, and so should not do so).
This is intentionally a close analogue of a wheel's ``.dist-info`` directory; intention is that as future metadata standards are defined, the specifications for the ``.sdist-info`` and ``.dist-info`` directories will evolve in synchrony.
Evolutionary notes ==================
A goal here is to make it as simple as possible to convert old-style sdists to new-style sdists. (E.g., this is one motivation for supporting dynamic build requirements.) The ideal would be that there would be a single static pypackage.cfg that could be dropped into any "version 0" VCS checkout to convert it to the new shiny. This is probably not 100% possible, but we can get close, and it's important to keep track of how close we are... hence this section.
A rough plan would be: Create a build system package (``setuptools_pypackage`` or whatever) that knows how to speak whatever hook language we come up with, and convert them into setuptools calls. This will probably require some sort of hooking or monkeypatching to setuptools to provide a way to extract the ``setup_requires=`` argument when needed, and to provide a new version of the sdist command that generates the new-style format. This all seems doable and sufficient for a large proportion of packages (though obviously we'll want to prototype such a system before we finalize anything here). (Alternatively, these changes could be made to setuptools itself rather than going into a separate package.)
But there remain two obstacles that mean we probably won't be able to automatically upgrade packages to the new format:
1) There currently exist packages which insist on particular packages being available in their environment before setup.py is executed. This means that if we decide to execute build scripts in an isolated virtualenv-like environment, then projects will need to check whether they do this, and if so then when upgrading to the new system they will have to start explicitly declaring these dependencies (either via ``setup_requires=`` or via static declaration in ``pypackage.cfg``).
2) There currently exist packages which do not declare consistent metadata (e.g. ``egg_info`` and ``bdist_wheel`` might get different ``install_requires=``). When upgrading to the new system, projects will have to evaluate whether this applies to them, and if so they will need to either stop doing that, or else add ``__SDIST_DYNAMIC__`` annotations at appropriate places.
We'll also presumably need some API for packages to describe which parts of the METADATA file should be marked ``__SDIST_DYNAMIC__``, for the packages that need it (a new argument to ``setup()`` or some setting in ``setup.cfg`` or something).
I'm confused here. And it's just now become clear *why* I'm confused.
The sdist format MUST be a generated format - i.e., we should insist (in principle at least) that it's only ever generated by tools. Otherwise it's way too easy for people to just zip up their source tree, hand craft something generic (that over-uses __SDIST_DYNAMIC__) and say "here's an sdist". Obviously, people always *can* manually create an sdist but we need to pin down the spec tightly, or we've not improved things.
That's why I'm concerned about __SDIST_DYNAMIC__ and it's also what confuses me about the above transition plan.
For people using setuptools currently, the transition should be simply that they upgrade setuptools, and the "setup.py sdist" command in the new setuptools generates the new sdist format. By default, the setuptools sdist process assumes everything is static and requires the user to modify the setup.py to explicitly mark which metadata they want to be left to build time. That way, we get a relatively transparent transition, while avoiding overuse of dynamic metadata.
If setup.py has to explicitly mark dynamic metadata, that also allows us to reject attempts to make name and version dynamic. Which is good.
Paul _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Tue, Oct 27, 2015 at 1:12 PM, Daniel Holth <dholth@gmail.com> wrote:
The drawback of .zip is file size since it compresses each file individually rather than giving the compression algorithm a larger input, it's a great format otherwise. Ubiquitous including Apple iOS packages, Java, word processor file formats. And most Python packages are small.
I don't really buy the indexing advantages, especially w/ the current implementation of zipfile in python (e.g. loading the whole set of archives at creation time) A common way to solve the fast metadata access from archive is to archive separately the metadata data and data (e.g. a zipfile containing 2 zipfiles, one being the original sdist, the other one containing the metadata). David
We must do the hard work to support Unicode file names, and spaces and accent marks in home directory names (historically a problem on Windows), in our packaging system. It is the right thing to do. It is not the publisher's fault that your system has broken Unicode.
On Tue, Oct 27, 2015 at 6:43 AM Paul Moore <p.f.moore@gmail.com> wrote:
On 26 October 2015 at 06:04, Nathaniel Smith <njs@pobox.com> wrote:
Here's a second round of text towards making a build-system independent interface between pip and source trees/sdists. My idea this time is to take a divide-and-conquer approach: this text tries to summarize all the stuff that it seemed like we had mostly reached consensus on in the previous thread + call, with blank chunks marked "TBD" where there are specific points that still need To Be Determined. So my hope is that everyone will read what's here and agree that it's great as far as it goes, and then we can go through and fill in each missing piece one at a time.
I'll comment on what's here, but ignore the TBD items - I'd rather (as you suggest) leave discussion of those details till the basic idea is agreed.
Abstract ========
Distutils delenda est.
While this makes a nice tagline, I'd rather something less negative. Distutils does not "need" to be destroyed. It's perfectly adequate (although hardly user friendly) for a lot of cases - I'd be willing to suggest *most* users can work just fine with distutils.
I'm not a fan of distutils, but I'd prefer it if we kept the rhetoric limited - as Nick pointed out this whole area is as much a political issue as a technical one.
Extended abstract =================
While ``distutils`` / ``setuptools`` have taken us a long way, they suffer from three serious problems: (a) they're missing important features like autoconfiguration and usable build-time dependency declaration, (b) extending them is quirky, complicated, and fragile, (c) it's very difficult to use anything else, because they provide the standard interface for installing python packages expected by both users and installation tools like ``pip``.
Again, this is overstated. You very nearly lost me right here - people won't read the details of the proposal if they disagree with the abstract(s). Specifically:
* The features in (a) are only important to *some* parts of the community. The scientific community is the major one, and is a huge influence over the direction we want to go in, but again, not crucial to many people. And even where they might be useful (e.g., Windows users building pyyaml, lxml, pillow, ...) the description implies "working out what's there" rather than "allowing users to easily manage non-Python dependencies", which gives the wrong impression.
* The features in (b) are highly specialised. Very few people extend setuptools/distutils. And those who do, have often invested a lot of effort in doing so. Sure, they'd rather not have needed to, but now that they have, a replacement system simply means that work is lost. Arguably, fixing (b) is only useful for people (like the scientific community) who have needed to extend setuptools and have been unable to achieve their goals that way. That's an even smaller part of the community.
Previous efforts (e.g. distutils2 or setuptools itself) have attempted to solve problems (a) and/or (b). We propose to solve (c).
Agreed - this is a good approach. But it's at odds with your abstract, which says distutils must die. Here you're saying you want to allow people to keep using distutils but allow people with specialised needs to choose an alternative. Or are you offering an alternative to people who use distutils?
The whole of the above is confusing on the face of it. The details below clarify a lot, as does knowing how the previous discussions have gone. But it would help a lot if the introduction to this PEP were clearer.
The goal of this PEP is get distutils-sig out of the business of being a gatekeeper for Python build systems. If you want to use distutils, great; if you want to use something else, then that should be easy to do using standardized methods. The difficulty of interfacing with distutils means that there aren't many such systems right now, but to give a sense of what we're thinking about see `flit <https://github.com/takluyver/flit>`_ or `bento <https://cournape.github.io/Bento/>`_. Fortunately, wheels have now solved many of the hard problems here -- e.g. it's no longer necessary that a build system also know about every possible installation configuration -- so pretty much all we really need from a build system is that it have some way to spit out standard-compliant wheels.
OK. Although I see a risk here that if I want to build package FOO, I now have to worry whether FOO's build system supports Windows, as well as worrying whether FOO itself supports Windows.
There's still a role for some "gatekeeper" (not a good word IMO, maybe "coordinator") to provide a certain level of support or review of build systems, and a point of contact for users with build issues (the point of this proposal is to some extent that people don't need to *know* what build system a project uses, so suggesting everyone has to direct issues to the correct build system support forum isn't necessarily practical).
We therefore propose a new, relatively minimal interface for installation tools like ``pip`` to interact with package source trees and source distributions.
In addition, we propose a wheel-inspired static metadata format for sdists, suitable for tools like PyPI and pip's resolver.
Terminology and goals =====================
A *source tree* is something like a VCS checkout. We need a standard interface for installing from this format, to support usages like ``pip install some-directory/``.
A *source distribution* is a static snapshot representing a particular release of some source code, like ``lxml-3.4.4.zip``. Source distributions serve many purposes: they form an archival record of releases, they provide a stupid-simple de facto standard for tools that want to ingest and process large corpora of code, possibly written in many languages (e.g. code search), they act as the input to downstream packaging systems like Debian/Fedora/Conda/..., and so forth. In the Python ecosystem they additionally have a particularly important role to play, because packaging tools like ``pip`` are able to use source distributions to fulfill binary dependencies, e.g. if there is a distribution ``foo.whl`` which declares a dependency on ``bar``, then we need to support the case where ``pip install bar`` or ``pip install foo`` automatically locates the sdist for ``bar``, downloads it, builds it, and installs the resulting package.
This is somewhat misleading, given that you go on to specify the format below, but maybe that's only an issue for someone like me who saw the previous debate over "source distribution" (as a bundled up source tree) vs "sdist" as a specified format. If I understand, you've now discarded the former sense of source distribution, and are sticking with the latter (specified format) definition.
Source distributions are also known as "sdists" for short.
Source trees ============
We retroactively declare the legacy source tree format involving ``setup.py`` to be "version 0". We don't try to specify it further; its de facto specification is encoded in the source code and documentation of ``distutils``, ``setuptools``, ``pip``, and other tools.
A "version 1" (or greater) source tree is any directory which contains a file named ``pypackage.cfg``, which will -- in some manner whose details are TBD -- describe the package's build dependencies and how to invoke the build system. This mechanism:
- Will allow for both static and dynamic specification of build dependencies
- Will have some degree of isolation of different builds from each other, so that it will be possible for a single run of pip to install one package that build-depends on ``foo = 1.1`` and another package that build-depends on ``foo = 1.2``.
All good so far.
- Will leave the actual installation of the package in the hands of the build/installation tool (i.e. individual package build systems will not need to know about things like --user versus --global or make decisions about when and how to modify .pth files)
This seems completely backwards to me. It's pip's job to do the actual install. The build tool should *only* focus on generating standard conforming binary wheels - otherwise what's the point of the separation of concerns that wheels provide?
Or maybe I'm confused by the term "build/installation tool" - by that did you actually mean pip, rather than the build system?
(TBDs omitted)
Source distributions ====================
[possibly this should get split off into a separate PEP, but I'll keep it together for now for ease of discussion]
A "version 1" (or greater) source distribution is a file meeting the following criteria:
- It MUST have a name of the form: {PACKAGE}-{VERSION}.{EXT}, where {PACKAGE} is the package name, {VERSION} is a PEP 440-compliant version number, and {EXT} is a compliant archive format.
The set of compliant archive formats is: zip, [TBD]
[QUESTION: should we continue to allow .tar.gz and friends? In practice by "allow" I mean something like "accept new-style sdists on PyPI in this format". I'm inclined not to -- zip is the most universally supported format around, it allows file-based random access (unlike tar-based things) which is useful for pulling out metadata without decompressing the whole thing, and standardizing on one format dodges distracting and pointless discussions about which format to use, i.e. it's TOOWTDI-compliant. Of course pip is free to continue to support other archive formats when passed explicitly on the command line. Any objections?]
+1 on having a single archive format, and zip seems like the best choice.
Similar to wheels, the archive is Unicode, and the filenames inside the archive are encoded in UTF-8.
This isn't the job of the sdist format to specify. It should be implicit in the choice of archive format.
Having said that, I'd go with
1. The sdist filename MUST support the full range of package names as specified in PEP 426 (https://www.python.org/dev/peps/pep-0426/#name) and versions as in PEP 440 (https://www.python.org/dev/peps/pep-0440/). That's actually far less than full Unicode. 2. The archive format MUST support arbitrary Unicode filenames. That means zip is OK, but tar.gz isn't unless you specify UTF-8 is used (the tar format doesn't allow for an encoding declaration - see https://docs.python.org/3.5/library/tarfile.html#tar-unicode for details on Unicode issues in the tar format).
Having said that I'd also go with "filenames in the archive SHOULD be limited to ASCII" - because we have had issues with pip where test files have Unicode filenames, and builds break because they get mangled on systems with weird encoding setups... IIRC, these are typically related to .tar.gz sdists, which (due to the lack of encoding support) result in files being unpacked with the wrong names. So maybe if we enforce zip format we don't need to add this limitation.
- When unpacked, it MUST contain a single directory directory tree named ``{PACKAGE}-{VERSION}``.
- This directory tree MUST be a valid version 1 (or greater) source tree as defined above.
- It MUST additionally contain a directory named ``{PACKAGE}-{VERSION}.sdist-info`` (notice the ``s``), with the following contents:
- ``SDIST``: Mandatory. Same record-oriented format as a wheel's ``WHEEL`` file, but with different fields::
SDist-Version: 1.0 Generator: setuptools sdist 20.1
``SDist-Version`` is the version number of this specification. Software that processes sdists should warn if ``SDist-Version`` is greater than the version it supports, and must fail if ``SDist-Version`` has a greater major version than the version it supports.
``Generator`` is the name and optionally the version of the software that produced the archive.
- ``RECORD``: Mandatory. A list of all files contained in the sdist (except for the RECORD file itself and any signature files) together with their hashes, as specified in PEP 427.
- ``RECORD.jws``, ``RECORD.p7s``: Optional. Signature files as specified in PEP 427.
- ``METADATA``: Mandatory. Metadata version 1.1 or greater format metadata, with an additional rule that fields may contain the special sentinel value ``__SDIST_DYNAMIC__``, which indicates that the value of this field cannot be determined until build time. If a "multiple use field" is present with the value ``__SDIST_DYNAMIC__``, then this field MUST occur exactly once, e.g.::
# Okay: Requires-Dist: lxml (> 3.3) Requires-Dist: requests
# no Requires-Dist lines at all is okay # (meaning: this package's requirements are the empty set)
# Okay, requirements will be determined at build time: Requires-Dist: __SDIST_DYNAMIC__
# NOT okay: Requires-Dist: lxml (> 3.3) Requires-Dist: __SDIST_DYNAMIC__
(The use of a special token allows us to distinguish between multiple use fields whose value is statically the empty list versus one whose value is dynamic; it also allows us to distinguish between optional fields which are statically not present versus ones whose value is dynamic.)
When this sdist is built, the resulting wheel MUST have metadata which is identical to the metadata present in this file, except that any fields with value ``__SDIST_DYNAMIC__`` in the sdist may have arbitrary values in the wheel.
A valid sdist MUST NOT use the ``__SDIST_DYNAMIC__`` mechanism for the package name or version (i.e., these must be given statically), and these MUST match the {PACKAGE} and {VERSION} of the sdist as described above.
This seems pretty good at first reading.
[TBD: do we want to forbid the use of dynamic metadata for any other fields? I assume PyPI will enforce some stricter rules at least, but I don't know if we want to make that part of the spec, or just part of PyPI's administrative rules.]
This covers the main point of contention. It would be bad if build systems started using __SDIST_DYNAMIC__ just because "it's easier".
Maybe add
* A valid sdist SHOULD NOT use the __SDIST_DYNAMIC__ mechanism any more than necessary (i.e., if the metadata is the same in all generated wheels, it does not need to use the __SDIST_DYNAMIC__ mechanism, and so should not do so).
This is intentionally a close analogue of a wheel's ``.dist-info`` directory; intention is that as future metadata standards are defined, the specifications for the ``.sdist-info`` and ``.dist-info`` directories will evolve in synchrony.
Evolutionary notes ==================
A goal here is to make it as simple as possible to convert old-style sdists to new-style sdists. (E.g., this is one motivation for supporting dynamic build requirements.) The ideal would be that there would be a single static pypackage.cfg that could be dropped into any "version 0" VCS checkout to convert it to the new shiny. This is probably not 100% possible, but we can get close, and it's important to keep track of how close we are... hence this section.
A rough plan would be: Create a build system package (``setuptools_pypackage`` or whatever) that knows how to speak whatever hook language we come up with, and convert them into setuptools calls. This will probably require some sort of hooking or monkeypatching to setuptools to provide a way to extract the ``setup_requires=`` argument when needed, and to provide a new version of the sdist command that generates the new-style format. This all seems doable and sufficient for a large proportion of packages (though obviously we'll want to prototype such a system before we finalize anything here). (Alternatively, these changes could be made to setuptools itself rather than going into a separate package.)
But there remain two obstacles that mean we probably won't be able to automatically upgrade packages to the new format:
1) There currently exist packages which insist on particular packages being available in their environment before setup.py is executed. This means that if we decide to execute build scripts in an isolated virtualenv-like environment, then projects will need to check whether they do this, and if so then when upgrading to the new system they will have to start explicitly declaring these dependencies (either via ``setup_requires=`` or via static declaration in ``pypackage.cfg``).
2) There currently exist packages which do not declare consistent metadata (e.g. ``egg_info`` and ``bdist_wheel`` might get different ``install_requires=``). When upgrading to the new system, projects will have to evaluate whether this applies to them, and if so they will need to either stop doing that, or else add ``__SDIST_DYNAMIC__`` annotations at appropriate places.
We'll also presumably need some API for packages to describe which parts of the METADATA file should be marked ``__SDIST_DYNAMIC__``, for the packages that need it (a new argument to ``setup()`` or some setting in ``setup.cfg`` or something).
I'm confused here. And it's just now become clear *why* I'm confused.
The sdist format MUST be a generated format - i.e., we should insist (in principle at least) that it's only ever generated by tools. Otherwise it's way too easy for people to just zip up their source tree, hand craft something generic (that over-uses __SDIST_DYNAMIC__) and say "here's an sdist". Obviously, people always *can* manually create an sdist but we need to pin down the spec tightly, or we've not improved things.
That's why I'm concerned about __SDIST_DYNAMIC__ and it's also what confuses me about the above transition plan.
For people using setuptools currently, the transition should be simply that they upgrade setuptools, and the "setup.py sdist" command in the new setuptools generates the new sdist format. By default, the setuptools sdist process assumes everything is static and requires the user to modify the setup.py to explicitly mark which metadata they want to be left to build time. That way, we get a relatively transparent transition, while avoiding overuse of dynamic metadata.
If setup.py has to explicitly mark dynamic metadata, that also allows us to reject attempts to make name and version dynamic. Which is good.
Paul _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Tue, Oct 27, 2015 at 7:00 AM, David Cournapeau <cournape@gmail.com> wrote:
On Tue, Oct 27, 2015 at 1:12 PM, Daniel Holth <dholth@gmail.com> wrote:
The drawback of .zip is file size since it compresses each file individually rather than giving the compression algorithm a larger input, it's a great format otherwise. Ubiquitous including Apple iOS packages, Java, word processor file formats. And most Python packages are small.
I don't really buy the indexing advantages, especially w/ the current implementation of zipfile in python (e.g. loading the whole set of archives at creation time)
Can you elaborate about what you mean? AFAICT from a quick skim of the source code, zipfile does eagerly read in the table of contents for the zip file (i.e., it reads out the list of files and their metadata), but no actual files are decompressed until you ask for them individually, and when you do request a specific file then it can be accessed in O(1) time. This is really different from .tar.gz, where you have to decompress the entire archive just to get a list of files, and then you need to decompress the whole thing again each time you want to access a single file inside. (Regarding the size thing, yeah, .tar.gz is smaller, and .tar.bz2 smaller than that, and .tar.xz smaller again, ... but this doesn't strike me as an argument for throwing up our hands and leaving the choice to individual projects, because it's not like they know what the optimal trade-off is either. IMO we should pick one, and zip is Good Enough.) -n -- Nathaniel J. Smith -- http://vorpus.org
Nathaniel, I'm not sure what the software is supposed to do with fine grained dynamic metadata that would make very much sense to the end user. I think you could probably get away with a single flag Dynamic: true / false. Iff true, pip runs the dist-info command after installing bootstrap dependencies. You could still complain if the name & version changed. Of course in a VCS checkout or during development you probably always want the regenerate-metadata behavior. On Wed, Oct 28, 2015 at 5:18 AM Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Oct 27, 2015 at 7:00 AM, David Cournapeau <cournape@gmail.com> wrote:
On Tue, Oct 27, 2015 at 1:12 PM, Daniel Holth <dholth@gmail.com> wrote:
The drawback of .zip is file size since it compresses each file individually rather than giving the compression algorithm a larger
input,
it's a great format otherwise. Ubiquitous including Apple iOS packages, Java, word processor file formats. And most Python packages are small.
I don't really buy the indexing advantages, especially w/ the current implementation of zipfile in python (e.g. loading the whole set of archives at creation time)
Can you elaborate about what you mean? AFAICT from a quick skim of the source code, zipfile does eagerly read in the table of contents for the zip file (i.e., it reads out the list of files and their metadata), but no actual files are decompressed until you ask for them individually, and when you do request a specific file then it can be accessed in O(1) time. This is really different from .tar.gz, where you have to decompress the entire archive just to get a list of files, and then you need to decompress the whole thing again each time you want to access a single file inside.
(Regarding the size thing, yeah, .tar.gz is smaller, and .tar.bz2 smaller than that, and .tar.xz smaller again, ... but this doesn't strike me as an argument for throwing up our hands and leaving the choice to individual projects, because it's not like they know what the optimal trade-off is either. IMO we should pick one, and zip is Good Enough.)
-n
-- Nathaniel J. Smith -- http://vorpus.org
On 27 October 2015 at 13:12, Daniel Holth <dholth@gmail.com> wrote:
We must do the hard work to support Unicode file names, and spaces and accent marks in home directory names (historically a problem on Windows), in our packaging system. It is the right thing to do. It is not the publisher's fault that your system has broken Unicode.
In the examples I'm thinking of, the publisher used a format (.tar.gz) that didn't properly support Unicode, in the sense that it didn't include an encoding for the bytes it used to represent filenames. IMO, that is something we shouldn't allow, by rejecting file formats that don't support Unicode properly. Whose fault it is, is not important - it's just as easy to say that it's not the end user's fault that the publisher made an unwarranted assumption about encodings. What's important is that things work for everyone, and the interoperability standards don't leave room for people to make such assumptions. Paul PS Consider this a retraction of my suggestion that filenames in sdists should be pure ASCII. But still, sdists shouldn't contain files that can't be used on target systems - e.g., 2 files whose names differ only in case, files containing characters like :, ? or * that are invalid on Windows... Whether this needs to be noted in the standard, or whether it's just a case of directing users' bug reports back to the publisher, is an open question, though.
On Tue, Oct 27, 2015 at 3:43 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 26 October 2015 at 06:04, Nathaniel Smith <njs@pobox.com> wrote:
Here's a second round of text towards making a build-system independent interface between pip and source trees/sdists. My idea this time is to take a divide-and-conquer approach: this text tries to summarize all the stuff that it seemed like we had mostly reached consensus on in the previous thread + call, with blank chunks marked "TBD" where there are specific points that still need To Be Determined. So my hope is that everyone will read what's here and agree that it's great as far as it goes, and then we can go through and fill in each missing piece one at a time.
I'll comment on what's here, but ignore the TBD items - I'd rather (as you suggest) leave discussion of those details till the basic idea is agreed.
Abstract ========
Distutils delenda est.
While this makes a nice tagline, I'd rather something less negative. Distutils does not "need" to be destroyed. It's perfectly adequate (although hardly user friendly) for a lot of cases - I'd be willing to suggest *most* users can work just fine with distutils.
I'm not a fan of distutils, but I'd prefer it if we kept the rhetoric limited - as Nick pointed out this whole area is as much a political issue as a technical one.
Extended abstract =================
While ``distutils`` / ``setuptools`` have taken us a long way, they suffer from three serious problems: (a) they're missing important features like autoconfiguration and usable build-time dependency declaration, (b) extending them is quirky, complicated, and fragile, (c) it's very difficult to use anything else, because they provide the standard interface for installing python packages expected by both users and installation tools like ``pip``.
Again, this is overstated. You very nearly lost me right here - people won't read the details of the proposal if they disagree with the abstract(s). Specifically:
* The features in (a) are only important to *some* parts of the community. The scientific community is the major one, and is a huge influence over the direction we want to go in, but again, not crucial to many people. And even where they might be useful (e.g., Windows users building pyyaml, lxml, pillow, ...) the description implies "working out what's there" rather than "allowing users to easily manage non-Python dependencies", which gives the wrong impression.
* The features in (b) are highly specialised. Very few people extend setuptools/distutils. And those who do, have often invested a lot of effort in doing so. Sure, they'd rather not have needed to, but now that they have, a replacement system simply means that work is lost. Arguably, fixing (b) is only useful for people (like the scientific community) who have needed to extend setuptools and have been unable to achieve their goals that way. That's an even smaller part of the community.
Previous efforts (e.g. distutils2 or setuptools itself) have attempted to solve problems (a) and/or (b). We propose to solve (c).
Agreed - this is a good approach. But it's at odds with your abstract, which says distutils must die. Here you're saying you want to allow people to keep using distutils but allow people with specialised needs to choose an alternative. Or are you offering an alternative to people who use distutils?
The whole of the above is confusing on the face of it. The details below clarify a lot, as does knowing how the previous discussions have gone. But it would help a lot if the introduction to this PEP were clearer.
Fair enough, I'll dial it back. :-) My personal prediction is that within a year of this support becoming widespread, we'll see build systems that are just better than distutils on all axes for all projects, not just the ones with weird specialised needs -- AFAICT the distutils architectures has remained basically unchanged since Python 2.0, and we've gained a bit more experience with Python packaging in the last 15 years :-). But yeah, sure, if you think it'll bother people then there's no point in that.
The goal of this PEP is get distutils-sig out of the business of being a gatekeeper for Python build systems. If you want to use distutils, great; if you want to use something else, then that should be easy to do using standardized methods. The difficulty of interfacing with distutils means that there aren't many such systems right now, but to give a sense of what we're thinking about see `flit <https://github.com/takluyver/flit>`_ or `bento <https://cournape.github.io/Bento/>`_. Fortunately, wheels have now solved many of the hard problems here -- e.g. it's no longer necessary that a build system also know about every possible installation configuration -- so pretty much all we really need from a build system is that it have some way to spit out standard-compliant wheels.
OK. Although I see a risk here that if I want to build package FOO, I now have to worry whether FOO's build system supports Windows, as well as worrying whether FOO itself supports Windows.
There's still a role for some "gatekeeper" (not a good word IMO, maybe "coordinator") to provide a certain level of support or review of build systems, and a point of contact for users with build issues (the point of this proposal is to some extent that people don't need to *know* what build system a project uses, so suggesting everyone has to direct issues to the correct build system support forum isn't necessarily practical).
I see what you mean, but I don't think there's much that can or should be done about it in the form of a PEP? I assume that what will happen is that if you can't build a package, you'll file a bug with the maintainers of that package, and then it's their job to figure out whether to patch around the issue locally, file their own bug upstream with whatever build system package they're using, switch to a new build system, or whatever. I think we can generally trust individual projects and the community at large to figure out what the trade-offs between different systems are, once the different systems start existing. Though it may well make sense for the PyPA packaging guide to add a set of best-practice guidelines for build system implementors.
We therefore propose a new, relatively minimal interface for installation tools like ``pip`` to interact with package source trees and source distributions.
In addition, we propose a wheel-inspired static metadata format for sdists, suitable for tools like PyPI and pip's resolver.
Terminology and goals =====================
A *source tree* is something like a VCS checkout. We need a standard interface for installing from this format, to support usages like ``pip install some-directory/``.
A *source distribution* is a static snapshot representing a particular release of some source code, like ``lxml-3.4.4.zip``. Source distributions serve many purposes: they form an archival record of releases, they provide a stupid-simple de facto standard for tools that want to ingest and process large corpora of code, possibly written in many languages (e.g. code search), they act as the input to downstream packaging systems like Debian/Fedora/Conda/..., and so forth. In the Python ecosystem they additionally have a particularly important role to play, because packaging tools like ``pip`` are able to use source distributions to fulfill binary dependencies, e.g. if there is a distribution ``foo.whl`` which declares a dependency on ``bar``, then we need to support the case where ``pip install bar`` or ``pip install foo`` automatically locates the sdist for ``bar``, downloads it, builds it, and installs the resulting package.
This is somewhat misleading, given that you go on to specify the format below, but maybe that's only an issue for someone like me who saw the previous debate over "source distribution" (as a bundled up source tree) vs "sdist" as a specified format. If I understand, you've now discarded the former sense of source distribution, and are sticking with the latter (specified format) definition.
The "sdists" in this draft try to compromise between the various concepts that were proposed in the previous thread: you can generally treat them like bundled up source trees (they have a single directory that unpacks into something that's laid out similarly to a VCS checkout), but they also contain additional static metadata to make PyPI and pip happy (or at least, as much static metadata as they can).
Source distributions are also known as "sdists" for short.
Source trees ============
We retroactively declare the legacy source tree format involving ``setup.py`` to be "version 0". We don't try to specify it further; its de facto specification is encoded in the source code and documentation of ``distutils``, ``setuptools``, ``pip``, and other tools.
A "version 1" (or greater) source tree is any directory which contains a file named ``pypackage.cfg``, which will -- in some manner whose details are TBD -- describe the package's build dependencies and how to invoke the build system. This mechanism:
- Will allow for both static and dynamic specification of build dependencies
- Will have some degree of isolation of different builds from each other, so that it will be possible for a single run of pip to install one package that build-depends on ``foo = 1.1`` and another package that build-depends on ``foo = 1.2``.
All good so far.
- Will leave the actual installation of the package in the hands of the build/installation tool (i.e. individual package build systems will not need to know about things like --user versus --global or make decisions about when and how to modify .pth files)
This seems completely backwards to me. It's pip's job to do the actual install. The build tool should *only* focus on generating standard conforming binary wheels - otherwise what's the point of the separation of concerns that wheels provide?
Or maybe I'm confused by the term "build/installation tool" - by that did you actually mean pip, rather than the build system?
Yeah, I was just unclear here -- the "build/installation tool" was supposed to be pip (because pip installs packages! ...and also builds them), as contrasted with the "individual package build systems" which don't know anything about installing. I'll reword. This bullet point is rather substantive, actually, since if adopted then it rules out the proposed semantics for the "develop" operation in Robert's PEP. (In current pip and in his proposal, "pip install -e" is unlike regular "pip install", in that "pip install -e" doesn't actually install anything, it just calls "setup.py develop", which does the actual installation. One consequence of this AFAICT is that if you try passing any of the standard installation target options to "pip install -e", like "--target" or whatever, then it blows up...)
(TBDs omitted)
Source distributions ====================
[possibly this should get split off into a separate PEP, but I'll keep it together for now for ease of discussion]
A "version 1" (or greater) source distribution is a file meeting the following criteria:
- It MUST have a name of the form: {PACKAGE}-{VERSION}.{EXT}, where {PACKAGE} is the package name, {VERSION} is a PEP 440-compliant version number, and {EXT} is a compliant archive format.
The set of compliant archive formats is: zip, [TBD]
[QUESTION: should we continue to allow .tar.gz and friends? In practice by "allow" I mean something like "accept new-style sdists on PyPI in this format". I'm inclined not to -- zip is the most universally supported format around, it allows file-based random access (unlike tar-based things) which is useful for pulling out metadata without decompressing the whole thing, and standardizing on one format dodges distracting and pointless discussions about which format to use, i.e. it's TOOWTDI-compliant. Of course pip is free to continue to support other archive formats when passed explicitly on the command line. Any objections?]
+1 on having a single archive format, and zip seems like the best choice.
Similar to wheels, the archive is Unicode, and the filenames inside the archive are encoded in UTF-8.
This isn't the job of the sdist format to specify. It should be implicit in the choice of archive format.
There's a silly typo in the quoted line -- it was supposed to read: Similar to wheels, the archive *filename* is Unicode, and the filenames inside the archive are encoded in UTF-8. These two points were just lifted from PEP 427 without thinking about it too much -- see https://www.python.org/dev/peps/pep-0427/#id12 Now that I reread that section of PEP 427, the underscore replacement probably makes sense for sdists as well.
Having said that, I'd go with
1. The sdist filename MUST support the full range of package names as specified in PEP 426 (https://www.python.org/dev/peps/pep-0426/#name) and versions as in PEP 440 (https://www.python.org/dev/peps/pep-0440/). That's actually far less than full Unicode. 2. The archive format MUST support arbitrary Unicode filenames. That means zip is OK, but tar.gz isn't unless you specify UTF-8 is used (the tar format doesn't allow for an encoding declaration - see https://docs.python.org/3.5/library/tarfile.html#tar-unicode for details on Unicode issues in the tar format).
Having said that I'd also go with "filenames in the archive SHOULD be limited to ASCII" - because we have had issues with pip where test files have Unicode filenames, and builds break because they get mangled on systems with weird encoding setups... IIRC, these are typically related to .tar.gz sdists, which (due to the lack of encoding support) result in files being unpacked with the wrong names. So maybe if we enforce zip format we don't need to add this limitation.
Especially if we go with zip as the one true archive format, then I think we should just use the same rules for all this stuff as wheels do. No need to re-invent the... well, you know.
- When unpacked, it MUST contain a single directory directory tree named ``{PACKAGE}-{VERSION}``.
- This directory tree MUST be a valid version 1 (or greater) source tree as defined above.
- It MUST additionally contain a directory named ``{PACKAGE}-{VERSION}.sdist-info`` (notice the ``s``), with the following contents:
- ``SDIST``: Mandatory. Same record-oriented format as a wheel's ``WHEEL`` file, but with different fields::
SDist-Version: 1.0 Generator: setuptools sdist 20.1
``SDist-Version`` is the version number of this specification. Software that processes sdists should warn if ``SDist-Version`` is greater than the version it supports, and must fail if ``SDist-Version`` has a greater major version than the version it supports.
``Generator`` is the name and optionally the version of the software that produced the archive.
- ``RECORD``: Mandatory. A list of all files contained in the sdist (except for the RECORD file itself and any signature files) together with their hashes, as specified in PEP 427.
- ``RECORD.jws``, ``RECORD.p7s``: Optional. Signature files as specified in PEP 427.
- ``METADATA``: Mandatory. Metadata version 1.1 or greater format metadata, with an additional rule that fields may contain the special sentinel value ``__SDIST_DYNAMIC__``, which indicates that the value of this field cannot be determined until build time. If a "multiple use field" is present with the value ``__SDIST_DYNAMIC__``, then this field MUST occur exactly once, e.g.::
# Okay: Requires-Dist: lxml (> 3.3) Requires-Dist: requests
# no Requires-Dist lines at all is okay # (meaning: this package's requirements are the empty set)
# Okay, requirements will be determined at build time: Requires-Dist: __SDIST_DYNAMIC__
# NOT okay: Requires-Dist: lxml (> 3.3) Requires-Dist: __SDIST_DYNAMIC__
(The use of a special token allows us to distinguish between multiple use fields whose value is statically the empty list versus one whose value is dynamic; it also allows us to distinguish between optional fields which are statically not present versus ones whose value is dynamic.)
When this sdist is built, the resulting wheel MUST have metadata which is identical to the metadata present in this file, except that any fields with value ``__SDIST_DYNAMIC__`` in the sdist may have arbitrary values in the wheel.
A valid sdist MUST NOT use the ``__SDIST_DYNAMIC__`` mechanism for the package name or version (i.e., these must be given statically), and these MUST match the {PACKAGE} and {VERSION} of the sdist as described above.
This seems pretty good at first reading.
[TBD: do we want to forbid the use of dynamic metadata for any other fields? I assume PyPI will enforce some stricter rules at least, but I don't know if we want to make that part of the spec, or just part of PyPI's administrative rules.]
This covers the main point of contention. It would be bad if build systems started using __SDIST_DYNAMIC__ just because "it's easier".
Maybe add
* A valid sdist SHOULD NOT use the __SDIST_DYNAMIC__ mechanism any more than necessary (i.e., if the metadata is the same in all generated wheels, it does not need to use the __SDIST_DYNAMIC__ mechanism, and so should not do so).
This is intentionally a close analogue of a wheel's ``.dist-info`` directory; intention is that as future metadata standards are defined, the specifications for the ``.sdist-info`` and ``.dist-info`` directories will evolve in synchrony.
Evolutionary notes ==================
A goal here is to make it as simple as possible to convert old-style sdists to new-style sdists. (E.g., this is one motivation for supporting dynamic build requirements.) The ideal would be that there would be a single static pypackage.cfg that could be dropped into any "version 0" VCS checkout to convert it to the new shiny. This is probably not 100% possible, but we can get close, and it's important to keep track of how close we are... hence this section.
A rough plan would be: Create a build system package (``setuptools_pypackage`` or whatever) that knows how to speak whatever hook language we come up with, and convert them into setuptools calls. This will probably require some sort of hooking or monkeypatching to setuptools to provide a way to extract the ``setup_requires=`` argument when needed, and to provide a new version of the sdist command that generates the new-style format. This all seems doable and sufficient for a large proportion of packages (though obviously we'll want to prototype such a system before we finalize anything here). (Alternatively, these changes could be made to setuptools itself rather than going into a separate package.)
But there remain two obstacles that mean we probably won't be able to automatically upgrade packages to the new format:
1) There currently exist packages which insist on particular packages being available in their environment before setup.py is executed. This means that if we decide to execute build scripts in an isolated virtualenv-like environment, then projects will need to check whether they do this, and if so then when upgrading to the new system they will have to start explicitly declaring these dependencies (either via ``setup_requires=`` or via static declaration in ``pypackage.cfg``).
2) There currently exist packages which do not declare consistent metadata (e.g. ``egg_info`` and ``bdist_wheel`` might get different ``install_requires=``). When upgrading to the new system, projects will have to evaluate whether this applies to them, and if so they will need to either stop doing that, or else add ``__SDIST_DYNAMIC__`` annotations at appropriate places.
We'll also presumably need some API for packages to describe which parts of the METADATA file should be marked ``__SDIST_DYNAMIC__``, for the packages that need it (a new argument to ``setup()`` or some setting in ``setup.cfg`` or something).
I'm confused here. And it's just now become clear *why* I'm confused.
The sdist format MUST be a generated format - i.e., we should insist (in principle at least) that it's only ever generated by tools. Otherwise it's way too easy for people to just zip up their source tree, hand craft something generic (that over-uses __SDIST_DYNAMIC__) and say "here's an sdist". Obviously, people always *can* manually create an sdist but we need to pin down the spec tightly, or we've not improved things.
The mandatory RECORD file makes it pretty much impossible to generate an sdist manually.
That's why I'm concerned about __SDIST_DYNAMIC__ and it's also what confuses me about the above transition plan.
For people using setuptools currently, the transition should be simply that they upgrade setuptools, and the "setup.py sdist" command in the new setuptools generates the new sdist format. By default, the setuptools sdist process assumes everything is static and requires the user to modify the setup.py to explicitly mark which metadata they want to be left to build time. That way, we get a relatively transparent transition, while avoiding overuse of dynamic metadata.
My assumption was that when a project flips the switch to move to the new format (not sure what that switch looks like, but presumably we will have one), then one of the things that happens is that "setup.py sdist" starts running the equivalent of "egg_info" and stuffing all the resulting metadata into {PACKAGE}-{VERSION}.sdist-info/ (along with generating a RECORD file etc.). So the default would be to assume all metadata is static. But right now that is not actually true for all projects (for both good and bad reasons), so this means that before they flip that switch they need to either adjust their setup.py to make it true, or else they need to use some new API that setuptools will add to let them specify which fields should be marked as dynamic. This API will be purely a setuptools-internal thing, though, nothing that the PEP itself needs to concern itself with.
If setup.py has to explicitly mark dynamic metadata, that also allows us to reject attempts to make name and version dynamic. Which is good.
Presumably PyPI will also reject packages with dynamic names or versions, so any build system that tries to get away with this will quickly realize the error of their ways. -n -- Nathaniel J. Smith -- http://vorpus.org
On 28 Oct 2015 10:08, "Nathaniel Smith" <njs@pobox.com> wrote:
Though it may well make sense for the PyPA packaging guide to add a set of best-practice guidelines for build system implementors.
What would be really nice is if the new specification came with a behavioural test suite (e.g. defined with a project like behave) that a build system developer or project owner could run to check all relevant pip commands are handled correctly for a given package. A spec can be ambiguous, but a behavioural test suite sets a minimum bar that build systems can readily check themselves against. Cheers, Nick.
On Mon, Oct 26, 2015 at 8:04 AM, Nathaniel Smith <njs@pobox.com> wrote:
[TBD: should builds be performed in a fully isolated environment, or should they get access to packages that are already installed in the target install environment? The former simplifies a number of things, but Robert was skeptical it would be possible.]
Would be wasteful not to allow it. But then, some people might want isolation. Whom should be able to switch isolation on? Users? Publishers? Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro
On Mon, Oct 26, 2015 at 8:04 AM, Nathaniel Smith <njs@pobox.com> wrote:
When this sdist is built, the resulting wheel MUST have metadata which is identical to the metadata present in this file, except that any fields with value ``__SDIST_DYNAMIC__`` in the sdist may have arbitrary values in the wheel.
A valid sdist MUST NOT use the ``__SDIST_DYNAMIC__`` mechanism for the package name or version (i.e., these must be given statically), and these MUST match the {PACKAGE} and {VERSION} of the sdist as described above.
[TBD: do we want to forbid the use of dynamic metadata for any other fields? I assume PyPI will enforce some stricter rules at least, but I don't know if we want to make that part of the spec, or just part of PyPI's administrative rules.]
Unless I misunderstood the core goal of this new sdist (to be able to know the dependencies statically), it doesn't make sense to allow mixing things. Is there an usecase for dynamic requirements? In that situation users can just as well use the current sdist format. There are no advantages to using the new sdist format if your requirements are dynamic right? Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro
- Will allow for both static and dynamic specification of build dependencies
I think you need to fill in the story on dynamic dependencies, or otherwise this PEP will be a mystery to most people. I *think* I understand your motivation for this, based on hearing your plan (in another thread) of putting c libs into wheels, and actually declaring them as dependencies... and because these libs can vary based on build settings, your dependencies are "dynamic"... but all that needs to be explained in the PEP --Marcus
participants (7)
-
Daniel Holth
-
David Cournapeau
-
Ionel Cristian Mărieș
-
Marcus Smith
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore