Towards a simple and standard sdist format that isn't intertwined with distutils

Hi all, We realized that actually as far as we could tell, it wouldn't be that hard at this point to clean up how sdists work so that it would be possible to migrate away from distutils. So we wrote up a little draft proposal. The main question is, does this approach seem sound? -n --- PEP: ?? Title: Standard interface for interacting with source trees and source distributions Version: $Revision$ Last-Modified: $Date$ Author: Nathaniel J. Smith <njs@pobox.com> Thomas Kluyver <takowl@gmail.com> Status: Draft Type: Standards-Track Content-Type: text/x-rst Created: 30-Sep-2015 Post-History: Discussions-To: <distutils-sig@python.org> Abstract ======== Distutils delenda est. Extended abstract ================= While ``distutils`` / ``setuptools`` have taken us a long way, they suffer from three serious problems: (a) they're missing important features like autoconfiguration and usable build-time dependency declaration, (b) extending them is quirky, complicated, and fragile, (c) you are forced to use them anyway, because they provide the standard interface for installing python packages expected by both users and installation tools like ``pip``. Previous efforts (e.g. distutils2 or setuptools itself) have attempted to solve problems (a) and/or (b). We propose to solve (c). The goal of this PEP is get distutils-sig out of the business of being a gatekeeper for Python build systems. If you want to use distutils, great; if you want to use something else, then the more the merrier. The difficulty of interfacing with distutils means that there aren't many such systems right now, but to give a sense of what we're thinking about see `flit <https://github.com/takluyver/flit>`_ or `bento <https://cournape.github.io/Bento/>`_. Fortunately, wheels have now solved many of the hard problems here -- e.g. it's no longer necessary that a build system also know about every possible installation configuration -- so pretty much all we really need from a build system is that it have some way to spit out standard-compliant wheels. We therefore propose a new, relatively minimal interface for installation tools like ``pip`` to interact with package source trees and source distributions. Synopsis and rationale ====================== To limit the scope of our design, we adopt several principles. First, we distinguish between a *source tree* (e.g., a VCS checkout) and a *source distribution* (e.g., an official snapshot release like ``lxml-3.4.4.zip``). There isn't a whole lot that *source trees* can be assumed to have in common. About all you know is that they can -- via some more or less Rube-Goldbergian process -- produce one or more binary distributions. In particular, you *cannot* tell via simple static inspection: - What version number will be attached to the resulting packages (e.g. it might be determined programmatically by consulting VCS metadata -- I have here a build of numpy version "1.11.0.dev0+4a9ad17") - What build- or run-time dependencies are required (e.g. these may depend on arbitrarily complex configuration settings that are determined via a mix of manual settings and auto-probing) - Or even how many distinct binary distributions will be produced (e.g. a source distribution may always produce wheel A, but only produce wheel B when built on Unix-like systems). Therefore, when dealing with source trees, our goal is just to provide a standard UX for the core operations that are commonly performed on other people's packages; anything fancier and more developer-centric we leave at the discretion of individual package developers. So our source trees just provide some simple hooks to let a tool like ``pip``: - query for build dependencies - run a build, producing wheels as output - set up the current source tree so that it can be placed on ``sys.path`` in "develop mode" and that's it. We teach users that the standard way to install a package from a VCS checkout is now ``pip install .`` instead of ``python setup.py install``. (This is already a good idea anyway -- e.g., pip can do reliable uninstall / upgrades.) Next, we note that pretty much all the operations that you might want to perform on a *source distribution* are also operations that you might want to perform on a source tree, and via the same UX. The only thing you do with source distributions that you don't do with source trees is, well, distribute them. There's all kind of metadata you could imagine including in a source distribution, but each piece of metadata puts an increased burden on source distribution generation tools, and most operations will still have to work without this metadata. So we only include extra metadata in source distributions if it helps solve specific problems that are unique to distribution. If you want wheel-style metadata, get a wheel and look at it -- they're great and getting better. Therefore, our source distributions are basically just source trees + a mechanism for signing. Finally: we explicitly do *not* have any concept of "depending on a source distribution". As in other systems like Debian, dependencies are always phrased in terms of binary distributions (wheels), and when a user runs something like ``pip install <package>``, then the long-run plan is that <package> and all its transitive dependencies should be available as wheels in a package index. But this is not yet realistic, so as a transitional / backwards-compatibility measure, we provide a simple mechanism for ``pip install <package>`` to handle cases where <package> is provided only as a source distribution. Source trees ============ We retroactively declare the legacy source tree format involving ``setup.py`` to be "version 0". We don't try to specify it further; its de facto specification is encoded in the source code of ``distutils``, ``setuptools``, ``pip``, and other tools. A version 1-or-greater format source tree can be identified by the presence of a file ``_pypackage/_pypackage.cfg``. If both ``_pypackage/_pypackage.cfg`` and ``setup.py`` are present, then we have a version 1+ source tree, i.e., ``setup.py`` is ignored. This is necessary because we anticipate that version 1+ source trees may want to contain a ``setup.py`` file for backwards compatibility, e.g.:: #!/usr/bin/env python import sys print("Don't call setup.py directly!") print("Use 'pip install .' instead!") print("(You might have to upgrade pip first.)") sys.exit(1) In the current version of the specification, the one file ``_pypackage/_pypackage.cfg`` is where pretty much all the action is (though see below). The motivation for putting it into a subdirectory is that: - the way of all standards is that cruft accumulates over time, so this way we pre-emptively have a place to put it, - real-world projects often accumulate build system cruft as well, so we might as well provide one obvious place to put it too. Of course this then creates the possibility of collisions between standard files and user files, and trying to teach arbitrary users not to scatter files around willy-nilly never works, so we adopt the convention that names starting with an underscore are reserved for official use, and non-underscored names are available for idiosyncratic use by individual projects. The alternative would be to simply place the main configuration file at the top-level, create the subdirectory only when specifically needed (most trees won't need it), and let users worry about finding their own place for their cruft. Not sure which is the best approach. Plus we can have a nice bikeshed about the names in general (FIXME). _pypackage.cfg -------------- The ``_pypackage.cfg`` file contains various settings. Another good bike-shed topic is which file format to use for storing these (FIXME), but for purposes of this draft I'll write examples using `toml <https://github.com/toml-lang/toml>`_, because you'll instantly be able to understand the semantics, it has similar expressivity to JSON while being more human-friendly (e.g., it supports comments and multi-line strings), it's better-specified than ConfigParser, and it's much simpler than YAML. Rust's package manager uses toml for similar purposes. Here's an example ``_pypackage/_pypackage.cfg``:: # Version of the "pypackage format" that this file uses. # Optional. If not present then 1 is assumed. # All version changes indicate incompatible changes; backwards # compatible changes are indicated by just having extra stuff in # the file. version = 1 [build] # An inline requirements file. Optional. # (FIXME: I guess this means we need a spec for requirements files?) requirements = """ mybuildtool >= 2.1 special_windows_tool ; sys_platform == "win32" """ # The path to an out-of-line requirements file. Optional. requirements-file = "build-requirements.txt" # A hook that will be called to query build requirements. Optional. requirements-dynamic = "mybuildtool:get_requirements" # A hook that will be called to build wheels. Required. build-wheels = "mybuildtool:do_build" # A hook that will be called to do an in-place build (see below). # Optional. build-in-place = "mybuildtool:do_inplace_build" # The "x" namespace is reserved for third-party extensions. # To use x.foo you should own the name "foo" on pypi. [x.mybuildtool] spam = ["spam", "spam", "spam"] All paths are relative to the ``_pypackage/`` directory (so e.g. the build.requirements-file value above refers to a file named ``_pypackage/build-requirements.txt``). A *hook* is a Python object that is looked up using the same rules as traditional setuptools entry_points: a dotted module name, followed by a colon, followed by a dotted name that is looked up within that module. *Running a hook* means: first, find or create a python interpreter which is executing in the current venv, whose working directory is set to the ``_pypackage/`` directory, and which has the ``_pypackage/`` directory on ``sys.path``. Then, inside this interpreter, look up the hook object, and call it, with arguments as specified below. A build command like ``pip wheel <source tree>`` performs the following steps: 1) Validate the ``_pypackage.cfg`` version number. 2) Create an empty virtualenv / venv, that matches the environment that the installer is targeting (e.g. if you want wheels for CPython 3.4 on 64-bit windows, then you make a CPython 3.4 64-bit windows venv). 3) If the build.requirements key is present, then in this venv run the equivalent of ``pip install -r <a file containing its value>``, using whatever index settings are currently in effect. 4) If the build.requirements-file key is present, then in this venv run the equivalent of ``pip install -r <the named file>``, using whatever index settings are currently in effect. 5) If the build.requirements-dynamic key is present, then in this venv run the hook with no arguments, capture its stdout, and pipe it into ``pip install -r -``, using whatever index settings are currently in effect. If the hook raises an exception, then abort the build with an error. Note: because these steps are performed in sequence, the build.requirements-dynamic hook is allowed to use packages that are listed in build.requirements or build.requirements-file. 6) In this venv, run the build.build-wheels hook. This should be a Python function which takes one argument. This argument is an arbitrary dictionary intended to contain user-specified configuration, specified via some install-tool-specific mechanism. The intention is that tools like ``pip`` should provide some way for users to specify key/value settings that will be passed in here, analogous to the legacy ``--install-option`` and ``--global-option`` arguments. To make it easier for packages to transition from version 0 to version 1 sdists, we suggest that ``pip`` and other tools that have such existing option-setting interfaces SHOULD map them to entries in this dictionary when -- e.g.:: pip --global-option=a --install-option=b --install-option=c could produce a dict like:: {"--global-option": ["a"], "--install-option": ["b", "c"]} The hook's return value is a list of pathnames relative to the scratch directory. Each entry names a wheel file created by this build. Errors are signaled by raising an exception. When performing an in-place build (e.g. for ``pip install -e .``), then the same steps are followed, except that instead of the build.build-wheels hook, we call the build.build-in-place hook, and instead of returning a list of wheel files, it returns the name of a directory that should be placed onto ``sys.path`` (usually this will be the source tree itself, but may not be, e.g. if a build system wants to enforce a rule where the source is always kept pristine then it could symlink the .py files into a build directory, place the extension modules and dist-info there, and return that). This directory must contain importable versions of the code in the source tree, along with appropriate .dist-info directories. (FIXME: in-place builds are useful but intrinsically kinda broken -- e.g. extensions / source / metadata can all easily get out of sync -- so while I think this paragraph provides a reasonable hack that preserves current functionality, maybe we should defer specifying them to until after we've thought through the issues more?) When working with source trees, build tools like ``pip`` are encouraged to cache and re-use virtualenvs for performance. Other contents of _pypackage/ ----------------------------- _RECORD, _RECORD.jws, _RECORD.p7s: see below. _x/<pypi name>/: reserved for use by tools (e.g. _x/mybuildtool/build/, _x/pip/venv-cache/cp34-none-linux_x86_64/) Source distributions ==================== A *source distribution* is a file in a well-known archive format such as zip or tar.gz, which contains a single directory, and this directory is a source tree (in the sense defined in the previous section). The ``_pypackage/`` directory in a source distribution SHOULD also contain a _RECORD file, as defined in PEP 427, and MAY also contain _RECORD.jws and/or _RECORD.p7s signature files. For official releases, source distributions SHOULD be named as ``<package>-<version>.<ext>``, and the directory they contain SHOULD be named ``<package>-<version>``, and building this source tree SHOULD produce a wheel named ``<package>-<version>-<compatibility tag>.whl`` (though it may produce other wheels as well). (FIXME: maybe we should add that if you want your sdist on PyPI then you MUST include a proper _RECORD file and use the proper naming convention?) Integration tools like ``pip`` SHOULD take advantage of this convention by applying the following heuristic: when seeking a package <package>, if no appropriate wheel can be found, but an sdist named <package>-<version>.<ext> is found, then: 1) build the sdist 2) add the resulting wheels to the package search space 3) retry the original operation This handles a variety of simple and complex cases -- for example, if we need a package 'foo', and we find foo-1.0.zip which builds foo.whl and bar.whl, and foo.whl depends on bar.whl, then everything will work out. There remain other cases that are not handled, e.g. if we start out searching for bar.whl we will never discover foo-1.0.zip. We take the perspective that this is nonetheless sufficient for a transitional heuristic, and anyone who runs into this problem should just upload wheels already. If this turns out to be inadequate in practice, then it will be addressed by future extensions. Examples ======== **Example 1:** While we assume that installation tools will have to continue supporting version 0 sdists for the indefinite future, it's a useful check to make sure that our new format can continue to support packages using distutils / setuptools as their build system. We assume that a future version ``pip`` will take its existing knowledge of distutils internals and expose them as the appropriate hooks, and then existing distutils / setuptools packages can be ported forward by using the following ``_pypackage/_pypackage.cfg``:: [build] requirements = """ pip >= whatever wheel """ # Applies monkeypatches, then does 'setup.py dist_info' and # extracts the setup_requires requirements-dynamic = "pip.pypackage_hooks:setup_requirements" # Applies monkeypatches, then does 'setup.py wheel' build-wheels = "pip.pypackage_hooks:build_wheels" # Applies monkeypatches, then does: # setup.py dist_info && setup.py build_ext -i build-in-place = "pip.pypackage_hooks:build_in_place" This is also useful for any other installation tools that may want to support version 0 sdists without having to implement bug-for-bug compatibility with pip -- if no ``_pypackage/_pypackage.cfg`` is present, they can use this as a default. **Example 2:** For packages using numpy.distutils. This is identical to the distutils / setuptools example above, except that numpy is moved into the list of static build requirements. Right now, most projects using numpy.distutils don't bother trying to declare this dependency, and instead simply error out if numpy is not already installed. This is because currently the only way to declare a build dependency is via the ``setup_requires`` argument to the ``setup`` function, and in this case the ``setup`` function is ``numpy.distutils.setup``, which... obviously doesn't work very well. Drop this ``_pypackage.cfg`` into an existing project like this and it will become robustly pip-installable with no further changes:: [build] requirements = """ numpy pip >= whatever wheel """ requirements-dynamic = "pip.pypackage_hooks:setup_requirements" build-wheels = "pip.pypackage_hooks:build_wheels" build-in-place = "pip.pypackage_hooks:build_in_place" **Example 3:** `flit <https://github.com/takluyver/flit>`_ is a tool designed to make distributing simple packages simple, but it currently has no support for sdists, and for convenience includes its own installation code that's redundant with that in pip. These 4 lines of boilerplate make any flit-using source tree pip-installable, and lets flit get out of the package installation business:: [build] requirements = "flit" build-wheels = "flit.pypackage_hooks:build_wheels" build-in-place = "flit.pypackage_hooks:build_in_place" FAQ === **Why is it version 1 instead of version 2?** Because the legacy sdist format is barely a format at all, and to `remind us to keep things simple <https://en.wikipedia.org/wiki/The_Mythical_Man-Month#The_second-system_effect>`_. **What about cross-compilation?** Standardizing an interface for cross-compilation seems premature given how complicated the configuration required can be, the lack of an existing de facto standard, and the authors of this PEP's inexperience with cross-compilation. This would be a great target for future extensions, though. In the mean time, there's no requirement that ``_pypackage/_pypackage.cfg`` contain the *only* entry points to a project's build system -- packages that want to support cross-compilation can still do so, they'll just need to include a README explaining how to do it. **PEP 426 says that the new sdist format will support automatically creating policy-compliant .deb/.rpm packages. What happened to that?** Step 1: enhance the wheel format as necessary so that a wheel can be automatically converted into a policy-compliant .deb/.rpm package (see PEP 491). Step 2: make it possible to automatically turn sdists into wheels (this PEP). Step 3: we're done. **What about automatically running tests?** Arguably this is another thing that should be pushed off to wheel metadata instead of sdist metadata: it's good practice to include tests inside your built distribution so that end-users can test their install (and see above re: our focus here being on stuff that end-users want to do, not dedicated package developers), there are lots of packages that have to be built before they can be tested anyway (e.g. because of binary extensions), and in any case it's good practice to test against an installed version in order to make sure your install code works properly. But even if we do want this in sdist, then it's hardly urgent (e.g. there is no ``pip test`` that people will miss), so we defer that for a future extension to avoid blocking the core functionality. -- Nathaniel J. Smith -- http://vorpus.org

Can you clarify the relationship to PEP426 metadata? There's no standard for metadata in here other than what's required to run a build hook. Does that imply you would have each build tool enforce their own convention for where metadata is found? On Thu, Oct 1, 2015 at 9:53 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
We realized that actually as far as we could tell, it wouldn't be that hard at this point to clean up how sdists work so that it would be possible to migrate away from distutils. So we wrote up a little draft proposal.
The main question is, does this approach seem sound?
-n
---
PEP: ?? Title: Standard interface for interacting with source trees and source distributions Version: $Revision$ Last-Modified: $Date$ Author: Nathaniel J. Smith <njs@pobox.com> Thomas Kluyver <takowl@gmail.com> Status: Draft Type: Standards-Track Content-Type: text/x-rst Created: 30-Sep-2015 Post-History: Discussions-To: <distutils-sig@python.org>
Abstract ========
Distutils delenda est.
Extended abstract =================
While ``distutils`` / ``setuptools`` have taken us a long way, they suffer from three serious problems: (a) they're missing important features like autoconfiguration and usable build-time dependency declaration, (b) extending them is quirky, complicated, and fragile, (c) you are forced to use them anyway, because they provide the standard interface for installing python packages expected by both users and installation tools like ``pip``.
Previous efforts (e.g. distutils2 or setuptools itself) have attempted to solve problems (a) and/or (b). We propose to solve (c).
The goal of this PEP is get distutils-sig out of the business of being a gatekeeper for Python build systems. If you want to use distutils, great; if you want to use something else, then the more the merrier. The difficulty of interfacing with distutils means that there aren't many such systems right now, but to give a sense of what we're thinking about see `flit <https://github.com/takluyver/flit>`_ or `bento <https://cournape.github.io/Bento/>`_. Fortunately, wheels have now solved many of the hard problems here -- e.g. it's no longer necessary that a build system also know about every possible installation configuration -- so pretty much all we really need from a build system is that it have some way to spit out standard-compliant wheels.
We therefore propose a new, relatively minimal interface for installation tools like ``pip`` to interact with package source trees and source distributions.
Synopsis and rationale ======================
To limit the scope of our design, we adopt several principles.
First, we distinguish between a *source tree* (e.g., a VCS checkout) and a *source distribution* (e.g., an official snapshot release like ``lxml-3.4.4.zip``).
There isn't a whole lot that *source trees* can be assumed to have in common. About all you know is that they can -- via some more or less Rube-Goldbergian process -- produce one or more binary distributions. In particular, you *cannot* tell via simple static inspection: - What version number will be attached to the resulting packages (e.g. it might be determined programmatically by consulting VCS metadata -- I have here a build of numpy version "1.11.0.dev0+4a9ad17") - What build- or run-time dependencies are required (e.g. these may depend on arbitrarily complex configuration settings that are determined via a mix of manual settings and auto-probing) - Or even how many distinct binary distributions will be produced (e.g. a source distribution may always produce wheel A, but only produce wheel B when built on Unix-like systems).
Therefore, when dealing with source trees, our goal is just to provide a standard UX for the core operations that are commonly performed on other people's packages; anything fancier and more developer-centric we leave at the discretion of individual package developers. So our source trees just provide some simple hooks to let a tool like ``pip``:
- query for build dependencies - run a build, producing wheels as output - set up the current source tree so that it can be placed on ``sys.path`` in "develop mode"
and that's it. We teach users that the standard way to install a package from a VCS checkout is now ``pip install .`` instead of ``python setup.py install``. (This is already a good idea anyway -- e.g., pip can do reliable uninstall / upgrades.)
Next, we note that pretty much all the operations that you might want to perform on a *source distribution* are also operations that you might want to perform on a source tree, and via the same UX. The only thing you do with source distributions that you don't do with source trees is, well, distribute them. There's all kind of metadata you could imagine including in a source distribution, but each piece of metadata puts an increased burden on source distribution generation tools, and most operations will still have to work without this metadata. So we only include extra metadata in source distributions if it helps solve specific problems that are unique to distribution. If you want wheel-style metadata, get a wheel and look at it -- they're great and getting better.
Therefore, our source distributions are basically just source trees + a mechanism for signing.
Finally: we explicitly do *not* have any concept of "depending on a source distribution". As in other systems like Debian, dependencies are always phrased in terms of binary distributions (wheels), and when a user runs something like ``pip install <package>``, then the long-run plan is that <package> and all its transitive dependencies should be available as wheels in a package index. But this is not yet realistic, so as a transitional / backwards-compatibility measure, we provide a simple mechanism for ``pip install <package>`` to handle cases where <package> is provided only as a source distribution.
Source trees ============
We retroactively declare the legacy source tree format involving ``setup.py`` to be "version 0". We don't try to specify it further; its de facto specification is encoded in the source code of ``distutils``, ``setuptools``, ``pip``, and other tools.
A version 1-or-greater format source tree can be identified by the presence of a file ``_pypackage/_pypackage.cfg``.
If both ``_pypackage/_pypackage.cfg`` and ``setup.py`` are present, then we have a version 1+ source tree, i.e., ``setup.py`` is ignored. This is necessary because we anticipate that version 1+ source trees may want to contain a ``setup.py`` file for backwards compatibility, e.g.::
#!/usr/bin/env python import sys print("Don't call setup.py directly!") print("Use 'pip install .' instead!") print("(You might have to upgrade pip first.)") sys.exit(1)
In the current version of the specification, the one file ``_pypackage/_pypackage.cfg`` is where pretty much all the action is (though see below). The motivation for putting it into a subdirectory is that: - the way of all standards is that cruft accumulates over time, so this way we pre-emptively have a place to put it, - real-world projects often accumulate build system cruft as well, so we might as well provide one obvious place to put it too.
Of course this then creates the possibility of collisions between standard files and user files, and trying to teach arbitrary users not to scatter files around willy-nilly never works, so we adopt the convention that names starting with an underscore are reserved for official use, and non-underscored names are available for idiosyncratic use by individual projects.
The alternative would be to simply place the main configuration file at the top-level, create the subdirectory only when specifically needed (most trees won't need it), and let users worry about finding their own place for their cruft. Not sure which is the best approach. Plus we can have a nice bikeshed about the names in general (FIXME).
_pypackage.cfg --------------
The ``_pypackage.cfg`` file contains various settings. Another good bike-shed topic is which file format to use for storing these (FIXME), but for purposes of this draft I'll write examples using `toml <https://github.com/toml-lang/toml>`_, because you'll instantly be able to understand the semantics, it has similar expressivity to JSON while being more human-friendly (e.g., it supports comments and multi-line strings), it's better-specified than ConfigParser, and it's much simpler than YAML. Rust's package manager uses toml for similar purposes.
Here's an example ``_pypackage/_pypackage.cfg``::
# Version of the "pypackage format" that this file uses. # Optional. If not present then 1 is assumed. # All version changes indicate incompatible changes; backwards # compatible changes are indicated by just having extra stuff in # the file. version = 1
[build] # An inline requirements file. Optional. # (FIXME: I guess this means we need a spec for requirements files?) requirements = """ mybuildtool >= 2.1 special_windows_tool ; sys_platform == "win32" """ # The path to an out-of-line requirements file. Optional. requirements-file = "build-requirements.txt" # A hook that will be called to query build requirements. Optional. requirements-dynamic = "mybuildtool:get_requirements"
# A hook that will be called to build wheels. Required. build-wheels = "mybuildtool:do_build"
# A hook that will be called to do an in-place build (see below). # Optional. build-in-place = "mybuildtool:do_inplace_build"
# The "x" namespace is reserved for third-party extensions. # To use x.foo you should own the name "foo" on pypi. [x.mybuildtool] spam = ["spam", "spam", "spam"]
All paths are relative to the ``_pypackage/`` directory (so e.g. the build.requirements-file value above refers to a file named ``_pypackage/build-requirements.txt``).
A *hook* is a Python object that is looked up using the same rules as traditional setuptools entry_points: a dotted module name, followed by a colon, followed by a dotted name that is looked up within that module. *Running a hook* means: first, find or create a python interpreter which is executing in the current venv, whose working directory is set to the ``_pypackage/`` directory, and which has the ``_pypackage/`` directory on ``sys.path``. Then, inside this interpreter, look up the hook object, and call it, with arguments as specified below.
A build command like ``pip wheel <source tree>`` performs the following steps:
1) Validate the ``_pypackage.cfg`` version number.
2) Create an empty virtualenv / venv, that matches the environment that the installer is targeting (e.g. if you want wheels for CPython 3.4 on 64-bit windows, then you make a CPython 3.4 64-bit windows venv).
3) If the build.requirements key is present, then in this venv run the equivalent of ``pip install -r <a file containing its value>``, using whatever index settings are currently in effect.
4) If the build.requirements-file key is present, then in this venv run the equivalent of ``pip install -r <the named file>``, using whatever index settings are currently in effect.
5) If the build.requirements-dynamic key is present, then in this venv run the hook with no arguments, capture its stdout, and pipe it into ``pip install -r -``, using whatever index settings are currently in effect. If the hook raises an exception, then abort the build with an error.
Note: because these steps are performed in sequence, the build.requirements-dynamic hook is allowed to use packages that are listed in build.requirements or build.requirements-file.
6) In this venv, run the build.build-wheels hook. This should be a Python function which takes one argument.
This argument is an arbitrary dictionary intended to contain user-specified configuration, specified via some install-tool-specific mechanism. The intention is that tools like ``pip`` should provide some way for users to specify key/value settings that will be passed in here, analogous to the legacy ``--install-option`` and ``--global-option`` arguments.
To make it easier for packages to transition from version 0 to version 1 sdists, we suggest that ``pip`` and other tools that have such existing option-setting interfaces SHOULD map them to entries in this dictionary when -- e.g.::
pip --global-option=a --install-option=b --install-option=c
could produce a dict like::
{"--global-option": ["a"], "--install-option": ["b", "c"]}
The hook's return value is a list of pathnames relative to the scratch directory. Each entry names a wheel file created by this build.
Errors are signaled by raising an exception.
When performing an in-place build (e.g. for ``pip install -e .``), then the same steps are followed, except that instead of the build.build-wheels hook, we call the build.build-in-place hook, and instead of returning a list of wheel files, it returns the name of a directory that should be placed onto ``sys.path`` (usually this will be the source tree itself, but may not be, e.g. if a build system wants to enforce a rule where the source is always kept pristine then it could symlink the .py files into a build directory, place the extension modules and dist-info there, and return that). This directory must contain importable versions of the code in the source tree, along with appropriate .dist-info directories.
(FIXME: in-place builds are useful but intrinsically kinda broken -- e.g. extensions / source / metadata can all easily get out of sync -- so while I think this paragraph provides a reasonable hack that preserves current functionality, maybe we should defer specifying them to until after we've thought through the issues more?)
When working with source trees, build tools like ``pip`` are encouraged to cache and re-use virtualenvs for performance.
Other contents of _pypackage/ -----------------------------
_RECORD, _RECORD.jws, _RECORD.p7s: see below.
_x/<pypi name>/: reserved for use by tools (e.g. _x/mybuildtool/build/, _x/pip/venv-cache/cp34-none-linux_x86_64/)
Source distributions ====================
A *source distribution* is a file in a well-known archive format such as zip or tar.gz, which contains a single directory, and this directory is a source tree (in the sense defined in the previous section).
The ``_pypackage/`` directory in a source distribution SHOULD also contain a _RECORD file, as defined in PEP 427, and MAY also contain _RECORD.jws and/or _RECORD.p7s signature files.
For official releases, source distributions SHOULD be named as ``<package>-<version>.<ext>``, and the directory they contain SHOULD be named ``<package>-<version>``, and building this source tree SHOULD produce a wheel named ``<package>-<version>-<compatibility tag>.whl`` (though it may produce other wheels as well).
(FIXME: maybe we should add that if you want your sdist on PyPI then you MUST include a proper _RECORD file and use the proper naming convention?)
Integration tools like ``pip`` SHOULD take advantage of this convention by applying the following heuristic: when seeking a package <package>, if no appropriate wheel can be found, but an sdist named <package>-<version>.<ext> is found, then:
1) build the sdist 2) add the resulting wheels to the package search space 3) retry the original operation
This handles a variety of simple and complex cases -- for example, if we need a package 'foo', and we find foo-1.0.zip which builds foo.whl and bar.whl, and foo.whl depends on bar.whl, then everything will work out. There remain other cases that are not handled, e.g. if we start out searching for bar.whl we will never discover foo-1.0.zip. We take the perspective that this is nonetheless sufficient for a transitional heuristic, and anyone who runs into this problem should just upload wheels already. If this turns out to be inadequate in practice, then it will be addressed by future extensions.
Examples ========
**Example 1:** While we assume that installation tools will have to continue supporting version 0 sdists for the indefinite future, it's a useful check to make sure that our new format can continue to support packages using distutils / setuptools as their build system. We assume that a future version ``pip`` will take its existing knowledge of distutils internals and expose them as the appropriate hooks, and then existing distutils / setuptools packages can be ported forward by using the following ``_pypackage/_pypackage.cfg``::
[build] requirements = """ pip >= whatever wheel """ # Applies monkeypatches, then does 'setup.py dist_info' and # extracts the setup_requires requirements-dynamic = "pip.pypackage_hooks:setup_requirements" # Applies monkeypatches, then does 'setup.py wheel' build-wheels = "pip.pypackage_hooks:build_wheels" # Applies monkeypatches, then does: # setup.py dist_info && setup.py build_ext -i build-in-place = "pip.pypackage_hooks:build_in_place"
This is also useful for any other installation tools that may want to support version 0 sdists without having to implement bug-for-bug compatibility with pip -- if no ``_pypackage/_pypackage.cfg`` is present, they can use this as a default.
**Example 2:** For packages using numpy.distutils. This is identical to the distutils / setuptools example above, except that numpy is moved into the list of static build requirements. Right now, most projects using numpy.distutils don't bother trying to declare this dependency, and instead simply error out if numpy is not already installed. This is because currently the only way to declare a build dependency is via the ``setup_requires`` argument to the ``setup`` function, and in this case the ``setup`` function is ``numpy.distutils.setup``, which... obviously doesn't work very well. Drop this ``_pypackage.cfg`` into an existing project like this and it will become robustly pip-installable with no further changes::
[build] requirements = """ numpy pip >= whatever wheel """ requirements-dynamic = "pip.pypackage_hooks:setup_requirements" build-wheels = "pip.pypackage_hooks:build_wheels" build-in-place = "pip.pypackage_hooks:build_in_place"
**Example 3:** `flit <https://github.com/takluyver/flit>`_ is a tool designed to make distributing simple packages simple, but it currently has no support for sdists, and for convenience includes its own installation code that's redundant with that in pip. These 4 lines of boilerplate make any flit-using source tree pip-installable, and lets flit get out of the package installation business::
[build] requirements = "flit" build-wheels = "flit.pypackage_hooks:build_wheels" build-in-place = "flit.pypackage_hooks:build_in_place"
FAQ ===
**Why is it version 1 instead of version 2?** Because the legacy sdist format is barely a format at all, and to `remind us to keep things simple < https://en.wikipedia.org/wiki/The_Mythical_Man-Month#The_second-system_effec...
`_.
**What about cross-compilation?** Standardizing an interface for cross-compilation seems premature given how complicated the configuration required can be, the lack of an existing de facto standard, and the authors of this PEP's inexperience with cross-compilation. This would be a great target for future extensions, though. In the mean time, there's no requirement that ``_pypackage/_pypackage.cfg`` contain the *only* entry points to a project's build system -- packages that want to support cross-compilation can still do so, they'll just need to include a README explaining how to do it.
**PEP 426 says that the new sdist format will support automatically creating policy-compliant .deb/.rpm packages. What happened to that?** Step 1: enhance the wheel format as necessary so that a wheel can be automatically converted into a policy-compliant .deb/.rpm package (see PEP 491). Step 2: make it possible to automatically turn sdists into wheels (this PEP). Step 3: we're done.
**What about automatically running tests?** Arguably this is another thing that should be pushed off to wheel metadata instead of sdist metadata: it's good practice to include tests inside your built distribution so that end-users can test their install (and see above re: our focus here being on stuff that end-users want to do, not dedicated package developers), there are lots of packages that have to be built before they can be tested anyway (e.g. because of binary extensions), and in any case it's good practice to test against an installed version in order to make sure your install code works properly. But even if we do want this in sdist, then it's hardly urgent (e.g. there is no ``pip test`` that people will miss), so we defer that for a future extension to avoid blocking the core functionality.
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On Oct 1, 2015 10:45 PM, "Marcus Smith" <qwcode@gmail.com> wrote:
Can you clarify the relationship to PEP426 metadata? There's no standard for metadata in here other than what's required to
run a build hook. Right -- the idea is that discretely installable binary packages (i.e. wheels) are a different sort of thing than a source tree / source distribution, and need different metadata. PEP 426 is then taken as a draft of the next metadata spec *for wheels*, and this is a draft "simplest thing that could possibly work" metadata spec for source trees / source distributions. The "synopsis and rationale" section provides more motivation for this approach.
Does that imply you would have each build tool enforce their own convention for where metadata is found?
I think you mean: "does that imply that each build tool would have its own way of determining what metadata to attach to the wheels it generates?", and the answer to that is yes -- already right now distutils does it via kwargs passed to the setup() function, flit does it via entries in its flit.ini file (though for simplicity it might later move this to an extension section in _pypackage.cfg -- and note that this purely declarative style is only possible because flit is designed to make simple things easy and complicated things someone else's problem), and in general it might require arbitrary code to be executed. I expect that some conventions will probably get sorted out once we have experimental build systems competing with each other and figuring out what works and what doesn't, but distutils-sig has historically had poor luck with trying to design this kind of thing a priori. -n

One way to do sdist 2.0 would be to have the package-1.0.dist-info directory in there (most sdists contain setuptools metadata) and to have a flag static-metadata=1 in setup.cfg asserting that setup.py [if present] does not alter the list of dependencies. In the old MEBS design the package could suggest a build system, but pip would invoke a list of build plugins to inspect the directory and return True if they were able to build the package. This would allow for ignoring the package's suggested build system. Instead of defining a command line interface for setup.py MEBS would define a set of methods on the build plugin. I thought Robert Collins had a working setup-requires implementation already? I have a worse but backwards compatible one too at https://bitbucket.org/dholth/setup-requires/src/tip/setup.py On Fri, Oct 2, 2015 at 9:42 AM Marcus Smith <qwcode@gmail.com> wrote:
Can you clarify the relationship to PEP426 metadata? There's no standard for metadata in here other than what's required to run a build hook. Does that imply you would have each build tool enforce their own convention for where metadata is found?
On Thu, Oct 1, 2015 at 9:53 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
We realized that actually as far as we could tell, it wouldn't be that hard at this point to clean up how sdists work so that it would be possible to migrate away from distutils. So we wrote up a little draft proposal.
The main question is, does this approach seem sound?
-n
---
PEP: ?? Title: Standard interface for interacting with source trees and source distributions Version: $Revision$ Last-Modified: $Date$ Author: Nathaniel J. Smith <njs@pobox.com> Thomas Kluyver <takowl@gmail.com> Status: Draft Type: Standards-Track Content-Type: text/x-rst Created: 30-Sep-2015 Post-History: Discussions-To: <distutils-sig@python.org>
Abstract ========
Distutils delenda est.
Extended abstract =================
While ``distutils`` / ``setuptools`` have taken us a long way, they suffer from three serious problems: (a) they're missing important features like autoconfiguration and usable build-time dependency declaration, (b) extending them is quirky, complicated, and fragile, (c) you are forced to use them anyway, because they provide the standard interface for installing python packages expected by both users and installation tools like ``pip``.
Previous efforts (e.g. distutils2 or setuptools itself) have attempted to solve problems (a) and/or (b). We propose to solve (c).
The goal of this PEP is get distutils-sig out of the business of being a gatekeeper for Python build systems. If you want to use distutils, great; if you want to use something else, then the more the merrier. The difficulty of interfacing with distutils means that there aren't many such systems right now, but to give a sense of what we're thinking about see `flit <https://github.com/takluyver/flit>`_ or `bento <https://cournape.github.io/Bento/>`_. Fortunately, wheels have now solved many of the hard problems here -- e.g. it's no longer necessary that a build system also know about every possible installation configuration -- so pretty much all we really need from a build system is that it have some way to spit out standard-compliant wheels.
We therefore propose a new, relatively minimal interface for installation tools like ``pip`` to interact with package source trees and source distributions.
Synopsis and rationale ======================
To limit the scope of our design, we adopt several principles.
First, we distinguish between a *source tree* (e.g., a VCS checkout) and a *source distribution* (e.g., an official snapshot release like ``lxml-3.4.4.zip``).
There isn't a whole lot that *source trees* can be assumed to have in common. About all you know is that they can -- via some more or less Rube-Goldbergian process -- produce one or more binary distributions. In particular, you *cannot* tell via simple static inspection: - What version number will be attached to the resulting packages (e.g. it might be determined programmatically by consulting VCS metadata -- I have here a build of numpy version "1.11.0.dev0+4a9ad17") - What build- or run-time dependencies are required (e.g. these may depend on arbitrarily complex configuration settings that are determined via a mix of manual settings and auto-probing) - Or even how many distinct binary distributions will be produced (e.g. a source distribution may always produce wheel A, but only produce wheel B when built on Unix-like systems).
Therefore, when dealing with source trees, our goal is just to provide a standard UX for the core operations that are commonly performed on other people's packages; anything fancier and more developer-centric we leave at the discretion of individual package developers. So our source trees just provide some simple hooks to let a tool like ``pip``:
- query for build dependencies - run a build, producing wheels as output - set up the current source tree so that it can be placed on ``sys.path`` in "develop mode"
and that's it. We teach users that the standard way to install a package from a VCS checkout is now ``pip install .`` instead of ``python setup.py install``. (This is already a good idea anyway -- e.g., pip can do reliable uninstall / upgrades.)
Next, we note that pretty much all the operations that you might want to perform on a *source distribution* are also operations that you might want to perform on a source tree, and via the same UX. The only thing you do with source distributions that you don't do with source trees is, well, distribute them. There's all kind of metadata you could imagine including in a source distribution, but each piece of metadata puts an increased burden on source distribution generation tools, and most operations will still have to work without this metadata. So we only include extra metadata in source distributions if it helps solve specific problems that are unique to distribution. If you want wheel-style metadata, get a wheel and look at it -- they're great and getting better.
Therefore, our source distributions are basically just source trees + a mechanism for signing.
Finally: we explicitly do *not* have any concept of "depending on a source distribution". As in other systems like Debian, dependencies are always phrased in terms of binary distributions (wheels), and when a user runs something like ``pip install <package>``, then the long-run plan is that <package> and all its transitive dependencies should be available as wheels in a package index. But this is not yet realistic, so as a transitional / backwards-compatibility measure, we provide a simple mechanism for ``pip install <package>`` to handle cases where <package> is provided only as a source distribution.
Source trees ============
We retroactively declare the legacy source tree format involving ``setup.py`` to be "version 0". We don't try to specify it further; its de facto specification is encoded in the source code of ``distutils``, ``setuptools``, ``pip``, and other tools.
A version 1-or-greater format source tree can be identified by the presence of a file ``_pypackage/_pypackage.cfg``.
If both ``_pypackage/_pypackage.cfg`` and ``setup.py`` are present, then we have a version 1+ source tree, i.e., ``setup.py`` is ignored. This is necessary because we anticipate that version 1+ source trees may want to contain a ``setup.py`` file for backwards compatibility, e.g.::
#!/usr/bin/env python import sys print("Don't call setup.py directly!") print("Use 'pip install .' instead!") print("(You might have to upgrade pip first.)") sys.exit(1)
In the current version of the specification, the one file ``_pypackage/_pypackage.cfg`` is where pretty much all the action is (though see below). The motivation for putting it into a subdirectory is that: - the way of all standards is that cruft accumulates over time, so this way we pre-emptively have a place to put it, - real-world projects often accumulate build system cruft as well, so we might as well provide one obvious place to put it too.
Of course this then creates the possibility of collisions between standard files and user files, and trying to teach arbitrary users not to scatter files around willy-nilly never works, so we adopt the convention that names starting with an underscore are reserved for official use, and non-underscored names are available for idiosyncratic use by individual projects.
The alternative would be to simply place the main configuration file at the top-level, create the subdirectory only when specifically needed (most trees won't need it), and let users worry about finding their own place for their cruft. Not sure which is the best approach. Plus we can have a nice bikeshed about the names in general (FIXME).
_pypackage.cfg --------------
The ``_pypackage.cfg`` file contains various settings. Another good bike-shed topic is which file format to use for storing these (FIXME), but for purposes of this draft I'll write examples using `toml <https://github.com/toml-lang/toml>`_, because you'll instantly be able to understand the semantics, it has similar expressivity to JSON while being more human-friendly (e.g., it supports comments and multi-line strings), it's better-specified than ConfigParser, and it's much simpler than YAML. Rust's package manager uses toml for similar purposes.
Here's an example ``_pypackage/_pypackage.cfg``::
# Version of the "pypackage format" that this file uses. # Optional. If not present then 1 is assumed. # All version changes indicate incompatible changes; backwards # compatible changes are indicated by just having extra stuff in # the file. version = 1
[build] # An inline requirements file. Optional. # (FIXME: I guess this means we need a spec for requirements files?) requirements = """ mybuildtool >= 2.1 special_windows_tool ; sys_platform == "win32" """ # The path to an out-of-line requirements file. Optional. requirements-file = "build-requirements.txt" # A hook that will be called to query build requirements. Optional. requirements-dynamic = "mybuildtool:get_requirements"
# A hook that will be called to build wheels. Required. build-wheels = "mybuildtool:do_build"
# A hook that will be called to do an in-place build (see below). # Optional. build-in-place = "mybuildtool:do_inplace_build"
# The "x" namespace is reserved for third-party extensions. # To use x.foo you should own the name "foo" on pypi. [x.mybuildtool] spam = ["spam", "spam", "spam"]
All paths are relative to the ``_pypackage/`` directory (so e.g. the build.requirements-file value above refers to a file named ``_pypackage/build-requirements.txt``).
A *hook* is a Python object that is looked up using the same rules as traditional setuptools entry_points: a dotted module name, followed by a colon, followed by a dotted name that is looked up within that module. *Running a hook* means: first, find or create a python interpreter which is executing in the current venv, whose working directory is set to the ``_pypackage/`` directory, and which has the ``_pypackage/`` directory on ``sys.path``. Then, inside this interpreter, look up the hook object, and call it, with arguments as specified below.
A build command like ``pip wheel <source tree>`` performs the following steps:
1) Validate the ``_pypackage.cfg`` version number.
2) Create an empty virtualenv / venv, that matches the environment that the installer is targeting (e.g. if you want wheels for CPython 3.4 on 64-bit windows, then you make a CPython 3.4 64-bit windows venv).
3) If the build.requirements key is present, then in this venv run the equivalent of ``pip install -r <a file containing its value>``, using whatever index settings are currently in effect.
4) If the build.requirements-file key is present, then in this venv run the equivalent of ``pip install -r <the named file>``, using whatever index settings are currently in effect.
5) If the build.requirements-dynamic key is present, then in this venv run the hook with no arguments, capture its stdout, and pipe it into ``pip install -r -``, using whatever index settings are currently in effect. If the hook raises an exception, then abort the build with an error.
Note: because these steps are performed in sequence, the build.requirements-dynamic hook is allowed to use packages that are listed in build.requirements or build.requirements-file.
6) In this venv, run the build.build-wheels hook. This should be a Python function which takes one argument.
This argument is an arbitrary dictionary intended to contain user-specified configuration, specified via some install-tool-specific mechanism. The intention is that tools like ``pip`` should provide some way for users to specify key/value settings that will be passed in here, analogous to the legacy ``--install-option`` and ``--global-option`` arguments.
To make it easier for packages to transition from version 0 to version 1 sdists, we suggest that ``pip`` and other tools that have such existing option-setting interfaces SHOULD map them to entries in this dictionary when -- e.g.::
pip --global-option=a --install-option=b --install-option=c
could produce a dict like::
{"--global-option": ["a"], "--install-option": ["b", "c"]}
The hook's return value is a list of pathnames relative to the scratch directory. Each entry names a wheel file created by this build.
Errors are signaled by raising an exception.
When performing an in-place build (e.g. for ``pip install -e .``), then the same steps are followed, except that instead of the build.build-wheels hook, we call the build.build-in-place hook, and instead of returning a list of wheel files, it returns the name of a directory that should be placed onto ``sys.path`` (usually this will be the source tree itself, but may not be, e.g. if a build system wants to enforce a rule where the source is always kept pristine then it could symlink the .py files into a build directory, place the extension modules and dist-info there, and return that). This directory must contain importable versions of the code in the source tree, along with appropriate .dist-info directories.
(FIXME: in-place builds are useful but intrinsically kinda broken -- e.g. extensions / source / metadata can all easily get out of sync -- so while I think this paragraph provides a reasonable hack that preserves current functionality, maybe we should defer specifying them to until after we've thought through the issues more?)
When working with source trees, build tools like ``pip`` are encouraged to cache and re-use virtualenvs for performance.
Other contents of _pypackage/ -----------------------------
_RECORD, _RECORD.jws, _RECORD.p7s: see below.
_x/<pypi name>/: reserved for use by tools (e.g. _x/mybuildtool/build/, _x/pip/venv-cache/cp34-none-linux_x86_64/)
Source distributions ====================
A *source distribution* is a file in a well-known archive format such as zip or tar.gz, which contains a single directory, and this directory is a source tree (in the sense defined in the previous section).
The ``_pypackage/`` directory in a source distribution SHOULD also contain a _RECORD file, as defined in PEP 427, and MAY also contain _RECORD.jws and/or _RECORD.p7s signature files.
For official releases, source distributions SHOULD be named as ``<package>-<version>.<ext>``, and the directory they contain SHOULD be named ``<package>-<version>``, and building this source tree SHOULD produce a wheel named ``<package>-<version>-<compatibility tag>.whl`` (though it may produce other wheels as well).
(FIXME: maybe we should add that if you want your sdist on PyPI then you MUST include a proper _RECORD file and use the proper naming convention?)
Integration tools like ``pip`` SHOULD take advantage of this convention by applying the following heuristic: when seeking a package <package>, if no appropriate wheel can be found, but an sdist named <package>-<version>.<ext> is found, then:
1) build the sdist 2) add the resulting wheels to the package search space 3) retry the original operation
This handles a variety of simple and complex cases -- for example, if we need a package 'foo', and we find foo-1.0.zip which builds foo.whl and bar.whl, and foo.whl depends on bar.whl, then everything will work out. There remain other cases that are not handled, e.g. if we start out searching for bar.whl we will never discover foo-1.0.zip. We take the perspective that this is nonetheless sufficient for a transitional heuristic, and anyone who runs into this problem should just upload wheels already. If this turns out to be inadequate in practice, then it will be addressed by future extensions.
Examples ========
**Example 1:** While we assume that installation tools will have to continue supporting version 0 sdists for the indefinite future, it's a useful check to make sure that our new format can continue to support packages using distutils / setuptools as their build system. We assume that a future version ``pip`` will take its existing knowledge of distutils internals and expose them as the appropriate hooks, and then existing distutils / setuptools packages can be ported forward by using the following ``_pypackage/_pypackage.cfg``::
[build] requirements = """ pip >= whatever wheel """ # Applies monkeypatches, then does 'setup.py dist_info' and # extracts the setup_requires requirements-dynamic = "pip.pypackage_hooks:setup_requirements" # Applies monkeypatches, then does 'setup.py wheel' build-wheels = "pip.pypackage_hooks:build_wheels" # Applies monkeypatches, then does: # setup.py dist_info && setup.py build_ext -i build-in-place = "pip.pypackage_hooks:build_in_place"
This is also useful for any other installation tools that may want to support version 0 sdists without having to implement bug-for-bug compatibility with pip -- if no ``_pypackage/_pypackage.cfg`` is present, they can use this as a default.
**Example 2:** For packages using numpy.distutils. This is identical to the distutils / setuptools example above, except that numpy is moved into the list of static build requirements. Right now, most projects using numpy.distutils don't bother trying to declare this dependency, and instead simply error out if numpy is not already installed. This is because currently the only way to declare a build dependency is via the ``setup_requires`` argument to the ``setup`` function, and in this case the ``setup`` function is ``numpy.distutils.setup``, which... obviously doesn't work very well. Drop this ``_pypackage.cfg`` into an existing project like this and it will become robustly pip-installable with no further changes::
[build] requirements = """ numpy pip >= whatever wheel """ requirements-dynamic = "pip.pypackage_hooks:setup_requirements" build-wheels = "pip.pypackage_hooks:build_wheels" build-in-place = "pip.pypackage_hooks:build_in_place"
**Example 3:** `flit <https://github.com/takluyver/flit>`_ is a tool designed to make distributing simple packages simple, but it currently has no support for sdists, and for convenience includes its own installation code that's redundant with that in pip. These 4 lines of boilerplate make any flit-using source tree pip-installable, and lets flit get out of the package installation business::
[build] requirements = "flit" build-wheels = "flit.pypackage_hooks:build_wheels" build-in-place = "flit.pypackage_hooks:build_in_place"
FAQ ===
**Why is it version 1 instead of version 2?** Because the legacy sdist format is barely a format at all, and to `remind us to keep things simple < https://en.wikipedia.org/wiki/The_Mythical_Man-Month#The_second-system_effec...
`_.
**What about cross-compilation?** Standardizing an interface for cross-compilation seems premature given how complicated the configuration required can be, the lack of an existing de facto standard, and the authors of this PEP's inexperience with cross-compilation. This would be a great target for future extensions, though. In the mean time, there's no requirement that ``_pypackage/_pypackage.cfg`` contain the *only* entry points to a project's build system -- packages that want to support cross-compilation can still do so, they'll just need to include a README explaining how to do it.
**PEP 426 says that the new sdist format will support automatically creating policy-compliant .deb/.rpm packages. What happened to that?** Step 1: enhance the wheel format as necessary so that a wheel can be automatically converted into a policy-compliant .deb/.rpm package (see PEP 491). Step 2: make it possible to automatically turn sdists into wheels (this PEP). Step 3: we're done.
**What about automatically running tests?** Arguably this is another thing that should be pushed off to wheel metadata instead of sdist metadata: it's good practice to include tests inside your built distribution so that end-users can test their install (and see above re: our focus here being on stuff that end-users want to do, not dedicated package developers), there are lots of packages that have to be built before they can be tested anyway (e.g. because of binary extensions), and in any case it's good practice to test against an installed version in order to make sure your install code works properly. But even if we do want this in sdist, then it's hardly urgent (e.g. there is no ``pip test`` that people will miss), so we defer that for a future extension to avoid blocking the core functionality.
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

Thank you for your work on this. We have to kill distutils to make progress in packaging. On Fri, Oct 2, 2015 at 10:12 AM Daniel Holth <dholth@gmail.com> wrote:
One way to do sdist 2.0 would be to have the package-1.0.dist-info directory in there (most sdists contain setuptools metadata) and to have a flag static-metadata=1 in setup.cfg asserting that setup.py [if present] does not alter the list of dependencies.
In the old MEBS design the package could suggest a build system, but pip would invoke a list of build plugins to inspect the directory and return True if they were able to build the package. This would allow for ignoring the package's suggested build system. Instead of defining a command line interface for setup.py MEBS would define a set of methods on the build plugin.
I thought Robert Collins had a working setup-requires implementation already? I have a worse but backwards compatible one too at https://bitbucket.org/dholth/setup-requires/src/tip/setup.py
On Fri, Oct 2, 2015 at 9:42 AM Marcus Smith <qwcode@gmail.com> wrote:
Can you clarify the relationship to PEP426 metadata? There's no standard for metadata in here other than what's required to run a build hook. Does that imply you would have each build tool enforce their own convention for where metadata is found?
On Thu, Oct 1, 2015 at 9:53 PM, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
We realized that actually as far as we could tell, it wouldn't be that hard at this point to clean up how sdists work so that it would be possible to migrate away from distutils. So we wrote up a little draft proposal.
The main question is, does this approach seem sound?
-n
---
PEP: ?? Title: Standard interface for interacting with source trees and source distributions Version: $Revision$ Last-Modified: $Date$ Author: Nathaniel J. Smith <njs@pobox.com> Thomas Kluyver <takowl@gmail.com> Status: Draft Type: Standards-Track Content-Type: text/x-rst Created: 30-Sep-2015 Post-History: Discussions-To: <distutils-sig@python.org>
Abstract ========
Distutils delenda est.
Extended abstract =================
While ``distutils`` / ``setuptools`` have taken us a long way, they suffer from three serious problems: (a) they're missing important features like autoconfiguration and usable build-time dependency declaration, (b) extending them is quirky, complicated, and fragile, (c) you are forced to use them anyway, because they provide the standard interface for installing python packages expected by both users and installation tools like ``pip``.
Previous efforts (e.g. distutils2 or setuptools itself) have attempted to solve problems (a) and/or (b). We propose to solve (c).
The goal of this PEP is get distutils-sig out of the business of being a gatekeeper for Python build systems. If you want to use distutils, great; if you want to use something else, then the more the merrier. The difficulty of interfacing with distutils means that there aren't many such systems right now, but to give a sense of what we're thinking about see `flit <https://github.com/takluyver/flit>`_ or `bento <https://cournape.github.io/Bento/>`_. Fortunately, wheels have now solved many of the hard problems here -- e.g. it's no longer necessary that a build system also know about every possible installation configuration -- so pretty much all we really need from a build system is that it have some way to spit out standard-compliant wheels.
We therefore propose a new, relatively minimal interface for installation tools like ``pip`` to interact with package source trees and source distributions.
Synopsis and rationale ======================
To limit the scope of our design, we adopt several principles.
First, we distinguish between a *source tree* (e.g., a VCS checkout) and a *source distribution* (e.g., an official snapshot release like ``lxml-3.4.4.zip``).
There isn't a whole lot that *source trees* can be assumed to have in common. About all you know is that they can -- via some more or less Rube-Goldbergian process -- produce one or more binary distributions. In particular, you *cannot* tell via simple static inspection: - What version number will be attached to the resulting packages (e.g. it might be determined programmatically by consulting VCS metadata -- I have here a build of numpy version "1.11.0.dev0+4a9ad17") - What build- or run-time dependencies are required (e.g. these may depend on arbitrarily complex configuration settings that are determined via a mix of manual settings and auto-probing) - Or even how many distinct binary distributions will be produced (e.g. a source distribution may always produce wheel A, but only produce wheel B when built on Unix-like systems).
Therefore, when dealing with source trees, our goal is just to provide a standard UX for the core operations that are commonly performed on other people's packages; anything fancier and more developer-centric we leave at the discretion of individual package developers. So our source trees just provide some simple hooks to let a tool like ``pip``:
- query for build dependencies - run a build, producing wheels as output - set up the current source tree so that it can be placed on ``sys.path`` in "develop mode"
and that's it. We teach users that the standard way to install a package from a VCS checkout is now ``pip install .`` instead of ``python setup.py install``. (This is already a good idea anyway -- e.g., pip can do reliable uninstall / upgrades.)
Next, we note that pretty much all the operations that you might want to perform on a *source distribution* are also operations that you might want to perform on a source tree, and via the same UX. The only thing you do with source distributions that you don't do with source trees is, well, distribute them. There's all kind of metadata you could imagine including in a source distribution, but each piece of metadata puts an increased burden on source distribution generation tools, and most operations will still have to work without this metadata. So we only include extra metadata in source distributions if it helps solve specific problems that are unique to distribution. If you want wheel-style metadata, get a wheel and look at it -- they're great and getting better.
Therefore, our source distributions are basically just source trees + a mechanism for signing.
Finally: we explicitly do *not* have any concept of "depending on a source distribution". As in other systems like Debian, dependencies are always phrased in terms of binary distributions (wheels), and when a user runs something like ``pip install <package>``, then the long-run plan is that <package> and all its transitive dependencies should be available as wheels in a package index. But this is not yet realistic, so as a transitional / backwards-compatibility measure, we provide a simple mechanism for ``pip install <package>`` to handle cases where <package> is provided only as a source distribution.
Source trees ============
We retroactively declare the legacy source tree format involving ``setup.py`` to be "version 0". We don't try to specify it further; its de facto specification is encoded in the source code of ``distutils``, ``setuptools``, ``pip``, and other tools.
A version 1-or-greater format source tree can be identified by the presence of a file ``_pypackage/_pypackage.cfg``.
If both ``_pypackage/_pypackage.cfg`` and ``setup.py`` are present, then we have a version 1+ source tree, i.e., ``setup.py`` is ignored. This is necessary because we anticipate that version 1+ source trees may want to contain a ``setup.py`` file for backwards compatibility, e.g.::
#!/usr/bin/env python import sys print("Don't call setup.py directly!") print("Use 'pip install .' instead!") print("(You might have to upgrade pip first.)") sys.exit(1)
In the current version of the specification, the one file ``_pypackage/_pypackage.cfg`` is where pretty much all the action is (though see below). The motivation for putting it into a subdirectory is that: - the way of all standards is that cruft accumulates over time, so this way we pre-emptively have a place to put it, - real-world projects often accumulate build system cruft as well, so we might as well provide one obvious place to put it too.
Of course this then creates the possibility of collisions between standard files and user files, and trying to teach arbitrary users not to scatter files around willy-nilly never works, so we adopt the convention that names starting with an underscore are reserved for official use, and non-underscored names are available for idiosyncratic use by individual projects.
The alternative would be to simply place the main configuration file at the top-level, create the subdirectory only when specifically needed (most trees won't need it), and let users worry about finding their own place for their cruft. Not sure which is the best approach. Plus we can have a nice bikeshed about the names in general (FIXME).
_pypackage.cfg --------------
The ``_pypackage.cfg`` file contains various settings. Another good bike-shed topic is which file format to use for storing these (FIXME), but for purposes of this draft I'll write examples using `toml <https://github.com/toml-lang/toml>`_, because you'll instantly be able to understand the semantics, it has similar expressivity to JSON while being more human-friendly (e.g., it supports comments and multi-line strings), it's better-specified than ConfigParser, and it's much simpler than YAML. Rust's package manager uses toml for similar purposes.
Here's an example ``_pypackage/_pypackage.cfg``::
# Version of the "pypackage format" that this file uses. # Optional. If not present then 1 is assumed. # All version changes indicate incompatible changes; backwards # compatible changes are indicated by just having extra stuff in # the file. version = 1
[build] # An inline requirements file. Optional. # (FIXME: I guess this means we need a spec for requirements files?) requirements = """ mybuildtool >= 2.1 special_windows_tool ; sys_platform == "win32" """ # The path to an out-of-line requirements file. Optional. requirements-file = "build-requirements.txt" # A hook that will be called to query build requirements. Optional. requirements-dynamic = "mybuildtool:get_requirements"
# A hook that will be called to build wheels. Required. build-wheels = "mybuildtool:do_build"
# A hook that will be called to do an in-place build (see below). # Optional. build-in-place = "mybuildtool:do_inplace_build"
# The "x" namespace is reserved for third-party extensions. # To use x.foo you should own the name "foo" on pypi. [x.mybuildtool] spam = ["spam", "spam", "spam"]
All paths are relative to the ``_pypackage/`` directory (so e.g. the build.requirements-file value above refers to a file named ``_pypackage/build-requirements.txt``).
A *hook* is a Python object that is looked up using the same rules as traditional setuptools entry_points: a dotted module name, followed by a colon, followed by a dotted name that is looked up within that module. *Running a hook* means: first, find or create a python interpreter which is executing in the current venv, whose working directory is set to the ``_pypackage/`` directory, and which has the ``_pypackage/`` directory on ``sys.path``. Then, inside this interpreter, look up the hook object, and call it, with arguments as specified below.
A build command like ``pip wheel <source tree>`` performs the following steps:
1) Validate the ``_pypackage.cfg`` version number.
2) Create an empty virtualenv / venv, that matches the environment that the installer is targeting (e.g. if you want wheels for CPython 3.4 on 64-bit windows, then you make a CPython 3.4 64-bit windows venv).
3) If the build.requirements key is present, then in this venv run the equivalent of ``pip install -r <a file containing its value>``, using whatever index settings are currently in effect.
4) If the build.requirements-file key is present, then in this venv run the equivalent of ``pip install -r <the named file>``, using whatever index settings are currently in effect.
5) If the build.requirements-dynamic key is present, then in this venv run the hook with no arguments, capture its stdout, and pipe it into ``pip install -r -``, using whatever index settings are currently in effect. If the hook raises an exception, then abort the build with an error.
Note: because these steps are performed in sequence, the build.requirements-dynamic hook is allowed to use packages that are listed in build.requirements or build.requirements-file.
6) In this venv, run the build.build-wheels hook. This should be a Python function which takes one argument.
This argument is an arbitrary dictionary intended to contain user-specified configuration, specified via some install-tool-specific mechanism. The intention is that tools like ``pip`` should provide some way for users to specify key/value settings that will be passed in here, analogous to the legacy ``--install-option`` and ``--global-option`` arguments.
To make it easier for packages to transition from version 0 to version 1 sdists, we suggest that ``pip`` and other tools that have such existing option-setting interfaces SHOULD map them to entries in this dictionary when -- e.g.::
pip --global-option=a --install-option=b --install-option=c
could produce a dict like::
{"--global-option": ["a"], "--install-option": ["b", "c"]}
The hook's return value is a list of pathnames relative to the scratch directory. Each entry names a wheel file created by this build.
Errors are signaled by raising an exception.
When performing an in-place build (e.g. for ``pip install -e .``), then the same steps are followed, except that instead of the build.build-wheels hook, we call the build.build-in-place hook, and instead of returning a list of wheel files, it returns the name of a directory that should be placed onto ``sys.path`` (usually this will be the source tree itself, but may not be, e.g. if a build system wants to enforce a rule where the source is always kept pristine then it could symlink the .py files into a build directory, place the extension modules and dist-info there, and return that). This directory must contain importable versions of the code in the source tree, along with appropriate .dist-info directories.
(FIXME: in-place builds are useful but intrinsically kinda broken -- e.g. extensions / source / metadata can all easily get out of sync -- so while I think this paragraph provides a reasonable hack that preserves current functionality, maybe we should defer specifying them to until after we've thought through the issues more?)
When working with source trees, build tools like ``pip`` are encouraged to cache and re-use virtualenvs for performance.
Other contents of _pypackage/ -----------------------------
_RECORD, _RECORD.jws, _RECORD.p7s: see below.
_x/<pypi name>/: reserved for use by tools (e.g. _x/mybuildtool/build/, _x/pip/venv-cache/cp34-none-linux_x86_64/)
Source distributions ====================
A *source distribution* is a file in a well-known archive format such as zip or tar.gz, which contains a single directory, and this directory is a source tree (in the sense defined in the previous section).
The ``_pypackage/`` directory in a source distribution SHOULD also contain a _RECORD file, as defined in PEP 427, and MAY also contain _RECORD.jws and/or _RECORD.p7s signature files.
For official releases, source distributions SHOULD be named as ``<package>-<version>.<ext>``, and the directory they contain SHOULD be named ``<package>-<version>``, and building this source tree SHOULD produce a wheel named ``<package>-<version>-<compatibility tag>.whl`` (though it may produce other wheels as well).
(FIXME: maybe we should add that if you want your sdist on PyPI then you MUST include a proper _RECORD file and use the proper naming convention?)
Integration tools like ``pip`` SHOULD take advantage of this convention by applying the following heuristic: when seeking a package <package>, if no appropriate wheel can be found, but an sdist named <package>-<version>.<ext> is found, then:
1) build the sdist 2) add the resulting wheels to the package search space 3) retry the original operation
This handles a variety of simple and complex cases -- for example, if we need a package 'foo', and we find foo-1.0.zip which builds foo.whl and bar.whl, and foo.whl depends on bar.whl, then everything will work out. There remain other cases that are not handled, e.g. if we start out searching for bar.whl we will never discover foo-1.0.zip. We take the perspective that this is nonetheless sufficient for a transitional heuristic, and anyone who runs into this problem should just upload wheels already. If this turns out to be inadequate in practice, then it will be addressed by future extensions.
Examples ========
**Example 1:** While we assume that installation tools will have to continue supporting version 0 sdists for the indefinite future, it's a useful check to make sure that our new format can continue to support packages using distutils / setuptools as their build system. We assume that a future version ``pip`` will take its existing knowledge of distutils internals and expose them as the appropriate hooks, and then existing distutils / setuptools packages can be ported forward by using the following ``_pypackage/_pypackage.cfg``::
[build] requirements = """ pip >= whatever wheel """ # Applies monkeypatches, then does 'setup.py dist_info' and # extracts the setup_requires requirements-dynamic = "pip.pypackage_hooks:setup_requirements" # Applies monkeypatches, then does 'setup.py wheel' build-wheels = "pip.pypackage_hooks:build_wheels" # Applies monkeypatches, then does: # setup.py dist_info && setup.py build_ext -i build-in-place = "pip.pypackage_hooks:build_in_place"
This is also useful for any other installation tools that may want to support version 0 sdists without having to implement bug-for-bug compatibility with pip -- if no ``_pypackage/_pypackage.cfg`` is present, they can use this as a default.
**Example 2:** For packages using numpy.distutils. This is identical to the distutils / setuptools example above, except that numpy is moved into the list of static build requirements. Right now, most projects using numpy.distutils don't bother trying to declare this dependency, and instead simply error out if numpy is not already installed. This is because currently the only way to declare a build dependency is via the ``setup_requires`` argument to the ``setup`` function, and in this case the ``setup`` function is ``numpy.distutils.setup``, which... obviously doesn't work very well. Drop this ``_pypackage.cfg`` into an existing project like this and it will become robustly pip-installable with no further changes::
[build] requirements = """ numpy pip >= whatever wheel """ requirements-dynamic = "pip.pypackage_hooks:setup_requirements" build-wheels = "pip.pypackage_hooks:build_wheels" build-in-place = "pip.pypackage_hooks:build_in_place"
**Example 3:** `flit <https://github.com/takluyver/flit>`_ is a tool designed to make distributing simple packages simple, but it currently has no support for sdists, and for convenience includes its own installation code that's redundant with that in pip. These 4 lines of boilerplate make any flit-using source tree pip-installable, and lets flit get out of the package installation business::
[build] requirements = "flit" build-wheels = "flit.pypackage_hooks:build_wheels" build-in-place = "flit.pypackage_hooks:build_in_place"
FAQ ===
**Why is it version 1 instead of version 2?** Because the legacy sdist format is barely a format at all, and to `remind us to keep things simple < https://en.wikipedia.org/wiki/The_Mythical_Man-Month#The_second-system_effec...
`_.
**What about cross-compilation?** Standardizing an interface for cross-compilation seems premature given how complicated the configuration required can be, the lack of an existing de facto standard, and the authors of this PEP's inexperience with cross-compilation. This would be a great target for future extensions, though. In the mean time, there's no requirement that ``_pypackage/_pypackage.cfg`` contain the *only* entry points to a project's build system -- packages that want to support cross-compilation can still do so, they'll just need to include a README explaining how to do it.
**PEP 426 says that the new sdist format will support automatically creating policy-compliant .deb/.rpm packages. What happened to that?** Step 1: enhance the wheel format as necessary so that a wheel can be automatically converted into a policy-compliant .deb/.rpm package (see PEP 491). Step 2: make it possible to automatically turn sdists into wheels (this PEP). Step 3: we're done.
**What about automatically running tests?** Arguably this is another thing that should be pushed off to wheel metadata instead of sdist metadata: it's good practice to include tests inside your built distribution so that end-users can test their install (and see above re: our focus here being on stuff that end-users want to do, not dedicated package developers), there are lots of packages that have to be built before they can be tested anyway (e.g. because of binary extensions), and in any case it's good practice to test against an installed version in order to make sure your install code works properly. But even if we do want this in sdist, then it's hardly urgent (e.g. there is no ``pip test`` that people will miss), so we defer that for a future extension to avoid blocking the core functionality.
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On October 2, 2015 at 12:54:03 AM, Nathaniel Smith (njs@pobox.com) wrote:
We realized that actually as far as we could tell, it wouldn't be that hard at this point to clean up how sdists work so that it would be possible to migrate away from distutils. So we wrote up a little draft proposal.
The main question is, does this approach seem sound?
I've just read over your proposal, but I've also just woken up so I might be a little slow still! After reading what you have, I don't think that this proposal is the right way to go about improving sdists. The first thing that immediately stood out to me, is that it's recommending that downstream redistributors like Debian, Fedora, etc utilize Wheels instead of the sdist to build their packages from. However, that is not really going to fly with most (all?) of the downstream redistributors. Debian for instance has policy that requires the use of building all of it's packages from Source, not from anything else and Wheels are not a source package. While it can theoretically work for pure python packages, it quickly devolves into a mess when you factor in packages that have any C code what so ever. Overall, this feels more like a sidegrade than an upgrade. One major theme throughout of the PEP is that we're going to push to rely heavily on wheels as the primary format of installation. While that works well for things like Debian, I don't think it's going to work as wheel for us. If we were only distributing pure python packages, then yes absolutely, however given that we are not, we have to worry about ABI issues. Given that there is so many different environments that a particular package might be installed into, all with different ABIs we have to assume that installing from source is still going to be a primary path for end users to install and that we are never going to have a world where we can assume a Wheel in a repository. One of the problems with the current system, is that we have no mechanism by which to determine dependencies of a source distribution without downloading the file and executing some potentially untrusted code. This makes dependency resolution harder and much much slower than if we could read that information statically from a source distribution. This PEP doesn't offer anything in the way of solving this problem. To a similar tune, this PEP also doesn't make it possible to really get at any other metadata without executing software. This makes it pratically impossible to safely inspect an unknown or untrusted package to determine what it is and to get information about it. Right now PyPI relies on the uploading tool to send that information alongside of the file it is uploading, but honestly what it should be doing is extracting that information from within the file. This is sort of possible right now since distutils and setuptools both create a static metadata file within the source distribution, but we don't rely on that within PyPI because that information may or may not be accurate and may or may not exist. However the twine uploading tool *does* rely on that, and this PEP would break the ability for twine to upload a package without executing arbitrary code. Overall, I don't think that this really solves most of the foundational problems with the current format. Largely it feels that what it achieves is shuffling around some logic (you need to create a hook that you reference from within a .cfg file instead of creating a setuptools extension or so) but without fixing most of the problems. The largest benefit I see to switching to this right now is that it would enable us to have build time dependencies that were controlled by pip rather than installed implicitly via the execution of the setup.py. That doesn't feel like a big enough benefit to me to do a mass shakeup of what we recommend and tell people to do. Having people adjust and change and do something new requires effort, and we need something to justify that effort to other people and I don't think that this PEP has something we can really use to justify that effort. I *do* think that there is a core of some ideas here that are valuable, and in fact are similar to some ideas I've had. The main flaw I see here is that it doesn't really fix sdists, it takes a solution that would work for VCS checkouts and then reuses it for sdists. In my mind, the supported flow for package installation would be: VCS/Bare Directory -> Source Distribution -> Wheel This would (eventually) be the only path that was supported for installation but you could "enter" the path at any stage. For example, if there is a Wheel already available, then you jump right on at the end and just install that, if there is a sdist available then pip first builds it into a wheel and then installs that, etc. I think your PEP is something like what the VCS/Bare Directory to sdist tooling could look like, but I don't think it's what the sdist to wheel path should look like. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Fri, Oct 2, 2015 at 12:58 PM, Donald Stufft <donald@stufft.io> wrote:
On October 2, 2015 at 12:54:03 AM, Nathaniel Smith (njs@pobox.com) wrote:
We realized that actually as far as we could tell, it wouldn't be that hard at this point to clean up how sdists work so that it would be possible to migrate away from distutils. So we wrote up a little draft proposal.
The main question is, does this approach seem sound?
I've just read over your proposal, but I've also just woken up so I might be a little slow still! After reading what you have, I don't think that this proposal is the right way to go about improving sdists.
The first thing that immediately stood out to me, is that it's recommending that downstream redistributors like Debian, Fedora, etc utilize Wheels instead of the sdist to build their packages from. However, that is not really going to fly with most (all?) of the downstream redistributors. Debian for instance has policy that requires the use of building all of it's packages from Source, not from anything else and Wheels are not a source package. While it can theoretically work for pure python packages, it quickly devolves into a mess when you factor in packages that have any C code what so ever.
Overall, this feels more like a sidegrade than an upgrade. One major theme throughout of the PEP is that we're going to push to rely heavily on wheels as the primary format of installation. While that works well for things like Debian, I don't think it's going to work as wheel for us. If we were only distributing pure python packages, then yes absolutely, however given that we are not, we have to worry about ABI issues. Given that there is so many different environments that a particular package might be installed into, all with different ABIs we have to assume that installing from source is still going to be a primary path for end users to install and that we are never going to have a world where we can assume a Wheel in a repository.
One of the problems with the current system, is that we have no mechanism by which to determine dependencies of a source distribution without downloading the file and executing some potentially untrusted code. This makes dependency resolution harder and much much slower than if we could read that information statically from a source distribution. This PEP doesn't offer anything in the way of solving this problem.
To a similar tune, this PEP also doesn't make it possible to really get at any other metadata without executing software. This makes it pratically impossible to safely inspect an unknown or untrusted package to determine what it is and to get information about it. Right now PyPI relies on the uploading tool to send that information alongside of the file it is uploading, but honestly what it should be doing is extracting that information from within the file. This is sort of possible right now since distutils and setuptools both create a static metadata file within the source distribution, but we don't rely on that within PyPI because that information may or may not be accurate and may or may not exist. However the twine uploading tool *does* rely on that, and this PEP would break the ability for twine to upload a package without executing arbitrary code.
Overall, I don't think that this really solves most of the foundational problems with the current format. Largely it feels that what it achieves is shuffling around some logic (you need to create a hook that you reference from within a .cfg file instead of creating a setuptools extension or so) but without fixing most of the problems. The largest benefit I see to switching to this right now is that it would enable us to have build time dependencies that were controlled by pip rather than installed implicitly via the execution of the setup.py. That doesn't feel like a big enough benefit to me to do a mass shakeup of what we recommend and tell people to do. Having people adjust and change and do something new requires effort, and we need something to justify that effort to other people and I don't think that this PEP has something we can really use to justify that effort.
I *do* think that there is a core of some ideas here that are valuable, and in fact are similar to some ideas I've had. The main flaw I see here is that it doesn't really fix sdists, it takes a solution that would work for VCS checkouts and then reuses it for sdists. In my mind, the supported flow for package installation would be:
VCS/Bare Directory -> Source Distribution -> Wheel
This would (eventually) be the only path that was supported for installation but you could "enter" the path at any stage. For example, if there is a Wheel already available, then you jump right on at the end and just install that, if there is a sdist available then pip first builds it into a wheel and then installs that, etc.
I think your PEP is something like what the VCS/Bare Directory to sdist tooling could look like, but I don't think it's what the sdist to wheel path should look like.
A major feature of the proposal is to allow alternative build/packaging tools. If that proposal is not acceptable in its current form, how would you envision interoperability between pip and new systems. For example, is it realistic to encode which commands and options a setup.py would need to support to be pip-installable (without the setup.py using distutils) ? David

On October 2, 2015 at 8:38:45 AM, David Cournapeau (cournape@gmail.com) wrote:
A major feature of the proposal is to allow alternative build/packaging tools.
If that proposal is not acceptable in its current form, how would you envision interoperability between pip and new systems. For example, is it realistic to encode which commands and options a setup.py would need to support to be pip-installable (without the setup.py using distutils) ?
I think it depends if you mean in the short term or the long term and what exactly you envision as the scope of an alternative build/packaging tool. In the short term we have two real options that I can think of off the top of my head. One is as you mentioned, define the interface to setup.py that any build tool is expected to follow. However I’m not sure if that’s a great idea because you end up having a bootstrapping problem since pip won’t know that it needs X available to execute a particular setup.py (a common problem with numpy.distutils, as I know you’re aware!). We could do a minimal extension and add another defacto-ish standard of allowing pip and setuptools to process additional setup_requires like arguments from a setup.cfg to solve that problem though. The flip side to this is that since it involves new capabilities in pip/setuptools/any other installer is that it you’ll have several years until you can depend on setup.cfg based setup_requires from being able to be depended on. Another short term option is to simply say that using something that isn’t distutils/setuptools isn’t supported, but that if you want to do something else, you should rely on setuptools ability to be extended. This means you’d need a very minimal ``setup.py`` [1], but that you could then do pretty much anything you wanted, since setuptools lets you override a good bit of it’s own logic, pbr uses this to have completely static metadata [2]. This option would just work and require no changes in pip or setuptools so then you could start depending on it right away and it would just work, the downside of course is that you’re tied to extending setuptools and the APIs it provides rather than being able to more freely do what you want, but the ability to do that in setuptools is pretty extensive. Longer term, I think the answer is sdist 2.0 which has proper metadata inside of it (name, version, dependencies, etc) but which also includes a hook like this PEP has to specify the build system that should be used to build a wheel out of this source distribution. [1] Example for pbr: https://github.com/testing-cabal/mock/blob/master/setup.py [2] https://github.com/testing-cabal/mock/blob/master/setup.cfg ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 2 October 2015 at 12:58, Donald Stufft <donald@stufft.io> wrote:
The first thing that immediately stood out to me, is that it's recommending that downstream redistributors like Debian, Fedora, etc utilize Wheels instead of the sdist to build their packages from. However, that is not really going to fly with most (all?) of the downstream redistributors.
I can't now find that it in the draft text, but I don't think this is a problem. This proposal means there's a standardised way to turn a source tree into wheels. So downstream distributors can download an sdist - or even a tarball of a VCS tag, if they're being strict about it - build wheels from that using the config in this proposal, and then transform the wheels into their own package format.
Longer term, I think the answer is sdist 2.0 which has proper metadata inside of it (name, version, dependencies, etc) but which also includes a hook like this PEP has to specify the build system
I hadn't heard of this before - is it something that's being worked on? Thanks, Thomas

So downstream distributors can download an sdist - or even a tarball of a VCS tag, if they're being strict about it - build wheels from that using the config in this proposal, and then transform the wheels into their own package format.
this has wheel itself being the interoperability standard. the going idea is that there would be a metadata artifact ("pydist.json" in PEP426), that provides the standard across different formats.
Longer term, I think the answer is sdist 2.0 which has proper metadata inside of it (name, version, dependencies, etc) but which also includes a hook like this PEP has to specify the build system
I hadn't heard of this before - is it something that's being worked on?
I haven't seen any PEP's for it yet.

On Fri, 2 Oct 2015 at 05:08 Donald Stufft <donald@stufft.io> wrote:
On October 2, 2015 at 12:54:03 AM, Nathaniel Smith (njs@pobox.com) wrote:
We realized that actually as far as we could tell, it wouldn't be that hard at this point to clean up how sdists work so that it would be possible to migrate away from distutils. So we wrote up a little draft proposal.
The main question is, does this approach seem sound?
I've just read over your proposal, but I've also just woken up so I might be a little slow still! After reading what you have, I don't think that this proposal is the right way to go about improving sdists.
The first thing that immediately stood out to me, is that it's recommending that downstream redistributors like Debian, Fedora, etc utilize Wheels instead of the sdist to build their packages from. However, that is not really going to fly with most (all?) of the downstream redistributors. Debian for instance has policy that requires the use of building all of it's packages from Source, not from anything else and Wheels are not a source package. While it can theoretically work for pure python packages, it quickly devolves into a mess when you factor in packages that have any C code what so ever.
So wouldn't they then download the sdist, build a wheel as an intermediate, and then generate the .deb file? I mean as long as people upload an sdist for those that want to build from source and a wheel for convenience -- which is probably what most people providing wheels do anyway -- then I don't see the problem.
Overall, this feels more like a sidegrade than an upgrade. One major theme throughout of the PEP is that we're going to push to rely heavily on wheels as the primary format of installation. While that works well for things like Debian, I don't think it's going to work as wheel for us. If we were only distributing pure python packages, then yes absolutely, however given that we are not, we have to worry about ABI issues. Given that there is so many different environments that a particular package might be installed into, all with different ABIs we have to assume that installing from source is still going to be a primary path for end users to install and that we are never going to have a world where we can assume a Wheel in a repository.
One of the problems with the current system, is that we have no mechanism by which to determine dependencies of a source distribution without downloading the file and executing some potentially untrusted code. This makes dependency resolution harder and much much slower than if we could read that information statically from a source distribution. This PEP doesn't offer anything in the way of solving this problem.
Isn't that what the requirements and requirements-file fields in the _pypackage file provide? Only if you use that requirements-dynamic would it require execcuting arbitrary code to gather dependency information, or am I missing something?
To a similar tune, this PEP also doesn't make it possible to really get at any other metadata without executing software. This makes it pratically impossible to safely inspect an unknown or untrusted package to determine what it is and to get information about it. Right now PyPI relies on the uploading tool to send that information alongside of the file it is uploading, but honestly what it should be doing is extracting that information from within the file. This is sort of possible right now since distutils and setuptools both create a static metadata file within the source distribution, but we don't rely on that within PyPI because that information may or may not be accurate and may or may not exist. However the twine uploading tool *does* rely on that, and this PEP would break the ability for twine to upload a package without executing arbitrary code.
Isn't that only if you use the dynamic fields?
Overall, I don't think that this really solves most of the foundational problems with the current format. Largely it feels that what it achieves is shuffling around some logic (you need to create a hook that you reference from within a .cfg file instead of creating a setuptools extension or so) but without fixing most of the problems. The largest benefit I see to switching to this right now is that it would enable us to have build time dependencies that were controlled by pip rather than installed implicitly via the execution of the setup.py. That doesn't feel like a big enough benefit to me to do a mass shakeup of what we recommend and tell people to do. Having people adjust and change and do something new requires effort, and we need something to justify that effort to other people and I don't think that this PEP has something we can really use to justify that effort.
From my naive perspective this proposal seems to help push forward a decoupling of building using distutils/setuptools as the only way you can
properly build Python projects (which is what I think we are all after) and will hopefully eventually free pip up to simply do orchestration.
I *do* think that there is a core of some ideas here that are valuable, and in fact are similar to some ideas I've had. The main flaw I see here is that it doesn't really fix sdists, it takes a solution that would work for VCS checkouts and then reuses it for sdists. In my mind, the supported flow for package installation would be:
VCS/Bare Directory -> Source Distribution -> Wheel
This would (eventually) be the only path that was supported for installation but you could "enter" the path at any stage. For example, if there is a Wheel already available, then you jump right on at the end and just install that, if there is a sdist available then pip first builds it into a wheel and then installs that, etc.
I think your PEP is something like what the VCS/Bare Directory to sdist tooling could look like, but I don't think it's what the sdist to wheel path should look like.
Is there another proposal I'm unaware for the sdist -> wheel step that is build tool-agnostic? I'm all for going with the best solution but there has to be an actual alternative to compare against and I don't know of any others right now and this proposal does seem to move things forward in a reasonable fashion.

The first thing that immediately stood out to me, is that it's recommending that downstream redistributors like Debian, Fedora, etc utilize Wheels instead of the sdist to build their packages from. However, that is not really going to fly with most (all?) of the downstream redistributors. Debian for instance has policy that requires the use of building all of it's packages from Source, not from anything else and Wheels are not a source package. While it can theoretically work for pure python packages, it quickly devolves into a mess when you factor in packages that have any C code what so ever.
So wouldn't they then download the sdist, build a wheel as an intermediate, and then generate the .deb file?
the new goal I think was to have standardized metadata immediately available in an sdist, and get away from the model, that you had to run a build step, before you had a metadata artifact. so here, you'd have to build a wheel (potentially one with binary extensions) just to know what the metadata is? that doesn't sound right.
I mean as long as people upload an sdist for those that want to build from source and a wheel for convenience -- which is probably what most people providing wheels do anyway -- then I don't see the problem.
Overall, this feels more like a sidegrade than an upgrade. One major theme throughout of the PEP is that we're going to push to rely heavily on wheels as the primary format of installation. While that works well for things like Debian, I don't think it's going to work as wheel for us. If we were only distributing pure python packages, then yes absolutely, however given that we are not, we have to worry about ABI issues. Given that there is so many different environments that a particular package might be installed into, all with different ABIs we have to assume that installing from source is still going to be a primary path for end users to install and that we are never going to have a world where we can assume a Wheel in a repository.
One of the problems with the current system, is that we have no mechanism by which to determine dependencies of a source distribution without downloading the file and executing some potentially untrusted code. This makes dependency resolution harder and much much slower than if we could read that information statically from a source distribution. This PEP doesn't offer anything in the way of solving this problem.
Isn't that what the requirements and requirements-file fields in the _pypackage file provide? Only if you use that requirements-dynamic would it require execcuting arbitrary code to gather dependency information, or am I missing something?
those are just requirements to run the build hook, not run time dependencies.
Is there another proposal I'm unaware for the sdist -> wheel step that is build tool-agnostic?
PEP426 talks about it some https://www.python.org/dev/peps/pep-0426/#metabuild-system

On 2 October 2015 at 20:02, Marcus Smith <qwcode@gmail.com> wrote:
So wouldn't they then download the sdist, build a wheel as an intermediate, and then generate the .deb file?
the new goal I think was to have standardized metadata immediately available in an sdist, and get away from the model, that you had to run a build step, before you had a metadata artifact. so here, you'd have to build a wheel (potentially one with binary extensions) just to know what the metadata is? that doesn't sound right.
I'm uncomfortable with the fact that the proposed sdist format has more or less no metadata of its own (even the filename format is only a recommendation) so (for example) if someone does "pip install foo==1.0" I don't see how pip can find a suitable sdist, if no wheel is available. I would rather see an sdist format that can be introspected *without* running code or a build tool. Installers and packaging tools like pip need to be able to do that - one of the biggest issues with pip's current sdist handling is that it can't make any meaningful decisions before building at least the egg-info. Ultimately the question for me is at what point do we require packaging tools like pip (and ad-hoc distribution analysis scripts - I write a lot of those!) to run code from the package in order to continue? I'd like to be able to know, at a minimum, the package name and version, as those are needed to make decisions on whether there is a conflict with an already installed version. Basically, why aren't you using a PEP 426-compatible metadata format in the sdist (there's no reason not to, you just have to mandate that tools used to build sdists generate that form of metadata file)? You could also say that source trees SHOULD store metadata in the _pypackage directory (in an appropriate defined format, maybe one more suited to human editing than JSON) and tools that work on source trees (build tools, things that create sdists) SHOULD use that data as the definitive source of metadata, rather than using their own configuration. I don't see a problem with allowing source trees to have some flexibility, but sdists are tool-generated and so could easily be required to contain static metadata in a standard format.
Is there another proposal I'm unaware for the sdist -> wheel step that is build tool-agnostic?
PEP426 talks about it some https://www.python.org/dev/peps/pep-0426/#metabuild-system
While the metabuild system is a good long-term goal, I'll take something that's being developed now over a great idea that no-one has time to work on... Wheel came about because Daniel just got on with it. Having said that, I very strongly prefer a sdist proposal that's compatible with PEP 426 (at least to the extent that tools like wheel already support it). Throwing away all of the work already done on PEP 426 doesn't seem like a good plan. Paul

On Fri, Oct 2, 2015 at 1:03 PM, Paul Moore <p.f.moore@gmail.com> wrote:
On 2 October 2015 at 20:02, Marcus Smith <qwcode@gmail.com> wrote:
So wouldn't they then download the sdist, build a wheel as an intermediate, and then generate the .deb file?
the new goal I think was to have standardized metadata immediately available in an sdist, and get away from the model, that you had to run a build step, before you had a metadata artifact. so here, you'd have to build a wheel (potentially one with binary extensions) just to know what the metadata is? that doesn't sound right.
I'm uncomfortable with the fact that the proposed sdist format has more or less no metadata of its own (even the filename format is only a recommendation) so (for example) if someone does "pip install foo==1.0" I don't see how pip can find a suitable sdist, if no wheel is available.
About the filename thing: The reason that the draft makes the inclusion of package/version info a SHOULD instead of a MUST is that regardless of what the spec says, all decent installation tools are going to support doing things like curl https://github.com/numpy/numpy/archive/master.zip -O numpy-master.zip pip install numpy-master.zip So we can either handle that by saying that "numpy-master.zip" is an sdist, just not one that we would allow on PyPI (which is what the current draft does), or we could handle it by saying that numpy-master.zip is almost-but-not-quite an sdist, and handling it is a commonly supported extension to the standard. Doesn't really matter that much either way -- just a matter of terminology. Either way the sdists are PyPI are obviously going to be named <package>-<version>.<ext>. For sdists that do have a name/version: it's not really crucial to the proposal that name/version are only in the filename -- they could be repeated inside the file as well. Given that the version number in particular is something that usually would need to be computed at sdist-build-time (from a __version__.py or whatever -- it's very common that the source-of-truth for version numbers is not static), then leaving it out of the static metadata is nice because it makes sdist-building-code much simpler -- 90% of the time you could just keep the static metadata file instead of having to rewrite it for each sdist. But that's just an engineering trade-off, it's not crucial to the concept.
I would rather see an sdist format that can be introspected *without* running code or a build tool. Installers and packaging tools like pip need to be able to do that - one of the biggest issues with pip's current sdist handling is that it can't make any meaningful decisions before building at least the egg-info.
Another way to look at this is to say that pip's current handling is proof that the build-to-get-metadata strategy is viable :-). It would indeed be nice if this weren't necessary, but the python packaging ecosystem has a long history of trying to make simplifying assumptions that turn out to bite us later... I think this is one of those. Note that for making installation decisions, name + version aren't enough: you also need full dependency information. And dependency information is definitely not fixed at sdist-creation-time.
Ultimately the question for me is at what point do we require packaging tools like pip (and ad-hoc distribution analysis scripts - I write a lot of those!) to run code from the package in order to continue? I'd like to be able to know, at a minimum, the package name and version, as those are needed to make decisions on whether there is a conflict with an already installed version.
Basically, why aren't you using a PEP 426-compatible metadata format in the sdist (there's no reason not to, you just have to mandate that tools used to build sdists generate that form of metadata file)? You could also say that source trees SHOULD store metadata in the _pypackage directory (in an appropriate defined format, maybe one more suited to human editing than JSON) and tools that work on source trees (build tools, things that create sdists) SHOULD use that data as the definitive source of metadata, rather than using their own configuration. I don't see a problem with allowing source trees to have some flexibility, but sdists are tool-generated and so could easily be required to contain static metadata in a standard format.
Is there another proposal I'm unaware for the sdist -> wheel step that is build tool-agnostic?
PEP426 talks about it some https://www.python.org/dev/peps/pep-0426/#metabuild-system
While the metabuild system is a good long-term goal, I'll take something that's being developed now over a great idea that no-one has time to work on... Wheel came about because Daniel just got on with it.
Having said that, I very strongly prefer a sdist proposal that's compatible with PEP 426 (at least to the extent that tools like wheel already support it). Throwing away all of the work already done on PEP 426 doesn't seem like a good plan.
Nothing is being thrown away -- the proposal is just that sdists and wheels are different things, so we should think of PEP 426 as wheel metadata, rather than all metadata. -n -- Nathaniel J. Smith -- http://vorpus.org

On Fri, Oct 2, 2015 at 2:45 PM, Nathaniel Smith <njs@pobox.com> wrote:
I would rather see an sdist format that can be introspected *without* running code or a build tool.
indeed -- this has come up a lot on this list, for binary dists, too, of course. but "build script as turing complete" [someone posted that in this thread...] requirement is there, too... but maybe we can get a long way with convention, without changing the tools. A setup.py is typically a bunc of stuff that builds up the setup, and then a call to setup.py, passing in teh objects created to various paramters. But if we tried to establish a convention that your setup.py would do: a_bunch_of_stuff_that_bulit_up_a_dict_of_options then setup(**setup_options_dict) then the_options_dict could be introspected without actually createing a seteup object. and in the easy cases, the_options_dict could be completely declarative, and maybe even stored in another file. and in the complex cases, it could still have all the parts that could be pre-declared declarative, so setup.py would be: setup_options_dict = load_options_dict("setup_options.txt") [do assorted complex stuff to edit/add to setup_options_dict] setup(**setup_options) would this move us toward an as-declarative-as-possible setup.py? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 2 October 2015 at 22:45, Nathaniel Smith <njs@pobox.com> wrote:
I'm uncomfortable with the fact that the proposed sdist format has more or less no metadata of its own (even the filename format is only a recommendation) so (for example) if someone does "pip install foo==1.0" I don't see how pip can find a suitable sdist, if no wheel is available.
About the filename thing:
The reason that the draft makes the inclusion of package/version info a SHOULD instead of a MUST is that regardless of what the spec says, all decent installation tools are going to support doing things like
curl https://github.com/numpy/numpy/archive/master.zip -O numpy-master.zip pip install numpy-master.zip
So we can either handle that by saying that "numpy-master.zip" is an sdist, just not one that we would allow on PyPI (which is what the current draft does), or we could handle it by saying that numpy-master.zip is almost-but-not-quite an sdist, and handling it is a commonly supported extension to the standard. Doesn't really matter that much either way -- just a matter of terminology. Either way the sdists are PyPI are obviously going to be named <package>-<version>.<ext>.
OK, that's a good point, and I never felt it was crucial that the name/version be encoded in the filename. But having them in some form of static metadata should be mandatory. Your _pypackage.cfg doesn't contain the package name or version, so how would I get them without running code? That's my real point.
Given that the version number in particular is something that usually would need to be computed at sdist-build-time (from a __version__.py or whatever -- it's very common that the source-of-truth for version numbers is not static), then leaving it out of the static metadata is nice because it makes sdist-building-code much simpler -- 90% of the time you could just keep the static metadata file instead of having to rewrite it for each sdist. But that's just an engineering trade-off, it's not crucial to the concept.
I'm willing to allow for it being non-static in the source tree, but not in the sdist.
I would rather see an sdist format that can be introspected *without* running code or a build tool. Installers and packaging tools like pip need to be able to do that - one of the biggest issues with pip's current sdist handling is that it can't make any meaningful decisions before building at least the egg-info.
Another way to look at this is to say that pip's current handling is proof that the build-to-get-metadata strategy is viable :-).
Not if you look at the bug reports for pip that can be traced back to needing to run setup.py egg-info to get metadata, or other variations on not having static introspectable metadata in sdists. Paul

On Fri, Oct 2, 2015 at 3:26 PM, Paul Moore <p.f.moore@gmail.com> wrote:
On 2 October 2015 at 22:45, Nathaniel Smith <njs@pobox.com> wrote:
I'm uncomfortable with the fact that the proposed sdist format has more or less no metadata of its own (even the filename format is only a recommendation) so (for example) if someone does "pip install foo==1.0" I don't see how pip can find a suitable sdist, if no wheel is available.
About the filename thing:
The reason that the draft makes the inclusion of package/version info a SHOULD instead of a MUST is that regardless of what the spec says, all decent installation tools are going to support doing things like
curl https://github.com/numpy/numpy/archive/master.zip -O numpy-master.zip pip install numpy-master.zip
So we can either handle that by saying that "numpy-master.zip" is an sdist, just not one that we would allow on PyPI (which is what the current draft does), or we could handle it by saying that numpy-master.zip is almost-but-not-quite an sdist, and handling it is a commonly supported extension to the standard. Doesn't really matter that much either way -- just a matter of terminology. Either way the sdists are PyPI are obviously going to be named <package>-<version>.<ext>.
OK, that's a good point, and I never felt it was crucial that the name/version be encoded in the filename. But having them in some form of static metadata should be mandatory. Your _pypackage.cfg doesn't contain the package name or version, so how would I get them without running code? That's my real point.
Well, first, it's just not possible for a devel snapshot like numpy-master.zip or a VCS checkout to contain static version metadata, since the actual version of the generated wheels *will* be determined by running arbitrary code (e.g. 'git rev-parse HEAD'). So we're only talking about tagged/released source trees. Then the argument would be, what are you going to do with that name/version information? If the answer is "decide to install it", then (a) if you want to support installation from VCS snapshots (and you do) then your tool already has to support running arbitrary code to get the version number, and (b) installing the package will also certainly require running arbitrary code even if you have a nice official-release sdist, so. OTOH the twine upload case that Donald mentioned is a good example of an operation that might actually want some metadata from release sdists specifically :-). I'm not opposed to adding it if there's a clear use case, I just don't think we should try to shove every piece of wheel metadata into the sdist without a clear understanding of how they make sense and solve a problem *for sdists*. [...]
I would rather see an sdist format that can be introspected *without* running code or a build tool. Installers and packaging tools like pip need to be able to do that - one of the biggest issues with pip's current sdist handling is that it can't make any meaningful decisions before building at least the egg-info.
Another way to look at this is to say that pip's current handling is proof that the build-to-get-metadata strategy is viable :-).
Not if you look at the bug reports for pip that can be traced back to needing to run setup.py egg-info to get metadata, or other variations on not having static introspectable metadata in sdists.
That sounds interesting! Can you elaborate? Links? I know that one unpleasant aspect of the current design is that the split between egg-info and actual building creates the possibility for time-of-definition-to-time-of-use bugs, where the final wheel hopefully matches what egg-info said it would, but in practice there could be skew. (Of course this is true in any system which represents this information in more than one place -- e.g. in sdist metadata and also in wheel metadata -- but right now it's particularly bad in cases where you can't actually get all the arguments you want to pass to setup() without running some code, but the code you need to run needs to be fetched via setup_requires=..., so you have to lie to setup() during the egg-info operation and hope that everything will work out in the end.) This is the motivation for the draft PEP to dropping egg-info as a separate operation. But there are certainly people who know more about the internal details of what pip needs than I do, and I'd love to hear more. -n -- Nathaniel J. Smith -- http://vorpus.org

On October 2, 2015 at 2:41:20 PM, Brett Cannon (brett@python.org) wrote:
Is there another proposal I'm unaware for the sdist -> wheel step that is build tool-agnostic? I'm all for going with the best solution but there has to be an actual alternative to compare against and I don't know of any others right now and this proposal does seem to move things forward in a reasonable fashion.
As Marcus said, it was touched on in PEP 426 but there isn’t a fully fleshed out PEP (or even a sort of fleshed out PEP) but the high level idea isn’t super hard, take PEP 426 metadata + some sdist specific metadata and stick it in a tarball or something and then you have sdist 2.0. Part of that sdist specific metadata would be describing in some way how to build a wheel out of this sdist (rather than just assuming distutils/setuptools). I think we need to push back against partial solutions to problems that we have without at least some semblance of a plan for how to solve the rest of the problems we have, or to explicitly declare them unsolvable. That doesn't mean that we need to turn every thing into an overengineered mess where we try to solve every problem ever, but we need to be careful that we minimize churn where possible. An example just recently where we had this was we accepted PEP 438 2 years ago as a partial solution to try and "move things forward" and then it turned out that solution was, well partial, and when we removed the partial solution for a full solution we had the folks who decided to rely on the partial solution upset. IOW we need to balance between avoiding churn and making a change "worth" it, and in my opinion this idea here doesn't solve enough problems to make it worth it. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 2 October 2015 at 20:45, Donald Stufft <donald@stufft.io> wrote:
n October 2, 2015 at 2:41:20 PM, Brett Cannon (brett@python.org) wrote:
Is there another proposal I'm unaware for the sdist -> wheel step that is build tool-agnostic? I'm all for going with the best solution but there has to be an actual alternative to compare against and I don't know of any others right now and this proposal does seem to move things forward in a reasonable fashion.
As Marcus said, it was touched on in PEP 426 but there isn’t a fully fleshed out PEP (or even a sort of fleshed out PEP) but the high level idea isn’t super hard, take PEP 426 metadata + some sdist specific metadata and stick it in a tarball or something and then you have sdist 2.0. Part of that sdist specific metadata would be describing in some way how to build a wheel out of this sdist (rather than just assuming distutils/setuptools).
I think we need to push back against partial solutions to problems that we have without at least some semblance of a plan for how to solve the rest of the problems we have, or to explicitly declare them unsolvable. That doesn't mean that we need to turn every thing into an overengineered mess where we try to solve every problem ever, but we need to be careful that we minimize churn where possible. An example just recently where we had this was we accepted PEP 438 2 years ago as a partial solution to try and "move things forward" and then it turned out that solution was, well partial, and when we removed the partial solution for a full solution we had the folks who decided to rely on the partial solution upset.
IOW we need to balance between avoiding churn and making a change "worth" it, and in my opinion this idea here doesn't solve enough problems to make it worth it.
That's a fair point. But I don't see any reason Nathaniel's proposal *couldn't* be that solution. I'd want to see the sdist format required to include static metadata, and the metadata format to be PEP 426, but neither of those seem incompatible with the ideas behind the proposal. Maybe I'm missing something massive, but I don't see a *huge* gap between this proposal and the basic ideas behind the metabuild concept - the sdist should define how to build it (in terms of required packages/tools for the build, and commands/hooks to call to do the specific build steps - make a wheel, do an in-place install). The biggest sticking point would be if Nathaniel is adamantly opposed to static metadata. Is there a good enumeration anywhere of the problems sdist 2.0 / the metabuild system needs to solve? I can think of: 1. Metadata needs to be discoverable without running code. 2. Packages should be able to specify the tools needed to build them. 3. Installers should only interact with sdists through well-defined entry points. 4. Build processes required are (a) create a wheel, (b) do a "develop" style in-place installation. 5. (Maybe) There needs to be better means of handling build errors. The proposal here seems to at least move towards better solutions for 2, 3, and 4. Paul

I'm speaking to the proposal as currently written. It's not completely off base for what I think a solution could be. I think part of the problem though is we don't have all the building blocks figured out and standardized yet. PEP426 has stalled (I plan to pick it up once Warehouse is deployed but someone else could do that) and we should probably get the environment markers sorted out because they are going to be even more important for a static sdist. I think that the current proposal conflates a bcs checkout with a sdist too much. As Paul said, sdists are generated and are not generally for human consumption or creation. We should strictly define what it looks like l, but have pluggable build systems. I don't think we need anything more complex than the ability for a sdist to say that it gets built using X hook. Give that hook a standard API and then any tool can be a first class build tool. Sent from my iPhone
On Oct 2, 2015, at 4:14 PM, Paul Moore <p.f.moore@gmail.com> wrote:
That's a fair point. But I don't see any reason Nathaniel's proposal *couldn't* be that solution. I'd want to see the sdist format required to include static metadata, and the metadata format to be PEP 426, but neither of those seem incompatible with the ideas behind the proposal.
Maybe I'm missing something massive, but I don't see a *huge* gap between this proposal and the basic ideas behind the metabuild concept

The MEBS idea is inspired by heroku buildpacks where you just ask a list of tools whether they can build something. https://devcenter.heroku.com/articles/buildpacks . The idea would be that pip would use MEBS instead of its setup.py-focused builder. The first available MEBS plugin would notice setup.py and do what pip does now (force setuptools, build in a subprocess). You should know about flit https://github.com/takluyver/flit and Bento http://cournape.github.io/Bento/ which have their own lightweight metadata formats, which are transformed into standard Python formats by the respective tools. requires.txt is popular but I'm not a fan of it, it seems like it was invented by people who didn't want to have a proper setup.py for their project. We have to come up with something simpler than setup.py if we want to get some of the people who don't understand how to write setup.py. Ideally any required new user-editable "which build system" metadata could be boiled down to a single line in setup.cfg. There would be 3 stages: VCS checkout (minimal metadata, a "generate machine readable metadata" step equivalent to "setup.py egg_info") -> new sdist (PEP 376 style static metadata that can be trusted) -> wheel. (How pip builds a package from source: 1. download sdist; .egg-info directory is almost always present 2. run setup.py egg_info to get dependencies, because the static one is not reliable, because too many requirements lists have 'if' statements 3. compile) For all the talk about static metadata, the build script in general needs to remain a Turing-complete script. Build systems everywhere are programs to build other programs. I really like your idea about returning a list of built artifacts. Python packaging is strictly 1:1 source package -> output package but rpm, deb, can generate many packages from a single source package. I don't think we have to worry that much about Debian & RHEL. They will get over it if setup.py is no longer there. Change brings work but stagnation brings death. On Fri, Oct 2, 2015 at 2:41 PM Brett Cannon <brett@python.org> wrote:
On Fri, 2 Oct 2015 at 05:08 Donald Stufft <donald@stufft.io> wrote:
On October 2, 2015 at 12:54:03 AM, Nathaniel Smith (njs@pobox.com) wrote:
We realized that actually as far as we could tell, it wouldn't be that hard at this point to clean up how sdists work so that it would be possible to migrate away from distutils. So we wrote up a little draft proposal.
The main question is, does this approach seem sound?
I've just read over your proposal, but I've also just woken up so I might be a little slow still! After reading what you have, I don't think that this proposal is the right way to go about improving sdists.
The first thing that immediately stood out to me, is that it's recommending that downstream redistributors like Debian, Fedora, etc utilize Wheels instead of the sdist to build their packages from. However, that is not really going to fly with most (all?) of the downstream redistributors. Debian for instance has policy that requires the use of building all of it's packages from Source, not from anything else and Wheels are not a source package. While it can theoretically work for pure python packages, it quickly devolves into a mess when you factor in packages that have any C code what so ever.
So wouldn't they then download the sdist, build a wheel as an intermediate, and then generate the .deb file? I mean as long as people upload an sdist for those that want to build from source and a wheel for convenience -- which is probably what most people providing wheels do anyway -- then I don't see the problem.
Overall, this feels more like a sidegrade than an upgrade. One major theme throughout of the PEP is that we're going to push to rely heavily on wheels as the primary format of installation. While that works well for things like Debian, I don't think it's going to work as wheel for us. If we were only distributing pure python packages, then yes absolutely, however given that we are not, we have to worry about ABI issues. Given that there is so many different environments that a particular package might be installed into, all with different ABIs we have to assume that installing from source is still going to be a primary path for end users to install and that we are never going to have a world where we can assume a Wheel in a repository.
One of the problems with the current system, is that we have no mechanism by which to determine dependencies of a source distribution without downloading the file and executing some potentially untrusted code. This makes dependency resolution harder and much much slower than if we could read that information statically from a source distribution. This PEP doesn't offer anything in the way of solving this problem.
Isn't that what the requirements and requirements-file fields in the _pypackage file provide? Only if you use that requirements-dynamic would it require execcuting arbitrary code to gather dependency information, or am I missing something?
To a similar tune, this PEP also doesn't make it possible to really get at any other metadata without executing software. This makes it pratically impossible to safely inspect an unknown or untrusted package to determine what it is and to get information about it. Right now PyPI relies on the uploading tool to send that information alongside of the file it is uploading, but honestly what it should be doing is extracting that information from within the file. This is sort of possible right now since distutils and setuptools both create a static metadata file within the source distribution, but we don't rely on that within PyPI because that information may or may not be accurate and may or may not exist. However the twine uploading tool *does* rely on that, and this PEP would break the ability for twine to upload a package without executing arbitrary code.
Isn't that only if you use the dynamic fields?
Overall, I don't think that this really solves most of the foundational problems with the current format. Largely it feels that what it achieves is shuffling around some logic (you need to create a hook that you reference from within a .cfg file instead of creating a setuptools extension or so) but without fixing most of the problems. The largest benefit I see to switching to this right now is that it would enable us to have build time dependencies that were controlled by pip rather than installed implicitly via the execution of the setup.py. That doesn't feel like a big enough benefit to me to do a mass shakeup of what we recommend and tell people to do. Having people adjust and change and do something new requires effort, and we need something to justify that effort to other people and I don't think that this PEP has something we can really use to justify that effort.
From my naive perspective this proposal seems to help push forward a decoupling of building using distutils/setuptools as the only way you can properly build Python projects (which is what I think we are all after) and will hopefully eventually free pip up to simply do orchestration.
I *do* think that there is a core of some ideas here that are valuable, and in fact are similar to some ideas I've had. The main flaw I see here is that it doesn't really fix sdists, it takes a solution that would work for VCS checkouts and then reuses it for sdists. In my mind, the supported flow for package installation would be:
VCS/Bare Directory -> Source Distribution -> Wheel
This would (eventually) be the only path that was supported for installation but you could "enter" the path at any stage. For example, if there is a Wheel already available, then you jump right on at the end and just install that, if there is a sdist available then pip first builds it into a wheel and then installs that, etc.
I think your PEP is something like what the VCS/Bare Directory to sdist tooling could look like, but I don't think it's what the sdist to wheel path should look like.
Is there another proposal I'm unaware for the sdist -> wheel step that is build tool-agnostic? I'm all for going with the best solution but there has to be an actual alternative to compare against and I don't know of any others right now and this proposal does seem to move things forward in a reasonable fashion. _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

We need to embrace partial solutions and the fine folks who propose them so the whole packaging ecosystem can have some progress. PEP 438 may not be a good analogue to adding a new sdist format since the latter only adds new things that you can do. A new sdist format will inconvenience a much more limited set of people, mainly the pip authors and the OS package maintainers. Sorry but the section of the PEP that prefixes filenames with _ has distracted the discussion away from the general idea. Instead of multiple hooks why not a single object exposed through an entry point that has several optional methods? NO [build] requirements = "flit" build-wheels = "flit.pypackage_hooks:build_wheels" build-in-place = "flit.pypackage_hooks:build_in_place" YES [build] build-system=flit class ABuildHook: def build_wheels(self): ... entry_points = {'new.sdist.build.hooks': ['flit= some_module:ABuildHook']} On Fri, Oct 2, 2015 at 3:48 PM Daniel Holth <dholth@gmail.com> wrote:
The MEBS idea is inspired by heroku buildpacks where you just ask a list of tools whether they can build something. https://devcenter.heroku.com/articles/buildpacks . The idea would be that pip would use MEBS instead of its setup.py-focused builder. The first available MEBS plugin would notice setup.py and do what pip does now (force setuptools, build in a subprocess).
You should know about flit https://github.com/takluyver/flit and Bento http://cournape.github.io/Bento/ which have their own lightweight metadata formats, which are transformed into standard Python formats by the respective tools.
requires.txt is popular but I'm not a fan of it, it seems like it was invented by people who didn't want to have a proper setup.py for their project.
We have to come up with something simpler than setup.py if we want to get some of the people who don't understand how to write setup.py. Ideally any required new user-editable "which build system" metadata could be boiled down to a single line in setup.cfg. There would be 3 stages: VCS checkout (minimal metadata, a "generate machine readable metadata" step equivalent to "setup.py egg_info") -> new sdist (PEP 376 style static metadata that can be trusted) -> wheel.
(How pip builds a package from source: 1. download sdist; .egg-info directory is almost always present 2. run setup.py egg_info to get dependencies, because the static one is not reliable, because too many requirements lists have 'if' statements 3. compile)
For all the talk about static metadata, the build script in general needs to remain a Turing-complete script. Build systems everywhere are programs to build other programs.
I really like your idea about returning a list of built artifacts. Python packaging is strictly 1:1 source package -> output package but rpm, deb, can generate many packages from a single source package.
I don't think we have to worry that much about Debian & RHEL. They will get over it if setup.py is no longer there. Change brings work but stagnation brings death.
On Fri, Oct 2, 2015 at 2:41 PM Brett Cannon <brett@python.org> wrote:
On Fri, 2 Oct 2015 at 05:08 Donald Stufft <donald@stufft.io> wrote:
On October 2, 2015 at 12:54:03 AM, Nathaniel Smith (njs@pobox.com) wrote:
We realized that actually as far as we could tell, it wouldn't be that hard at this point to clean up how sdists work so that it would be possible to migrate away from distutils. So we wrote up a little draft proposal.
The main question is, does this approach seem sound?
I've just read over your proposal, but I've also just woken up so I might be a little slow still! After reading what you have, I don't think that this proposal is the right way to go about improving sdists.
The first thing that immediately stood out to me, is that it's recommending that downstream redistributors like Debian, Fedora, etc utilize Wheels instead of the sdist to build their packages from. However, that is not really going to fly with most (all?) of the downstream redistributors. Debian for instance has policy that requires the use of building all of it's packages from Source, not from anything else and Wheels are not a source package. While it can theoretically work for pure python packages, it quickly devolves into a mess when you factor in packages that have any C code what so ever.
So wouldn't they then download the sdist, build a wheel as an intermediate, and then generate the .deb file? I mean as long as people upload an sdist for those that want to build from source and a wheel for convenience -- which is probably what most people providing wheels do anyway -- then I don't see the problem.
Overall, this feels more like a sidegrade than an upgrade. One major theme throughout of the PEP is that we're going to push to rely heavily on wheels as the primary format of installation. While that works well for things like Debian, I don't think it's going to work as wheel for us. If we were only distributing pure python packages, then yes absolutely, however given that we are not, we have to worry about ABI issues. Given that there is so many different environments that a particular package might be installed into, all with different ABIs we have to assume that installing from source is still going to be a primary path for end users to install and that we are never going to have a world where we can assume a Wheel in a repository.
One of the problems with the current system, is that we have no mechanism by which to determine dependencies of a source distribution without downloading the file and executing some potentially untrusted code. This makes dependency resolution harder and much much slower than if we could read that information statically from a source distribution. This PEP doesn't offer anything in the way of solving this problem.
Isn't that what the requirements and requirements-file fields in the _pypackage file provide? Only if you use that requirements-dynamic would it require execcuting arbitrary code to gather dependency information, or am I missing something?
To a similar tune, this PEP also doesn't make it possible to really get at any other metadata without executing software. This makes it pratically impossible to safely inspect an unknown or untrusted package to determine what it is and to get information about it. Right now PyPI relies on the uploading tool to send that information alongside of the file it is uploading, but honestly what it should be doing is extracting that information from within the file. This is sort of possible right now since distutils and setuptools both create a static metadata file within the source distribution, but we don't rely on that within PyPI because that information may or may not be accurate and may or may not exist. However the twine uploading tool *does* rely on that, and this PEP would break the ability for twine to upload a package without executing arbitrary code.
Isn't that only if you use the dynamic fields?
Overall, I don't think that this really solves most of the foundational problems with the current format. Largely it feels that what it achieves is shuffling around some logic (you need to create a hook that you reference from within a .cfg file instead of creating a setuptools extension or so) but without fixing most of the problems. The largest benefit I see to switching to this right now is that it would enable us to have build time dependencies that were controlled by pip rather than installed implicitly via the execution of the setup.py. That doesn't feel like a big enough benefit to me to do a mass shakeup of what we recommend and tell people to do. Having people adjust and change and do something new requires effort, and we need something to justify that effort to other people and I don't think that this PEP has something we can really use to justify that effort.
From my naive perspective this proposal seems to help push forward a decoupling of building using distutils/setuptools as the only way you can properly build Python projects (which is what I think we are all after) and will hopefully eventually free pip up to simply do orchestration.
I *do* think that there is a core of some ideas here that are valuable, and in fact are similar to some ideas I've had. The main flaw I see here is that it doesn't really fix sdists, it takes a solution that would work for VCS checkouts and then reuses it for sdists. In my mind, the supported flow for package installation would be:
VCS/Bare Directory -> Source Distribution -> Wheel
This would (eventually) be the only path that was supported for installation but you could "enter" the path at any stage. For example, if there is a Wheel already available, then you jump right on at the end and just install that, if there is a sdist available then pip first builds it into a wheel and then installs that, etc.
I think your PEP is something like what the VCS/Bare Directory to sdist tooling could look like, but I don't think it's what the sdist to wheel path should look like.
Is there another proposal I'm unaware for the sdist -> wheel step that is build tool-agnostic? I'm all for going with the best solution but there has to be an actual alternative to compare against and I don't know of any others right now and this proposal does seem to move things forward in a reasonable fashion. _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On Fri, Oct 2, 2015 at 1:24 PM, Daniel Holth <dholth@gmail.com> wrote:
Instead of multiple hooks why not a single object exposed through an entry point that has several optional methods?
NO
[build] requirements = "flit" build-wheels = "flit.pypackage_hooks:build_wheels" build-in-place = "flit.pypackage_hooks:build_in_place"
YES
[build] build-system=flit
class ABuildHook: def build_wheels(self): ...
entry_points = {'new.sdist.build.hooks': ['flit= some_module:ABuildHook']}
Mostly because this rules out the possibility of shipping the build hook inside the package being built, which seems like a useful option to leave open (e.g. some people will probably want to vendor their build system for more-or-less legitimate reasons). Notice that the _pypackage directory is added to sys.path before resolving hooks. -n -- Nathaniel J. Smith -- http://vorpus.org

On October 2, 2015 at 4:24:38 PM, Daniel Holth (dholth@gmail.com) wrote:
We need to embrace partial solutions and the fine folks who propose them so the whole packaging ecosystem can have some progress. PEP 438 may not be a good analogue to adding a new sdist format since the latter only adds new things that you can do. A new sdist format will inconvenience a much more limited set of people, mainly the pip authors and the OS package maintainers.
Packaging formats are a bit like HTTP, "move fast and break things" isn't super great because anytime you add a new format, you have to support that *forever* (or long enough to basically be forever). It's unlike a software package where you can, for example, deprecate and remove something and if someone is still using that feature they just continue to use an older version of your library. Packagers have no control over what version of the tools people are using and they are discouraged (or even disallowed) to go back in time and correct their older packages. Being conservative in what we accept as an accepted standard and implement in the main tools is a good thing, being liberal in experiments and what we try and having people try out partial solutions and figure out what works and what doesn't without making it a standard or putting it in pip is also a good thing. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Fri, Oct 2, 2015 at 2:36 PM, Donald Stufft <donald@stufft.io> wrote:
On October 2, 2015 at 4:24:38 PM, Daniel Holth (dholth@gmail.com) wrote:
We need to embrace partial solutions and the fine folks who propose them so the whole packaging ecosystem can have some progress. PEP 438 may not be a good analogue to adding a new sdist format since the latter only adds new things that you can do. A new sdist format will inconvenience a much more limited set of people, mainly the pip authors and the OS package maintainers.
Packaging formats are a bit like HTTP, "move fast and break things" isn't super great because anytime you add a new format, you have to support that *forever* (or long enough to basically be forever).
Right: this is why it's important for me to make the case that putting full PEP 426 metadata in sdists is not just temporarily inconvenient, but actually conceptually the wrong thing to do. -n -- Nathaniel J. Smith -- http://vorpus.org

On Oct 2, 2015 5:18 PM, "Nathaniel Smith" <njs@pobox.com> wrote:
On Fri, Oct 2, 2015 at 2:36 PM, Donald Stufft <donald@stufft.io> wrote:
On October 2, 2015 at 4:24:38 PM, Daniel Holth (dholth@gmail.com) wrote:
We need to embrace partial solutions and the fine folks who propose them so the whole packaging ecosystem can have some progress. PEP 438 may not be a good analogue to adding a new sdist format since the latter only adds new things that you can do. A new sdist format will inconvenience a much more limited set of people, mainly the pip authors and the OS package maintainers.
Packaging formats are a bit like HTTP, "move fast and break things"
isn't super
great because anytime you add a new format, you have to support that *forever* (or long enough to basically be forever).
Right: this is why it's important for me to make the case that putting full PEP 426 metadata in sdists is not just temporarily inconvenient, but actually conceptually the wrong thing to do.
pydist.jsonld would be a helpful metadata file to add to an sdist, as well URIs to dependencies with rule/constraints in the reified edges drawn from e.g. - setup.py - requirements.txt - requirements.lock/versions/freeze.txt - requirements.peep.txt - requirements-dev/test/docs.txt - [versions.cfg]
-n
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On Fri, Oct 2, 2015 at 4:58 AM, Donald Stufft <donald@stufft.io> wrote:
On October 2, 2015 at 12:54:03 AM, Nathaniel Smith (njs@pobox.com) wrote:
We realized that actually as far as we could tell, it wouldn't be that hard at this point to clean up how sdists work so that it would be possible to migrate away from distutils. So we wrote up a little draft proposal.
The main question is, does this approach seem sound?
I've just read over your proposal, but I've also just woken up so I might be a little slow still! After reading what you have, I don't think that this proposal is the right way to go about improving sdists.
The first thing that immediately stood out to me, is that it's recommending that downstream redistributors like Debian, Fedora, etc utilize Wheels instead of the sdist to build their packages from. However, that is not really going to fly with most (all?) of the downstream redistributors. Debian for instance has policy that requires the use of building all of it's packages from Source, not from anything else and Wheels are not a source package. While it can theoretically work for pure python packages, it quickly devolves into a mess when you factor in packages that have any C code what so ever.
I think this was addressed downthread -- the idea would be that Debian would build from sdist, with a two step process: convert sdist to wheels, repack wheels into binary .deb.
Overall, this feels more like a sidegrade than an upgrade. One major theme throughout of the PEP is that we're going to push to rely heavily on wheels as the primary format of installation. While that works well for things like Debian, I don't think it's going to work as wheel for us. If we were only distributing pure python packages, then yes absolutely, however given that we are not, we have to worry about ABI issues. Given that there is so many different environments that a particular package might be installed into, all with different ABIs we have to assume that installing from source is still going to be a primary path for end users to install and that we are never going to have a world where we can assume a Wheel in a repository.
One of the problems with the current system, is that we have no mechanism by which to determine dependencies of a source distribution without downloading the file and executing some potentially untrusted code. This makes dependency resolution harder and much much slower than if we could read that information statically from a source distribution. This PEP doesn't offer anything in the way of solving this problem.
What are the "dependencies of a source distribution"? Do you mean the runtime dependencies of the wheels that will be built from a source distribution? If you need that metadata to be statically in the sdist, then you might as well give up now because it's simply impossible. As the very simplest example, every package that uses the numpy C API gets a runtime dependency on "numpy >= [whatever version happened to be installed on the *build* machine]". There are plenty of more complex examples too (e.g. ones that involve build/configure-time decisions about whether to rely on particular system libraries, or build/configure-time decisions about whether particular packages should even be built). For comparison, here's the Debian source package metadata: https://www.debian.org/doc/debian-policy/ch-controlfields.html#s-debiansourc... Note that the only mandatory fields are format version / package name / package version / maintainer / checksums. The closest they come to making promises about the built packages are the Package-List and Binary fields which provide a optional hint about what binary packages will be built, and are allowed to contain lies (e.g. they explicitly don't guarantee that all the binary packages named will actually be produced on every architecture). The only kind of dependencies that a source package can declare are build-depends.
To a similar tune, this PEP also doesn't make it possible to really get at any other metadata without executing software. This makes it pratically impossible to safely inspect an unknown or untrusted package to determine what it is and to get information about it. Right now PyPI relies on the uploading tool to send that information alongside of the file it is uploading, but honestly what it should be doing is extracting that information from within the file. This is sort of possible right now since distutils and setuptools both create a static metadata file within the source distribution, but we don't rely on that within PyPI because that information may or may not be accurate and may or may not exist. However the twine uploading tool *does* rely on that, and this PEP would break the ability for twine to upload a package without executing arbitrary code.
Okay, what metadata do you need? We certainly could put name / version kind of stuff in there. We left it out because we weren't sure what was necessary and it's easy to add later, but anything that's needed by twine fits neatly into the existing text saying that we should "include extra metadata in source distributions if it helps solve specific problems that are unique to distribution" -- twine uploads definitely count.
Overall, I don't think that this really solves most of the foundational problems with the current format. Largely it feels that what it achieves is shuffling around some logic (you need to create a hook that you reference from within a .cfg file instead of creating a setuptools extension or so) but
numpy.distutils is the biggest distutils/setuptools extension around, and everyone involved in maintaining it wants to kill it with fire :-). That's a problem...
without fixing most of the problems. The largest benefit I see to switching to this right now is that it would enable us to have build time dependencies that were controlled by pip rather than installed implicitly via the execution of the setup.py.
Yes, this problem means that literally every numerical python package currently has a broken setup.py.
That doesn't feel like a big enough benefit to me to do a mass shakeup of what we recommend and tell people to do. Having people adjust and change and do something new requires effort, and we need something to justify that effort to other people and I don't think that this PEP has something we can really use to justify that effort.
The end-user adjustment is teaching people to switch to always using pip to install packages -- this seems like something we will certainly do sooner or later, so we might as well get started. And it's already actually the right thing to do -- if you use 'setup.py install' then you get a timebomb in your venv where later upgrades may leave you with a broken package :-(. (This is orthogonal to the actual PEP.) In the long run, the idea that every package has to contain code that knows how to implement installation in very possible configuration (--user? --single-version-externally-managed?) is clearly broken, and teaching people to use 'pip install' is obviously the only sensible alternative. -n -- Nathaniel J. Smith -- http://vorpus.org

On 2 October 2015 at 21:19, Nathaniel Smith <njs@pobox.com> wrote:
One of the problems with the current system, is that we have no mechanism by which to determine dependencies of a source distribution without downloading the file and executing some potentially untrusted code. This makes dependency resolution harder and much much slower than if we could read that information statically from a source distribution. This PEP doesn't offer anything in the way of solving this problem.
What are the "dependencies of a source distribution"? Do you mean the runtime dependencies of the wheels that will be built from a source distribution?
If you need that metadata to be statically in the sdist, then you might as well give up now because it's simply impossible.
As the very simplest example, every package that uses the numpy C API gets a runtime dependency on "numpy >= [whatever version happened to be installed on the *build* machine]". There are plenty of more complex examples too (e.g. ones that involve build/configure-time decisions about whether to rely on particular system libraries, or build/configure-time decisions about whether particular packages should even be built).
From my point of view, it's not a source distribution or a binary distribution that depends on something (numpy or whatever) - it's the *project*. If project foo needs numpy to work, it depends on numpy. If it depends on features in numpy 1.9, it depends on numpy>=1.9. Optional dependencies are covered by extras, and environment specific dependencies are covered by environment markers.[1] That remains true for all wheels that are built from that project, for whatever platform using whatever tools. It should also be true for the source distribution, precisely *because* it's independent of the build
I'm really not at all clear what you're saying here. It's quite possible that those of us who don't understand the complexities of the scientific/numpy world are missing something important, but if so it would be useful if you could spell out the problems in detail. process. I can understand that a binary wheel may need a certain set of libraries installed - but that's about the platform tags that are part of the wheel definition, not about dependencies. Platform tags are an ongoing discussion, and a good example of a partial solution that needs to be extended, certainly, but they aren't really relevant in any way that I can see to how the build chain works. You seem to be saying that wheels need a dependency on "the version of numpy they were built against". That sounds to me like a binary compatibility requirement that platform tags are intended to cover. It may well be a requirement that platform tags need significant enhancement (maybe even redesign) to cover, but it's not a dependency in the sense that pip and the packaging PEPs use the term. And if my understanding is correct, I'm against trying to fit that information into a dependency simply to work around the current limitations of the platform tag mechanism. I'm all in favour of new initiatives to make progress in areas that are currently stalled (we definitely need people willing to contribute) but we really don't have the resources to throw away the progress we've already made. Even if some of the packaging PEPs are still works in progress, what is there represents an investment we need to build on, not bypass. Paul [1] If extras and environment markers don't cover the needs of scientific modules, we need some input into their design from the scientific community. But again, let's not throw away the work that's already done.

I'm not sure if I understand what Nathaniel is getting at, but...
As the very simplest example, every package that uses the numpy C API
gets a runtime dependency on "numpy >= [whatever version happened to be installed on the *build* machine]".
From my point of view, it's not a source distribution or a binary distribution that depends on something (numpy or whatever) - it's the *project*. If project foo needs numpy to work, it depends on numpy. If it depends on features in numpy 1.9, it depends on numpy>=1.9.
Here is the gap (I'm a bit confused about what a "project" is -- so I"ll use the term "package", meaning a python package). A given package might depend on numpy, as you say, and it may work with all numpy versions 1.6 to 1.9. Fine, so we specify that in install_requires. And this shodl be the dependency in the sdist, too. If the package is pur python, this is fine and done. But if the package has some extensions code that used the numpy C API ( a very common occurrence), then when it is built, it will only work (reliably) with the version of numpy it was built with. So the project itself, and the sdist depend on numpy >=1.6, but a build binary wheel depends on numpy == 1.7 (for instance). Which requires a binary (wheel) dependency that is somewhat different than the source dependency. In a way, this is a lot like a python package that may work fine on py2.7 to py3.5, but a binary wheel is for py3.4 specifically (yes?) Note that conda, being developed originally for scipy, has packages like: gsw-3.0.3-np19py27_0.tar.bz2 so this binary depends specifically on py2.7 and numpy1.9 but once you get beyond python_numpy, you cold get a REALLY BIG matrix of possible version combnations! nevertheless, I think it would be really helpful to have the concept of a binary dependency that is distinct from the "package" dependency. Maybe down to "this wheel depends on this other particular wheel" - not only version, but also wheel specific. That would open the door to making wheels for binary non-python dependencies like libpng: then other wheels would depend on the libpng wheel, but different wheels built from the same package might have those dependencies handled differently, particularly on different platforms. I can understand that a binary wheel may need a certain set of
libraries installed - but that's about the platform tags that are part of the wheel definition, not about dependencies. Platform tags are an ongoing discussion, and a good example of a partial solution that needs to be extended, certainly, but they aren't really relevant in any way that I can see to how the build chain works.
I haven't followed the discussion well, but I really think platform tags are never going to be specific enough to handle these use-cases And these really ARE dependencies, even if they aren't pure-python ones... And if my
understanding is correct, I'm against trying to fit that information into a dependency simply to work around the current limitations of the platform tag mechanism.
I can't imagine how you could extend platform tags to cover all of this -- but maybe I'm being unimaginative... BTW, numpy is nearly unique here -- I can't think of any other package that is a python package with a C API that is widely used -- most other cases are either a python package OR a regular old compiled lib. Which is probably why conda can get away with essentially special casing it. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Fri, Oct 2, 2015 at 1:42 PM, Paul Moore <p.f.moore@gmail.com> wrote:
On 2 October 2015 at 21:19, Nathaniel Smith <njs@pobox.com> wrote:
One of the problems with the current system, is that we have no mechanism by which to determine dependencies of a source distribution without downloading the file and executing some potentially untrusted code. This makes dependency resolution harder and much much slower than if we could read that information statically from a source distribution. This PEP doesn't offer anything in the way of solving this problem.
What are the "dependencies of a source distribution"? Do you mean the runtime dependencies of the wheels that will be built from a source distribution?
If you need that metadata to be statically in the sdist, then you might as well give up now because it's simply impossible.
As the very simplest example, every package that uses the numpy C API gets a runtime dependency on "numpy >= [whatever version happened to be installed on the *build* machine]". There are plenty of more complex examples too (e.g. ones that involve build/configure-time decisions about whether to rely on particular system libraries, or build/configure-time decisions about whether particular packages should even be built).
I'm really not at all clear what you're saying here. It's quite possible that those of us who don't understand the complexities of the scientific/numpy world are missing something important, but if so it would be useful if you could spell out the problems in detail.
From my point of view, it's not a source distribution or a binary distribution that depends on something (numpy or whatever) - it's the *project*. If project foo needs numpy to work, it depends on numpy. If it depends on features in numpy 1.9, it depends on numpy>=1.9. Optional dependencies are covered by extras, and environment specific dependencies are covered by environment markers.[1] That remains true for all wheels that are built from that project, for whatever platform using whatever tools. It should also be true for the source distribution, precisely *because* it's independent of the build process.
"Project" is a pretty messy concept. Obviously in simple cases there's a one-to-one mapping between project <-> wheel <-> importable package, but this breaks down quickly in edge cases. Consider a project that provides builds multiple wheels out of the same source tree. You obviously can't expect that all of these packages will have the same dependencies. This situation is not common today for Python packages, but the only reason for that is that distutils makes it really hard to do -- it's extremely common in other package ecosystems, and the advantages are obvious. E.g., maybe numpy.distutils should be split into a separately installable package from numpy -- there's no technical reason that this should mean we are now forced to move the code for it into its own VCS repository.
I can understand that a binary wheel may need a certain set of libraries installed - but that's about the platform tags that are part of the wheel definition, not about dependencies. Platform tags are an ongoing discussion, and a good example of a partial solution that needs to be extended, certainly, but they aren't really relevant in any way that I can see to how the build chain works.
(I assume that by "platform tags" you mean what PEP 426 calls "environment markers".) Environment markers are really useful for extending the set of cases that can be handled by a single architecture-dependent wheel. And they're a good fit for that environment, given that wheels can't contain arbitrary code. But they're certainly never going to be adequate to provide a single static description of every possible build configuration of every possible project. And installing an sdist already requires arbitrary code execution, so it doesn't make sense to try to build some elaborate system to avoid arbitrary code execution just for the dependency specification. You're right that in a perfect future world numpy C API related dependencies would be handling by some separate ABI-tracking mechanism similar to how the CPython ABI is tracked, so here are some other examples of why environment markers are inadequate: In the future it will almost certainly be possible to build numpy in two different configurations: one where it expects to find BLAS inside a wheel distributed for this purpose (e.g. this is necessary to provide high-quality windows wheels), and one where it expects to find BLAS installed on the system. This decision will *not* be tied to the platform, but be selectable at build time. E.g., on OS X there is a system-provided BLAS library, but it has some issues. So the default wheels on PyPI will probably act like windows and depend on a BLAS-package that we control, but there will also be individual users who prefer to build numpy in the configuration where it uses the system BLAS, so we definitely need to support both options on OS X. Now the problem: There will never be a single environment marker that you can stick into a wheel or sdist that says "we depend on the 'pyblas' package if the system is OS X (ok) and the user set this flag in this configuration file during the build process (wait wut)". Similarly, I think someone was saying in a discussion recently that lxml supports being built either in a mode where it requires libxml be available on the system, or else it can be statically linked. Even if in the future we start having metadata that lets us describe dependencies on external system libraries, it's never going to be the case that we can put the *same* dependency metadata into wheels that are built using these two configurations.
You seem to be saying that wheels need a dependency on "the version of numpy they were built against". That sounds to me like a binary compatibility requirement that platform tags are intended to cover. It may well be a requirement that platform tags need significant enhancement (maybe even redesign) to cover, but it's not a dependency in the sense that pip and the packaging PEPs use the term. And if my understanding is correct, I'm against trying to fit that information into a dependency simply to work around the current limitations of the platform tag mechanism.
I'm all in favour of new initiatives to make progress in areas that are currently stalled (we definitely need people willing to contribute) but we really don't have the resources to throw away the progress we've already made. Even if some of the packaging PEPs are still works in progress, what is there represents an investment we need to build on, not bypass.
Paul
[1] If extras and environment markers don't cover the needs of scientific modules, we need some input into their design from the scientific community. But again, let's not throw away the work that's already done.
As far as sdists go, you can either cover 90% of the cases by building increasingly elaborate metadata formats, or you can cover 100% of the cases by keeping things simple... -n -- Nathaniel J. Smith -- http://vorpus.org

On 2 October 2015 at 23:15, Nathaniel Smith <njs@pobox.com> wrote:
"Project" is a pretty messy concept. Obviously in simple cases there's a one-to-one mapping between project <-> wheel <-> importable package, but this breaks down quickly in edge cases.
I mistakenly used "project" in an attempt to avoid confusion resulting from me using the word "distribution" as a more general term than the way you were using "source distribution" or "binary distribution". Clearly I failed and made things more confusing. I use the term "distribution" in the sense used here https://packaging.python.org/en/latest/glossary/#term-distribution-package. Note that this is in contrast to the terms "source distribution" and "binary distribution" or "built distribution" in the same page. Sorry for confusing things. I'll stick to the terminology as in the PUG glossary from now on.
Consider a project that provides builds multiple wheels out of the same source tree. You obviously can't expect that all of these packages will have the same dependencies.
Correct. But a distribution can and should (I believe) have the same dependencies for all of the source and built distributions derived from it.
This situation is not common today for Python packages, but the only reason for that is that distutils makes it really hard to do -- it's extremely common in other package ecosystems, and the advantages are obvious. E.g., maybe numpy.distutils should be split into a separately installable package from numpy -- there's no technical reason that this should mean we are now forced to move the code for it into its own VCS repository.
I'm lost here, I'm afraid. Could you rephrase this in terms of the definitions from the PUG glossary? It sounds to me like the VCS repository is the project, which contains multiple distributions. I don't see how that's particularly hard. Each distribution just has its own subdirectory (and setup.py) in the VCS repository...
(I assume that by "platform tags" you mean what PEP 426 calls "environment markers".)
Nope, I mean as defined in PEP 425. The platform tag is part of the compatibility tag. Maybe I meant the ABI tag, I don't really follow the distinctions.
Environment markers are really useful for extending the set of cases that can be handled by a single architecture-dependent wheel. And they're a good fit for that environment, given that wheels can't contain arbitrary code.
But they're certainly never going to be adequate to provide a single static description of every possible build configuration of every possible project. And installing an sdist already requires arbitrary code execution, so it doesn't make sense to try to build some elaborate system to avoid arbitrary code execution just for the dependency specification.
You're right that in a perfect future world numpy C API related dependencies would be handling by some separate ABI-tracking mechanism similar to how the CPython ABI is tracked, so here are some other examples of why environment markers are inadequate:
In the future it will almost certainly be possible to build numpy in two different configurations: one where it expects to find BLAS inside a wheel distributed for this purpose (e.g. this is necessary to provide high-quality windows wheels), and one where it expects to find BLAS installed on the system. This decision will *not* be tied to the platform, but be selectable at build time. E.g., on OS X there is a system-provided BLAS library, but it has some issues. So the default wheels on PyPI will probably act like windows and depend on a BLAS-package that we control, but there will also be individual users who prefer to build numpy in the configuration where it uses the system BLAS, so we definitely need to support both options on OS X. Now the problem: There will never be a single environment marker that you can stick into a wheel or sdist that says "we depend on the 'pyblas' package if the system is OS X (ok) and the user set this flag in this configuration file during the build process (wait wut)".
Similarly, I think someone was saying in a discussion recently that lxml supports being built either in a mode where it requires libxml be available on the system, or else it can be statically linked. Even if in the future we start having metadata that lets us describe dependencies on external system libraries, it's never going to be the case that we can put the *same* dependency metadata into wheels that are built using these two configurations.
This is precisely the very complex issue that's being discussed under the banner of extending compatibility tags in a way that gives a viable but practical way of distinguishing binary wheels. You can either see that as a discussion about "expanding compatibility tags" or "finding something better than compatibility tags". I don't have much of a stake in that discussion, as the current compatibility tags suit my needs fine, as a Windows user. The issues seem to be around Linux and possibly some of the complexities around binary dependencies for numerical libraries. But the key point here is that I see the solution for this as being about distinguishing the "right" wheel for the target environment. It's not about anything that should reach back to sdists. Maybe a solution will involve a PEP 426 metadata enhancement that adds metadata that's only valid in binary distributions and not in source distributions, but that's fine by me. But it won't replace the existing dependency data, which *is* valid at the sdist level. At least as far as I can see - I'm willing to be enlightened. But your argument seems to be that sdist-level dependency information should be omitted because more detailed ABI compatibility data *might* be needed at the wheel level for some packages. I don't agree with that - we still need the existing metadata, even if more might be required in specialist cases.
[1] If extras and environment markers don't cover the needs of scientific modules, we need some input into their design from the scientific community. But again, let's not throw away the work that's already done.
As far as sdists go, you can either cover 90% of the cases by building increasingly elaborate metadata formats, or you can cover 100% of the cases by keeping things simple...
But your argument seems to be that having metadata generated from package build code is "simpler". My strong opinion, based on what I've seen of the problems caused by having metadata in an "exectable setup.py", is that static metadata is far simpler. I don't believe that the cost of changing to a new system can be justified *without* getting the benefits of static metadata. Paul

On Oct 2, 2015 5:53 PM, "Paul Moore" <p.f.moore@gmail.com> wrote:
On 2 October 2015 at 23:15, Nathaniel Smith <njs@pobox.com> wrote:
"Project" is a pretty messy concept. Obviously in simple cases there's a one-to-one mapping between project <-> wheel <-> importable package, but this breaks down quickly in edge cases.
I mistakenly used "project" in an attempt to avoid confusion resulting from me using the word "distribution" as a more general term than the way you were using "source distribution" or "binary distribution". Clearly I failed and made things more confusing.
I use the term "distribution" in the sense used here https://packaging.python.org/en/latest/glossary/#term-distribution-package
.
Note that this is in contrast to the terms "source distribution" and "binary distribution" or "built distribution" in the same page.
Sorry for confusing things. I'll stick to the terminology as in the PUG glossary from now on.
Consider a project that provides builds multiple wheels out of the same source tree. You obviously can't expect that all of these packages will have the same dependencies.
Correct. But a distribution can and should (I believe) have the same dependencies for all of the source and built distributions derived from it.
This situation is not common today for Python packages, but the only reason for that is that distutils makes it really hard to do -- it's extremely common in other package ecosystems, and the advantages are obvious. E.g., maybe numpy.distutils should be split into a separately installable package from numpy -- there's no technical reason that this should mean we are now forced to move the code for it into its own VCS repository.
I'm lost here, I'm afraid. Could you rephrase this in terms of the definitions from the PUG glossary? It sounds to me like the VCS repository is the project, which contains multiple distributions. I don't see how that's particularly hard. Each distribution just has its own subdirectory (and setup.py) in the VCS repository...
(I assume that by "platform tags" you mean what PEP 426 calls "environment markers".)
Nope, I mean as defined in PEP 425. The platform tag is part of the compatibility tag. Maybe I meant the ABI tag, I don't really follow the distinctions.
Environment markers are really useful for extending the set of cases that can be handled by a single architecture-dependent wheel. And they're a good fit for that environment, given that wheels can't contain arbitrary code.
But they're certainly never going to be adequate to provide a single static description of every possible build configuration of every possible project. And installing an sdist already requires arbitrary code execution, so it doesn't make sense to try to build some elaborate system to avoid arbitrary code execution just for the dependency specification.
You're right that in a perfect future world numpy C API related dependencies would be handling by some separate ABI-tracking mechanism similar to how the CPython ABI is tracked, so here are some other examples of why environment markers are inadequate:
In the future it will almost certainly be possible to build numpy in two different configurations: one where it expects to find BLAS inside a wheel distributed for this purpose (e.g. this is necessary to provide high-quality windows wheels), and one where it expects to find BLAS installed on the system. This decision will *not* be tied to the platform, but be selectable at build time. E.g., on OS X there is a system-provided BLAS library, but it has some issues. So the default wheels on PyPI will probably act like windows and depend on a BLAS-package that we control, but there will also be individual users who prefer to build numpy in the configuration where it uses the system BLAS, so we definitely need to support both options on OS X. Now the problem: There will never be a single environment marker that you can stick into a wheel or sdist that says "we depend on the 'pyblas' package if the system is OS X (ok) and the user set this flag in this configuration file during the build process (wait wut)".
Similarly, I think someone was saying in a discussion recently that lxml supports being built either in a mode where it requires libxml be available on the system, or else it can be statically linked. Even if in the future we start having metadata that lets us describe dependencies on external system libraries, it's never going to be the case that we can put the *same* dependency metadata into wheels that are built using these two configurations.
This is precisely the very complex issue that's being discussed under the banner of extending compatibility tags in a way that gives a viable but practical way of distinguishing binary wheels. You can either see that as a discussion about "expanding compatibility tags" or "finding something better than compatibility tags". I don't have much of a stake in that discussion, as the current compatibility tags suit my needs fine, as a Windows user. The issues seem to be around Linux and possibly some of the complexities around binary dependencies for numerical libraries.
But the key point here is that I see the solution for this as being about distinguishing the "right" wheel for the target environment. It's not about anything that should reach back to sdists. Maybe a solution will involve a PEP 426 metadata enhancement that adds metadata that's only valid in binary distributions and not in source distributions, but that's fine by me. But it won't replace the existing dependency data, which *is* valid at the sdist level.
this would be good to discuss here: PEP 426: Define a JSON-LD context as part of the proposal #31 https://github.com/pypa/interoperability-peps/issues/31
At least as far as I can see - I'm willing to be enlightened. But your argument seems to be that sdist-level dependency information should be omitted because more detailed ABI compatibility data *might* be needed at the wheel level for some packages. I don't agree with that - we still need the existing metadata, even if more might be required in specialist cases.
if sys.platform: extras_require =
[1] If extras and environment markers don't cover the needs of scientific modules, we need some input into their design from the scientific community. But again, let's not throw away the work that's already done.
As far as sdists go, you can either cover 90% of the cases by building increasingly elaborate metadata formats, or you can cover 100% of the cases by keeping things simple...
everything in one (composed) 'pydist.jsonld'
But your argument seems to be that having metadata generated from package build code is "simpler". My strong opinion, based on what I've seen of the problems caused by having metadata in an "exectable setup.py", is that static metadata is far simpler.
static JSONLD is easily indexable with warehouse (Postgresql), elasticsearch, triplestores
I don't believe that the cost of changing to a new system can be justified *without* getting the benefits of static metadata.
external benefits of canonical package URIs + schema.org/Action
Paul _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On Fri, Oct 2, 2015 at 3:52 PM, Paul Moore <p.f.moore@gmail.com> wrote:
On 2 October 2015 at 23:15, Nathaniel Smith <njs@pobox.com> wrote: [...]
This situation is not common today for Python packages, but the only reason for that is that distutils makes it really hard to do -- it's extremely common in other package ecosystems, and the advantages are obvious. E.g., maybe numpy.distutils should be split into a separately installable package from numpy -- there's no technical reason that this should mean we are now forced to move the code for it into its own VCS repository.
I'm lost here, I'm afraid. Could you rephrase this in terms of the definitions from the PUG glossary? It sounds to me like the VCS repository is the project, which contains multiple distributions. I don't see how that's particularly hard. Each distribution just has its own subdirectory (and setup.py) in the VCS repository...
The problem is that projects tend to release the whole project together, rather than releasing individual subdirectories, and that usually you can't just rip those subdirectories out of the parent project and expect to build them on their own, because there's shared infrastructure for build, configuration, static libraries with utility code that get built once and then linked into each distribution... having a VCS checkout = one wheel rule blocks a lot of otherwise sensible project arrangements and forces awkward technical workarounds. But if you allow one VCS checkout to produce multiple wheels, then I can't see how you can avoid having one sdist produce multiple wheels.
(I assume that by "platform tags" you mean what PEP 426 calls "environment markers".)
Nope, I mean as defined in PEP 425. The platform tag is part of the compatibility tag. Maybe I meant the ABI tag, I don't really follow the distinctions.
Environment markers are really useful for extending the set of cases that can be handled by a single architecture-dependent wheel. And they're a good fit for that environment, given that wheels can't contain arbitrary code.
But they're certainly never going to be adequate to provide a single static description of every possible build configuration of every possible project. And installing an sdist already requires arbitrary code execution, so it doesn't make sense to try to build some elaborate system to avoid arbitrary code execution just for the dependency specification.
You're right that in a perfect future world numpy C API related dependencies would be handling by some separate ABI-tracking mechanism similar to how the CPython ABI is tracked, so here are some other examples of why environment markers are inadequate:
In the future it will almost certainly be possible to build numpy in two different configurations: one where it expects to find BLAS inside a wheel distributed for this purpose (e.g. this is necessary to provide high-quality windows wheels), and one where it expects to find BLAS installed on the system. This decision will *not* be tied to the platform, but be selectable at build time. E.g., on OS X there is a system-provided BLAS library, but it has some issues. So the default wheels on PyPI will probably act like windows and depend on a BLAS-package that we control, but there will also be individual users who prefer to build numpy in the configuration where it uses the system BLAS, so we definitely need to support both options on OS X. Now the problem: There will never be a single environment marker that you can stick into a wheel or sdist that says "we depend on the 'pyblas' package if the system is OS X (ok) and the user set this flag in this configuration file during the build process (wait wut)".
Similarly, I think someone was saying in a discussion recently that lxml supports being built either in a mode where it requires libxml be available on the system, or else it can be statically linked. Even if in the future we start having metadata that lets us describe dependencies on external system libraries, it's never going to be the case that we can put the *same* dependency metadata into wheels that are built using these two configurations.
This is precisely the very complex issue that's being discussed under the banner of extending compatibility tags in a way that gives a viable but practical way of distinguishing binary wheels. You can either see that as a discussion about "expanding compatibility tags" or "finding something better than compatibility tags". I don't have much of a stake in that discussion, as the current compatibility tags suit my needs fine, as a Windows user. The issues seem to be around Linux and possibly some of the complexities around binary dependencies for numerical libraries.
But the key point here is that I see the solution for this as being about distinguishing the "right" wheel for the target environment. It's not about anything that should reach back to sdists. Maybe a solution will involve a PEP 426 metadata enhancement that adds metadata that's only valid in binary distributions and not in source distributions, but that's fine by me. But it won't replace the existing dependency data, which *is* valid at the sdist level.
Okay. I think this is the key question for me: you want sdist's to contain rich static metadata, presumably because you want to do something with that metadata -- you say that not having it causes problems. The obvious thing that pip might want to use that metadata for is so that it can look at an sdist and know whether the wheel it builds from that sdist will be useful in solving its current dependency goals. But to answer this question, you actually do need to know what the compatibility tag will be. So: what problems does it solve for pip to get access to static information about some, *but not all* of the eventual wheel's metadata? (It's also the case that in the numpy example I gave above, it isn't just the compatibility tag that can vary between wheels built from the same sdist, it's really truly the actual runtime dependency metadata that varies. But even if we ignore that, I think my question still stands.) -n -- Nathaniel J. Smith -- http://vorpus.org

On October 2, 2015 at 4:20:00 PM, Nathaniel Smith (njs@pobox.com) wrote:
On Fri, Oct 2, 2015 at 4:58 AM, Donald Stufft wrote:
One of the problems with the current system, is that we have no mechanism by which to determine dependencies of a source distribution without downloading the file and executing some potentially untrusted code. This makes dependency resolution harder and much much slower than if we could read that information statically from a source distribution. This PEP doesn't offer anything in the way of solving this problem.
What are the "dependencies of a source distribution"? Do you mean the runtime dependencies of the wheels that will be built from a source distribution?
If you need that metadata to be statically in the sdist, then you might as well give up now because it's simply impossible.
I don’t believe this is impossible.
As the very simplest example, every package that uses the numpy C API gets a runtime dependency on "numpy >= [whatever version happened to be installed on the *build* machine]”.
A quick, off the cuff idea here is to allow additional ABI declarations and stop trying to use the same system for API and ABI. A source distribution can't have ABI dependencies, only wheels can and an installation has to be valid for both API and any relevant ABI requirements.
There are plenty of more complex examples too (e.g. ones that involve build/configure-time decisions about whether to rely on particular system libraries, or build/configure-time decisions about whether particular packages should even be built).
I don't think build/configure-time decisions are great ideas as it's near impossible to actually depend on them. For example, take Pillow, Pillow will conditionally compile against libraries that enable it to much around with PNGs. However, if I *need* Pillow with PNG support, I don't have any mechanism to declare that. If instead, builds were *not* conditional and Pillow instead split it's PNG capabilities out into it's own package called say, Pillow-PNG which also did not conditionally compile against anything, but unconditionally did, then we could add in something like having Pillow declare a "weak" dependency on Pillow-PNG where we attempt to get it by default if possible, but we will skip installing it if we can't locate/build it. If you combine this with Extras, you could then easily make it so that people can depend on particular conditional features by doing something like ``Pillow[PNG]`` in their dependency metadata.
For comparison, here's the Debian source package metadata: https://www.debian.org/doc/debian-policy/ch-controlfields.html#s-debiansourc... Note that the only mandatory fields are format version / package name / package version / maintainer / checksums. The closest they come to making promises about the built packages are the Package-List and Binary fields which provide a optional hint about what binary packages will be built, and are allowed to contain lies (e.g. they explicitly don't guarantee that all the binary packages named will actually be produced on every architecture). The only kind of dependencies that a source package can declare are build-depends.
Debian doesn't really have "source packages" like we do, but inside of the debian/ directory is the control file which lists all of the dependency information (or explicitly lists a placeholder where something can't be statically declared).
To a similar tune, this PEP also doesn't make it possible to really get at any other metadata without executing software. This makes it pratically impossible to safely inspect an unknown or untrusted package to determine what it is and to get information about it. Right now PyPI relies on the uploading tool to send that information alongside of the file it is uploading, but honestly what it should be doing is extracting that information from within the file. This is sort of possible right now since distutils and setuptools both create a static metadata file within the source distribution, but we don't rely on that within PyPI because that information may or may not be accurate and may or may not exist. However the twine uploading tool *does* rely on that, and this PEP would break the ability for twine to upload a package without executing arbitrary code.
Okay, what metadata do you need? We certainly could put name / version kind of stuff in there. We left it out because we weren't sure what was necessary and it's easy to add later, but anything that's needed by twine fits neatly into the existing text saying that we should "include extra metadata in source distributions if it helps solve specific problems that are unique to distribution" -- twine uploads definitely count.
Everything that isn't specific to a built wheel. Look at the previously accepted metadata specs as well as PEP 426. If you're not including a field that was included in one of those, there should be a rationale for why that field is no longer being included.
Overall, I don't think that this really solves most of the foundational problems with the current format. Largely it feels that what it achieves is shuffling around some logic (you need to create a hook that you reference from within a .cfg file instead of creating a setuptools extension or so) but
numpy.distutils is the biggest distutils/setuptools extension around, and everyone involved in maintaining it wants to kill it with fire :-). That's a problem…
Well, it's not really what I call setuptools extension because it doesn't use the extension points in setuptools to do it's work. It expects you to just ``import numpy.distutils`` at the top of your ``setup.py`` and use that, which means that it breaks things like pip because we don't have a way to know that we need to install numpy first.
without fixing most of the problems. The largest benefit I see to switching to this right now is that it would enable us to have build time dependencies that were controlled by pip rather than installed implicitly via the execution of the setup.py.
Yes, this problem means that literally every numerical python package currently has a broken setup.py.
Because numpy.distutils wasn't written to plug into setuptools. If it had been they wouldn't be.
That doesn't feel like a big enough benefit to me to do a mass shakeup of what we recommend and tell people to do. Having people adjust and change and do something new requires effort, and we need something to justify that effort to other people and I don't think that this PEP has something we can really use to justify that effort.
The end-user adjustment is teaching people to switch to always using pip to install packages -- this seems like something we will certainly do sooner or later, so we might as well get started.
And it's already actually the right thing to do -- if you use 'setup.py install' then you get a timebomb in your venv where later upgrades may leave you with a broken package :-(. (This is orthogonal to the actual PEP.) In the long run, the idea that every package has to contain code that knows how to implement installation in very possible configuration (--user? --single-version-externally-managed?) is clearly broken, and teaching people to use 'pip install' is obviously the only sensible alternative.
Sorry, ambigous end-user in my original statement. I don't mean the end-end-users (e.g. the people executing ``pip install``), I mean the packagers. I'm going to re-read the originall proposal and try to point out more actionable feedback shortly. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Fri, Oct 2, 2015 at 2:30 PM, Donald Stufft <donald@stufft.io> wrote:
On October 2, 2015 at 4:20:00 PM, Nathaniel Smith (njs@pobox.com) wrote: [...]
There are plenty of more complex examples too (e.g. ones that involve build/configure-time decisions about whether to rely on particular system libraries, or build/configure-time decisions about whether particular packages should even be built).
I don't think build/configure-time decisions are great ideas as it's near impossible to actually depend on them. For example, take Pillow, Pillow will conditionally compile against libraries that enable it to much around with PNGs. However, if I *need* Pillow with PNG support, I don't have any mechanism to declare that. If instead, builds were *not* conditional and Pillow instead split it's PNG capabilities out into it's own package called say, Pillow-PNG which also did not conditionally compile against anything, but unconditionally did, then we could add in something like having Pillow declare a "weak" dependency on Pillow-PNG where we attempt to get it by default if possible, but we will skip installing it if we can't locate/build it. If you combine this with Extras, you could then easily make it so that people can depend on particular conditional features by doing something like ``Pillow[PNG]`` in their dependency metadata.
While I agree with the sentiment here, I don't think we can simply unconditionally rule out build/configure-time decisions. I gave an example in the other subthread of a numpy wheel, which depending on build configuration might depend implicitly on the system BLAS, might have BLAS statically linked, or might depend explicitly on a "BLAS wheel". (And note that when configured to use a "BLAS wheel" then this would actually be a build-dependency, not just a runtime-dependency.) As far as downstream users are concerned, all of these numpy wheels export exactly the same API -- how numpy finds BLAS is just an internal detail. So in this case the problems your paragraph above is worrying about just don't arise. And numpy absolutely will need the option to be built in these different ways.
For comparison, here's the Debian source package metadata: https://www.debian.org/doc/debian-policy/ch-controlfields.html#s-debiansourc... Note that the only mandatory fields are format version / package name / package version / maintainer / checksums. The closest they come to making promises about the built packages are the Package-List and Binary fields which provide a optional hint about what binary packages will be built, and are allowed to contain lies (e.g. they explicitly don't guarantee that all the binary packages named will actually be produced on every architecture). The only kind of dependencies that a source package can declare are build-depends.
Debian doesn't really have "source packages" like we do, but inside of the debian/ directory is the control file which lists all of the dependency information (or explicitly lists a placeholder where something can't be statically declared).
Someone who is more of an expert on debian packaging can correct me if I'm wrong, but I'm 99% sure that this is incorrect, and in an important way. The actual interface between a build tool like dpkg-buildpackage and a source package is: (a) the .dsc file, with the required fields I mentioned, (b) the debian/rules file, which is an opaque executable that can be called to perform standard operations like "build" and "clean" -- basically the moral equivalent of the hooks in our sdist proposal. The debian/control file does have a conventional format, but this is just convention -- most-or-all debian/control scripts all use the same set of tools to work with this file, and expect it to be in the same place. But if, say, Debian decides that they need a new kind of placeholder to handle a situation that hasn't arisen before, then there's no need to change the definition of a source package: you just add support for the new placeholder to the tools that work with this file, and then packages that want to make use of the new placeholder just have to Build-Depend on the latest version of those tools. This is the idea motivating the sdist PEP's design: you can't specify all of a source distribution's metadata statically, and then given that you'll be specifying at least part of the metadata dynamically, you want to do it in a way that you can change without having to do a PEP and update pip etc.
To a similar tune, this PEP also doesn't make it possible to really get at any other metadata without executing software. This makes it pratically impossible to safely inspect an unknown or untrusted package to determine what it is and to get information about it. Right now PyPI relies on the uploading tool to send that information alongside of the file it is uploading, but honestly what it should be doing is extracting that information from within the file. This is sort of possible right now since distutils and setuptools both create a static metadata file within the source distribution, but we don't rely on that within PyPI because that information may or may not be accurate and may or may not exist. However the twine uploading tool *does* rely on that, and this PEP would break the ability for twine to upload a package without executing arbitrary code.
Okay, what metadata do you need? We certainly could put name / version kind of stuff in there. We left it out because we weren't sure what was necessary and it's easy to add later, but anything that's needed by twine fits neatly into the existing text saying that we should "include extra metadata in source distributions if it helps solve specific problems that are unique to distribution" -- twine uploads definitely count.
Everything that isn't specific to a built wheel. Look at the previously accepted metadata specs as well as PEP 426. If you're not including a field that was included in one of those, there should be a rationale for why that field is no longer being included.
The default rationale is just "let's keep our options open" -- it's much easier to add than to subtract later. In particular I hesitate a little bit to just drop in everything from PEP 426 and friends, because previous specs haven't really thought through the distinction between sdists and wheels -- e.g. if an sdist generates two wheels, they probably won't have the same name, description, trove classifiers, etc. They may not even have the same version (e.g. if two different tools with existing numbering schemes get merged into a single distribution -- esp. if one of them needs an epoch marker). So it may well make sense to have an "sdist description field", but it's not immediately obvious that it's identical to a wheel's description field. I mean, in practice it's probably no big deal -- a description is some text for human readers, whatever, it's useful and it'll be fine. But given that we can trivially add more fields to the pypackage.cfg file later, and that current sdists don't have any of this metadata, I just don't want to risk blocking progress on one axis (enabling better build systems) while waiting to achieve maximal progress on another mostly-orthogonal axis (having nice metadata in sdists for tools like twine to take advantage of). Bottom line: If after further discussion we reach the point where the only thing blocking this is the addition of name and description and trove classifier fields, then of course we'll just add those to the PEP :-). -n -- Nathaniel J. Smith -- http://vorpus.org

On 3 October 2015 at 02:03, Nathaniel Smith <njs@pobox.com> wrote:
In particular I hesitate a little bit to just drop in everything from PEP 426 and friends, because previous specs haven't really thought through the distinction between sdists and wheels -- e.g. if an sdist generates two wheels, they probably won't have the same name, description, trove classifiers, etc. They may not even have the same version (e.g. if two different tools with existing numbering schemes get merged into a single distribution -- esp. if one of them needs an epoch marker). So it may well make sense to have an "sdist description field", but it's not immediately obvious that it's identical to a wheel's description field.
I can only assume you're using the term sdist differently from the way it is normally used (as in the PUG glossary), because for me the idea that a sdist generates multiple wheels makes no sense. If I do "pip wheel <some_sdist>" I expect to get a single wheel that can be installed to give the same results as "pip install <some_sdist>" would, The wheel install will work on my build machine, and on any machine where the wheel's compatibility metadata (the compatibility tags, currently) says it's valid. The above behaviour is key to pip's mechanism for auto-generating and caching wheels when doing an install, so I don't see how it could easily be discarded. If what you're calling a "sdist" doesn't work like this, maybe you should invent a new term, so that people don't get confused? If it *does* work like this, I don't see what you mean by a sdist generating two wheels. Paul

On Sat, 3 Oct 2015 at 04:51 Paul Moore <p.f.moore@gmail.com> wrote:
On 3 October 2015 at 02:03, Nathaniel Smith <njs@pobox.com> wrote:
In particular I hesitate a little bit to just drop in everything from PEP 426 and friends, because previous specs haven't really thought through the distinction between sdists and wheels -- e.g. if an sdist generates two wheels, they probably won't have the same name, description, trove classifiers, etc. They may not even have the same version (e.g. if two different tools with existing numbering schemes get merged into a single distribution -- esp. if one of them needs an epoch marker). So it may well make sense to have an "sdist description field", but it's not immediately obvious that it's identical to a wheel's description field.
I can only assume you're using the term sdist differently from the way it is normally used (as in the PUG glossary), because for me the idea that a sdist generates multiple wheels makes no sense.
If I do "pip wheel <some_sdist>" I expect to get a single wheel that can be installed to give the same results as "pip install <some_sdist>" would, The wheel install will work on my build machine, and on any machine where the wheel's compatibility metadata (the compatibility tags, currently) says it's valid.
The above behaviour is key to pip's mechanism for auto-generating and caching wheels when doing an install, so I don't see how it could easily be discarded.
If what you're calling a "sdist" doesn't work like this, maybe you should invent a new term, so that people don't get confused? If it *does* work like this, I don't see what you mean by a sdist generating two wheels.
From this perspective I don't see Nathaniel's desire for installing from a VCS as something other than requiring a step to bundle up the source code into an sdist/source wheel that pip knows how to handle. But I think what Paul and Donald are saying is pip doesn't want to have anything to do with
I think sdist is getting a bit overloaded in this discussion. From my understanding of what Paul and Donald are after, the process should be: VCS -> sdist/source wheel -> binary wheel. Here, "VCS" is literally a git/hg clone of some source tree. A "sdist/source wheel" is carefully constructed zip file that contains all the source code from the VCS necessary to build a binary wheel for a project as long as the proper dependencies exist (e.g., proper version of NumPy, BLAS in one of its various forms, etc.). The binary wheel is obviously the final artifact that can just be flat-out loaded by Python without any work. So Paul doesn't see sdist/source wheels producing more than one binary wheel because in its own way an sdist/source wheel is a "compiled" artifact of select source code whose only purpose is to generate a binary wheel for a single project. So while a VCS clone might have multiple subprojects, each project should generate a single sdist/source wheel. Now this isn't to say that an sdist/source wheel won't generate different *versions* of a binary wheel based on whether e.g. BLAS is system-linked, statically linked, or dynamically loaded from a Linux binary wheel. But the key point is that the sdist/source wheel still **only** makes one kind of project. the VCS -> sdist/source wheel step and that is entirely up to the project to manage through whatever tooling they choose. I also view the sdist//source wheel as almost a mini-VCS checkout since it is just a controlled view of the source code with a bunch of helpful, static metadata for pip to use to execute whatever build steps are needed to get to a binary wheel. Or I'm totally wrong. =) But for me I actually like the idea of an sdist/source wheel being explained as "a blob of source in a structured way that can produce a binary wheel for the package" and a binary wheel as "a blob of bytes for a package that Python can import" and I'm totally fine having C extensions not just shuttle around a blind zip but a structured zip in the form of an sdist/source wheel.

On October 3, 2015 at 1:31:48 PM, Brett Cannon (brett@python.org) wrote:
On Sat, 3 Oct 2015 at 04:51 Paul Moore wrote:
On 3 October 2015 at 02:03, Nathaniel Smith wrote:
In particular I hesitate a little bit to just drop in everything from PEP 426 and friends, because previous specs haven't really thought through the distinction between sdists and wheels -- e.g. if an sdist generates two wheels, they probably won't have the same name, description, trove classifiers, etc. They may not even have the same version (e.g. if two different tools with existing numbering schemes get merged into a single distribution -- esp. if one of them needs an epoch marker). So it may well make sense to have an "sdist description field", but it's not immediately obvious that it's identical to a wheel's description field.
I can only assume you're using the term sdist differently from the way it is normally used (as in the PUG glossary), because for me the idea that a sdist generates multiple wheels makes no sense.
If I do "pip wheel " I expect to get a single wheel that can be installed to give the same results as "pip install " would, The wheel install will work on my build machine, and on any machine where the wheel's compatibility metadata (the compatibility tags, currently) says it's valid.
The above behaviour is key to pip's mechanism for auto-generating and caching wheels when doing an install, so I don't see how it could easily be discarded.
If what you're calling a "sdist" doesn't work like this, maybe you should invent a new term, so that people don't get confused? If it *does* work like this, I don't see what you mean by a sdist generating two wheels.
I think sdist is getting a bit overloaded in this discussion. From my understanding of what Paul and Donald are after, the process should be:
VCS -> sdist/source wheel -> binary wheel.
Here, "VCS" is literally a git/hg clone of some source tree. A "sdist/source wheel" is carefully constructed zip file that contains all the source code from the VCS necessary to build a binary wheel for a project as long as the proper dependencies exist (e.g., proper version of NumPy, BLAS in one of its various forms, etc.). The binary wheel is obviously the final artifact that can just be flat-out loaded by Python without any work.
So Paul doesn't see sdist/source wheels producing more than one binary wheel because in its own way an sdist/source wheel is a "compiled" artifact of select source code whose only purpose is to generate a binary wheel for a single project. So while a VCS clone might have multiple subprojects, each project should generate a single sdist/source wheel.
Now this isn't to say that an sdist/source wheel won't generate different *versions* of a binary wheel based on whether e.g. BLAS is system-linked, statically linked, or dynamically loaded from a Linux binary wheel. But the key point is that the sdist/source wheel still **only** makes one kind of project.
From this perspective I don't see Nathaniel's desire for installing from a VCS as something other than requiring a step to bundle up the source code into an sdist/source wheel that pip knows how to handle. But I think what Paul and Donald are saying is pip doesn't want to have anything to do with the VCS -> sdist/source wheel step and that is entirely up to the project to manage through whatever tooling they choose. I also view the sdist//source wheel as almost a mini-VCS checkout since it is just a controlled view of the source code with a bunch of helpful, static metadata for pip to use to execute whatever build steps are needed to get to a binary wheel.
Or I'm totally wrong. =) But for me I actually like the idea of an sdist/source wheel being explained as "a blob of source in a structured way that can produce a binary wheel for the package" and a binary wheel as "a blob of bytes for a package that Python can import" and I'm totally fine having C extensions not just shuttle around a blind zip but a structured zip in the form of an sdist/source wheel.
This is basically accurate, the only thing is that I think that we do need an answer for handling the VCS side of things, but I think that we should defer it until after this PEP. Right now I think the PEP is trying to tackle two different problems at once because on the surface they look similar but IMO a VCS/arbitrary directory and a sdist are two entirely different things that are only similar on the surface. So Yes the path would be: VCS -> sdist (aka Source Wheel or w/e) -> Wheel -> Install \ \-> "In Place" install (aka Development Install) Right now, we have the Wheel -> Install part, and I'd like for this PEP to focus on the sdist -> Wheel part, and then a future PEP focus on the VCS part. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Oct 3, 2015 12:36 PM, "Donald Stufft" <donald@stufft.io> wrote:
On October 3, 2015 at 1:31:48 PM, Brett Cannon (brett@python.org) wrote:
On Sat, 3 Oct 2015 at 04:51 Paul Moore wrote:
On 3 October 2015 at 02:03, Nathaniel Smith wrote:
In particular I hesitate a little bit to just drop in everything
PEP 426 and friends, because previous specs haven't really thought through the distinction between sdists and wheels -- e.g. if an sdist generates two wheels, they probably won't have the same name, description, trove classifiers, etc. They may not even have the same version (e.g. if two different tools with existing numbering schemes get merged into a single distribution -- esp. if one of them needs an epoch marker). So it may well make sense to have an "sdist description field", but it's not immediately obvious that it's identical to a wheel's description field.
I can only assume you're using the term sdist differently from the way it is normally used (as in the PUG glossary), because for me the idea that a sdist generates multiple wheels makes no sense.
If I do "pip wheel " I expect to get a single wheel that can be installed to give the same results as "pip install " would, The wheel install will work on my build machine, and on any machine where the wheel's compatibility metadata (the compatibility tags, currently) says it's valid.
The above behaviour is key to pip's mechanism for auto-generating and caching wheels when doing an install, so I don't see how it could easily be discarded.
If what you're calling a "sdist" doesn't work like this, maybe you should invent a new term, so that people don't get confused? If it *does* work like this, I don't see what you mean by a sdist generating two wheels.
I think sdist is getting a bit overloaded in this discussion. From my understanding of what Paul and Donald are after, the process should be:
VCS -> sdist/source wheel -> binary wheel.
Here, "VCS" is literally a git/hg clone of some source tree. A "sdist/source wheel" is carefully constructed zip file that contains all the source code from the VCS necessary to build a binary wheel for a project as long as the proper dependencies exist (e.g., proper version of NumPy, BLAS in one of its various forms, etc.). The binary wheel is obviously the final artifact that can just be flat-out loaded by Python without any work.
So Paul doesn't see sdist/source wheels producing more than one binary wheel because in its own way an sdist/source wheel is a "compiled" artifact of select source code whose only purpose is to generate a binary wheel for a single project. So while a VCS clone might have multiple subprojects, each project should generate a single sdist/source wheel.
Now this isn't to say that an sdist/source wheel won't generate different *versions* of a binary wheel based on whether e.g. BLAS is system-linked, statically linked, or dynamically loaded from a Linux binary wheel. But
key point is that the sdist/source wheel still **only** makes one kind of project.
From this perspective I don't see Nathaniel's desire for installing from a VCS as something other than requiring a step to bundle up the source code into an sdist/source wheel that pip knows how to handle. But I think what Paul and Donald are saying is pip doesn't want to have anything to do with the VCS -> sdist/source wheel step and that is entirely up to the
to manage through whatever tooling they choose. I also view the sdist//source wheel as almost a mini-VCS checkout since it is just a controlled view of the source code with a bunch of helpful, static
for pip to use to execute whatever build steps are needed to get to a binary wheel.
Or I'm totally wrong. =) But for me I actually like the idea of an sdist/source wheel being explained as "a blob of source in a structured way that can produce a binary wheel for the package" and a binary wheel as "a blob of bytes for a package that Python can import" and I'm totally fine having C extensions not just shuttle around a blind zip but a structured zip in the form of an sdist/source wheel.
This is basically accurate, the only thing is that I think that we do need an answer for handling the VCS side of things, but I think that we should defer it until after this PEP. Right now I think the PEP is trying to tackle two different problems at once because on the surface they look similar but IMO a VCS/arbitrary directory and a sdist are two entirely different things
are only similar on the surface.
So Yes the path would be:
VCS -> sdist (aka Source Wheel or w/e) -> Wheel -> Install \ \-> "In Place" install (aka Development Install)
Right now, we have the Wheel -> Install part, and I'd like for this PEP to focus on the sdist -> Wheel part, and then a future PEP focus on the VCS
from the project metadata that part. thanks! is there a diagram of this somewhere? Is this basically a FSM with URIs for each resource and transformation, where more specific attributes are added (e.g. to pydist.jsonld)? [PEP 426 JSONLD] differences between VCS and sdist: * MANIFEST.in * setup.py * egg-info metadata PEP 426 JSONLD: https://github.com/pypa/interoperability-peps/issues/31
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372
DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On October 2, 2015 at 12:54:03 AM, Nathaniel Smith (njs@pobox.com) wrote:
Distutils delenda est.
I think that you should drop (from this PEP) the handling of a VCS/arbitrary directories and focus solely on creating a format for source distributions. A source distribution can (and should) be fairly strict and well defined exactly where all of the files go, what files exist and don't exist, and things of that nature (more on this later). In addition, the metadata files should be optimizes for machines to read and parse them first, for humans to read them second, and humans to write them not at all. Given the Python standard library, your metadata inside of the source distribution should (probably) be JSON. This is another reason why this should focus on the source distribution as well, because the file you put into VCS needs to be able to be written by humans. Metadata 2.0 should probably get finished before or as part of a new sdist format happening. I fell off towards the end of that and it appears that it got a lot more complex since I last looked at it. It probably needs more work. The filename should be strictly defined similarly to what Wheels have, probably something like {name}-{version}.{ext}, and like wheel it should mandate that any - characters inside of any of the fields should be escaped to a _ so that we can unambigiously parse it. It should not support arbitrary filenames because they are (almost never) actually sdists. In another email you mentioned something like the tarballs that github produces, but those are not source distributions, they are vcs exports and shouldn't be covered by this PEP. I don't believe that Python should develop anything like the Debian ability to have a single source "package" create multiple binary packages. The metadata of the Wheel *must* strictly match the metadata of the sdist (except for things that are Wheel specific). This includes things like name, version, etc. Trying to go down this path I think will make things a lot more complicated since we have a segmented archive where people have to claim particular names, otherwise how do you prevent me from registering the name "foobar" on PyPI and saying it produces the "Django" wheel? Since I think this should only deal with source distributions, then the primary thing we need is an operation that will take an unpacked source distribution that is currently sitting on the filesystem and turn it into a wheel located in a specific location. The layout for a source distribution should be specified, I think something like: . ├── meta │ ├── DESCRIPTION.rst │ ├── FORMAT-VERSION │ ├── LICENSE.txt │ └── METADATA.json └── src ├── my-cool-build-tool.cfg └── mypackage └── __init__.py I don't particularly care about the exact names, but this layout gives us two top level directories (and only two), one is a place where all of the source distribution metadata goes, and one is a src directory where all of the files for the project should go, including any relevant configuration for the build tool in use by the project. Having two directories like this eliminates the need to worry about naming collisions between the metadata files and the project itself. We should probably give this a new name instead of "sdist" and give it a dedicated extension. Perhaps we should call them "source wheels" and have the extension be something like .swhl or .src.whl. This means we don't need to worry about making the same artifact compatible with both the legacy toolchain and a toolchain that supports "source wheels". We should also probably specify a particular container format to be used for a .whl/.src.whl. It probably makes sense to simply use zip since that is what wheels use and it supports different compression algorithms internally. We probably want to at least suggest limiting compression algorithms used to Deflate and None, if not mandate that one of those two are used. We should include absolutely as much metadata as part of the static metadata inside the sdist as we can. I don't think there is any case to be made for things like name, version, summary, description, classifiers, license, keywords, contact information (author/maintainers), project URLs, etc are Wheel specific. I think there are other things which are arguably able to be specified in the sdist, but I'd need to fiddle with it to be sure. Basically any metadata that isn't included as static information will not be able to be displayed on PyPI. The metada should directly include the specifiers inside of it and shouldn't propagate the meme that pip's requirements.txt format is anything but a way to recreate a specific environment with pip. Build requirements cannot be dynamic. We don't need a "build in place" hook, you don't build source distributions in place, you build wheels with them. Another PEP that handles a VCS/non sdist directory can add things like building in place. I don't think there's ever going to be a world where pip depends on virtualenv or pyvenv. The PEP shouldn't prescribe how the tool installs the build deps or executes the build hook, though I think it should mandate that it is called with a compatible Python to the Wheel that is desired to be produced. Cross compiling can be handled later. Possibly we might want to make the hooks calling an executable instead of doing something that involves importing the hook and calling it. This would make it easier to prevent the build tool from monkeypatching the installation tool *and* make it easier for downstream redistributors to use it. If you're interested, I'm happy to directly collaborate on this PEP if it's in a github repository somewhere or something. There's an interoptability repo you can use or your own or whatever. Or you can tell me to go pound sand too and I'll just comment on posts to the ML. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

If you're interested, I'm happy to directly collaborate on this PEP if it's in a github repository somewhere or something. There's an interoptability repo
btw, the repo he's talking about is here: https://github.com/pypa/interoperability-peps it has a convention about where to add pep ideas that have no number yet

Hi Donald, Thanks for taking the time to make such detailed comments! Thoughts below. On Fri, Oct 2, 2015 at 4:04 PM, Donald Stufft <donald@stufft.io> wrote:
On October 2, 2015 at 12:54:03 AM, Nathaniel Smith (njs@pobox.com) wrote:
Distutils delenda est.
I think that you should drop (from this PEP) the handling of a VCS/arbitrary directories and focus solely on creating a format for source distributions. A source distribution can (and should) be fairly strict and well defined exactly where all of the files go, what files exist and don't exist, and things of that nature (more on this later).
Hmm. Okay, I think this really helps clarify our key point of difference! For me, an important requirement is that there continue to be a single standard command that end-users can use to install a VCS checkout. This is a really important usability property -- everyone knows "setup.py install". Unfortunately we can't keep "setup.py install" given its entanglement with distutils and the desire to split building and installation, so the obvious answer is that this should become 'pip install <directory>', and from that everything else follows. Having a standard way to install from a VCS checkout is also useful for things like requirements files... and in fact it's required by our current standards. PEP 440 has this as an example of a valid dependency specification: pip @ git+https://github.com/pypa/pip.git@7921be1537eac1e97bc40179a57f0349c2aee67d So I'm extremely reluctant to give up on standardizing how to handle VCS checkouts. And if we're going to have a standard for that, then would sure be nice if we could share the work between this standard and the one for sdists, given how similar they are. [...]
I don't believe that Python should develop anything like the Debian ability to have a single source "package" create multiple binary packages. The metadata of the Wheel *must* strictly match the metadata of the sdist (except for things that are Wheel specific). This includes things like name, version, etc. Trying to go down this path I think will make things a lot more complicated since we have a segmented archive where people have to claim particular names, otherwise how do you prevent me from registering the name "foobar" on PyPI and saying it produces the "Django" wheel?
What prevents it in the current draft is that there's no way for foobar to say any such thing :-). If you ask for Django, then the only sdist it will look at is the one in the Django segment. This is an intentionally limited solution, based on the intuition that multiple wheels from a single sdist will tend to be a relatively rare case, when they do occur then there will generally be one "main" wheel that people will want to depend on, and that people should be uploading wheels anyway rather than relying on sdists. (Part of the intuition for the last part is that we also have a not-terribly-secret-conspiracy here for writing a PEP to get Linux wheels onto PyPI and at least achieve feature parity with Windows / OS X. Obviously there will always be weird platforms -- iOS and FreeBSD and Linux-without-glibc and ... -- but this should dramatically reduce the frequency with which people need sdist dependencies.) If that proves inadequate, then the obvious extension would be to add some metadata to the sdist similar to Debian's, where an sdist has a list of all the wheels that it (might) produce when built, PyPI would grow an API by which pip-or-whoever could query for all sdists that claim to be able to produce wheel X, and at the same time PyPI would start enforcing the rule that if you want to upload an sdist that claims to produce wheel X then you have to own the name X. (After all, you need to own that name anyway so you can upload the wheels.) Or alternatively people could just split up their packages, like would be required by your proposal anyway :-). So I sorta doubt it will be a problem in practice, but even if becomes one then it won't be hard to fix. (And to be clear, the multiple-wheels-from-one-sdist thing is not a primary goal of this proposal -- the main reason we put it in is that once you've given up on having static wheel metadata inside the sdist then supporting multiple-wheels-from-one-sdist is trivial, so you might as well do it, esp. since it's a case that does seem to come up with some regularity in real life and you don't want to make people fight with their tools when it's unnecessary.)
Since I think this should only deal with source distributions, then the primary thing we need is an operation that will take an unpacked source distribution that is currently sitting on the filesystem and turn it into a wheel located in a specific location.
The layout for a source distribution should be specified, I think something like:
. ├── meta │ ├── DESCRIPTION.rst │ ├── FORMAT-VERSION │ ├── LICENSE.txt │ └── METADATA.json └── src ├── my-cool-build-tool.cfg └── mypackage └── __init__.py
I don't particularly care about the exact names, but this layout gives us two top level directories (and only two), one is a place where all of the source distribution metadata goes, and one is a src directory where all of the files for the project should go, including any relevant configuration for the build tool in use by the project. Having two directories like this eliminates the need to worry about naming collisions between the metadata files and the project itself.
We should probably give this a new name instead of "sdist" and give it a dedicated extension. Perhaps we should call them "source wheels" and have the extension be something like .swhl or .src.whl. This means we don't need to worry about making the same artifact compatible with both the legacy toolchain and a toolchain that supports "source wheels".
We should also probably specify a particular container format to be used for a .whl/.src.whl. It probably makes sense to simply use zip since that is what wheels use and it supports different compression algorithms internally. We probably want to at least suggest limiting compression algorithms used to Deflate and None, if not mandate that one of those two are used.
We should include absolutely as much metadata as part of the static metadata inside the sdist as we can. I don't think there is any case to be made for things like name, version, summary, description, classifiers, license, keywords, contact information (author/maintainers), project URLs, etc are Wheel specific. I think there are other things which are arguably able to be specified in the sdist, but I'd need to fiddle with it to be sure. Basically any metadata that isn't included as static information will not be able to be displayed on PyPI.
I feel like this idea of "source wheels" makes some sense if we want something that looks like a wheel, but without the ABI compatibility issues of wheels. I'm uncertain how well it can be made to work in practice, or how urgent it is once we have a 95% solution in place for linux wheels, but it's certainly an interesting idea. To me it feels rather different from a traditional sdist, and obviously there's still the problem of having a standard way to build from a VCS checkout. It might even make sense to have standard methods to go: VCS checkout -> sdist -> (wheels and/or source wheels) ?
The metada should directly include the specifiers inside of it and shouldn't propagate the meme that pip's requirements.txt format is anything but a way to recreate a specific environment with pip.
Yeah, there's a big question mark next to the requirements.txt stuff in the draft PEP, because something more standard and structured would certainly be nice. But requirements.txt is wildly popular, and for a good reason -- it provides a simple terse syntax that does what people want. (By comparison, the PEP 426 JSON syntax for requirements with extras and environment specifiers is extremely cumbersome yet less featureful.) And semantically, what we want here is a way to say "to build *this* I need an environment that looks like *this*", which is pretty close to what requirements.txt is actually designed for. So I dunno -- instead of fighting the meme maybe we should embrace it :-). But obviously this is a tangent to the main questions. [...]
I don't think there's ever going to be a world where pip depends on virtualenv or pyvenv.
Huh, really? Can you elaborate on why not? The standard doesn't have to require the use of clean build environments (I was thinking of the text in the standard as applying with the "as if rule" -- a valid sdist is one that can be built the way described, if you have some way that will work to build such sdists then your way is valid too). But using clean environments by default is really the only way that we're going to get a world where most packages have accurate build requirements. -n -- Nathaniel J. Smith -- http://vorpus.org

On October 2, 2015 at 10:27:36 PM, Nathaniel Smith (njs@pobox.com) wrote:
So I'm extremely reluctant to give up on standardizing how to handle VCS checkouts. And if we're going to have a standard for that, then would sure be nice if we could share the work between this standard and the one for sdists, given how similar they are.
Mentioned in another email, but I don't think we should give up on the VCS handling, I just think we should defer it to another PEP. I think this PEP is suffering from trying to use the same mechanism for VCS and sdists when they are different things and have different considerations. The Python packaging toolchain has a long history of suffering because it tried to reuse the same thing across multiple "phases" of the packaging life cycle. The problem with that is different phases have different needs, and when you're trying to use the same thing across multiple you suffer from something that is a sort of "Jack of All Trades, Master of None" kind of thing.
What prevents it in the current draft is that there's no way for foobar to say any such thing :-). If you ask for Django, then the only sdist it will look at is the one in the Django segment. This is an intentionally limited solution, based on the intuition that multiple wheels from a single sdist will tend to be a relatively rare case, when they do occur then there will generally be one "main" wheel that people will want to depend on, and that people should be uploading wheels anyway rather than relying on sdists.
There *must* be a 1:1 mapping in name/version between sdist and wheels. This assumption is baked into basically every layer of the toolchain.
I feel like this idea of "source wheels" makes some sense if we want something that looks like a wheel, but without the ABI compatibility issues of wheels. I'm uncertain how well it can be made to work in practice, or how urgent it is once we have a 95% solution in place for linux wheels, but it's certainly an interesting idea. To me it feels rather different from a traditional sdist, and obviously there's still the problem of having a standard way to build from a VCS checkout.
I feel like you have some sort of "a sdist is jsut a tarball of a VCS" mentality and I don't think that idea of a sdist is generally useful.
I don't think there's ever going to be a world where pip depends on virtualenv or pyvenv.
Huh, really? Can you elaborate on why not? The standard doesn't have to require the use of clean build environments (I was thinking of the text in the standard as applying with the "as if rule" -- a valid sdist is one that can be built the way described, if you have some way that will work to build such sdists then your way is valid too). But using clean environments by default is really the only way that we're going to get a world where most packages have accurate build requirements.
We'll most likely use a clean environment by explicitly emptying out the sys.path or something similar. We won't depend on virtual environments because it is a layering violation. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On October 2, 2015 at 10:27:36 PM, Nathaniel Smith (njs@pobox.com) wrote:
So I'm extremely reluctant to give up on standardizing how to handle VCS checkouts. And if we're going to have a standard for that, then would sure be nice if we could share the work between this standard and the one for sdists, given how similar they are.
Mentioned in another email, but I don't think we should give up on the VCS handling, I just think we should defer it to another PEP. I think this PEP is suffering from trying to use the same mechanism for VCS and sdists when
are different things and have different considerations. The Python
On Sat, Oct 3, 2015 at 10:50 AM, Donald Stufft <donald@stufft.io> wrote: they packaging
toolchain has a long history of suffering because it tried to reuse the same thing across multiple "phases" of the packaging life cycle. The problem with that is different phases have different needs, and when you're trying to use the same thing across multiple you suffer from something that is a sort of "Jack of All Trades, Master of None" kind of thing.
I guess to make progress in this conversation I need some more detailed explanations. I totally get that there's a long history of thought and conversations behind the various assertions here like "a sdist is fundamentally different from a VCS checkout", "there must be a 1-1 mapping between sdists and wheels", "pip needs sdists that have full wheel metadata in static form", and I'm barging in from the outside with no context, but I literally have no idea why the specific design features you're asking for are desirable or even viable. Right now if I were to try and write the PEP you're asking for, then the rationale section would just be "because Donald said so" over and over :-). I couldn't write the motivation section, because I don't know any problems that the PEP you're describing would fix for me as a package author (which doesn't mean they don't exist, but!). -n

Let me see if I can help clarify, so it's not just Donald who says so :-) It does feel as if we're trying to explain a lot of things that "everybody knows". Clearly not everybody knows, as you don't, but what we're trying to clarify here is the de facto realities of how sdists work, and how people expect them to work. Unfortunately, there's an awful lot of things in the packaging ecosystem that are defined by existing practice, and traditionally haven't been formally documented. I'm sure it feels as if we're just repeatedly saying "it has to be like that" - but in truth, it's more that what we're saying is the definition of a sdist, as established by existing practice. I wish we could point you to a formal definition of the requirements, but unfortunately they've never been written down. With luck, one of the outcomes here will be that someone will record what a sdist is - but we need to reflect current reality, and not end up reusing the term "sdist" to mean something different from what people currently use it for. On 4 October 2015 at 19:22, Nathaniel Smith <njs@pobox.com> wrote:
"a sdist is fundamentally different from a VCS checkout",
Specifically, a sdist is built by the packaging tools - at the moment, by "setup.py sdist", but in future by whatever tool(s) may replace distutils/setuptools. So a sdist has a defined format, and we can mandate certain things about it. In particular, we can require files to be present which are in tool-friendly formats, because the tools will build them. On the other hand, a VCS checkout is fundamentally built by a human, for use by humans. File formats need to be human-editable, we have to be prepared to work with constraints imposed by workflows and processes *other* than Python packaging tools. So we have much less ability to dictate the format. Your proposal mandates a single directory "owned" by the packaging ecosystem, which follows the git/hg/subversion model, so it's lightweight and low-risk. But you still cant realistically ask the user to maintain package data in (for example) a JSON file in that directory.
"there must be a 1-1 mapping between sdists and wheels",
The fundamental reason is one I know I've mentioned here before - pip implements "pip install <sdist>" by first building a wheel and then installing it. If a sdist generates two wheels, how will pip know which one to install? Also, users expect "pip wheel <sdist>" to produce the wheel corresponding to the sdist. You're proposing to change that expectation - the onus is on you to justify that change. You need to consider backward compatibility in the wider sense here too - right now, there *is* a one-to-one mapping between a sdist and a wheel. If you want to change that you need to justify it, it's not enough just to claim that no-one has come up with a persuasive argument to keep things as they are. Change is not a bad thing, and "because we've always done it that way" is not a good argument, but change needs to be justified even so.
"pip needs sdists that have full wheel metadata in static form"
I assume here you're now OK with the distinction between a sdist and a VCS checkout? If you still think we're saying that pip needs static metadata in *VCS checkouts* then please review the comments already made about the difference between a sdist and a VCS checkout. But basically, a sdist is a tool-generated archive that captures the state of the project and allows for *reproducible* builds of that project. If your understanding of what a sdist is differs from this, we need to stop and agree on terminology before going any further. I will concede that https://packaging.python.org/en/latest/glossary/ doesn't mention the point that a sdist needs to provide reproducible builds. But that's certainly how sdists are used at present, and how people expect them to work. Certainly, if I lost the wheel I'd built from a sdist, I'd expect to just rebuild it from the sdist and get the same wheel. Pip needs metadata to do dependency resolution. This includes project name, version, and dependency information. We could debate about whether *full* metadata is needed, but I'm not sure what the point is. Once you are recording the stuff that pip needs, why *not* record everything? There are other tools (and ad-hoc scripts) that would benefit from having the full metadata, so why would you make it harder for them to work? You claim that you want to keep your options open - but to me, it's more important to leave the *user's* options open. If we don't provide certain values, a user who needs that data has to propose a change to the format, wait for it to be implemented, and even then they can't rely on it until all projects move to the new format. Better to just require everything from the start, then users can get at whatever they need. As far as why the metadata should be static, the current sdist format does actually include static metadata, in the PKG-INFO file. So again we have a case where it's up to you to justify the backward compatibility break. But it's a little less clear-cut here, because you are proposing a new sdist format, so you've already argued for a break with the old format. Also the old format is not typically introspected, it's just used to unpack and run setup.py. So you can reasonably argue that the current state of affairs is irrelevant. However, we're talking here about whether the metadata should be statically available, or dynamically generated. The key point here is that dynamic metadata requires the tool (pip, my one-off script, whatever) to *run arbitrary code* in order to get the metadata. OK, with signing we can ensure that it's *trusted* code, but it still could do anything the project author wanted, and we can make no assumptions about what it does. That makes a tool's job much harder. A common bug report for pip is users finding that their installs fail, because setup.py requires numpy to be installed in order to run, and yet pip is running setup.py egg-info precisely to find out what the requirements are. We tell the user that the setup.py is written incorrectly, and they should install numpy and retry the install, but it's not a good user experience. And from a selfish point of view, users blame *pip* for the consequences of a project whose code to generate the metadata is buggy. Those bug reports are a drain on the time of the pip developers, as well as a frustrating experience for the users. If you want to argue that a VCS checkout, or development directory, needs to generate metatata dynamically, I won't argue. That's fine. But the sdist is a tool-generated snapshot of a *specific* release of a project (maybe "the release I made at 1:15 today for my test build", but still a specific build) and it should be perfectly possible to capture the dynamically generated metadata from the VCS checkout and store it in the sdist when it is built. If you feel that there is metadata that cannot be stored statically in the sdist, could you please give a specific example? But do remember that a sdist is intended as a *snapshot* of a VCS checkout that can be used to reproducibly build the project - so "the version number needs to include the time of the build" isn't a valid example. Paul

On October 4, 2015 at 2:22:51 PM, Nathaniel Smith (njs@pobox.com) wrote:
I guess to make progress in this conversation I need some more detailed explanations. I totally get that there's a long history of thought and conversations behind the various assertions here like "a sdist is fundamentally different from a VCS checkout", "there must be a 1-1 mapping between sdists and wheels", "pip needs sdists that have full wheel metadata in static form", and I'm barging in from the outside with no context, but I literally have no idea why the specific design features you're asking for are desirable or even viable. Right now if I were to try and write the PEP you're asking for, then the rationale section would just be "because Donald said so" over and over :-). I couldn't write the motivation section, because I don't know any problems that the PEP you're describing would fix for me as a package author (which doesn't mean they don't exist, but!).
I don't mind going into more details! I'll do the things you specifically mentioned and then if there is other things, feel free to bring them up too. I should also mention, that these are my opinions from my experiences with the toolchain and ecosystem, others may agree or disagree with me. I have strong opinions, but that doesn't make them immutable laws of the universe, although "because Donald said so" sounds like a pretty good answer to me ;) "a sdist is fundamentally different from a VCS checkout" This one I have a hard time trying to explain. They are focused on different things. With an sdist you need to have a project name, a version, a list of files, things like that. The use cases and needs for each "phase" are different. For instance, in a VCS checkout you can derrive the list of files or the version by asking the VCS but a sdist doesn't have a VCS so it has to have that baked into it. A more C centric example, is that you often times have something like autogen.sh in a C project's VCS, but you don't have the output of that checked into the VCS, however when you prepare a tarball for distribution you run autogen.sh and then include the output there. There are other differences too, in a VCS we don't really need the ability to statically read any metadata except for build dependencies and how to invoke the build tool. Most everything else can be dynamically configured because you're not distributing that. However in a sdist, we need as much of the metadata to be static as possible. Something like PyPI needs to be able to inspect any of the files uploaded to it (sdist, wheels, etc) for certain information and anything that can't be statically and safely read from it might as well not even exist as far as PyPI is concerned. We currently have the situation where we have a single file that is used for all phases of the process, dev (``setup.py develop`` & ``setup.py sdist``), building of a wheel (``setup.py bdist_wheel``) and even installation sometimes (``setup.py install``). Throughout this there are a lot of common problems where some author tried to optimize their ``setup.py`` for their development use cases and broke it for the other cases. An example of this is version handling, where it's not unusual for someone's first forray into attempting to deduplication version involves importing their thing (which works fine on their machine) and passing it into the setup kwargs. This simple thing would generally work just fine if the output of ``setup.py sdist`` produced static metadata and ``setup.py`` was no longer being used. This also becomes expressed in what interfaces you give to the toolchain at each "phase". It's important for something inside of a VCS checkout to be able to be written by human beings. This leads to wanting to use formats like INI (which is ugly) or something like TOML or YAML or some other nice, human friendly format. These formats are great for humans to write and for humans to read but are not particularly great as data interchange formats. Looking at something like JSON, msgpack, etc are far better for data interchange for computers to talk to other computers, but are not great for humans to write, edit, or even really read in many cases. If we go back to distutils2, you can see this effect happening there, they had two similar keywords arguments in their setup.cfg statements, description and description-file, these both did the same things, but just pulled from different sources (inline or via a file) forcing every tool in the chain to have to support both of these options even though it could have easily made an sdist that was distinct from the VCS code and simplified code there. I see the blurring of lines between the various phases of a package one of the fundamental flaws of distutils and setuptools. "there must be a 1-1 mapping between sdists and wheels" This has technical and social reasons. In the techincal side, the 1-1 mapping between sdists and wheels (and all other bdists) is an assumption baked into all of the tools. From PyPI's enforcement mechanisms, to pip's caching, to things like devpi and the such breaking this assumption will break a lot of code. This is all code and code is not immutable so we could of course change that, however we wouldn't be able to rely on the fact that we've fixed that assumption for many years (probably at least 5 at the earliest, 10+ is more likely). The social side is a bit more interesting though. In Debian, end users almost *never* actually interact with source packages and in near 100% of the time they are interacting soley with built packages (in fact, unlike Python, you have to manually build a deb before you can even attempt to install something). There really aren't "source packages" in Debian, just sources that happen to produce a Debian package. In Python land, a source package is still a package and people have expectations around that, I think people would be very confused if a sdist "foo-1.0.tar.gz" could produce a wheel "bar-3.0.whl". In addition, systems like Debian don't really try to protect against a malicious DD at all. Things like "prevent foo from claiming to be bar" are enforced via societal conventions and the fact that it is not an open repo and there are gatekeepers keeping everything in place. On the flip side, we let anyone upload to PyPI and rely on things like ACLs to secure things. This means that we need to know ahead of time what names a package is going to produce. The simpliest mechanism for this is to enforce a 1:1 mapping between sdist and wheel because that is an immutable property and easy to understand. I could possibly envision something that allowed this, but it would require a project to explicitly declare up front what names it will produce, and require registering those names with PyPI before you could upload a sdist that could produce those named wheels. Ultimately, I don't think the very minor benefits are worth the additional complexity and pain of trying to adapt all of the tooling and human expectations to this. "pip needs sdists that have full wheel metadata in static form" I think I could come around to the idea that some metadata doesn't make sense for a sdist, and that it really needs to be a part of wheels but not a part of sdist. I think that the argument needs to be made in the other direction though, we should assume that all metadata will be included as part of the sdist and then make an argument for why each particular piece of metadata is Wheel specific not specific to a particular version of a project. Things like name, version, description, classifiers, etc are easily able to be classified into specific to a particular (name, version) tuple. Other things like "Python ABI" are easily able to be classified into specific to a particular wheel. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Sat, Oct 3, 2015 at 10:50 AM, Donald Stufft <donald@stufft.io> wrote:
I feel like you have some sort of "a sdist is jsut a tarball of a VCS" mentality and I don't think that idea of a sdist is generally useful.
Hmm, so, between thinking this over more on the plane today and reading your and Paul's great replies, I think I see where a lot of this disjunction might be arising. I'll try to reply to those in more detail later, but first let me try to lay this out and see if it makes things clearer. First, let's drop the word "sdist", it's confusing. I'm starting from the long history and conventions around how people make what I'll call "source releases" (and in a few paragraphs will contrast with "source wheels"). 'Everyone knows' that when you release a new version of some package (in pretty much any language), then one key step is to put together a file called <package>-<version>.<zip or .tar.gz>. And 'everyone knows' that if you see a file that follows this naming convention, and you download it, then what you'll find inside is: a single directory called <package>-<version>/, and inside this directory will be something that's almost like a VCS checkout -- it'll probably contain a README, source files in convenient places to be edited or grepped, etc. The differences from a VCS checkout (if any) will be little convenience stuff -- like ./autogen.sh will have been run already, or there will be an extra file containing a fixed version number instead of it being autodetected, or -DNDEBUG will be in the default CFLAGS, or Cython files will have been pre-translated to C -- but fundamentally it will be similar to a VCS checkout, and switching back and forth between them won't be too jarring. 95% of the time there will be a standard way to build the thing ('./configure && make && make install', or 'python setup.py install', or similar). And these kind of source releases have a rich ecosystem around them and serve a wide range of uses: they provide a low-tech archival record (while VCS's come and go), they end up in deb and rpm "original source" bundles, they get downloaded by users and built by hand (maybe with weird configury on top, like a hack to enable cross-compilation) or poked around in by hand, etc. etc. When sdists were originally designed, then "source releases" is what the designers were thinking about. Then, easy_install came along, and pulled off a clever hack where when you asked for some package by name, then it would try to automagically go out and track down any relevant source releases and build them all. And it works great, except when it doesn't. And creates massive headaches for everyone trying to work on python packaging afterwards, because source releases were not designed to be used this way. My hypothesis is that the requirements that were confusing me are based around the idea that an sdist should be something designed to slot into this particular use case: i.e., something that pip can automatically grab and work with while solving a dependency resolution problem. Therefore it really needs to have a static name, and static version number, and static dependencies, and must produce exactly one binary wheel that shares all that metadata. Let's call this a "source wheel" -- what we're really looking for here is a way to ship extension modules inside something that acts like an architecture-neutral "none-any" wheel. So: the email that started this thread was a proposal for how to standardize the format of "source releases", and Donald's counter was a proposal for how to standardize the format of "source wheels". Does that sound right? If so, then some follow-up thoughts: 1) If we design a source wheel format, then I am 100% in favor of the suggestion of giving it a unique extension like "swhl". I'm still a young whippersnapper compared to some, but I've been downloading files named <package>-<version>.<zip or tar.gz> for 20 years, and AFAICR every one of those files unpacked to make a single directory that was laid out like a VCS checkout. Obviously we can change what goes inside, but we should change the naming convention at the same time because otherwise we're just going to confuse people. 2) I think there's a strong case to be made that Python actually needs standards for *both* source releases and source wheels. There's certainly no logical contradiction -- they're conceptually different things. It sounds like we all agree that "pip" should continue to have a way to build and install an arbitrary VCS checkout, and extending that standard to cover building and installing a classic "source release" would be... almost difficult *not* to do. And I think that there will continue to be a clear need for source releases even in a world where source wheels exist, because of all those places where source releases get used that aren't automagic-easy_install/pip-builds. For example, most pure Python packages (the ones that already make "none-any" wheels) have no need at all for source wheels, but they still need source releases to serve as archival snapshots. And more complex packages that need build-time configuration (e.g. numpy) will continue to require source releases that can be configured to build wheels that have a variety of different properties (e.g., different dependency metadata), so they can't get by with source wheels alone -- but you can imagine that such projects might reasonably *in addition* provide a source wheel that locks down the same default configuration that gets used for their uploaded binary wheel builds, and is designed for pip to use when trying to resolve dependencies on platforms where a regular binary wheel is unavailable. Pictorially, this world would look like: VCS checkout -> source release \ \ --------------------+--> in-place install | +--> wheels -> install | +--> source wheels -> wheels -> install 3) It sounds like we all agree that - 'pip install <VCS checkout>' should work - that there is some crucial metadata that VCS checkouts won't be able to provide without running arbitrary code (e.g. dependencies and version numbers) - that what metadata they do provide (e.g., which arbitrary code to run) should be specified in a human-friendly configuration file Given this, in the big picture it sounds like the only really essentially controversial things about the original proposal might be: - that 'pip install <tarball of VCS checkout>' should work the same as 'pip install <VCS checkout>' (does anyone actually disagree?) - the 1 paragraph describing the "transitional plan" allowing pip to automatically install from these source releases *as part of a dependency resolution plan* (as opposed to when a VCS-checkout-or-equivalent is explicitly given as an install target). Which honestly I don't like either, it's just there as a backcompat measure so that these source releases don't create a regression versus existing sdists -- note that one of the goals of the design was that current sdists could be upgraded into this format by dropping in a single static file (or by an install tool "virtually" dropping in this file when it encounters an old-style sdist -- so you don't need to keep around code to handle both cases separately). Does that sound right? (Other features of the original proposal include stuff like the lack of trivial metadata like "name" and "description", and the support for generating multiple wheels from one directory. I am explicitly calling these "inessential".) -n -- Nathaniel J. Smith -- http://vorpus.org

On 5 October 2015 at 07:29, Nathaniel Smith <njs@pobox.com> wrote:
First, let's drop the word "sdist", it's confusing.
I'll read your full reply later, when I have the time, but please note we can't drop the term sdist - it's a well known concept in packaging. Having said that, I'm happy if you want to restate your proposal in terms of a new concept that's not a sdist (you'll need to come up with a suitable term, and make it clear that it's distinct from a sdist when you formalise the proposal, but that's OK for now). My immediate thought is that I'm against a proposal that throws out the sdist concept in favour of something new, as there's a lot of reworking that would need to be done to achieve that, and I don't know who's going to do that. So I'd need convincing that the proposal is practical. For example, PyPI would need to be able to host these new things, and distinguish between them and sdists. But I'll hold off on detailed comments until I've read your full email. Paul

OK, I've had a better read of your email now. Responses inline. On 5 October 2015 at 07:29, Nathaniel Smith <njs@pobox.com> wrote:
First, let's drop the word "sdist", it's confusing.
We can't (see below for details). We can deprecate the sdist concept, if that's what you want to propose. From what I gather, you're proposing deprecating it in favour of a "source wheel" concept. I don't have a huge issue with that other than that I don't see the necessity - the sdist concept pretty much covers what you want, except maybe that it's not clear enough to people outside the packaging community how it differs from a VCS checkout.
I'm starting from the long history and conventions around how people make what I'll call "source releases" (and in a few paragraphs will contrast with "source wheels"). 'Everyone knows' that when you release a new version of some package (in pretty much any language), then one key step is to put together a file called <package>-<version>.<zip or .tar.gz>. And 'everyone knows' that if you see a file that follows this naming convention, and you download it, then what you'll find inside is: a single directory called <package>-<version>/, and inside this directory will be something that's almost like a VCS checkout -- it'll probably contain a README, source files in convenient places to be edited or grepped, etc. The differences from a VCS checkout (if any) will be little convenience stuff -- like ./autogen.sh will have been run already, or there will be an extra file containing a fixed version number instead of it being autodetected, or -DNDEBUG will be in the default CFLAGS, or Cython files will have been pre-translated to C -- but fundamentally it will be similar to a VCS checkout, and switching back and forth between them won't be too jarring. 95% of the time there will be a standard way to build the thing ('./configure && make && make install', or 'python setup.py install', or similar).
Breaking at this point, because that's frankly *not* the reality in the Python packaging world (at least not nowadays - I'm not clear to what extent you're just talking about history and background here, although your reference to Cython makes me think you're talking in terms of current practice). It may look like that, but there are some fundamental differences. First and foremost, nobody zips up and publishes their VCS checkout in the way you describe. (At least not if they are using the standard tools - distutils and setuptools). Instead, they create a "sdist" using the "python setup.py sdist" command. I'm sorry, but I'm going to carry on using the "sdist" term here, because I'm describing current practice and sdists *are* current practice. The difference is between a sdist and what you call a "source release" is subtle, precisely because the current sdist format is a bit of a mess, but the key point is that all sdists are created by a standard process, and conform to a standard naming convention and layout. The packaging tools rely on being able to make that assumption, in all sorts of ways which we're doing our best to clarify as part of this thread, but which honestly have been a little bit implicit up to this point. Further muddying the water is the fact that as you say, pip needs to be able to build from a VCS checkout (a directory on the user's local system) and we have code in pip that does that - mostly by assuming that you can treat a VCS checkout as an unpacked sdist, but there are hacks we need to do to make that work (we run setup.py egg-info to get the metadata we need, for example, which has implications as we only get that data at a later stage than we have it in the sdist case) and differences in functionality (develop mode). At this point I'm not saying that things have to be this way, or even that "make a source release however you choose as long as it follows these conventions" isn't a viable way forward, but I do think we need to agree on our picture of how things are now, or we'll continue talking past each other.
And these kind of source releases have a rich ecosystem around them and serve a wide range of uses: they provide a low-tech archival record (while VCS's come and go), they end up in deb and rpm "original source" bundles, they get downloaded by users and built by hand (maybe with weird configury on top, like a hack to enable cross-compilation) or poked around in by hand, etc. etc. When sdists were originally designed, then "source releases" is what the designers were thinking about.
This, on the other hand, I suspect is not far from the truth. When sdists were designed, they were a convenience for bundling the stuff needed to do setup.py install later, possibly on a different machine. But that's a long time ago, and not really relevant now. For better or worse. Unless you are suggesting that we go all the way back to that original point? Which you may be, but that means discarding the work that's been done based on the sdist concept since then. Which leads nicely on to...
Then, easy_install came along, and pulled off a clever hack where when you asked for some package by name, then it would try to automagically go out and track down any relevant source releases and build them all. And it works great, except when it doesn't. And creates massive headaches for everyone trying to work on python packaging afterwards, because source releases were not designed to be used this way.
My hypothesis is that the requirements that were confusing me are based around the idea that an sdist should be something designed to slot into this particular use case: i.e., something that pip can automatically grab and work with while solving a dependency resolution problem. Therefore it really needs to have a static name, and static version number, and static dependencies, and must produce exactly one binary wheel that shares all that metadata.
Anyone who knows my history will know that I'm the last person to defend setuptools' hacks, but you hit the nail on the head above. When it works, it works great (meh, "sufficiently well" :-)) And pip *needs* to do static dependency resolution. We have enough bug reports and feature requests asking that we improve the dependency resolution process that nobody is going to be happy with anything that doesn't allow for at least as much static information as we currently have, and ultimately more.
Let's call this a "source wheel" -- what we're really looking for here is a way to ship extension modules inside something that acts like an architecture-neutral "none-any" wheel.
I don't understand this statement. What do extension modules matter here? We need to be able to ship sources in a form that can participate in dependency resolution (and any other potential discovery processes that may turn up in future) without having to run code to do so. The reasons for this are: 1. Running code is a performance overhead, and possibly even a security risk (even trusted code may behave in ways you didn't anticipate). We want to do as little as possible of that as we can, and in particular we want to discard invalid candidate files without running any of their code. 2. Running code introduces the possibility of that code failing. We don't want end users to have installs fail because code in distributions we're going to discard is buggy. 3. Repositories like PyPI need to present project metadata, for both human and tool consumption - they can only do this if it's available statically. You seem to be thinking that binary wheels are sufficient for this for pure-Python code. Look at it the other way - we discard sdists from the dependency calculations whenever there's an equivalent binary wheel available. That's always for non-any wheels, but less often for architecture-dependent wheels. But in order to know that the wheel is equivalent, we need to match it with the sdist - so the sdist needs the metadata you're trying to argue against providing...
So: the email that started this thread was a proposal for how to standardize the format of "source releases", and Donald's counter was a proposal for how to standardize the format of "source wheels". Does that sound right?
Well, essentially yes, although I read it as your original email being a proposal for a new format to replace sdists, and Donald's and my counter is that there's already a been a certain amount of thinking and design gone into how we move from the current ad-hoc sdist format to a better defined and specified "next version", so how does your proposal affect that? It seems that your answer is that you want to bypass that and offer an alternative. Is that fair? For the record, I don't like the term "source wheel" and would prefer to stick with "sdist" if appropriate, or choose a term that doesn't include the word "wheel" otherwise (as wheels seem to me to be strongly, and beneficially, linked in people's minds to the concept of a binary release format).
If so, then some follow-up thoughts:
1) If we design a source wheel format, then I am 100% in favor of the suggestion of giving it a unique extension like "swhl". I'm still a young whippersnapper compared to some, but I've been downloading files named <package>-<version>.<zip or tar.gz> for 20 years, and AFAICR every one of those files unpacked to make a single directory that was laid out like a VCS checkout. Obviously we can change what goes inside, but we should change the naming convention at the same time because otherwise we're just going to confuse people.
I have no problem with making it clear that "sdist version 2" or "source wheel" is not the same as a packed VCS checkout. I don't see the need for a new term, I'd be happy with "<package>-<version>.sdist" as the name. I'd also like to emphasize strongly that PyPI only hosts sdists, and *not* source releases - that source releases are typically only seen in Python in the form of a VCS checkout or development directory. (There's an implication there that we need to explore, that pip won't necessarily gain the ability to be pointed at a non-sdist format packed "source release" archive, and download it and process it. That's not a given, but I'd like to be sure we are happy with the potential re-introduction of confusion over the distinction between a sdist/source wheel and a source release that would result).
2) I think there's a strong case to be made that Python actually needs standards for *both* source releases and source wheels. There's certainly no logical contradiction -- they're conceptually different things. It sounds like we all agree that "pip" should continue to have a way to build and install an arbitrary VCS checkout, and extending that standard to cover building and installing a classic "source release" would be... almost difficult *not* to do.
As noted above, I'm happy for that discussion to occur. But I'm *not* sure the case is as strong as you think. Technically, it's certainly not too hard, but the social issues are what concern me. How will we explain to someone that they can't upload their file to PyPI because it's a "source release" not a "source wheel"? What is the implication on people's workflow? How would we explain why people might want to make "source releases" *at all*? Personally, I can only see a need in my personal experience for a VCS url that people can clone, a packaged source artifact that I can upload to PyPI for automatic consumption, and (binary) wheels. That second item is a "source wheel" - not a "source release".
And I think that there will continue to be a clear need for source releases even in a world where source wheels exist, because of all those places where source releases get used that aren't automagic-easy_install/pip-builds. For example, most pure Python packages (the ones that already make "none-any" wheels) have no need at all for source wheels, but they still need source releases to serve as archival snapshots. And more complex packages that need build-time configuration (e.g. numpy) will continue to require source releases that can be configured to build wheels that have a variety of different properties (e.g., different dependency metadata), so they can't get by with source wheels alone -- but you can imagine that such projects might reasonably *in addition* provide a source wheel that locks down the same default configuration that gets used for their uploaded binary wheel builds, and is designed for pip to use when trying to resolve dependencies on platforms where a regular binary wheel is unavailable.
Pictorially, this world would look like:
VCS checkout -> source release \ \ --------------------+--> in-place install | +--> wheels -> install | +--> source wheels -> wheels -> install
I don't see the need for source releases that you do. That's likely because I don't deal with the sorts of complex projects you do, though, so I'm not dismissing the issue. As I say, my objections are mostly non-technical. I do think you should consider how to document "what a source release is intended to achieve" in a way that explains it to people who don't need the complexity it adds - and with the explicit goal of making sure that you dissuade people who *don't* need source releases from thinking they do.
3) It sounds like we all agree that - 'pip install <VCS checkout>' should work
Yes.
- that there is some crucial metadata that VCS checkouts won't be able to provide without running arbitrary code (e.g. dependencies and version numbers)
I'm still resisting this one, although I can live with "Nathanial tells me so" :-)
- that what metadata they do provide (e.g., which arbitrary code to run) should be specified in a human-friendly configuration file
I don't agree to that one particularly, in the sense that I don't really care. I'd be happy with a system that said something like that for a VCS checkout, pip expects "setup.py egg-info" and "setup.py sdist" to work, and produce respectively a set of static metadata in a known location, and a properly formatted "source wheel"/sdist file. Non-distutils build tools can write a wrapper setup.py that works however they prefer. (That's roughly what we have now, BTW).
Given this, in the big picture it sounds like the only really essentially controversial things about the original proposal might be: - that 'pip install <tarball of VCS checkout>' should work the same as 'pip install <VCS checkout>' (does anyone actually disagree?)
Yes, to the extent that I want to ensure it's clearly documented how, and why, this differs from a sdist/source wheel. But that's not a technical issue.
- the 1 paragraph describing the "transitional plan" allowing pip to automatically install from these source releases *as part of a dependency resolution plan* (as opposed to when a VCS-checkout-or-equivalent is explicitly given as an install target). Which honestly I don't like either, it's just there as a backcompat measure so that these source releases don't create a regression versus existing sdists -- note that one of the goals of the design was that current sdists could be upgraded into this format by dropping in a single static file (or by an install tool "virtually" dropping in this file when it encounters an old-style sdist -- so you don't need to keep around code to handle both cases separately).
Given that these source releases won't be hosted on PyPI (as the proposal currently stands) there's no real need for this - all you need to say is that you can point pip at any old URL and take your chances :-)
Does that sound right?
Not really. For me the big controversy is whether we move forward from where we are with sdists, or we ignore the current sdist mechanism and start over. A key further question, which I don't think has been stated explicitly until I started this email, is what formats will be supported for hosting on PyPI. I am against hosting formats that don't support static metadata, such as your "source distribution", as I don't see how PyPI would be able to publish the metadata if it weren't static. And following on from that, we need to agree whether the key formats should be required to have a static version. I'm OK with a VCS checkout having a dynamically generated version, that's part of the "all bets are off" contract over such things (if you don't generate a version that reflects every change, you get to deal with the consequences) but I don't think that's a reasonable thing to allow in "published" formats.
(Other features of the original proposal include stuff like the lack of trivial metadata like "name" and "description", and the support for generating multiple wheels from one directory. I am explicitly calling these "inessential".)
Not sure what you mean by lack of name/description being "inessential". The double negative confuses me. Do you mean you're OK with requiring them? Fair enough. For multiple wheels, I'd tend to consider the opposite to be true - it's not that the capability is non-essential, but rather that in published formats (source wheel and later in the chain) it's essential that one source generates one target. Paul

The OP asks for a Python callable interface to pip instead of setup.py's command line interface. That could be accomplished now by figuring out all of the arguments that pip will send to setup.py (setup.py egg_info and setup.py bdist_wheel)?, and then by writing a setup.py emulator that implements those commands by invoking the discovered callables, but it's hard to know exactly what that command line interface is. It has been a packaging TODO to write the list of arguments pip can send to setup.py down... Suppose someone from the pip team wrote the generic setup.py. It would implement the command line interface, import a script called, say, build.py, and invoke Python callables to do egg_info and bdist_wheel. Then the flit author could implement a couple of functions instead of having to reverse-engineer a command line interface. This would be an improvement because build system authors do not know how to extend setuptools or reverse engineer the necessary setup.py command line interface. On Mon, Oct 5, 2015 at 7:28 AM Paul Moore <p.f.moore@gmail.com> wrote:
OK, I've had a better read of your email now. Responses inline.
On 5 October 2015 at 07:29, Nathaniel Smith <njs@pobox.com> wrote:
First, let's drop the word "sdist", it's confusing.
We can't (see below for details). We can deprecate the sdist concept, if that's what you want to propose. From what I gather, you're proposing deprecating it in favour of a "source wheel" concept. I don't have a huge issue with that other than that I don't see the necessity - the sdist concept pretty much covers what you want, except maybe that it's not clear enough to people outside the packaging community how it differs from a VCS checkout.
I'm starting from the long history and conventions around how people make what I'll call "source releases" (and in a few paragraphs will contrast with "source wheels"). 'Everyone knows' that when you release a new version of some package (in pretty much any language), then one key step is to put together a file called <package>-<version>.<zip or .tar.gz>. And 'everyone knows' that if you see a file that follows this naming convention, and you download it, then what you'll find inside is: a single directory called <package>-<version>/, and inside this directory will be something that's almost like a VCS checkout -- it'll probably contain a README, source files in convenient places to be edited or grepped, etc. The differences from a VCS checkout (if any) will be little convenience stuff -- like ./autogen.sh will have been run already, or there will be an extra file containing a fixed version number instead of it being autodetected, or -DNDEBUG will be in the default CFLAGS, or Cython files will have been pre-translated to C -- but fundamentally it will be similar to a VCS checkout, and switching back and forth between them won't be too jarring. 95% of the time there will be a standard way to build the thing ('./configure && make && make install', or 'python setup.py install', or similar).
Breaking at this point, because that's frankly *not* the reality in the Python packaging world (at least not nowadays - I'm not clear to what extent you're just talking about history and background here, although your reference to Cython makes me think you're talking in terms of current practice). It may look like that, but there are some fundamental differences.
First and foremost, nobody zips up and publishes their VCS checkout in the way you describe. (At least not if they are using the standard tools - distutils and setuptools). Instead, they create a "sdist" using the "python setup.py sdist" command. I'm sorry, but I'm going to carry on using the "sdist" term here, because I'm describing current practice and sdists *are* current practice.
The difference is between a sdist and what you call a "source release" is subtle, precisely because the current sdist format is a bit of a mess, but the key point is that all sdists are created by a standard process, and conform to a standard naming convention and layout. The packaging tools rely on being able to make that assumption, in all sorts of ways which we're doing our best to clarify as part of this thread, but which honestly have been a little bit implicit up to this point.
Further muddying the water is the fact that as you say, pip needs to be able to build from a VCS checkout (a directory on the user's local system) and we have code in pip that does that - mostly by assuming that you can treat a VCS checkout as an unpacked sdist, but there are hacks we need to do to make that work (we run setup.py egg-info to get the metadata we need, for example, which has implications as we only get that data at a later stage than we have it in the sdist case) and differences in functionality (develop mode).
At this point I'm not saying that things have to be this way, or even that "make a source release however you choose as long as it follows these conventions" isn't a viable way forward, but I do think we need to agree on our picture of how things are now, or we'll continue talking past each other.
And these kind of source releases have a rich ecosystem around them and serve a wide range of uses: they provide a low-tech archival record (while VCS's come and go), they end up in deb and rpm "original source" bundles, they get downloaded by users and built by hand (maybe with weird configury on top, like a hack to enable cross-compilation) or poked around in by hand, etc. etc. When sdists were originally designed, then "source releases" is what the designers were thinking about.
This, on the other hand, I suspect is not far from the truth. When sdists were designed, they were a convenience for bundling the stuff needed to do setup.py install later, possibly on a different machine.
But that's a long time ago, and not really relevant now. For better or worse. Unless you are suggesting that we go all the way back to that original point? Which you may be, but that means discarding the work that's been done based on the sdist concept since then. Which leads nicely on to...
Then, easy_install came along, and pulled off a clever hack where when you asked for some package by name, then it would try to automagically go out and track down any relevant source releases and build them all. And it works great, except when it doesn't. And creates massive headaches for everyone trying to work on python packaging afterwards, because source releases were not designed to be used this way.
My hypothesis is that the requirements that were confusing me are based around the idea that an sdist should be something designed to slot into this particular use case: i.e., something that pip can automatically grab and work with while solving a dependency resolution problem. Therefore it really needs to have a static name, and static version number, and static dependencies, and must produce exactly one binary wheel that shares all that metadata.
Anyone who knows my history will know that I'm the last person to defend setuptools' hacks, but you hit the nail on the head above. When it works, it works great (meh, "sufficiently well" :-))
And pip *needs* to do static dependency resolution. We have enough bug reports and feature requests asking that we improve the dependency resolution process that nobody is going to be happy with anything that doesn't allow for at least as much static information as we currently have, and ultimately more.
Let's call this a "source wheel" -- what we're really looking for here is a way to ship extension modules inside something that acts like an architecture-neutral "none-any" wheel.
I don't understand this statement. What do extension modules matter here? We need to be able to ship sources in a form that can participate in dependency resolution (and any other potential discovery processes that may turn up in future) without having to run code to do so. The reasons for this are:
1. Running code is a performance overhead, and possibly even a security risk (even trusted code may behave in ways you didn't anticipate). We want to do as little as possible of that as we can, and in particular we want to discard invalid candidate files without running any of their code. 2. Running code introduces the possibility of that code failing. We don't want end users to have installs fail because code in distributions we're going to discard is buggy. 3. Repositories like PyPI need to present project metadata, for both human and tool consumption - they can only do this if it's available statically.
You seem to be thinking that binary wheels are sufficient for this for pure-Python code. Look at it the other way - we discard sdists from the dependency calculations whenever there's an equivalent binary wheel available. That's always for non-any wheels, but less often for architecture-dependent wheels. But in order to know that the wheel is equivalent, we need to match it with the sdist - so the sdist needs the metadata you're trying to argue against providing...
So: the email that started this thread was a proposal for how to standardize the format of "source releases", and Donald's counter was a proposal for how to standardize the format of "source wheels". Does that sound right?
Well, essentially yes, although I read it as your original email being a proposal for a new format to replace sdists, and Donald's and my counter is that there's already a been a certain amount of thinking and design gone into how we move from the current ad-hoc sdist format to a better defined and specified "next version", so how does your proposal affect that?
It seems that your answer is that you want to bypass that and offer an alternative. Is that fair?
For the record, I don't like the term "source wheel" and would prefer to stick with "sdist" if appropriate, or choose a term that doesn't include the word "wheel" otherwise (as wheels seem to me to be strongly, and beneficially, linked in people's minds to the concept of a binary release format).
If so, then some follow-up thoughts:
1) If we design a source wheel format, then I am 100% in favor of the suggestion of giving it a unique extension like "swhl". I'm still a young whippersnapper compared to some, but I've been downloading files named <package>-<version>.<zip or tar.gz> for 20 years, and AFAICR every one of those files unpacked to make a single directory that was laid out like a VCS checkout. Obviously we can change what goes inside, but we should change the naming convention at the same time because otherwise we're just going to confuse people.
I have no problem with making it clear that "sdist version 2" or "source wheel" is not the same as a packed VCS checkout. I don't see the need for a new term, I'd be happy with "<package>-<version>.sdist" as the name. I'd also like to emphasize strongly that PyPI only hosts sdists, and *not* source releases - that source releases are typically only seen in Python in the form of a VCS checkout or development directory.
(There's an implication there that we need to explore, that pip won't necessarily gain the ability to be pointed at a non-sdist format packed "source release" archive, and download it and process it. That's not a given, but I'd like to be sure we are happy with the potential re-introduction of confusion over the distinction between a sdist/source wheel and a source release that would result).
2) I think there's a strong case to be made that Python actually needs standards for *both* source releases and source wheels. There's certainly no logical contradiction -- they're conceptually different things. It sounds like we all agree that "pip" should continue to have a way to build and install an arbitrary VCS checkout, and extending that standard to cover building and installing a classic "source release" would be... almost difficult *not* to do.
As noted above, I'm happy for that discussion to occur. But I'm *not* sure the case is as strong as you think. Technically, it's certainly not too hard, but the social issues are what concern me. How will we explain to someone that they can't upload their file to PyPI because it's a "source release" not a "source wheel"? What is the implication on people's workflow? How would we explain why people might want to make "source releases" *at all*? Personally, I can only see a need in my personal experience for a VCS url that people can clone, a packaged source artifact that I can upload to PyPI for automatic consumption, and (binary) wheels. That second item is a "source wheel" - not a "source release".
And I think that there will continue to be a clear need for source releases even in a world where source wheels exist, because of all those places where source releases get used that aren't automagic-easy_install/pip-builds. For example, most pure Python packages (the ones that already make "none-any" wheels) have no need at all for source wheels, but they still need source releases to serve as archival snapshots. And more complex packages that need build-time configuration (e.g. numpy) will continue to require source releases that can be configured to build wheels that have a variety of different properties (e.g., different dependency metadata), so they can't get by with source wheels alone -- but you can imagine that such projects might reasonably *in addition* provide a source wheel that locks down the same default configuration that gets used for their uploaded binary wheel builds, and is designed for pip to use when trying to resolve dependencies on platforms where a regular binary wheel is unavailable.
Pictorially, this world would look like:
VCS checkout -> source release \ \ --------------------+--> in-place install | +--> wheels -> install | +--> source wheels -> wheels -> install
I don't see the need for source releases that you do. That's likely because I don't deal with the sorts of complex projects you do, though, so I'm not dismissing the issue. As I say, my objections are mostly non-technical. I do think you should consider how to document "what a source release is intended to achieve" in a way that explains it to people who don't need the complexity it adds - and with the explicit goal of making sure that you dissuade people who *don't* need source releases from thinking they do.
3) It sounds like we all agree that - 'pip install <VCS checkout>' should work
Yes.
- that there is some crucial metadata that VCS checkouts won't be able to provide without running arbitrary code (e.g. dependencies and version numbers)
I'm still resisting this one, although I can live with "Nathanial tells me so" :-)
- that what metadata they do provide (e.g., which arbitrary code to run) should be specified in a human-friendly configuration file
I don't agree to that one particularly, in the sense that I don't really care. I'd be happy with a system that said something like that for a VCS checkout, pip expects "setup.py egg-info" and "setup.py sdist" to work, and produce respectively a set of static metadata in a known location, and a properly formatted "source wheel"/sdist file. Non-distutils build tools can write a wrapper setup.py that works however they prefer. (That's roughly what we have now, BTW).
Given this, in the big picture it sounds like the only really essentially controversial things about the original proposal might be: - that 'pip install <tarball of VCS checkout>' should work the same as 'pip install <VCS checkout>' (does anyone actually disagree?)
Yes, to the extent that I want to ensure it's clearly documented how, and why, this differs from a sdist/source wheel. But that's not a technical issue.
- the 1 paragraph describing the "transitional plan" allowing pip to automatically install from these source releases *as part of a dependency resolution plan* (as opposed to when a VCS-checkout-or-equivalent is explicitly given as an install target). Which honestly I don't like either, it's just there as a backcompat measure so that these source releases don't create a regression versus existing sdists -- note that one of the goals of the design was that current sdists could be upgraded into this format by dropping in a single static file (or by an install tool "virtually" dropping in this file when it encounters an old-style sdist -- so you don't need to keep around code to handle both cases separately).
Given that these source releases won't be hosted on PyPI (as the proposal currently stands) there's no real need for this - all you need to say is that you can point pip at any old URL and take your chances :-)
Does that sound right?
Not really. For me the big controversy is whether we move forward from where we are with sdists, or we ignore the current sdist mechanism and start over.
A key further question, which I don't think has been stated explicitly until I started this email, is what formats will be supported for hosting on PyPI. I am against hosting formats that don't support static metadata, such as your "source distribution", as I don't see how PyPI would be able to publish the metadata if it weren't static.
And following on from that, we need to agree whether the key formats should be required to have a static version. I'm OK with a VCS checkout having a dynamically generated version, that's part of the "all bets are off" contract over such things (if you don't generate a version that reflects every change, you get to deal with the consequences) but I don't think that's a reasonable thing to allow in "published" formats.
(Other features of the original proposal include stuff like the lack of trivial metadata like "name" and "description", and the support for generating multiple wheels from one directory. I am explicitly calling these "inessential".)
Not sure what you mean by lack of name/description being "inessential". The double negative confuses me. Do you mean you're OK with requiring them? Fair enough.
For multiple wheels, I'd tend to consider the opposite to be true - it's not that the capability is non-essential, but rather that in published formats (source wheel and later in the chain) it's essential that one source generates one target.
Paul _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On October 5, 2015 at 2:29:35 AM, Nathaniel Smith (njs@pobox.com) wrote:
Does that sound right?
Off the bat, I'm perfectly fine with `pip install archive-made-from-vcs.tar.gz` working the same as `pip install git+https://github.com/foo/bar.git``. The fact that one is a VCS and one isn't is immaterial to me that's just a transport mechanism for an arbitrary collection of files and directories. These arbitrary collections of file and directories can only provide a very minimal amount of information about the thing that is contained inside of them, and indeed there may even be multiple things inside of this rooted within different sub directories of the top level arbitrary collection of files. I agree with Paul though, I don't see any way this arbitrary collection of files/directories can be distributed on PyPI. PyPI is not just a place to dump whatever random archives you want for a package, it's specifically a place for Python package files (sdists, wheels, and eggs right now for the most part). It sounds like maybe you're just looking for a way to make it so that pip no longer makes these "arbitrary files/directories" installs work by treating them like unpacked sdists and instead follows some supported path. If so, that is reasonable to me (and something I planned on getting around to). If that is what you're shooting for, I think it got confused by trying to mix in the sdist concept, as our sdists and wheels are not really for human consumption any more. I don't think that it makes sense for pip to go directly from a VCS[1] to a Wheel in the grand scheme of things. Right now we kind of do it, but that's because we just treat them like an unpacked sdist [2], long term though I don't think that is the correct way to do things. We (I?) want to minimize the different installation "paths" that can be taken and the main variable then when you do a ``pip install`` is how far long that path we already are. My ideal path looks something like this: VCS -> Source Wheel [3] -> Wheel -> Installed \-> Inplace Installation [4] So in that regards, I think that the only things (in my ideal world) people should be uploading to PyPI are source wheels (mandatory? [5]) and binary wheels (optional?). I don't know if downstream would prefer to use a source wheel or an arbitrary collection of files that may or may not be in a tarball, but I'm focused primarily on the needs of our toolchain, while still trying to make sure we don't end up in a situation that hurts downstream as well. I also don't think it makes sense for pip to ever really install these items, no matter what the transport mechanism is (VCS, tarball, unpacked on disk) without being explicitly pointed to by an URL or file path. There's obviously some backwards compatability issues here, because we can't just stop fetching .tar.gz links or anything, but I think the expectation should be that these items are only ever directly installed, not installed as part of the dependency resolution process. In that vein, they don't really participate in the dependency resolution process either (we'll install their dependencies and what not, but we'll assume that since you're pointing us to an explicit archive that we aren't going to resolve that particular dependency to anything other than what you've explicitly given us). If we're only installing these items when explicitly being given them by a direct URL or path, then a lot of the worries about non static metadata no longer exist, because as a user you're explicitly opting into installing something that isn't a real package, but is a VCS install of something that could be a package. Which we'll fetch, turn into a source wheel, and then turn that into a binary wheel (or do an inplace install). I also don't think it makes sense for these VCS installs to directly support outputting multiple different source wheels, but instead rely on the fact pip lets you say "install this artifact, but first CD into a particular directory". So you'd essentially just structure the filesystem of your VCS install to make independent little mini projects that could be independently packaged into their own VCS installs, but are all contained within some larger thing and we need to install a specific sub directory. I'm happy to be convinced otherwise, but given that this is a sort of edge case that a project will need this and we already have the subdirectory support I think it's simpler to just leverage that. Given my desire to reduce the number of installation paths we actually support, I think that trying to standardize this concept of a VCS install depends first on standardizing the concept of a source wheel (or sdist 2.0). This is because, in my mind, the main two things you can do with a VCS install, from pip's point of view, is do an in place installation or create a source wheel. I also agree with Paul that we should not be adding new formats to PyPI that do not support static metadata. What metadata in specific makes sense for that particular format (like ABI doesn't make sense for a source wheel/sdist) is going to be format dependent, but there should never be a "run this bit of code to find out X" situation. The closest we should get is "that type of metadata doesn't make sense for X, you need to build X into a Y for it". PyPI needs static metadata [6], and static metadata makes other tools easier to operate as well. [1] I'm going to just call these VCS installs, but I mean any install that is not a fully formed Python package and is instead just an arbitrary collection of files and directories where we can have a minimal amount of control over the structure. [2] This goes back to my belief that one of the "original sins" of distutils and setuptools was blurring the lines between the different phases in the life cycle of a package. [3] Unlike Paul, I think a Source Wheel is actually a decent name for this concept. It's similar to .rpm and .src.rpm in that world, and I think it makes it more obvious that this item isn't an installable item in it's own right, that it exists in order to produce binary wheels. However, this concept is currently being handled by the sdist "format", however ambigiously defined that currently is. I also think it makes it a bit easier to get rid of the ambigous "package", since we can just call them all "wheels" which is easier to say than "distribution". [4] Even though I want to reduce the number of paths we take, I don't think we'll ever be able to reasonably get rid of the inplace installation path. There will probably need to be some back and forth about exactly which parts of the inplace install the installer should be responsible for and which parts the build tool is responsible for. [5] Mandatory in the sense of, if you're going to upload any files to PyPI you must upload one of these, not mandatory in the absolute sense. In a future world, ideally the only thing people will need to upload is a source wheel and we'll have a build farm that will take that and automatically produce binary wheels from them. [6] In reality, we don't have static metadata today, and we get by. This is mostly because we force uploads to include static metadata alongside the upload and we present that instead. In the future I want to move us to a situation where you *just* upload the file, and PyPI inspects the file for all of the metadata it needs. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 5 October 2015 at 13:44, Donald Stufft <donald@stufft.io> wrote:
[3] Unlike Paul, I think a Source Wheel is actually a decent name for this concept. It's similar to .rpm and .src.rpm in that world, and I think it makes it more obvious that this item isn't an installable item in it's own right, that it exists in order to produce binary wheels. However, this concept is currently being handled by the sdist "format", however ambigiously defined that currently is. I also think it makes it a bit easier to get rid of the ambigous "package", since we can just call them all "wheels" which is easier to say than "distribution".
Being a Windows user, I hadn't caught the parallel between Wheel-Source wheel and RPM-Source RPM (is there also a similar deb-source deb pair?) But if the concept works for people with a Linux background, I'm OK with it. (My main concern is that end users commonly make requests to projects saying "please provide wheels". I don't want that nice simple concept to get confused, if we can avoid it). Paul

On Mon, 5 Oct 2015 14:11:56 +0100 Paul Moore <p.f.moore@gmail.com> wrote:
Being a Windows user, I hadn't caught the parallel between Wheel-Source wheel and RPM-Source RPM (is there also a similar deb-source deb pair?)
But if the concept works for people with a Linux background, I'm OK with it.
The "source RPM" concept is actually confusing since, IIUC, a "source RPM" is nothing like a normal RPM (i.e. you can't install it using the "rpm" tool). It should actually be called a "RPM source" (again, IIUC).
(My main concern is that end users commonly make requests to projects saying "please provide wheels". I don't want that nice simple concept to get confused, if we can avoid it).
+1 Regards Antoine.

On October 5, 2015 at 9:11:59 AM, Paul Moore (p.f.moore@gmail.com) wrote:
On 5 October 2015 at 13:44, Donald Stufft wrote:
[3] Unlike Paul, I think a Source Wheel is actually a decent name for this concept. It's similar to .rpm and .src.rpm in that world, and I think it makes it more obvious that this item isn't an installable item in it's own right, that it exists in order to produce binary wheels. However, this concept is currently being handled by the sdist "format", however ambigiously defined that currently is. I also think it makes it a bit easier to get rid of the ambigous "package", since we can just call them all "wheels" which is easier to say than "distribution".
Being a Windows user, I hadn't caught the parallel between Wheel-Source wheel and RPM-Source RPM (is there also a similar deb-source deb pair?)
But if the concept works for people with a Linux background, I'm OK with it.
(My main concern is that end users commonly make requests to projects saying "please provide wheels". I don't want that nice simple concept to get confused, if we can avoid it).
Paul
I'm not dead set on the source wheel name or anything either fwiw. I liked the parallels between that and .src.rpm and .rpm and I think the "sdist" name is kind of a mouthful to say. The debian analog to a source rpm is just a "source package", but that's really just a directory that you can run the debian build tools in, it's not really a package format like source RPMs are. Source RPMs have an "install" concept, but it's not really like pip's install, it just unpacks the file into place so that you can later build it. The main thing I wanted was something that wasn't a mouthful to say and that had a dedicated extension so we don't end up where the filename looks incredibly generic (requests-1.0.swhl vs requests-1.0.tar.gz). I liked the idea of tying it into Wheels to keep the "new guard" of packaging formats related to each other, but I'm also happy to use a different term. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Axle? On Mon, Oct 5, 2015 at 9:25 AM Donald Stufft <donald@stufft.io> wrote:
On 5 October 2015 at 13:44, Donald Stufft wrote:
[3] Unlike Paul, I think a Source Wheel is actually a decent name for
concept. It's similar to .rpm and .src.rpm in that world, and I think it makes it more obvious that this item isn't an installable item in it's own right, that it exists in order to produce binary wheels. However, this concept is currently being handled by the sdist "format", however ambigiously defined that currently is. I also think it makes it a bit easier to get rid of the ambigous "package", since we can just call
On October 5, 2015 at 9:11:59 AM, Paul Moore (p.f.moore@gmail.com) wrote: this them
all "wheels" which is easier to say than "distribution".
Being a Windows user, I hadn't caught the parallel between Wheel-Source wheel and RPM-Source RPM (is there also a similar deb-source deb pair?)
But if the concept works for people with a Linux background, I'm OK with it.
(My main concern is that end users commonly make requests to projects saying "please provide wheels". I don't want that nice simple concept to get confused, if we can avoid it).
Paul
I'm not dead set on the source wheel name or anything either fwiw. I liked the parallels between that and .src.rpm and .rpm and I think the "sdist" name is kind of a mouthful to say. The debian analog to a source rpm is just a "source package", but that's really just a directory that you can run the debian build tools in, it's not really a package format like source RPMs are. Source RPMs have an "install" concept, but it's not really like pip's install, it just unpacks the file into place so that you can later build it.
The main thing I wanted was something that wasn't a mouthful to say and that had a dedicated extension so we don't end up where the filename looks incredibly generic (requests-1.0.swhl vs requests-1.0.tar.gz). I liked the idea of tying it into Wheels to keep the "new guard" of packaging formats related to each other, but I'm also happy to use a different term.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On 5 October 2015 at 14:25, Donald Stufft <donald@stufft.io> wrote:
The main thing I wanted was something that wasn't a mouthful to say and that had a dedicated extension so we don't end up where the filename looks incredibly generic (requests-1.0.swhl vs requests-1.0.tar.gz).
You find "Source Wheel" easier to say / type than "sdist"? (I type it more than say it, but typing sdist is easier, and when I say it I usually go for "ess-dist, or suh-dist"). And ".sdist" seems like a perfectly good suffix to me. And the bikeshed should be a light purple colour :-) Paul

On Oct 5, 2015 8:41 AM, "Paul Moore" <p.f.moore@gmail.com> wrote:
On 5 October 2015 at 14:25, Donald Stufft <donald@stufft.io> wrote:
The main thing I wanted was something that wasn't a mouthful to say and
that
had a dedicated extension so we don't end up where the filename looks incredibly generic (requests-1.0.swhl vs requests-1.0.tar.gz).
You find "Source Wheel" easier to say / type than "sdist"? (I type it more than say it, but typing sdist is easier, and when I say it I usually go for "ess-dist, or suh-dist"). And ".sdist" seems like a perfectly good suffix to me.
How about something like .sdist.whl.zip ? There are already many tools with e.g. ZIP MIME extensions so that one can open the file and browse the archive with a file explorer without a platform specific file type association install step
And the bikeshed should be a light purple colour :-) Paul _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On October 5, 2015 at 9:45:29 AM, Wes Turner (wes.turner@gmail.com) wrote:
How about something like .sdist.whl.zip ?
There are already many tools with e.g. ZIP MIME extensions so that one can open the file and browse the archive with a file explorer without a platform specific file type association install step
-1 on using a .zip extension. I don’t really want people opening these up to inspect them unless they know what they are doing. For most people, they should just think of them as blobs of stuff. +0 on .sdist.whl instead of .swhl or .src.whl. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On October 5, 2015 at 9:41:28 AM, Paul Moore (p.f.moore@gmail.com) wrote:
On 5 October 2015 at 14:25, Donald Stufft wrote:
The main thing I wanted was something that wasn't a mouthful to say and that had a dedicated extension so we don't end up where the filename looks incredibly generic (requests-1.0.swhl vs requests-1.0.tar.gz).
You find "Source Wheel" easier to say / type than "sdist"? (I type it more than say it, but typing sdist is easier, and when I say it I usually go for "ess-dist, or suh-dist"). And ".sdist" seems like a perfectly good suffix to me.
Yea I do, but I have a stutter and "d" (particularly trying to transition to it from the "s") and "st" are hard for me to get the sounds out anyways (ironic given my name contains both a D and a St!). I don't know how hard it is for someone else to say, probably slightly harder but not by much. This is why I also hate the "distribution" term, I have a hard time saying the whole word without getting stuck in the "stribut" part. Anyways, name is the least important part to me as long as we get a dedicated extension out of it :) ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

sdists work well most of the time. Some problems 1 - hard to depend on something setup.py must import 2 - hard to use non-distutils build system due to the first problem and a poorly defined command line interface 3 - have to regenerate metadata to get dependencies Assume we are not ever going to be able to remove support for pip install git+https:// or the current sdist. So you need new-sdist for 3, but could add 1 & 2 within the current framework by using an elaborate generic setup.py or by adding a measurable amount of complexity to pip. Perhaps the underlying code for 1 & 2 is needed to implement features desired in new-sdist.

On October 5, 2015 at 12:05:53 PM, Daniel Holth (dholth@gmail.com) wrote:
sdists work well most of the time. Some problems
1 - hard to depend on something setup.py must import 2 - hard to use non-distutils build system due to the first problem and a poorly defined command line interface 3 - have to regenerate metadata to get dependencies
Assume we are not ever going to be able to remove support for pip install git+https:// or the current sdist.
So you need new-sdist for 3, but could add 1 & 2 within the current framework by using an elaborate generic setup.py or by adding a measurable amount of complexity to pip. Perhaps the underlying code for 1 & 2 is needed to implement features desired in new-sdist.
I'm not opposed to layering things ontop of the current sdist format in the interim to solve problems that currently exist until a better solution can be done. That's not what this PEP idea was though, it was creating a whole new sdist. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Mon, 5 Oct 2015 08:44:04 -0400 Donald Stufft <donald@stufft.io> wrote:
I don't think that it makes sense for pip to go directly from a VCS[1] to a Wheel in the grand scheme of things. Right now we kind of do it, but that's because we just treat them like an unpacked sdist [2], long term though I don't think that is the correct way to do things. We (I?) want to minimize the different installation "paths" that can be taken and the main variable then when you do a ``pip install`` is how far long that path we already are. My ideal path looks something like this:
VCS -> Source Wheel [3] -> Wheel -> Installed \-> Inplace Installation [4]
A valid use case may be to do an in-place installation from a sdist (although you may question the sanity of doing development from a source tree which isn't VCS-backed :-)). In any case, sdists being apt for human consumption is an important feature IMHO. Regards Antoine.

On October 5, 2015 at 9:25:24 AM, Antoine Pitrou (solipsis@pitrou.net) wrote:
On Mon, 5 Oct 2015 08:44:04 -0400 Donald Stufft wrote:
I don't think that it makes sense for pip to go directly from a VCS[1] to a Wheel in the grand scheme of things. Right now we kind of do it, but that's because we just treat them like an unpacked sdist [2], long term though I don't think that is the correct way to do things. We (I?) want to minimize the different installation "paths" that can be taken and the main variable then when you do a ``pip install`` is how far long that path we already are. My ideal path looks something like this:
VCS -> Source Wheel [3] -> Wheel -> Installed \-> Inplace Installation [4]
A valid use case may be to do an in-place installation from a sdist (although you may question the sanity of doing development from a source tree which isn't VCS-backed :-)).
In any case, sdists being apt for human consumption is an important feature IMHO.
I don't think so. Remember in my footnote I mentioned that I wasn't using VCS to mean literally "something checked into a VCS", but rather the more traditional (outside of Python) "source release" concept. This could be a simple tarball that just has the proper, for human consumption, files in it or it could be a VCS checkout, or it could just be some files sitting on disk. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Mon, 5 Oct 2015 09:28:48 -0400 Donald Stufft <donald@stufft.io> wrote:
In any case, sdists being apt for human consumption is an important feature IMHO.
I don't think so. Remember in my footnote I mentioned that I wasn't using VCS to mean literally "something checked into a VCS", but rather the more traditional (outside of Python) "source release" concept. This could be a simple tarball that just has the proper, for human consumption, files in it or it could be a VCS checkout, or it could just be some files sitting on disk.
But why use two different formats for "source release" and "sdists"? Currently sdists fit the assumptions for a source release, why introduce some complexity and have the users deal with separate concepts (with all the confusion that will inevitably ensue)? Regards Antoine.

Because the status quo, which is a single format, sucks. I don't think changing it so we invoke a Python function instead of a script is going to make it not suck. The fundamental reason it sucks (in my mind) is that we have a single format trying to do way too many things and we need to break that out into smaller chunks. You can see these needs are different I think by looking at how what Nathaniel wants differs from what me and Paul want. He wants something that will make the human side easier and will support different tools, we want something that pip can consume more reasonably. Trying to do too much with a singular format just means it sucks for all the uses cases instead of being great for one use case. I also don't think it will be confusing. They'll associate the VCS thing (a source release) as something focused on development for most everyone. Most people won't explicitly make one and nobody will be uploading it to PyPI. The end goal in my mind is someone produces a source wheel and uploads that to PyPI and PyPI takes it from there. Mucking around with manually producing binary wheels or producing source releases other than what's checked into vcs will be something that I suspect only advanced users will do. Sent from my iPhone
On Oct 5, 2015, at 9:39 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Mon, 5 Oct 2015 09:28:48 -0400 Donald Stufft <donald@stufft.io> wrote:
In any case, sdists being apt for human consumption is an important feature IMHO.
I don't think so. Remember in my footnote I mentioned that I wasn't using VCS to mean literally "something checked into a VCS", but rather the more traditional (outside of Python) "source release" concept. This could be a simple tarball that just has the proper, for human consumption, files in it or it could be a VCS checkout, or it could just be some files sitting on disk.
But why use two different formats for "source release" and "sdists"? Currently sdists fit the assumptions for a source release, why introduce some complexity and have the users deal with separate concepts (with all the confusion that will inevitably ensue)?
Regards
Antoine. _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On Mon, 5 Oct 2015 09:51:05 -0400 Donald Stufft <donald@stufft.io> wrote:
You can see these needs are different I think by looking at how what Nathaniel wants differs from what me and Paul want. He wants something that will make the human side easier and will support different tools, we want something that pip can consume more reasonably. Trying to do too much with a singular format just means it sucks for all the uses cases instead of being great for one use case.
That doesn't seem to follow. You can have regular human-compatible content and a few machine-compatible files besides (perhaps in a dedicated subdirectory). I don't see how that "sucks".
I also don't think it will be confusing. They'll associate the VCS thing (a source release) as something focused on development for most everyone. Most people won't explicitly make one and nobody will be uploading it to PyPI.
Well, what is the point of standardizing the concept of source releases if nobody produces them? Regards Antoine.

On October 5, 2015 at 10:52:22 AM, Antoine Pitrou (solipsis@pitrou.net) wrote:
On Mon, 5 Oct 2015 09:51:05 -0400 Donald Stufft wrote:
You can see these needs are different I think by looking at how what Nathaniel wants differs
from what me and Paul want. He wants something that will make the human side easier and will support different tools, we want something that pip can consume more reasonably. Trying to do too much with a singular format just means it sucks for all the uses cases instead of being great for one use case.
That doesn't seem to follow. You can have regular human-compatible content and a few machine-compatible files besides (perhaps in a dedicated subdirectory). I don't see how that "sucks”.
Machine compatible needs things to be as tightly specified as possible, human compatible needs it to be as loosely specified as possible. In addition, the original proposal had things like "version" being dynamically computed because it might change based on something in the directory (like if the VCS gets another commit). For a machine format, we want all metadata to be static, for something designed for humans to interact with and much around with the code we want something that is more dynamic. If you have two different files one for machines and one for humans, what happens if they disagree? There should be a single source of truth in any one artifact for all of the information. We currently have the situation you describe, a PKG-INFO that is static metadata and a setup.py that is aimed towards humans... but nothing uses the PKG-INFO because the source of truth is the setup.py and the PKG-INFO might be lies. So we have a lot of effort and hacks going in to make something that is best designed for human interaction to be used for machines.
I also don't think it will be confusing. They'll associate the VCS thing (a source release) as something focused on development for most everyone. Most people won't explicitly make one and nobody will be uploading it to PyPI.
Well, what is the point of standardizing the concept of source releases if nobody produces them?
Well, a VCS commit will be a form of one, just not one that people explicitly distribute. I don't think it makes a sense to standard this idea of a "source release" as something different other than what we need to bootstrap a VCS based build as well. So we need to support it for VCSs anyways, and divorcing the standard from VCSs makes it more robust and allows the idea of a .tar.gz based "VCS" build anyways. I also didn't say nobody would produce them. I said most people would have no need for explicit source releases (particularly in a tarball form) and wouldn't produce them. If I thought *nobody* would produce them, then I wouldn't be willing to have them standardized. I think that they might be useful for some projects that have more advanced needs. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Mon, Oct 5, 2015 at 6:51 AM, Donald Stufft <donald@stufft.io> wrote: [...]
I also don't think it will be confusing. They'll associate the VCS thing (a source release) as something focused on development for most everyone. Most people won't explicitly make one and nobody will be uploading it to PyPI. The end goal in my mind is someone produces a source wheel and uploads that to PyPI and PyPI takes it from there. Mucking around with manually producing binary wheels or producing source releases other than what's checked into vcs will be something that I suspect only advanced users will do.
Of course people will make source releases, and should be able to upload them to PyPI. The end goal is that *pip* will not use source releases, but PyPI is not just there for pip. If it was, it wouldn't even show package descriptions :-). There are projects on PyPI right now, today, that have no way to generate sdists and will never have any need for "source wheels" (because they don't use distutils and they build "none-any" wheels directly from their source). It should still be possible for them to upload source releases for all the other reasons that having source releases is useful: they form a permanent record of the whole project state (including potentially docs, tests, working notes, etc. that don't make it into the wheels), human users may well want to download those archives, Debian may prefer to use that as their orig.tar.gz, etc. etc. And on the other end of the complexity scale, there are projects like numpy where it's not clear to me whether they'll ever be able to support "source wheels", and even if they do they'll still need source releases to support user configuration at build time. -n -- Nathaniel J. Smith -- http://vorpus.org

On October 7, 2015 at 1:27:31 PM, Nathaniel Smith (njs@pobox.com) wrote:
On Mon, Oct 5, 2015 at 6:51 AM, Donald Stufft wrote: [...]
I also don't think it will be confusing. They'll associate the VCS thing (a source release) as something focused on development for most everyone. Most people won't explicitly make one and nobody will be uploading it to PyPI. The end goal in my mind is someone produces a source wheel and uploads that to PyPI and PyPI takes it from there. Mucking around with manually producing binary wheels or producing source releases other than what's checked into vcs will be something that I suspect only advanced users will do.
Of course people will make source releases, and should be able to upload them to PyPI. The end goal is that *pip* will not use source releases, but PyPI is not just there for pip. If it was, it wouldn't even show package descriptions :-).
There are projects on PyPI right now, today, that have no way to generate sdists and will never have any need for "source wheels" (because they don't use distutils and they build "none-any" wheels directly from their source). It should still be possible for them to upload source releases for all the other reasons that having source releases is useful: they form a permanent record of the whole project state (including potentially docs, tests, working notes, etc. that don't make it into the wheels), human users may well want to download those archives, Debian may prefer to use that as their orig.tar.gz, etc. etc.
And on the other end of the complexity scale, there are projects like numpy where it's not clear to me whether they'll ever be able to support "source wheels", and even if they do they'll still need source releases to support user configuration at build time.
We must have different ideas of what a source release vs source wheel would look like, because I'm having a hard time squaring what you've said here with what it looks like in my head. In my head, source releases (outside of the VCS use case) will be rare and only for very complex packages that are doing very complex things. Source wheels will be something that will be semi mandatory to being a well behaved citizen (for Debian and such to download) and binary wheels will be something that you'll want to have but aren't required. I don't see any reason why source wheels wouldn't include docs, tests, and other misc files. I picture building a binary wheel directly being something similar to using fpm to build binary .deb packages directly, totally possible but unadvised. Having talked to folks who deal with Debian/Fedora packages, they won't accept a binary wheel as the input source and (given how I explained it to them) they are excited about the concept of source wheels and moving away from dynamic metadata and towards static metadata. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Mon, Oct 5, 2015 at 6:51 AM, Donald Stufft wrote: [...]
I also don't think it will be confusing. They'll associate the VCS
On Wed, 7 Oct 2015 18:51 Donald Stufft <donald@stufft.io> wrote: On October 7, 2015 at 1:27:31 PM, Nathaniel Smith (njs@pobox.com) wrote: thing (a source release)
as something focused on development for most everyone. Most people won't explicitly make one and nobody will be uploading it to PyPI. The end goal in my mind is someone produces a source wheel and uploads that to PyPI and PyPI takes it from there. Mucking around with manually producing binary wheels or producing source releases other than what's checked into vcs will be something that I suspect only advanced users will do.
Of course people will make source releases, and should be able to upload them to PyPI. The end goal is that *pip* will not use source releases, but PyPI is not just there for pip. If it was, it wouldn't even show package descriptions :-).
There are projects on PyPI right now, today, that have no way to generate sdists and will never have any need for "source wheels" (because they don't use distutils and they build "none-any" wheels directly from their source). It should still be possible for them to upload source releases for all the other reasons that having source releases is useful: they form a permanent record of the whole project state (including potentially docs, tests, working notes, etc. that don't make it into the wheels), human users may well want to download those archives, Debian may prefer to use that as their orig.tar.gz, etc. etc.
And on the other end of the complexity scale, there are projects like numpy where it's not clear to me whether they'll ever be able to support "source wheels", and even if they do they'll still need source releases to support user configuration at build time.
We must have different ideas of what a source release vs source wheel would look like, because I'm having a hard time squaring what you've said here with what it looks like in my head. In my head, source releases (outside of the VCS use case) will be rare and only for very complex packages that are doing very complex things. Source wheels will be something that will be semi mandatory to being a well behaved citizen (for Debian and such to download) and binary wheels will be something that you'll want to have but aren't required. I don't see any reason why source wheels wouldn't include docs, tests, and other misc files. I picture building a binary wheel directly being something similar to using fpm to build binary .deb packages directly, totally possible but unadvised. Having talked to folks who deal with Debian/Fedora packages, they won't accept a binary wheel as the input source and (given how I explained it to them) they are excited about the concept of source wheels and moving away from dynamic metadata and towards static metadata. Your idea of an sdist as something that has fully static build/runtime dependency metadata and a one to one correspondence with binary wheels is not a usable format when releasing the code for e.g. numpy 1.10. It's fine to say that pip/PyPI should work with the source in some other distribution format and numpy could produce that but it means that the standard tarball release needs to be supported some how separately. Numpy should be able to use PyPI in order to host the tarball even if pip ignores the file. If numpy released only source wheels then there would be more than one source wheel for each release corresponding to e.g. the different ways that numpy is linked. There still needs to be a way to release a single file representing the code for the release as a whole. -- Oscar

On October 7, 2015 at 2:31:03 PM, Oscar Benjamin (oscar.j.benjamin@gmail.com) wrote:
Your idea of an sdist as something that has fully static build/runtime dependency metadata and a one to one correspondence with binary wheels is not a usable format when releasing the code for e.g. numpy 1.10. It's fine to say that pip/PyPI should work with the source in some other distribution format and numpy could produce that but it means that the standard tarball release needs to be supported some how separately. Numpy should be able to use PyPI in order to host the tarball even if pip ignores the file.
If numpy released only source wheels then there would be more than one source wheel for each release corresponding to e.g. the different ways that numpy is linked. There still needs to be a way to release a single file representing the code for the release as a whole.
Can you expand on this please? I've never used numpy for anything serious and I'm trying to figure out why and what parts of what I'm thinking of wouldn't work for it. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Wed, 7 Oct 2015 19:42 Donald Stufft <donald@stufft.io> wrote: On October 7, 2015 at 2:31:03 PM, Oscar Benjamin (oscar.j.benjamin@gmail.com) wrote:
Your idea of an sdist as something that has fully static build/runtime dependency metadata and a one to one correspondence with binary wheels is not a usable format when releasing the code for e.g. numpy 1.10. It's fine to say that pip/PyPI should work with the source in some other distribution format and numpy could produce that but it means that the standard tarball release needs to be supported some how separately. Numpy should be able to use PyPI in order to host the tarball even if pip ignores the file.
If numpy released only source wheels then there would be more than one source wheel for each release corresponding to e.g. the different ways that numpy is linked. There still needs to be a way to release a single file representing the code for the release as a whole.
Can you expand on this please? I've never used numpy for anything serious and I'm trying to figure out why and what parts of what I'm thinking of wouldn't work for it. Currently I can take the code from the numpy release and compile it in different incompatible ways. For example I could make a wheel that bundles a BLAS library. Or I could make a wheel that expects to use a system BLAS library that should be installed separately somehow or I could build a wheel against pyopenblas and make a wheel that depends on pyopenblas. Or I could link a BLAS library statically into numpy. A numpy release supports being compiled and linked in many different ways and will continue to do so regardless of any decisions made by PYPA. What that means is that there is not a one to one correspondence between a numpy release and a binary wheel. If there must be a one to one correspondence between a source wheel and a binary wheel then it follows that there cannot be a one to one correspondence between the source release and a source wheel. Of course numpy could say that they will only upload one particular source wheel and binary wheel to PyPI but people need to be able to use the source release in many different ways. So only releasing a source wheel that maps one to one to a particular way of compiling numpy is not an acceptable way for numpy to release its code. -- Oscar

On 7 October 2015 at 20:36, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Currently I can take the code from the numpy release and compile it in different incompatible ways. For example I could make a wheel that bundles a BLAS library. Or I could make a wheel that expects to use a system BLAS library that should be installed separately somehow or I could build a wheel against pyopenblas and make a wheel that depends on pyopenblas. Or I could link a BLAS library statically into numpy.
A numpy release supports being compiled and linked in many different ways and will continue to do so regardless of any decisions made by PYPA. What that means is that there is not a one to one correspondence between a numpy release and a binary wheel. If there must be a one to one correspondence between a source wheel and a binary wheel then it follows that there cannot be a one to one correspondence between the source release and a source wheel.
Of course numpy could say that they will only upload one particular source wheel and binary wheel to PyPI but people need to be able to use the source release in many different ways. So only releasing a source wheel that maps one to one to a particular way of compiling numpy is not an acceptable way for numpy to release its code.
The disconnect here seems to be that I view all of those wheels as being numpy 1.9.X wheels (or whatever). They differ in terms of compatibility details, but they are all wheels for the same project/version. So there's no problem with them all being built from the same source wheel. I also have no problem with it being possible to configure the build differently from a single source wheel, to generate all those wheels. The configuration isn't metadata, it's "just" settings for the build. Of course, there *is* an unsolved issue here, which is how we manage compatibility for wheels at the level needed for numpy. But I thought the discussion on that was ongoing? I'm concerned that this proposal is actually about bypassing that discussion, and instead trying to treat incompatibly linked wheels as "different" in terms of project metadata, which I think is the wrong way of handling things. I note that Christoph Gohlke's numpy builds are tagged with a "+mkl" local version modifier - that's presumably intended to mark the fact that they are built with an incompatible runtime - but that's a misuse of local versions (and I've found it causes niggling issues with how pip recognises upgrades, etc). So, in summary: Your points above don't seem to me to in any way preclude having a single numpy source wheel, and a number of (mutually incompatible, but the same in terms of project and version) binary wheels. Paul

On Wed, Oct 7, 2015 at 1:28 PM, Paul Moore <p.f.moore@gmail.com> wrote:
The disconnect here seems to be that I view all of those wheels as being numpy 1.9.X wheels (or whatever). They differ in terms of compatibility details, but they are all wheels for the same project/version. So there's no problem with them all being built from the same source wheel. I also have no problem with it being possible to configure the build differently from a single source wheel, to generate all those wheels. The configuration isn't metadata, it's "just" settings for the build.
But the different builds for the different configurations end up with different metadata. If I'm understanding right, the whole point of "source wheels" is that they have all the static metadata that pip needs in order to make decisions, and this has to match the resulting wheels -- right? The way I'm imagining it is that there are multiple levels of metadata staticness: package name, author, description, ... static in: VCS checkouts, source releases, source wheels, wheels package version static in: source releases, source wheels, wheels package dependencies static in: source wheels, wheels environment tag static in: wheels
Of course, there *is* an unsolved issue here, which is how we manage compatibility for wheels at the level needed for numpy. But I thought the discussion on that was ongoing? I'm concerned that this proposal is actually about bypassing that discussion, and instead trying to treat incompatibly linked wheels as "different" in terms of project metadata, which I think is the wrong way of handling things. I note that Christoph Gohlke's numpy builds are tagged with a "+mkl" local version modifier - that's presumably intended to mark the fact that they are built with an incompatible runtime - but that's a misuse of local versions (and I've found it causes niggling issues with how pip recognises upgrades, etc).
Yeah, that's not a good long term solution -- it needs to be moved into the metadata (probably by creating an MKL wheel and then making the numpy wheel depend on it). That's exactly the problem :-)
So, in summary: Your points above don't seem to me to in any way preclude having a single numpy source wheel, and a number of (mutually incompatible, but the same in terms of project and version) binary wheels.
Maybe I have misunderstood: does it actually help pip at all to have static access to name and version, but not to anything else? I've been assuming not, but I don't think anyone's pointed to any examples yet of the problems that pip is encountering due to the lack of static metadata -- would this actually be enough to solve them? -n -- Nathaniel J. Smith -- http://vorpus.org

On 7 October 2015 at 22:28, Nathaniel Smith <njs@pobox.com> wrote:
Maybe I have misunderstood: does it actually help pip at all to have static access to name and version, but not to anything else? I've been assuming not, but I don't think anyone's pointed to any examples yet of the problems that pip is encountering due to the lack of static metadata -- would this actually be enough to solve them?
The principle I am working on is that *all* metadata in a source wheel should be statically available - that's not just for pip, but for all other consumers, including distro packagers. What's not set in stone is precisely what (subsets of) metadata are appropriate for source wheels as opposed to (binary) wheels. So I'd counter your question with the converse - what metadata specifically are you unwilling to include statically in source wheels? My feeling is that there isn't anything you'd be unwilling to include that I'd consider as "source wheel metadata". Possibly the nearest we'd have to an issue is over allowing the build process to *add* dependencies to a binary wheel (e.g. a some builds depend on currently-hypothetical MKL wheel, which provides needed DLLs). I don't in principle object to that, but I'd like to see a fleshed out proposal on how wheels containing just DLLs (as opposed to Python packages) would work in practice - until we have a mechanism for building/distributing such wheels, I think it's premature to worry about specifying dependencies. But whatever comes out of this, the Metadata 2.0 spec should ultimately be updated to note which metadata is mandated in source wheels, and which in binary wheels only. Paul

On 7 October 2015 at 22:41, Paul Moore <p.f.moore@gmail.com> wrote:
On 7 October 2015 at 22:28, Nathaniel Smith <njs@pobox.com> wrote:
Maybe I have misunderstood: does it actually help pip at all to have static access to name and version, but not to anything else? I've been assuming not, but I don't think anyone's pointed to any examples yet of the problems that pip is encountering due to the lack of static metadata -- would this actually be enough to solve them?
The principle I am working on is that *all* metadata in a source wheel should be statically available - that's not just for pip, but for all other consumers, including distro packagers. What's not set in stone is precisely what (subsets of) metadata are appropriate for source wheels as opposed to (binary) wheels.
A concrete example would be whether or not the numpy source wheel depends on pyopenblas. Depending on how numpy is built the binary wheel may or may not depend on pyopenblas. It doesn't make any sense to say that the numpy source release depends on pyopenblas so what should be the dependencies of the source wheel? One possibility which I think is what Nathaniel is getting at is that there is a source release and then that could be used to generate different possible source wheels each of which would correspond to a particular configuration of numpy. Each source wheel would correspond to one binary wheel and have all static metadata but there still needs to be a separate source release that is used to generate the different source wheels. The step that turns a source wheel into a binary wheel would be analogous to the ./configure step in a typical makefile project. ./configure is used to specify the options corresponding to all the different ways of compiling and installing the project. After running ./configure the command "make" is unparametrised and performs the actual compilation: this step is analogous to converting a source wheel to a binary wheel. I think this satisfies all of the requirements for static metadata and one-to-one correspondence of source wheels and binary wheels. If numpy followed this then I imagine that there would be a single source wheel on PyPI corresponding to the one configuration that would be used consistently there. However numpy still needs to separately release the code in a form that is also usable in all of the many other contexts that it is already used. IOW they will need to continue to issue source releases in more or less the same form as today. It makes sense for PyPI to host the source release archives on the project page even if pip will simply ignore them. -- Oscar

On 8 October 2015 at 11:18, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
On 7 October 2015 at 22:41, Paul Moore <p.f.moore@gmail.com> wrote:
On 7 October 2015 at 22:28, Nathaniel Smith <njs@pobox.com> wrote:
Maybe I have misunderstood: does it actually help pip at all to have static access to name and version, but not to anything else? I've been assuming not, but I don't think anyone's pointed to any examples yet of the problems that pip is encountering due to the lack of static metadata -- would this actually be enough to solve them?
The principle I am working on is that *all* metadata in a source wheel should be statically available - that's not just for pip, but for all other consumers, including distro packagers. What's not set in stone is precisely what (subsets of) metadata are appropriate for source wheels as opposed to (binary) wheels.
A concrete example would be whether or not the numpy source wheel depends on pyopenblas. Depending on how numpy is built the binary wheel may or may not depend on pyopenblas. It doesn't make any sense to say that the numpy source release depends on pyopenblas so what should be the dependencies of the source wheel?
Well, I said this previously but I don't have any objections to the idea that binary wheels have additional dependencies - so the source wheel doesn't depend on pyopenblas but the binary does. But as I understand it, this is currently theoretical - there isn't yet any pyopenblas validate these speculations against? I say this not because I think the approach is invalid, but because I think there are probably a lot of untested questions that need answering. Let's expand the scenario a bit. The user (presumably) still just says "python -m pip install numpy". What happens then? 1. Assume there's a binary wheel that's compatible with the user's platform. 1a. If there are multiple compatible binary wheels, pip chooses the "most compatible" so we're safe to assume there's only one. [1] 2. Looking at the dependencies, say it depends on pyopenblas. So pip needs to install pyopenblas. 2a. If there's a compatible wheel for pyopenblas, pip installs that too. 2b. If there's no compatible pyopenblas wheel, pip falls back to a source wheel, builds it, and uses that. If the build fails, the whole numpy install fails. 3. If there's no compatible numpy binary wheel, pip gets the source wheel and builds it. There's no user interaction possible here [2], so the build uses whatever defaults the numpy build process identifies as "most appropriate" for the user's platform. This may be simply a lowest common denominator, or it may do some form of introspection of the user's system to get the best possible build. Either way, a wheel is generated that's known to work on the user's system, so there should be no additional dependencies injected at this point, and pip will use that wheel directly. The only constraint here is that a binary numpy wheel built with the default options on a given machine from a numpy source wheel cannot have extra dependencies that aren't known to be already satisfied by the user's system, because by the time pip generates a wheel from the source wheel, it's finished doing dependency resolution so any new dependencies won't get checked. I don't see it as a problem for any hypothetical new build system to conform to this constraint - by default a built wheel must work on the system it's built on. All it means is that to build binaries with additional dependencies must be done manually, supplying options describing your intent. [1] Dependencies are *not* considered as part of the compatibility matching, so it's correct that this step happens before the dependency checks. Maybe you're assuming that if there are two wheels, one depending on pyopenblas and one not, then if the user doesn't have pyopenblas installed the wheel that doesn't depend on it will be used? But that's not how pip works. [2] When pip runs installs, it does so non-interactively. Whatever command pip uses to build a wheel ("python setup.py bdist_wheel" at the moment) must run without user interaction and produce a wheel that is compatible with the user's environment. So unless I'm mistaken about what you're saying, I don't see any issue here. Unless you're saying that you're not willing to work under some of the constraints I describe above - but in that case, you need pip's compatibility matching, dependency resolution, or automated wheel build processes to change. That's fine but to move the discussion forwards, we'd then need to understand (and agree with) whatever changes you need in pip. At the moment, I'm not aware that anyone has asked for substantive changes to pip's behaviour in these areas as part of this proposal.
One possibility which I think is what Nathaniel is getting at is that there is a source release and then that could be used to generate different possible source wheels each of which would correspond to a particular configuration of numpy. Each source wheel would correspond to one binary wheel and have all static metadata but there still needs to be a separate source release that is used to generate the different source wheels.
That's possible, but what would these multiple source wheels be called? They couldn't all be called "numpy" as how would the user say which one they wanted? Pip can't decide. They can't be called numpy and distinguished by versions, as then how would you decide whether "numpy with openblas" is "newer" or "older" than "numpy with MKL"? That's the issue with Christoph Gohlke's current means of versioning his MKL builds. So you're looking at multiple PyPI projects, one for each "flavour" of numpy. Or you're looking at changes to how PyPI and pip define a "project". Neither of those options sound particularly straightforward to me.
The step that turns a source wheel into a binary wheel would be analogous to the ./configure step in a typical makefile project. ./configure is used to specify the options corresponding to all the different ways of compiling and installing the project. After running ./configure the command "make" is unparametrised and performs the actual compilation: this step is analogous to converting a source wheel to a binary wheel.
But the Python (PyPI/pip) model is different from the autoconf "typical makefile project" model. There's no configure step. If you're proposing that we add one, then that's a pretty major change in structure and would have some fairly wide-ranging impacts (on PyPI and pip, and also on 3rd party projects like bandersnatch and devpi). I don't think we're even close to understanding how we'd manage such a change.
I think this satisfies all of the requirements for static metadata and one-to-one correspondence of source wheels and binary wheels. If numpy followed this then I imagine that there would be a single source wheel on PyPI corresponding to the one configuration that would be used consistently there. However numpy still needs to separately release the code in a form that is also usable in all of the many other contexts that it is already used. IOW they will need to continue to issue source releases in more or less the same form as today. It makes sense for PyPI to host the source release archives on the project page even if pip will simply ignore them.
So you're talking about numpy only supporting one configuration via PyPI, and expecting any other configurations to be made available only via other channels? I guess you could do that, but I hope you won't. It feels to me like giving up before we've properly tried to understand the issues. Paul

On 8 October 2015 at 12:46, Paul Moore <p.f.moore@gmail.com> wrote:
On 8 October 2015 at 11:18, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
A concrete example would be whether or not the numpy source wheel depends on pyopenblas. Depending on how numpy is built the binary wheel may or may not depend on pyopenblas. It doesn't make any sense to say that the numpy source release depends on pyopenblas so what should be the dependencies of the source wheel?
Well, I said this previously but I don't have any objections to the idea that binary wheels have additional dependencies - so the source wheel doesn't depend on pyopenblas but the binary does.
Okay, I guess I'm confused by what you mean when you say that a source wheel (or sdist) should have a "one-to-one" correspondence with a binary wheel.
But as I understand it, this is currently theoretical - there isn't yet any pyopenblas validate these speculations against?
I don't think pyopenblas is ready yet but it is being developed with specifically this scenario in mind. <snip>
So unless I'm mistaken about what you're saying, I don't see any issue here. Unless you're saying that you're not willing to work under some of the constraints I describe above
As an aside : I'm not a contributor to numpy. I just use it a lot and teach people how to use it (which is where the packaging problems come in).
- but in that case, you need pip's compatibility matching, dependency resolution, or automated wheel build processes to change. That's fine but to move the discussion forwards, we'd then need to understand (and agree with) whatever changes you need in pip. At the moment, I'm not aware that anyone has asked for substantive changes to pip's behaviour in these areas as part of this proposal.
I don't think anyone is suggesting significant changes to pip's dependency resolution. Compatibility matching does need improvement IMO. Also the automated build process does need to be changed - specifically we need build-requires so that third party build tools can work. I didn't think improving the build process was controversial... <snip>
I think this satisfies all of the requirements for static metadata and one-to-one correspondence of source wheels and binary wheels. If numpy followed this then I imagine that there would be a single source wheel on PyPI corresponding to the one configuration that would be used consistently there. However numpy still needs to separately release the code in a form that is also usable in all of the many other contexts that it is already used. IOW they will need to continue to issue source releases in more or less the same form as today. It makes sense for PyPI to host the source release archives on the project page even if pip will simply ignore them.
So you're talking about numpy only supporting one configuration via PyPI, and expecting any other configurations to be made available only via other channels? I guess you could do that, but I hope you won't. It feels to me like giving up before we've properly tried to understand the issues.
Okay so again I'm not a numpy dev. Numpy already supports being used in lots of setups that are not via pypi. Apart from Cristoph's builds you have all kinds of people building on all kinds of OSes and linking with different BLAS libraries in different ways. Some people will compile numpy statically with CPython. If you follow the discussions about numpy development it's clear that the numpy devs don't know all of the ways that numpy is built and used. Clearly pip/PyPI cannot be used to statically link numpy with CPython or for all of the different (often non-redistributable) BLAS libraries so numpy will support some setups that are not possible through pip. That's fine, I don't see the problem with that. At the moment an sdist is the same thing is a source release. If you propose to change it so that projects should upload source wheels and then make source wheels something tightly defined (e.g. a zip file containing exactly two directories setup to build for one particular configuration) then there needs to be a separate way to simply release the code in traditional format as is done now. -- Oscar

On 8 October 2015 at 13:39, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
On 8 October 2015 at 12:46, Paul Moore <p.f.moore@gmail.com> wrote:
On 8 October 2015 at 11:18, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
A concrete example would be whether or not the numpy source wheel depends on pyopenblas. Depending on how numpy is built the binary wheel may or may not depend on pyopenblas. It doesn't make any sense to say that the numpy source release depends on pyopenblas so what should be the dependencies of the source wheel?
Well, I said this previously but I don't have any objections to the idea that binary wheels have additional dependencies - so the source wheel doesn't depend on pyopenblas but the binary does.
Okay, I guess I'm confused by what you mean when you say that a source wheel (or sdist) should have a "one-to-one" correspondence with a binary wheel.
OK, let me try to clarify: 1. The identifying data is name and version. There should only ever be *one* source wheel with a given name/version. 2. It's possible to have multiple binary wheels derived from a source wheel, but they must all have the same name/version as the source wheel, and only one will ever be considered as the "right" one for a given system. (I.e., compatibility matching takes place and a single best match is selected). The key things here is that binary wheels that differ only in compatibility tags are considered "the same" in this sense (think equivalence class, if you have a mathematical background). Things that may have muddied the water: 1. When I say "all metadata in a source wheel must be static". Technically, tools (pip) don't care about anything other than name and version. The rest is mostly for things like publishing on PyPI and distro packaging, which is not a "core" use. 2. The whole pyopenblas question, which as I understand it is part of the effort to work around the limitations of the compatibility tag mechanism by making binary library dependencies into package dependencies. That approach looks promising, but it moves some of the "what is the appropriate wheel for this environment" question away from compatibility tags and into dependency checking. Which is a non-trivial change to the model. I'd like to see the "binary dependency as package dependency" approach explored properly, but it's completely orthogonal to the question of how source wheels should work. You'd have the same issues with the current sdist format. I'd suggest that the work around this area be separated out from the revamped source distribution discussion. It could probably even be prototyped without getting involved with numpy - how about creating a test project that linked either statically or dynamically to a dummy external library, and then sorting out the wheel issues with that (project testproj depends on project testlib if it's built to link dynamically, and doesn't if it's built statically)? You could do that without anyone needing to understand about BLAS, or numpy, and get much better informed feedback from the wider distutils-sig than you currently get (because you keep needing to explain numpy build processes every time anyone asks a question). For the purposes of this discussion, however, I'd suggest that we assume that building a binary wheel from a source wheel *doesn't* add any new dependencies. Apart from the pyopenblas case, I don't know of any other case where that would happen. (If making that restriction makes the whole discussion trivial, then that's sort of my point - I don't think there *is* a big debate to be had outside of the pyopenblas question). Paul

On Thu, Oct 8, 2015 at 1:18 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
I think this satisfies all of the requirements for static metadata and one-to-one correspondence of source wheels and binary wheels. If numpy followed this then I imagine that there would be a single source wheel on PyPI corresponding to the one configuration that would be used consistently there. However numpy still needs to separately release the code in a form that is also usable in all of the many other contexts that it is already used.
Can't that configuration just be the build defaults? There would be a single source but with some preset build configuration. People with different needs can just override those. Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro

On 8 October 2015 at 13:05, Ionel Cristian Mărieș <contact@ionelmc.ro> wrote:
On Thu, Oct 8, 2015 at 1:18 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
I think this satisfies all of the requirements for static metadata and one-to-one correspondence of source wheels and binary wheels. If numpy followed this then I imagine that there would be a single source wheel on PyPI corresponding to the one configuration that would be used consistently there. However numpy still needs to separately release the code in a form that is also usable in all of the many other contexts that it is already used.
Can't that configuration just be the build defaults? There would be a single source but with some preset build configuration. People with different needs can just override those.
Yeah, I guess so. Maybe I'm just not understanding what the "one-to-one" correspondence is supposed to mean. Earlier in the thread it was said to be important because of wheel caching etc. but if it's possible to configure different builds then it's not really one-to-one. -- Oscar

On October 8, 2015 at 8:48:16 AM, Oscar Benjamin (oscar.j.benjamin@gmail.com) wrote:
On 8 October 2015 at 13:05, Ionel Cristian Mărieș wrote:
On Thu, Oct 8, 2015 at 1:18 PM, Oscar Benjamin wrote:
I think this satisfies all of the requirements for static metadata and one-to-one correspondence of source wheels and binary wheels. If numpy followed this then I imagine that there would be a single source wheel on PyPI corresponding to the one configuration that would be used consistently there. However numpy still needs to separately release the code in a form that is also usable in all of the many other contexts that it is already used.
Can't that configuration just be the build defaults? There would be a single source but with some preset build configuration. People with different needs can just override those.
Yeah, I guess so. Maybe I'm just not understanding what the "one-to-one" correspondence is supposed to mean. Earlier in the thread it was said to be important because of wheel caching etc. but if it's possible to configure different builds then it's not really one-to-one.
One of the features in the original PEP was the ability to produce multiple different Wheels from the same source release much like how Debian does. e.g. numpy-1.0.newsdistthing could produce numpy-pyopenblas-12.6.whl and numpy-mkl-7.8.whl, etc etc where there would be a bunch of names/versions that would differ from the name/version of the original sdist thing that was being proposed. That won't work with our packaging toolchain, the idea that their a singular name/version for one sdist (and then for the wheels that produces) is pretty heavily baked into the entire toolchain. That’s what I meant by wheels and sdists being 1:1. As far as static metadata goes, I think that one of my earlier messages tried to get across the idea that if there is a good reason for something to be dynamic then we can possibly do that, but that the current PEP went too far and made (well kept) *everything* dynamic. My point was that we should assume static for all metadata and then make exceptions for the data that we can't assume that for, but each case should be properly documented with motivations or *why* that can't be static. That will give everyone else the ability to see the use case, and figure out if that’s a use case we want to support, if we like how the PEP is supporting it, or if there is possibly some other feature we could add instead that would still support that, while offering the static-ness that we desire. Or if not, at least we'll have it documented as to *why* it needs to be dynamic. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Thu, Oct 8, 2015 at 4:01 PM, Donald Stufft <donald@stufft.io> wrote:
One of the features in the original PEP was the ability to produce multiple different Wheels from the same source release much like how Debian does. e.g. numpy-1.0.newsdistthing could produce numpy-pyopenblas-12.6.whl and numpy-mkl-7.8.whl, etc etc where there would be a bunch of names/versions that would differ from the name/version of the original sdist thing that was being proposed.
Sorry if this sounds obtuse but isn't that useless overspecialization? They can just publish `numpy-mlk` and `numpy-thatblas` or whatever on PyPI, and that will even work better when it comes to dependencies. I mean, if you build something for `numpy-mkl` then it wouldn't work on a `numpy-otherblas` anyway right? Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro

On 8 October 2015 at 14:34, Ionel Cristian Mărieș <contact@ionelmc.ro> wrote:
On Thu, Oct 8, 2015 at 4:01 PM, Donald Stufft <donald@stufft.io> wrote:
One of the features in the original PEP was the ability to produce multiple different Wheels from the same source release much like how Debian does. e.g. numpy-1.0.newsdistthing could produce numpy-pyopenblas-12.6.whl and numpy-mkl-7.8.whl, etc etc where there would be a bunch of names/versions that would differ from the name/version of the original sdist thing that was being proposed.
Sorry if this sounds obtuse but isn't that useless overspecialization? They can just publish `numpy-mlk` and `numpy-thatblas` or whatever on PyPI, and that will even work better when it comes to dependencies. I mean, if you build something for `numpy-mkl` then it wouldn't work on a `numpy-otherblas` anyway right?
It depends. If you're using numpy from pure Python code the difference between mkl and otherblas is probably irrelevant. So in most cases you'd want to be able to depend just on "numpy" but in some cases you'd need to be more specific. Perhaps you could solve that with "provides"... Really though it's probably best to keep the set of binaries on PyPI internally consistent and not try to represent everything. My point earlier was that regardless of what goes on PyPI as the official numpy wheel there will be many people using the numpy code in other ways. If pip is not the only consumer of a source release then it's not really reasonable to dictate (and redesign in a less human-friendly way) its layout purely for pip's benefit. -- Oscar

On Thu, Oct 8, 2015 at 4:51 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
It depends. If you're using numpy from pure Python code the difference between mkl and otherblas is probably irrelevant. So in most cases you'd want to be able to depend just on "numpy" but in some cases you'd need to be more specific. Perhaps you could solve that with "provides"...
Really though it's probably best to keep the set of binaries on PyPI internally consistent and not try to represent everything. My point earlier was that regardless of what goes on PyPI as the official numpy wheel there will be many people using the numpy code in other ways. If pip is not the only consumer of a source release then it's not really reasonable to dictate (and redesign in a less human-friendly way) its layout purely for pip's benefit.
Yes indeed. But then shouldn't we talk about proper dependency resolution, compatible releases, meta packages and stuff like that? Unless I completely misunderstood the discussion here (quite probable :-) then this whole multiple source distributions idea is more like a workaround. Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro

On Oct 8, 2015 9:23 AM, "Ionel Cristian Mărieș" <contact@ionelmc.ro> wrote:
On Thu, Oct 8, 2015 at 4:51 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com>
It depends. If you're using numpy from pure Python code the difference between mkl and otherblas is probably irrelevant. So in most cases you'd want to be able to depend just on "numpy" but in some cases you'd need to be more specific. Perhaps you could solve that with "provides"...
Really though it's probably best to keep the set of binaries on PyPI internally consistent and not try to represent everything. My point earlier was that regardless of what goes on PyPI as the official numpy wheel there will be many people using the numpy code in other ways. If pip is not the only consumer of a source release then it's not really reasonable to dictate (and redesign in a less human-friendly way) its layout purely for pip's benefit.
Yes indeed. But then shouldn't we talk about proper dependency resolution, compatible releases, meta packages and stuff like that? Unless I completely misunderstood the discussion here (quite probable :-) then
wrote: this whole multiple source distributions idea is more like a workaround. so, because install_requires and extras_require are computed at [egg-info] time, an sdist['s metadata] is/can/maybe technically different on different platforms, no? because of things like if sys.platform in [...]: INSTALL_REQUIRES.extend([...])
Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On Oct 8, 2015 8:34 AM, "Ionel Cristian Mărieș" <contact@ionelmc.ro> wrote:
On Thu, Oct 8, 2015 at 4:01 PM, Donald Stufft <donald@stufft.io> wrote:
One of the features in the original PEP was the ability to produce
multiple
different Wheels from the same source release much like how Debian does. e.g. numpy-1.0.newsdistthing could produce numpy-pyopenblas-12.6.whl and numpy-mkl-7.8.whl, etc etc where there would be a bunch of names/versions that would differ from the name/version of the original sdist thing that was being proposed.
Sorry if this sounds obtuse but isn't that useless overspecialization? They can just publish `numpy-mlk` and `numpy-thatblas` or whatever on PyPI, and that will even work better when it comes to dependencies. I mean, if you build something for `numpy-mkl` then it wouldn't work on a `numpy-otherblas` anyway right?
from a reproducibility standpoint, when I run e.g. pip freeze, how (without e.g. numpy-mkl, numpy-blas3k) do I reinstall the same package? * { } URI params, post-#frag=ment encoding? * {+1} prefix-[suffix] how do I install the same version of the package that [you had / this Jupyter notebook has (version_information, watermark)] on [your / their] machine?
Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On Wed, Oct 7, 2015 at 2:41 PM, Paul Moore <p.f.moore@gmail.com> wrote: [...]
Possibly the nearest we'd have to an issue is over allowing the build process to *add* dependencies to a binary wheel (e.g. a some builds depend on currently-hypothetical MKL wheel, which provides needed DLLs). I don't in principle object to that, but I'd like to see a fleshed out proposal on how wheels containing just DLLs (as opposed to Python packages) would work in practice - until we have a mechanism for building/distributing such wheels, I think it's premature to worry about specifying dependencies.
Just to answer this: There's no formal proposal because it doesn't involve distutils-sig -- it's something that we already know how to do and are working on implementing :-). The only reason it doesn't already exist on PyPI is that Windows is the main target, and toolchain problems mean that we aren't yet able to build numpy or scipy wheels on Windows at all. (Long story...) The basic idea though is just, we make a "python package" with trivial structure: pyopenblas/ __init__.py openblas.dll include/ ... Usage: # in downstream package setup.py Extension(..., include_dirs=... + pyopenblas.get_include(), linker_arguments=... + pyopenblas.get_linker_arguments(), ...) # in downstream package __init__.py import pyopenblas pyopenblas.enable() Implementation: pyopenblas/__init__.py contains some code like: DIR = os.path.dirname(__file__) def get_include(): return [os.path.join(DIR, "include")] def get_linker_arguments(): return ["-L" + DIR, "-lpyopenblas"] def enable(): # Platform specific code to let the runtime linker find libopenblas.so if WINDOWS: # ctypes magic to preload the DLL elif LINUX: # modify os.environ["LD_LIBRARY_PATH"] else: ... -n -- Nathaniel J. Smith -- http://vorpus.org

On 11 October 2015 at 05:01, Nathaniel Smith <njs@pobox.com> wrote:
There's no formal proposal because it doesn't involve distutils-sig -- it's something that we already know how to do and are working on implementing :-). The only reason it doesn't already exist on PyPI is that Windows is the main target, and toolchain problems mean that we aren't yet able to build numpy or scipy wheels on Windows at all. (Long story...)
The basic idea though is just, we make a "python package" with trivial structure:
Cool, thanks for the clarification - I hadn't realised it was as straightforward as this. Paul

On October 7, 2015 at 5:28:54 PM, Nathaniel Smith (njs@pobox.com) wrote:
Yeah, that's not a good long term solution -- it needs to be moved into the metadata (probably by creating an MKL wheel and then making the numpy wheel depend on it). That's exactly the problem :-)
Are you available on IRC or for a video call or something? I feel like there's something foundational from both sides that we're each missing here and it'd be easier to just hash it out in real time rather than lobbying random emails coming from places of confusion (at least on my side). I'm not sure if Paul (or anyone else!) would want to jump in on it too, though I feel like probably if it's me and you then the two "sides" will probably be reasonably well represented so if more folks don't want to join that's probably OK too, particularly since we wouldn't be making any actual decisions there :D ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Oct 7, 2015 2:58 PM, "Donald Stufft" <donald@stufft.io> wrote:
On October 7, 2015 at 5:28:54 PM, Nathaniel Smith (njs@pobox.com) wrote:
Yeah, that's not a good long term solution -- it needs to be moved into the metadata (probably by creating an MKL wheel and then making the numpy wheel depend on it). That's exactly the problem :-)
Are you available on IRC or for a video call or something? I feel like
something foundational from both sides that we're each missing here and it'd be easier to just hash it out in real time rather than lobbying random emails coming from places of confusion (at least on my side).
I'm not sure if Paul (or anyone else!) would want to jump in on it too,
I feel like probably if it's me and you then the two "sides" will
reasonably well represented so if more folks don't want to join that's
OK too, particularly since we wouldn't be making any actual decisions
there's though probably be probably there :D This does sound like it would be a good idea -- couldn't hurt, anyway :-). I'll contact you offlist. If anyone else wants to join in, email me... -n

On Oct 8, 2015 8:14 AM, "Nathaniel Smith" <njs@pobox.com> wrote:
On Oct 7, 2015 2:58 PM, "Donald Stufft" <donald@stufft.io> wrote:
On October 7, 2015 at 5:28:54 PM, Nathaniel Smith (njs@pobox.com) wrote:
Yeah, that's not a good long term solution -- it needs to be moved into the metadata (probably by creating an MKL wheel and then making the numpy wheel depend on it). That's exactly the problem :-)
Are you available on IRC or for a video call or something? I feel like
something foundational from both sides that we're each missing here and it'd be easier to just hash it out in real time rather than lobbying random emails coming from places of confusion (at least on my side).
I'm not sure if Paul (or anyone else!) would want to jump in on it too,
I feel like probably if it's me and you then the two "sides" will
reasonably well represented so if more folks don't want to join that's
OK too, particularly since we wouldn't be making any actual decisions
there's though probably be probably there :D
This does sound like it would be a good idea -- couldn't hurt, anyway
:-). I'll contact you offlist. If anyone else wants to join in, email me... Looks like this is happening tomorrow (Fri Oct 9) at 11 am California / 2 pm New York / 7 pm London. Since there's been at least some interest, we'll do it as a Google hangout and send a link around in case anyone wants to listen in, and of course summarize back to the list. -n

On Oct 8, 2015 09:33, "Nathaniel Smith" <njs@pobox.com> wrote:
On Oct 8, 2015 8:14 AM, "Nathaniel Smith" <njs@pobox.com> wrote:
On Oct 7, 2015 2:58 PM, "Donald Stufft" <donald@stufft.io> wrote:
On October 7, 2015 at 5:28:54 PM, Nathaniel Smith (njs@pobox.com)
Yeah, that's not a good long term solution -- it needs to be moved into the metadata (probably by creating an MKL wheel and then making the numpy wheel depend on it). That's exactly the problem :-)
Are you available on IRC or for a video call or something? I feel
something foundational from both sides that we're each missing here and it'd be easier to just hash it out in real time rather than lobbying random emails coming from places of confusion (at least on my side).
I'm not sure if Paul (or anyone else!) would want to jump in on it too, though I feel like probably if it's me and you then the two "sides" will
reasonably well represented so if more folks don't want to join
OK too, particularly since we wouldn't be making any actual decisions
This does sound like it would be a good idea -- couldn't hurt, anyway
:-). I'll contact you offlist. If anyone else wants to join in, email me...
Looks like this is happening tomorrow (Fri Oct 9) at 11 am California / 2
wrote: like there's probably be that's probably there :D pm New York / 7 pm London. Since there's been at least some interest, we'll do it as a Google hangout and send a link around in case anyone wants to listen in, and of course summarize back to the list. Correction: happening *Monday Oct 12* at 11 am California / 2 pm New York / 7 pm London. -n

How many people can join a hangout? we may be bumping up against that limit :-) -CHB On Thu, Oct 8, 2015 at 12:58 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Oct 8, 2015 09:33, "Nathaniel Smith" <njs@pobox.com> wrote:
On Oct 8, 2015 8:14 AM, "Nathaniel Smith" <njs@pobox.com> wrote:
On Oct 7, 2015 2:58 PM, "Donald Stufft" <donald@stufft.io> wrote:
On October 7, 2015 at 5:28:54 PM, Nathaniel Smith (njs@pobox.com)
Yeah, that's not a good long term solution -- it needs to be moved into the metadata (probably by creating an MKL wheel and then making the numpy wheel depend on it). That's exactly the problem :-)
Are you available on IRC or for a video call or something? I feel
something foundational from both sides that we're each missing here and it'd be easier to just hash it out in real time rather than lobbying random emails coming from places of confusion (at least on my side).
I'm not sure if Paul (or anyone else!) would want to jump in on it too, though I feel like probably if it's me and you then the two "sides" will
reasonably well represented so if more folks don't want to join
wrote: like there's probably be that's probably
OK too, particularly since we wouldn't be making any actual decisions there :D
This does sound like it would be a good idea -- couldn't hurt, anyway :-). I'll contact you offlist. If anyone else wants to join in, email me...
Looks like this is happening tomorrow (Fri Oct 9) at 11 am California / 2 pm New York / 7 pm London. Since there's been at least some interest, we'll do it as a Google hangout and send a link around in case anyone wants to listen in, and of course summarize back to the list.
Correction: happening *Monday Oct 12* at 11 am California / 2 pm New York / 7 pm London.
-n
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Fri, Oct 9, 2015 at 7:05 PM, Chris Barker <chris.barker@noaa.gov> wrote:
How many people can join a hangout? we may be bumping up against that limit :-)
AFAIK there's no limit on the number of people that can listen in. Also, Hangouts can record video (the "On Air" thing). Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro

Listening in to an On Air session is unbounded, but direct participants are capped at 10. On Fri, 9 Oct 2015 at 09:17 Ionel Cristian Mărieș <contact@ionelmc.ro> wrote:
On Fri, Oct 9, 2015 at 7:05 PM, Chris Barker <chris.barker@noaa.gov> wrote:
How many people can join a hangout? we may be bumping up against that limit :-)
AFAIK there's no limit on the number of people that can listen in. Also, Hangouts can record video (the "On Air" thing).
Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

I"ve been following this thread and gotten a bit lost. But I do care, because I'm a heavy numpy user, and also because I was involved for years in building pacakges for OS-X, and currently need distribute some of my own stuff that has semi-ugly C lib dependencies (Not as ugly as BLAS.,though :-) ) In the last couple years I gave up on PyPi and pip (and went with Anaconda and conda)-- that ecosystem simply doesn't currently support my use-cases. But it would be nice to go back, and it looks like there are some idea on the table that will make that possible. But could someone clarify a thing or two for me?? 1) what in the world is a "source wheel"? And how is it different than an sdist (other than maybe in a different file format. 2) Is it indeed "OK" with the current PEPs and tools for different binary wheels to have different dependencies? This would be the example of, for instance the Matplotlib binary wheel for Windows depends on a py_zlib, whereas the binary wheel for OS-X relies on the the system lib, and therefor does not have that dependency? (and has anyone worked out the linking issues so that that would all work with virtualenv and friends...) if (2) then it seems the issue is what to do with the same package on the same platform having potentially different dependencies -- i.e. numpy w/ mkl and numpy w/some_other_blas. In that case, I think that this could completely explode into m**n possible wheels if we try to accommodate it in a fully flexible manner -- so making it a name-thing (like was proposed here), makes sense to me -- numpy_mkl and numpy_openblas are, as far as pip is concerned, different packages. I think this is OK, as we probably only want some small subset of possible build up on PyPi anyway (only one?). But it does get a bit tricky if you want to put up a package that depends on "pure" numpy -- i.e. it doesn't care which BLAS the numpy it uses, but it DOES need numpy. This could this be Accomidated byt have an "or" option for dependencies: numpy_mkl>=1.9 or numpy_openblas >= 1.9 However, that would mean that the author of that wheel would need to know all of the wheels that might be available up front - less than ideal. Which seems to point more to having an optional "binary_build" component to the name. Not sure what syntax is available, but the idea would be that: "numpy|mkl" and "numpy|openblas" would both match "numpy" which, of course would require changes to the whole stack... and off the top of my head, I'm wondering whether having one "binary_build' flag would be enough, or if we'd find that we wanted n options, and n*m combinations, and this would all blow up. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 9 October 2015 at 18:04, Chris Barker <chris.barker@noaa.gov> wrote:
1) what in the world is a "source wheel"? And how is it different than an sdist (other than maybe in a different file format.
A "source wheel" is the proposed name for a to-be-defined replacement for sdists. For now, you can think of "source wheel" and "sdist" as the same.
2) Is it indeed "OK" with the current PEPs and tools for different binary wheels to have different dependencies? This would be the example of, for instance the Matplotlib binary wheel for Windows depends on a py_zlib, whereas the binary wheel for OS-X relies on the the system lib, and therefor does not have that dependency? (and has anyone worked out the linking issues so that that would all work with virtualenv and friends...)
It's not *currently* OK for different binary wheels to have different dependencies. At least I don't think it is. It's basically not something that as far as I'm aware anyone has ever considered an option up till now, and so it's quite likely that there are assumptions baked into the tools that would break if different builds of (a given version of) a package had different dependencies. One of the proposed approaches to binary dependencies (the one I've been referring to as "the pyopenblas approach") is based on the idea that different wheels could have different dependency metadata. I've tried to enumerate the questions that need to be looked at if we were to go down that route, but to my knowledge, no-one has yet either tested how well things would work if this happened, or audited the code for problematic assumptions. So it's a possibility for the future, but like many things it depends on someone doing the work to make it happen (and as far as I know it's mostly the numpy community who have the need, so I assume someone in that community would need to move this forward). Paul

On 10/09/2015 11:18 AM, Paul Moore wrote:
On 9 October 2015 at 18:04, Chris Barker <chris.barker@noaa.gov> wrote:
1) what in the world is a "source wheel"? And how is it different than an sdist (other than maybe in a different file format.
A "source wheel" is the proposed name for a to-be-defined replacement for sdists. For now, you can think of "source wheel" and "sdist" as the same.
2) Is it indeed "OK" with the current PEPs and tools for different binary wheels to have different dependencies? This would be the example of, for instance the Matplotlib binary wheel for Windows depends on a py_zlib, whereas the binary wheel for OS-X relies on the the system lib, and therefor does not have that dependency? (and has anyone worked out the linking issues so that that would all work with virtualenv and friends...)
It's not *currently* OK for different binary wheels to have different dependencies. At least I don't think it is. It's basically not something that as far as I'm aware anyone has ever considered an option up till now, and so it's quite likely that there are assumptions baked into the tools that would break if different builds of (a given version of) a package had different dependencies.
AFAIK this is actually just fine currently, it's just not considered ideal for a hopeful future static-metadata world. Today, in the all-metadata-is-dynamic world that we actually live in, the tools aren't able to make any assumptions at all, and just have to download whatever best-match wheel or sdist they find for a given package/version requirement, unpack it, and see what it says about its own metadata (including dependencies). So different binary wheels of the same package at the same version having different dependencies works just fine. It's not like any of the tooling would actually know the difference, since they will only deal with one of those wheels at a given time. But of course this dynamic-metadata world prevents the tools from doing all kinds of useful things (like actually building dependency graphs in advance so they can do proper dependency conflict resolution), which is why we want to move towards static metadata. And it's in that transition that issues like "are different wheels for the same project at the same version allowed to have different dependencies" become relevant. Carl

On 9 October 2015 at 18:04, Chris Barker <chris.barker@noaa.gov> wrote:
1) what in the world is a "source wheel"? And how is it different than an sdist (other than maybe in a different file format.
A "source wheel" is the proposed name for a to-be-defined replacement for sdists. For now, you can think of "source wheel" and "sdist" as the same.
2) Is it indeed "OK" with the current PEPs and tools for different binary wheels to have different dependencies? This would be the example of, for instance the Matplotlib binary wheel for Windows depends on a py_zlib, whereas the binary wheel for OS-X relies on the the system lib, and
On Fri, 9 Oct 2015 19:01 Carl Meyer <carl@oddbird.net> wrote: On 10/09/2015 11:18 AM, Paul Moore wrote: therefor
does not have that dependency? (and has anyone worked out the linking issues so that that would all work with virtualenv and friends...)
It's not *currently* OK for different binary wheels to have different dependencies. At least I don't think it is. It's basically not something that as far as I'm aware anyone has ever considered an option up till now, and so it's quite likely that there are assumptions baked into the tools that would break if different builds of (a given version of) a package had different dependencies.
AFAIK this is actually just fine currently, it's just not considered ideal for a hopeful future static-metadata world. Why would it need dynamic metadata for the windows matplotlib wheel to have different metadata from the OSX matplotlib wheel? The platform Windows/OSX is static and each wheel declares its own dependencies statically but differently. Am I missing something? -- Oscar

On 10/09/2015 12:28 PM, Oscar Benjamin wrote:
Why would it need dynamic metadata for the windows matplotlib wheel to have different metadata from the OSX matplotlib wheel? The platform Windows/OSX is static and each wheel declares its own dependencies statically but differently. Am I missing something?
I didn't say that required dynamic metadata (wheel metadata is already static). I just said that it works fine currently, and that it becomes an open question with the move towards static metadata in both source and binary releases, because we have to answer questions like "what information beyond just package/version makes up a complete node in a dependency graph." Carl

On Fri, 9 Oct 2015 19:35 Carl Meyer <carl@oddbird.net> wrote: On 10/09/2015 12:28 PM, Oscar Benjamin wrote:
Why would it need dynamic metadata for the windows matplotlib wheel to have different metadata from the OSX matplotlib wheel? The platform Windows/OSX is static and each wheel declares its own dependencies statically but differently. Am I missing something?
I didn't say that required dynamic metadata (wheel metadata is already static). I just said that it works fine currently, and that it becomes an open question with the move towards static metadata in both source and binary releases, because we have to answer questions like "what information beyond just package/version makes up a complete node in a dependency graph." Assuming it's tied to the operating system it doesn't matter surely. When pip runs on Windows it can ignore dependencies that apply to other platforms so I don't see how this case makes it more complex. -- Oscar

On 10/09/2015 12:44 PM, Oscar Benjamin wrote:
On Fri, 9 Oct 2015 19:35 Carl Meyer <carl@oddbird.net <mailto:carl@oddbird.net>> wrote:
On 10/09/2015 12:28 PM, Oscar Benjamin wrote: > Why would it need dynamic metadata for the windows matplotlib wheel to > have different metadata from the OSX matplotlib wheel? The platform > Windows/OSX is static and each wheel declares its own dependencies > statically but differently. Am I missing something?
I didn't say that required dynamic metadata (wheel metadata is already static). I just said that it works fine currently, and that it becomes an open question with the move towards static metadata in both source and binary releases, because we have to answer questions like "what information beyond just package/version makes up a complete node in a dependency graph."
Assuming it's tied to the operating system it doesn't matter surely. When pip runs on Windows it can ignore dependencies that apply to other platforms so I don't see how this case makes it more complex.
Sure. "Assuming it's tied to the operating system" is an assumption that can't be made here, though, if I understand the examples that have already been given repeatedly regarding numpy and BLAS-linking. If "OS" were actually sufficient to distinguish all cases, then the existing wheel platform tags would already be an adequate solution to this problem. Carl

On Fri, Oct 9, 2015 at 11:44 AM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Assuming it's tied to the operating system it doesn't matter surely. When pip runs on Windows it can ignore dependencies that apply to other platforms so I don't see how this case makes it more complex.
does pip currently support platfrom-specific dependencies? That would solve part of the problem. but yes, there are dependencies that are a function of how the wheel was built, beyond what platform it is on. So those could be static in binary wheel, but not static in the sdist (or source wheel, or...) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Fri, Oct 9, 2015 at 11:28 AM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Why would it need dynamic metadata for the windows matplotlib wheel to have different metadata from the OSX matplotlib wheel? The platform Windows/OSX is static and each wheel declares its own dependencies statically but differently. Am I missing something?
I think the metadata can be static for the binary wheels, yes. but the dependencies would be different for the two wheels. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 9 October 2015 at 18:04, Chris Barker <chris.barker@noaa.gov> wrote:
Which seems to point more to having an optional "binary_build" component to the name. Not sure what syntax is available, but the idea would be that:
"numpy|mkl"
and
"numpy|openblas"
would both match "numpy"
which, of course would require changes to the whole stack...
This sounds more like it's an extension of the wheel "compatibility tag" approach. That's another option I can imagine being worth looking at. Basically, wheels currently encode in their names the Python version, ABI and architecture(s) they work on. That's sufficient for simple uses, but for more complicated scenarios you need more (such as, "this wheel is only valid if library FOO is present). It's not impossible to extend the compatibility tag mechanism to include such things, but there are a number of issues that would need to be thrashed out (spot the common theme here? :-)) Specifically, it's horribly cumbersome to encode everything in the filename, so some more scalable mechanism is needed. Also, you need a notion of compatibility order (is wheel X "more compatible" with your system than wheel Y?). And of course, compatibility is about "does it work with what's there?" which doesn't allow for the possibility of downloading and installing a dependency (that's what dependency resolution does, compatibility checking is at a later stage in the process). So again this is a possible solution, but needs someone to work on the details. ... or thinking again, maybe you mean having multiple packages (numpy_mkl, numpy_openblas, ...) all of which satisfy a "numpy" requirement? That's definitely viable, the Metadata 2.0 spec allows for one package "providing" an implementation of another (https://www.python.org/dev/peps/pep-0426/#provides). But that part of Metadata 2.0 isn't implemented yet - it needs some people with time to work on it again, and it will probably be implemented alongside "sdist 2.0", which is what we've been calling "when we get round to reworking/rethinking the sdist format". Paul Paul

On Fri, Oct 9, 2015 at 10:28 AM, Paul Moore <p.f.moore@gmail.com> wrote:
... or thinking again, maybe you mean having multiple packages (numpy_mkl, numpy_openblas, ...) all of which satisfy a "numpy" requirement?
yes, that is EXACTLY what I meant. the idea here is that if you build a package that requires both numpy with a particular BLAS, then you'd do: eg: numpy|mkl >= 1.9 but if you build one that only required numpy, and doesn't care which BLAS it's using, then you simply do: numpy >= 1.9 So pip, when asked to install said package, would look and see if any package called numpy|* was installed (the ther right version). if so, it would move along. If not, then it would go look on PyPi for "numpy" -- and this is where it gets tricky -- which numpy| should it use? At this stage, it wouldn't matter, anyone would do. But say it installed numpy|mkl. Then next the user goes to install a package that depends on numpy|openblas -- now pip goes and looks, and find numpy|mkl, but not numpy|openblas -- so it need to go install numpy|openblas. which overwrites numpy|mkl, which is still OK. Until the user goes to install somethign that DOES depend on numpy|mkl. No we are stuck. But is this any different that two packages that depend on two different specific versions of the same package? But all this is making me think that the way this could be handled is by numpy NOT building a BLAS into the package at all. But rather, having the package depend on another package that provides the BLAS, so: numpy on PyPi would depend on this theoretical py_openblas any package that depends on only numpy would be easy. any package that depends on openblas would then depend on py_openblas. So the question is: if I compile my third party package against numpy and mkl, it would depend on numpy and py_mkl. If I went to installed this on a system that had a numpy that depends on openblas, pip would install py_mkl (having already installed numpy and py_openblas). And I'd have a numpy extension calling into a different BLAS than numpy itself is calling into -- would that cause any problems? Python would be linked to two libs with the same names -- would that cause conflict? I'm way out of my depth here! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 10 October 2015 at 04:47, Chris Barker <chris.barker@noaa.gov> wrote:
So the question is: if I compile my third party package against numpy and mkl, it would depend on numpy and py_mkl.
If I went to installed this on a system that had a numpy that depends on openblas, pip would install py_mkl (having already installed numpy and py_openblas). And I'd have a numpy extension calling into a different BLAS than numpy itself is calling into -- would that cause any problems? Python would be linked to two libs with the same names -- would that cause conflict? I'm way out of my depth here!
Yes, you'd get weird binary ABI problems in that situation. There's a reason centrally built Linux distros and cross-platform distros like conda exist - we pick the common ABIs that all the components we build use, and enforce them in the build system. For "built by anyone" ABIs, the best you can hope for is to detect-and-report fundamental conflicts, as you can't solve the general case. The approach I proposed for metadata 2.0 is introducing an environmental constraints extension that caused things to fail at install time if the installer detected a binary incompatibility: https://www.python.org/dev/peps/pep-0459/#the-python-constraints-extension The equivalent in a pre-metadata 2.0 world would be yet-another-file in the dist-info directory. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Oct 9, 2015 at 10:04 AM, Chris Barker <chris.barker@noaa.gov> wrote:
1) what in the world is a "source wheel"? And how is it different than an sdist (other than maybe in a different file format.
It seemed like people had different ideas about what a platonic "sdist" even is and this was causing us to talk past each other. So I proposed the terms "source release" and "source wheel" as two somewhat abstract concepts to let us to state positions like "I think an sdist should fill the role of a source wheel, and that there's no need for source releases", or "I think source wheels and source releases are different and important, so we need one spec for each". I'm not sure how much it helped :-) This is the email that suggested distinguishing between the terms: https://mail.python.org/pipermail/distutils-sig/2015-October/026981.html and the "source wheel" term in particular was stolen from Donald's earlier email: https://mail.python.org/pipermail/distutils-sig/2015-October/026964.html -n -- Nathaniel J. Smith -- http://vorpus.org

Toss me an invite too - I'm very interested in this, for all that I havent' kibbitzed on the thread yet :) On 9 October 2015 at 05:33, Nathaniel Smith <njs@pobox.com> wrote:
On Oct 8, 2015 8:14 AM, "Nathaniel Smith" <njs@pobox.com> wrote:
On Oct 7, 2015 2:58 PM, "Donald Stufft" <donald@stufft.io> wrote:
On October 7, 2015 at 5:28:54 PM, Nathaniel Smith (njs@pobox.com) wrote:
Yeah, that's not a good long term solution -- it needs to be moved into the metadata (probably by creating an MKL wheel and then making the numpy wheel depend on it). That's exactly the problem :-)
Are you available on IRC or for a video call or something? I feel like there's something foundational from both sides that we're each missing here and it'd be easier to just hash it out in real time rather than lobbying random emails coming from places of confusion (at least on my side).
I'm not sure if Paul (or anyone else!) would want to jump in on it too, though I feel like probably if it's me and you then the two "sides" will probably be reasonably well represented so if more folks don't want to join that's probably OK too, particularly since we wouldn't be making any actual decisions there :D
This does sound like it would be a good idea -- couldn't hurt, anyway :-). I'll contact you offlist. If anyone else wants to join in, email me...
Looks like this is happening tomorrow (Fri Oct 9) at 11 am California / 2 pm New York / 7 pm London. Since there's been at least some interest, we'll do it as a Google hangout and send a link around in case anyone wants to listen in, and of course summarize back to the list.
-n
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
-- Robert Collins <rbtcollins@hp.com> Distinguished Technologist HP Converged Cloud

I'd also like to join Am 9. Oktober 2015 06:29:58 MESZ, schrieb Robert Collins <robertc@robertcollins.net>:
Toss me an invite too - I'm very interested in this, for all that I havent' kibbitzed on the thread yet :)
On Oct 8, 2015 8:14 AM, "Nathaniel Smith" <njs@pobox.com> wrote:
On Oct 7, 2015 2:58 PM, "Donald Stufft" <donald@stufft.io> wrote:
On October 7, 2015 at 5:28:54 PM, Nathaniel Smith (njs@pobox.com)
wrote:
Yeah, that's not a good long term solution -- it needs to be moved into the metadata (probably by creating an MKL wheel and then making the numpy wheel depend on it). That's exactly the problem :-)
Are you available on IRC or for a video call or something? I feel
there's something foundational from both sides that we're each missing here and it'd be easier to just hash it out in real time rather than lobbying random emails coming from places of confusion (at least on my side).
I'm not sure if Paul (or anyone else!) would want to jump in on it too, though I feel like probably if it's me and you then the two "sides" will probably be reasonably well represented so if more folks don't want to join
probably OK too, particularly since we wouldn't be making any actual decisions there :D
This does sound like it would be a good idea -- couldn't hurt, anyway :-). I'll contact you offlist. If anyone else wants to join in, email me...
Looks like this is happening tomorrow (Fri Oct 9) at 11 am California / 2 pm New York / 7 pm London. Since there's been at least some interest, we'll do it as a Google hangout and send a link around in case anyone wants to
On 9 October 2015 at 05:33, Nathaniel Smith <njs@pobox.com> wrote: like that's listen
in, and of course summarize back to the list.
-n
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
MFG Ronny

But the different builds for the different configurations end up with different metadata. If I'm understanding right, the whole point of "source wheels" is that they have all the static metadata that pip needs in order to make decisions, and this has to match the resulting wheels -- right?
I think we're largely talking about variances in "external" non-python system dependencies (and their build settings). PEP426 currently doesn't cover this in the core metadata, so as it stands, any 2.0 sdist couldn't exhaust these build variances in it's core metadata. There has been some discussion on how to represent external dependencies. In brief, I think the going idea is that it would be through extensions ( https://www.python.org/dev/peps/pep-0426/#metadata-extensions), not in the core python metadata, and the various groups (distro folks, science folks, etc..) would implement these themselves to fulfill their needs... Assuming they did implement such an extension, it would exist in the sdist, and for cases like numpy likely support some notion of "build options", and hence allow for a 1 to many mapping between sdist and binary wheels. Marcus
The way I'm imagining it is that there are multiple levels of metadata staticness:
package name, author, description, ... static in: VCS checkouts, source releases, source wheels, wheels package version static in: source releases, source wheels, wheels package dependencies static in: source wheels, wheels environment tag static in: wheels
Of course, there *is* an unsolved issue here, which is how we manage compatibility for wheels at the level needed for numpy. But I thought the discussion on that was ongoing? I'm concerned that this proposal is actually about bypassing that discussion, and instead trying to treat incompatibly linked wheels as "different" in terms of project metadata, which I think is the wrong way of handling things. I note that Christoph Gohlke's numpy builds are tagged with a "+mkl" local version modifier - that's presumably intended to mark the fact that they are built with an incompatible runtime - but that's a misuse of local versions (and I've found it causes niggling issues with how pip recognises upgrades, etc).
Yeah, that's not a good long term solution -- it needs to be moved into the metadata (probably by creating an MKL wheel and then making the numpy wheel depend on it). That's exactly the problem :-)
So, in summary: Your points above don't seem to me to in any way preclude having a single numpy source wheel, and a number of (mutually incompatible, but the same in terms of project and version) binary wheels.
Maybe I have misunderstood: does it actually help pip at all to have static access to name and version, but not to anything else? I've been assuming not, but I don't think anyone's pointed to any examples yet of the problems that pip is encountering due to the lack of static metadata -- would this actually be enough to solve them?
-n
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On 7 October 2015 at 18:27, Nathaniel Smith <njs@pobox.com> wrote:
There are projects on PyPI right now, today, that have no way to generate sdists and will never have any need for "source wheels"
I think I'm as confused by what you're saying here as Donald is. Could you give a few examples of such projects? I'd like to go & take a look at them and try to understand what they are doing that is so incompatible with what Donald and I are thinking of as a"source wheel". Paul

On Wed, Oct 7, 2015 at 11:14 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 7 October 2015 at 18:27, Nathaniel Smith <njs@pobox.com> wrote:
There are projects on PyPI right now, today, that have no way to generate sdists and will never have any need for "source wheels"
I think I'm as confused by what you're saying here as Donald is. Could you give a few examples of such projects? I'd like to go & take a look at them and try to understand what they are doing that is so incompatible with what Donald and I are thinking of as a"source wheel".
An example would be flit itself: https://github.com/takluyver/flit https://pypi.python.org/pypi/flit It's not that you couldn't support a "source wheel" here, it's just that forcing them to go checkout -> source wheel -> wheel would be adding pointless hassle while accomplishing nothing useful. pip would never actually touch the source wheel, and for the remaining use cases for source distribution, a classic "source release" that's basically a tarball of a VCS checkout + static version number would be more familiar and useful. -n -- Nathaniel J. Smith -- http://vorpus.org

On 7 October 2015 at 22:53, Nathaniel Smith <njs@pobox.com> wrote:
I think I'm as confused by what you're saying here as Donald is. Could you give a few examples of such projects? I'd like to go & take a look at them and try to understand what they are doing that is so incompatible with what Donald and I are thinking of as a"source wheel".
An example would be flit itself: https://github.com/takluyver/flit https://pypi.python.org/pypi/flit
It's not that you couldn't support a "source wheel" here, it's just that forcing them to go checkout -> source wheel -> wheel would be adding pointless hassle while accomplishing nothing useful. pip would never actually touch the source wheel, and for the remaining use cases for source distribution, a classic "source release" that's basically a tarball of a VCS checkout + static version number would be more familiar and useful.
I'm not sure I follow. If you have a binary wheel of flit, "pip install flit" won't need a source wheel, certainly (that's just as true for flit as for something complex like numpy). But distro packages would still want a source wheel to build their packages. If you mean that flit itself wouldn't use a source wheel, then while that may well be true, it's hardly relevant - whether flit chooses to use a source wheel is its own choice. But I'd hope flit *could* use a source wheel, as otherwise I couldn't use it to build wheels for other projects which want to use it and distribute source wheels. Should any such exist - this is pretty hypothetical at this point, and so not likely to be very productive. I am inclined to think that we're basically in agreement, we're just confused over terminology, and/or worrying about hypothetical cases. Would it help if I said that the *only* distinction between "source release" and source wheel that I care about is that in a source wheel the metadata must be static? We can discuss what metadata precisely, and we can thrash out other differences that might make more use of the fact that conceptually a "source release" is for humans to work with whereas a source wheel is for tools to consume, but those are details. I'm not clear if you think I have some more complicated picture than that, but really I don't [1]. Paul [1] I'd like a source wheel to have a defined format, but even that's not a killer. A zipfile with 2 directories "metadata" containing machine readable static metadata, and "source" with the complete contents of a source release, would do me. Of course when you build, if the metadata the build produces doesn't match the static data, that's a bug in the project packaging and we'd want to guard against it (it's the main reason the static data in the current sdist format is useless, that we can't rely on it :-() We can thrash this sort of stuff out, though.

On 2015-10-05 15:39:10 +0200 (+0200), Antoine Pitrou wrote: [...]
But why use two different formats for "source release" and "sdists"? Currently sdists fit the assumptions for a source release, why introduce some complexity and have the users deal with separate concepts (with all the confusion that will inevitably ensue)?
An sdist is an installable package which just happens to _look_ a lot like a source release tarball, but trying to pretend that downstream packagers will want to use it as such leads to a variety of pain points in the upstream/downstream relationship. For better or worse a lot of distros don't want generated files in upstream source code releases, since they need to confirm that they also ship the necessary tooling to regenerate any required files and that the generated files they ship match what their packaged tooling produces. While this similarity was probably seen as a "Good Thing [TM]" initially (hence standardizing on a .tar.gz extension), over time both the generated content of a typical sdist and the concern most distros have over shipping upstream-generated files has increased to the point where they really need to be viewed as separate and distinct release artifacts now. -- Jeremy Stanley

EWOW, huge thread. I've read nearly all of it but in order not to make it massively worse, I'm going to reply to all the points I think need raising in one mail :). Top level thoughts here, more point fashion with only rough editing below the fold. I realise many things - like the issue between different wheels of the same package consuming different numpy abis - have been touched on, but AFAICT they are entirely orthogonal to the proposal, which was to solve 'be able to use arbitrary build systems and still install with pip'. Of the actual problems with using arbitrary build systems, 99% of them seem to boil down to 'setup-requires isn't introspectable by pip (https://github.com/pypa/pip/issues/1820 ). - If it was, then alternative build systems could be depended on reasonably; and the mooted thunk from setuptools CLI to arbitrary build system would be viable. It is, in principle a matter of one patch to teach pip *a* way to do this (and then any and all build systems that want to can utilise it). https://github.com/rbtcollins/pip/tree/declarative is a POC I did - my next steps on that were to discuss the right ecosystem stuff for it - e.g. should pip consume it via setuptools, or should pip support it as *the way* and other systems including setuptools can choose to use it? A related but separate thing is being able to *exclusively* install things without setuptools present - I've filed https://github.com/pypa/pip/issues/3175 about that, but I think its -much- lower priority than reliably enabling third party build tools. -Rob ---- " solved many of the hard problems here -- e.g. it's no longer necessary that a build system also know about every possible installation configuration -- so pretty much all we really need from a build system is that it have some way to spit out standard-compliant wheels. " Actually pip still punts a *lot* here - we have bypasses to let things like C compiler flags be set during wheel build, and when thats done we don't cache the wheels (or even try to build wheels). " While ``distutils`` / ``setuptools`` have taken us a long way, they suffer from three serious problems: ... (c) you are forced to use them anyway, because they provide the standard interface for installing python packages expected by both users and installation tools like ``pip``." I don't understand the claim of (c) here - its entirely possible to write a package that doesn't use setuptools and have it do the right thing - pip uses a subprocess to drive package installation, and the interface is documented. The interface might be fugly as, but it exists and works. It is missing setup-requires handling, but so is setup.py itself. The only thing we'd really need to do AFAICT is make our setuptools monkeypatching thunk handle setuptools not being installed (which would be a sensible thing to Just Do anyhow). " - query for build dependencies - run a build, producing wheels as output - set up the current source tree so that it can be placed on ``sys.path`` in "develop mode" " So we have that already. setup.py egg-info, setup.py bdist_wheel, setup.py develop. "A version 1-or-greater format source tree can be identified by the presence of a file ``_pypackage/_pypackage.cfg``. " I really don't like this. Its going to be with us forever, and its intrusive (its visible), and so far isn't shown to be fixing anything. "to scatter files around willy-nilly never works, so we adopt the convention that names starting with an underscore are reserved for official use, and non-underscored names are available for idiosyncratic use by individual projects." I can see the motivation here, but is it really solving a problem we have? On the specifics of the format: I don't want to kibbitz over strawman aspects at this point. Having the extension mechanism be both pip specific and in Python means that we're going to face significant adoption issues: the former because pip is not by any means the only thing around - and some distros have until very recently been actively hostile to pip (which in turn means we need to wait a decade or two for them to age-out and stop being used). The latter because we'll face all the headaches of running arbitrary untrusted code and dealing with two deps with different versions of the same hook and so on: I think its an intrinsically unsafe design. @dstufft "problem with numpy.distutils, as I know you’re aware!). We could do a minimal extension and add another defacto-ish standard of allowing pip and setuptools to process additional setup_requires like arguments from a setup.cfg to solve that problem though. The flip side to this is that since it involves new capabilities in pip/setuptools/any other installer is that it you’ll have several years until you can depend on setup.cfg based setup_requires from being able to be depended on. " Well. For *any* proposal that involves modifying pip, we have to assume that all existing things keep working, and that anyone wanting to utilise the new thing will have to either a) include a local compatibility thunk, or b) error when being used from a too-old toolchain. I don't think that should really be a factor in design since its intrinsic to the quagmire. "Longer term, I think the answer is sdist 2.0 which has proper metadata inside of it (name, version, dependencies, etc) but which also includes a hook like this PEP has to specify the build system that should be used to build a wheel out of this source distribution." Any reason that can't just be setup.cfg ? @Daniel "I thought Robert Collins had a working setup-requires implementation already? I have a worse but backwards compatible one too at https://bitbucket.org/dholth/setup-requires/src/tip/setup.py" - https://github.com/rbtcollins/pip/tree/declarative - I'll be updating that probably early next year at this rate - after issue-988 anyhow. The issue with your approach is that pip doesn't handle having concurrent installs done well - and in fact it will end up locking its environment somehow. @Paul " I can understand that a binary wheel may need a certain set of libraries installed - but that's about the platform tags that are part of the wheel definition, not about dependencies. Platform tags are an ongoing discussion, and a good example of a partial solution that" - thats where the draft PEP tennessee and I start is aimed - at making those libraries be metadata, not platform tags. @Chris " A given package might depend on numpy, as you say, and it may work with all numpy versions 1.6 to 1.9. Fine, so we specify that in install_requires. And this shodl be the dependency in the sdist, too. If the package is pur python, this is fine and done. But if the package has some extensions code that used the numpy C API ( a very common occurrence), then when it is built, it will only work (reliably) with the version of numpy it was built with. So the project itself, and the sdist depend on numpy >=1.6, but a build binary wheel depends on numpy == 1.7 (for instance). Which requires a binary (wheel) dependency that is somewhat different than the source dependency. " - so yes, that is where bdist_wheel should be creating different metadata for that wheel. The issue that arises is that we need unique file names so that they can coexist on PyPI or local archives - which is where wheel tags come in. I'd be in favour of not using semantic tags for this - rather hash the deps or something and just make a unique file name. Use actual metadata for metadata. @Nathaniel "I know that one unpleasant aspect of the current design is that the split between egg-info and actual building creates the possibility for time-of-definition-to-time-of-use bugs, where the final wheel hopefully matches what egg-info said it would, but in practice there could be skew. (Of course this is true in any system which represents" - actually see https://bugs.launchpad.net/pbr/+bug/1502692 for a bug where this 'skew' is desirable: for older environments we want tailored deps with no markers, for anything supporting markers we want them - so the wheel will have markers and egg_info won't. @Nathaniel " (Part of the intuition for the last part is that we also have a not-terribly-secret-conspiracy here for writing a PEP to get Linux wheels onto PyPI and at least achieve feature parity with Windows / OS X. Obviously there will always be weird platforms -- iOS and FreeBSD and Linux-without-glibc and ... -- but this should dramatically reduce the frequency with which people need sdist dependencies.)" - I think a distinction between sdist and binary names for dependencies would be a terrible mistake. It will raise complexity for reasoning and describing things without solving any concrete problem that I can see. @Nathaniel "I guess to make progress in this conversation I need some more detailed explanations. I totally get that there's a long history of thought and conversations behind the various assertions here like "a sdist is fundamentally different from a VCS checkout", "there must be a 1-1 mapping between sdists and wheels", "pip needs sdists that have full wheel metadata in static form", and I'm barging in from the outside with no context, but I literally have no idea why the specific design features you're asking for are desirable or even viable. Right now if I were to try and write the PEP you're asking for, then the rationale section would just be "because Donald said so" over and over :-). I couldn't write the motivation section, because I don't know any problems that the PEP you're describing would fix for me as a package author (which doesn't mean they don't exist, but!)." -- VCS trees are (generally) by-humans for humans. They are the primary source of data and can do thinks like inferring versions from commit data. sdists are derived from the VCS tree and can include extra data (such as statically defined version data). Wheels are derived from a tree on disk and can (today) be built from either VCS trees or sdists. I'm not sure that forcing an sdist step is beneficial - the egg-info step we have today is basically that without the cost of compressing and decompressing potentially large trees for no reason. @Jeremy "An sdist is an installable package which just happens to _look_ a lot like a source release tarball, but trying to pretend that downstream packagers will want to use it as such leads to a variety of pain points in the upstream/downstream relationship. For better or worse a lot of distros don't want generated files in upstream source code releases, since they need to confirm that they also ship the necessary tooling to regenerate any required files and that the generated files they ship match what their packaged tooling produces." - Well, pbr doesn't work if you just tar up or git export your VCS tree: it requires the chance to add metadata. And while distros have whinged about pbr in a number of contexts, that hasn't been one so far. Downstreams are pretty used to receiving tarballs with generated files in them - as long as they *have the option* to recreate those, so the source material isn't lost. [And for version data, 'grab from git' is a valid answer there']. OTOH perhaps ftpmaster just hasn't noticed and we're about to get a bug report ;)

On 12 October 2015 at 17:06, Robert Collins <robertc@robertcollins.net> wrote:
EWOW, huge thread.
I've read nearly all of it but in order not to make it massively worse, I'm going to reply to all the points I think need raising in one mail :).
And a bugfix :) - I didn't link to the docs for the build system interface we have today - https://pip.pypa.io/en/latest/reference/pip_install/#build-system-interface -Rob -- Robert Collins <rbtcollins@hp.com> Distinguished Technologist HP Converged Cloud

On Mon, Oct 12, 2015 at 6:37 AM, Robert Collins <robertc@robertcollins.net> wrote:
On 12 October 2015 at 17:06, Robert Collins <robertc@robertcollins.net> wrote:
EWOW, huge thread.
I've read nearly all of it but in order not to make it massively worse, I'm going to reply to all the points I think need raising in one mail :).
And a bugfix :) - I didn't link to the docs for the build system interface we have today - https://pip.pypa.io/en/latest/reference/pip_install/#build-system-interface
From that link: """ In order for pip to install a package from source, setup.py must implement
the following commands: ... The install command should implement the complete process of installing the package to the target directory XXX. """ That just sounds so wrong. You want the build system to build, not install. And if "install" actually means "build to a tempdir so pip can copy it over it to its final location", then how does that address something like installing docs to a different dir than the package itself? +1 for your main point of focusing more on enabling other build systems though. Ralf
-Rob
-- Robert Collins <rbtcollins@hp.com> Distinguished Technologist HP Converged Cloud _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On 12 October 2015 at 19:23, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, Oct 12, 2015 at 6:37 AM, Robert Collins <robertc@robertcollins.net> wrote:
On 12 October 2015 at 17:06, Robert Collins <robertc@robertcollins.net> wrote:
EWOW, huge thread.
I've read nearly all of it but in order not to make it massively worse, I'm going to reply to all the points I think need raising in one mail :).
And a bugfix :) - I didn't link to the docs for the build system interface we have today -
https://pip.pypa.io/en/latest/reference/pip_install/#build-system-interface
From that link: """ In order for pip to install a package from source, setup.py must implement the following commands: ... The install command should implement the complete process of installing the package to the target directory XXX. """ That just sounds so wrong. You want the build system to build, not install.
Right so - with the automatic wheel cache we added, that is only used if building a wheel failed. So you can consider it to be legacy cruft - the preferred interface is build-a-wheel then install-that-wheel.
And if "install" actually means "build to a tempdir so pip can copy it over it to its final location", then how does that address something like installing docs to a different dir than the package itself?
Well, how do we install docs to a different dir with wheels? If we've got an answer for that, I think we're in a good position to figure something out for install (even if that is 'fix your package so it can build wheels). -Rob -- Robert Collins <rbtcollins@hp.com> Distinguished Technologist HP Converged Cloud

On 12 October 2015 at 05:37, Robert Collins <robertc@robertcollins.net> wrote:
And a bugfix :) - I didn't link to the docs for the build system interface we have today - https://pip.pypa.io/en/latest/reference/pip_install/#build-system-interface
I'm happy to see that the command line interface pip requires from setup.py is now documented. But the first thing it describes is the egg_info command, and the description of what that involves is basically 'whatever setuptools does'. Egg info is not any kind of standard, AFAIK - unlike dist info. One of my main goals in writing flit is to build wheels without involving setuptools at all, so I'm probably never going to implement that. I also don't want to have a setup.py in the VCS at all, because it's an invitation for people to run setup.py whatever, and then file bug reports that it doesn't do exactly what setuptools/distutils would do. This is what I like about Nathaniel's proposal. By looking for a new file, it clearly indicates that this is a new style Python source tree, without potential confusion over what setup.py does, and it lets us specify a simple build system interface based on standardised pieces like wheels rather than messing around with the details of setuptools. Thomas

On 12 October 2015 at 21:15, Thomas Kluyver <takowl@gmail.com> wrote:
On 12 October 2015 at 05:37, Robert Collins <robertc@robertcollins.net> wrote:
And a bugfix :) - I didn't link to the docs for the build system interface we have today -
https://pip.pypa.io/en/latest/reference/pip_install/#build-system-interface
I'm happy to see that the command line interface pip requires from setup.py is now documented. But the first thing it describes is the egg_info command, and the description of what that involves is basically 'whatever setuptools does'. Egg info is not any kind of standard, AFAIK - unlike dist info. One
https://www.python.org/dev/peps/pep-0314/ describes PKG_INFO, which can in principle include dependency data. I don't know why they're not being fully populated - I suspect its another case of forward-looking-spec-not-quite-right-and-thus-never-used. Someone with more history on Python packaging will have to fill that in. One option would be to push on that.
of my main goals in writing flit is to build wheels without involving setuptools at all, so I'm probably never going to implement that.
Well thats fair enough. but: pip uses pkg_resources to find out whats installed and what it depends on, and that pulls out of egg-info on the file system for non-wheel-installs (dist-info for wheel installs). So, if that isn't well enough documented, it would be entirely reasonable to make a PEP to capture it so that you can emit it with confidence. There's already buy-in from everyone in the ecosystem to use PEPs to document interoperability impacting changes across these tools, so I can't imagine it being a contentious proposal.
I also don't want to have a setup.py in the VCS at all, because it's an invitation for people to run setup.py whatever, and then file bug reports that it doesn't do exactly what setuptools/distutils would do.
Ok, so I understand that frustration. You're going to get those through pip too though - right now there's a giant escape clause in pip where arbitrary options are passed through to setup.py, because we don't have semantic modelling of all the things that people want to do. And until thats achieved - and the options aged out - folk are going to use those options and then be surprised when flit using packages don't behave the same way.
This is what I like about Nathaniel's proposal. By looking for a new file, it clearly indicates that this is a new style Python source tree, without potential confusion over what setup.py does, and it lets us specify a simple build system interface based on standardised pieces like wheels rather than messing around with the details of setuptools.
The problem with 'simple' is that we've got a rich (not just old: fully featured) interface. Building something that isn't complected out from behind that is both valuable and hard. An 85% solution or 95% solution will retain the trap-doors that let folk tickle the system from the outside, and thats going to lead back to the same bug reports you don't want IMO :/. I'm not against a spec that avoids the need for each tree to have a setup.py - I just don't think its going to be the end of the story in any deep fashion :). Here's a draft one I'd be happy with (with no boilerplate or anything). For commands we can define a small number environment variables like e.g. OUTPUT_PATH and PYTHON have those supplied when invoking the command. Python packaging config in $root/pypa.yaml Defined keys: ---- version: # not needed yet, since its a new file, defaults to 1. setup-requires: - requirement - requirement[extra] - requirement:markers - requirement[extra]:markers build-tool: # basic command that will spit a hunk of json back to the caller defining the # commands to use with the build tool. ---- Build tool output json (in yaml format for ease of reading): version: # not needed yet, defaults to 1 egg-info: # comand to run to generate an egg-info directory # only used if dist-info is not defined dist-info: # command to run to generate a dist-info directory wheel: # command to build a wheel develop: # command to do an in-place installation install: # command to do a direct installation # only used if wheeling failed provided-by: # the requirement that provides the build system. # this is used to facilitate caching the build tool output # so that we don't need to probe everytime. (e.g. if the # same dependency version is present in two cases # we can presume the output - I threw this in as an # obvious tweak for large dependency chains but it is # probably not worth it for typical 10-20 package things -Rob -- Robert Collins <rbtcollins@hp.com> Distinguished Technologist HP Converged Cloud

On 12 October 2015 at 11:01, Robert Collins <robertc@robertcollins.net> wrote:
Python packaging config in $root/pypa.yaml
Defined keys: ---- ... build-tool: # basic command that will spit a hunk of json back to the caller defining the # commands to use with the build tool. ----
Build tool output json (in yaml format for ease of reading):
I would be quite happy with something along the lines of this proposal, though I'd bikeshed about some of the details. I like the idea of the source tree having a single reference to the build tool, and the build tool describing itself to pip. I'd probably use references to Python functions/modules rather than specifying shell-style commands, though; it seems like there's less to go wrong that way. Thomas

On 13 October 2015 at 02:33, Thomas Kluyver <takowl@gmail.com> wrote:
On 12 October 2015 at 11:01, Robert Collins <robertc@robertcollins.net> wrote:
Python packaging config in $root/pypa.yaml
Defined keys: ---- ... build-tool: # basic command that will spit a hunk of json back to the caller defining the # commands to use with the build tool. ----
Build tool output json (in yaml format for ease of reading):
I would be quite happy with something along the lines of this proposal, though I'd bikeshed about some of the details. I like the idea of the source tree having a single reference to the build tool, and the build tool describing itself to pip. I'd probably use references to Python functions/modules rather than specifying shell-style commands, though; it seems like there's less to go wrong that way.
One of the fundamental things that emerged during the review of the design of my static setup-requires implementation in pip was that setuptools behaviour of not installing setup requirements into the target environment was deliberate design: it permits the use of different, mutually incompatible, versions of a given setup requirement by packages in the same dependency tree. E.g. imagine A and B both use setuptools-vcs, and setuptools-vcs does an incompatible 2.0 release. When A upgrades to that and B hasn't, if B install-requires A, pip installing B needs to install both those setuptools-vcs versions transiently, not permanently. (Even if one version is already installed, the build-time actions around the other of A|B need to have the other version installed). [My branch of pip doesn't do this - its one of the differences between proof of concept and production ready] So until we solve the problems related to unloading something loaded into Python and loading a different version in and all the related pain that can occur - I think using Python function calls is a non-starter. -Rob -- Robert Collins <rbtcollins@hp.com> Distinguished Technologist HP Converged Cloud

On Oct 12, 2015 10:16 AM, "Robert Collins" <robertc@robertcollins.net> wrote:
On 13 October 2015 at 02:33, Thomas Kluyver <takowl@gmail.com> wrote:
On 12 October 2015 at 11:01, Robert Collins <robertc@robertcollins.net> wrote:
Python packaging config in $root/pypa.yaml
Defined keys: ---- ... build-tool: # basic command that will spit a hunk of json back to the caller defining the # commands to use with the build tool. ----
Build tool output json (in yaml format for ease of reading):
I would be quite happy with something along the lines of this proposal, though I'd bikeshed about some of the details. I like the idea of the
source
tree having a single reference to the build tool, and the build tool describing itself to pip. I'd probably use references to Python functions/modules rather than specifying shell-style commands, though; it seems like there's less to go wrong that way.
One of the fundamental things that emerged during the review of the design of my static setup-requires implementation in pip was that setuptools behaviour of not installing setup requirements into the target environment was deliberate design: it permits the use of different, mutually incompatible, versions of a given setup requirement by packages in the same dependency tree. E.g. imagine A and B both use setuptools-vcs, and setuptools-vcs does an incompatible 2.0 release. When A upgrades to that and B hasn't, if B install-requires A, pip installing B needs to install both those setuptools-vcs versions transiently, not permanently. (Even if one version is already installed, the build-time actions around the other of A|B need to have the other version installed). [My branch of pip doesn't do this - its one of the differences between proof of concept and production ready]
So until we solve the problems related to unloading something loaded into Python and loading a different version in and all the related pain that can occur - I think using Python function calls is a non-starter.
I don't see the contradiction here. If you look at the original draft PEP then it exactly specifies that builds get isolated environments, and build tools are supposed to spawn a child and then have that child do a function call using whatever mechanism they prefer. -n

On 13 October 2015 at 06:23, Nathaniel Smith <njs@pobox.com> wrote:
On Oct 12, 2015 10:16 AM, "Robert Collins" <robertc@robertcollins.net> wrote: ...
So until we solve the problems related to unloading something loaded into Python and loading a different version in and all the related pain that can occur - I think using Python function calls is a non-starter.
I don't see the contradiction here. If you look at the original draft PEP then it exactly specifies that builds get isolated environments, and build tools are supposed to spawn a child and then have that child do a function call using whatever mechanism they prefer.
Ok, so here's a worked example to let us debug the disconnect. Given: A@1.0: setup-requires: S~=1.0 install-requires: B B@1.0: setup-requires: S~=2.0 S@1.0: no dependencies at all. S@2.0: no dependencies at all. and no binaries of A or B... then: pip install A will do the following (key bits related to this proposal only, happy path only): - download the A@1.0 sdist - read pypa.yaml and read the setup-requires + build-tool keys - download S@1.0 and prepare a built version of it - place S@1.0 into PYTHONPATH and the built versions bin into PATH - run the build-tool command to determine how to use it - run the resulting wheel-build command to build a wheel for A@1.0 - read the wheel metadata in to determine A's install-requires - download the B@1.0 sdist - read pypa.yaml and read the setup-requires + build-tool keys - download S@2.0 and prepare a built version of it - place S@2.0 into PYTHONPATH and the built versions bin into PATH - run the build-tool command to determine how to use it - run the resulting wheel-build command to build a wheel for B@1.0 - read the wheel metadata in to determine B's install-requires - install the B wheel into the target environment - install the A wheel into the target environment Note the two places where PYTHONPATH and PATH need to be something other than the environment of pip itself. pip may not be installed in the target environment (or may be a different version than the pip installed there). I don't understand how you propose to have S@1.0 and S@2.0 co-exist in pip's Python process. -Rob -- Robert Collins <rbtcollins@hp.com> Distinguished Technologist HP Converged Cloud

I could be wrong but if egg-info actually generated a dist-info directory it would probably still work. On Mon, Oct 12, 2015 at 8:50 AM Thomas Kluyver <takowl@gmail.com> wrote:
On 12 October 2015 at 05:37, Robert Collins <robertc@robertcollins.net> wrote:
And a bugfix :) - I didn't link to the docs for the build system interface we have today -
https://pip.pypa.io/en/latest/reference/pip_install/#build-system-interface
I'm happy to see that the command line interface pip requires from setup.py is now documented. But the first thing it describes is the egg_info command, and the description of what that involves is basically 'whatever setuptools does'. Egg info is not any kind of standard, AFAIK - unlike dist info. One of my main goals in writing flit is to build wheels without involving setuptools at all, so I'm probably never going to implement that.
I also don't want to have a setup.py in the VCS at all, because it's an invitation for people to run setup.py whatever, and then file bug reports that it doesn't do exactly what setuptools/distutils would do.
This is what I like about Nathaniel's proposal. By looking for a new file, it clearly indicates that this is a new style Python source tree, without potential confusion over what setup.py does, and it lets us specify a simple build system interface based on standardised pieces like wheels rather than messing around with the details of setuptools.
Thomas _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

On 13 October 2015 at 01:52, Daniel Holth <dholth@gmail.com> wrote:
I could be wrong but if egg-info actually generated a dist-info directory it would probably still work.
I'd worry about fallout - since pip doesn't need to change (much) either way, I'd be inclined to be specific and allow setuptools to merely define the bootstrap code and opt in to later changes without pressure. -Rob -- Robert Collins <rbtcollins@hp.com> Distinguished Technologist HP Converged Cloud

On Oct 11, 2015 11:07 PM, "Robert Collins" <robertc@robertcollins.net> wrote:
EWOW, huge thread.
I've read nearly all of it but in order not to make it massively worse, I'm going to reply to all the points I think need raising in one mail :).
Top level thoughts here, more point fashion with only rough editing below the fold.
I realise many things - like the issue between different wheels of the same package consuming different numpy abis - have been touched on, but AFAICT they are entirely orthogonal to the proposal, which was to solve 'be able to use arbitrary build systems and still install with pip'.
Of the actual problems with using arbitrary build systems, 99% of them seem to boil down to 'setup-requires isn't introspectable by pip (https://github.com/pypa/pip/issues/1820 ). - If it was, then alternative build systems could be depended on reasonably; and the mooted thunk from setuptools CLI to arbitrary build system would be viable.
It is, in principle a matter of one patch to teach pip *a* way to do this (and then any and all build systems that want to can utilise it). https://github.com/rbtcollins/pip/tree/declarative is a POC I did - my next steps on that were to discuss the right ecosystem stuff for it - e.g. should pip consume it via setuptools, or should pip support it as *the way* and other systems including setuptools can choose to use it?
as a standard RDF graph representation, JSON-LD would be uniquely portable here. "PEP 426: Define a JSON-LD context as part of the proposal" https://github.com/pypa/interoperability-peps/issues/31
A related but separate thing is being able to *exclusively* install things without setuptools present - I've filed https://github.com/pypa/pip/issues/3175 about that, but I think its -much- lower priority than reliably enabling third party build tools.
peep may not need setuptools? * SHA256 * --no-deps * https://pypi.python.org/pypi/peep * wheels
-Rob
----
" solved many of the hard problems here -- e.g. it's no longer necessary that a build system also know about every possible installation configuration -- so pretty much all we really need from a build system is that it have some way to spit out standard-compliant wheels. "
Actually pip still punts a *lot* here - we have bypasses to let things like C compiler flags be set during wheel build, and when thats done we don't cache the wheels (or even try to build wheels).
" While ``distutils`` / ``setuptools`` have taken us a long way, they suffer from three serious problems: ... (c) you are forced to use them anyway, because they provide the standard interface for installing python packages expected by both users and installation tools like ``pip``."
I don't understand the claim of (c) here - its entirely possible to write a package that doesn't use setuptools and have it do the right thing - pip uses a subprocess to drive package installation, and the interface is documented. The interface might be fugly as, but it exists and works. It is missing setup-requires handling, but so is setup.py itself. The only thing we'd really need to do AFAICT is make our setuptools monkeypatching thunk handle setuptools not being installed (which would be a sensible thing to Just Do anyhow).
" - query for build dependencies - run a build, producing wheels as output - set up the current source tree so that it can be placed on ``sys.path`` in "develop mode" "
So we have that already. setup.py egg-info, setup.py bdist_wheel, setup.py develop.
"A version 1-or-greater format source tree can be identified by the presence of a file ``_pypackage/_pypackage.cfg``. "
I really don't like this. Its going to be with us forever, and its intrusive (its visible), and so far isn't shown to be fixing anything.
"to scatter files around willy-nilly never works, so we adopt the convention that names starting with an underscore are reserved for official use, and non-underscored names are available for idiosyncratic use by individual projects."
I can see the motivation here, but is it really solving a problem we have?
On the specifics of the format: I don't want to kibbitz over strawman aspects at this point.
Having the extension mechanism be both pip specific and in Python means that we're going to face significant adoption issues: the former because pip is not by any means the only thing around - and some distros have until very recently been actively hostile to pip (which in turn means we need to wait a decade or two for them to age-out and stop being used). The latter because we'll face all the headaches of running arbitrary untrusted code and dealing with two deps with different versions of the same hook and so on: I think its an intrinsically unsafe design.
@dstufft "problem with numpy.distutils, as I know you’re aware!). We could do a minimal extension and add another defacto-ish standard of allowing pip and setuptools to process additional setup_requires like arguments from a setup.cfg to solve that problem though. The flip side to this is that since it involves new capabilities in pip/setuptools/any other installer is that it you’ll have several years until you can depend on setup.cfg based setup_requires from being able to be depended on. "
Well. For *any* proposal that involves modifying pip, we have to assume that all existing things keep working, and that anyone wanting to utilise the new thing will have to either a) include a local compatibility thunk, or b) error when being used from a too-old toolchain. I don't think that should really be a factor in design since its intrinsic to the quagmire.
"Longer term, I think the answer is sdist 2.0 which has proper metadata inside of it (name, version, dependencies, etc) but which also includes a hook like this PEP has to specify the build system that should be used to build a wheel out of this source distribution."
a composed JSON-LD document indicating provenance (who, what, when) for each part of the build chain [VCS archive, egg-info, sdist, wheel, bdist] pydist.jsonld?
Any reason that can't just be setup.cfg ?
@Daniel "I thought Robert Collins had a working setup-requires implementation already? I have a worse but backwards compatible one too at https://bitbucket.org/dholth/setup-requires/src/tip/setup.py" - https://github.com/rbtcollins/pip/tree/declarative - I'll be updating that probably early next year at this rate - after issue-988 anyhow. The issue with your approach is that pip doesn't handle having concurrent installs done well - and in fact it will end up locking its environment somehow.
@Paul " I can understand that a binary wheel may need a certain set of libraries installed - but that's about the platform tags that are part of the wheel definition, not about dependencies. Platform tags are an ongoing discussion, and a good example of a partial solution that" - thats where the draft PEP tennessee and I start is aimed - at making those libraries be metadata, not platform tags.
@Chris " A given package might depend on numpy, as you say, and it may work with all numpy versions 1.6 to 1.9. Fine, so we specify that in install_requires. And this shodl be the dependency in the sdist, too. If the package is pur python, this is fine and done.
But if the package has some extensions code that used the numpy C API ( a very common occurrence), then when it is built, it will only work (reliably) with the version of numpy it was built with.
So the project itself, and the sdist depend on numpy >=1.6, but a build binary wheel depends on numpy == 1.7 (for instance).
Which requires a binary (wheel) dependency that is somewhat different than the source dependency. " - so yes, that is where bdist_wheel should be creating different metadata for that wheel. The issue that arises is that we need unique file names so that they can coexist on PyPI or local archives - which is where wheel tags come in. I'd be in favour of not using semantic tags for this - rather hash the deps or something and just make a unique file name. Use actual metadata for metadata.
@Nathaniel "I know that one unpleasant aspect of the current design is
that the
split between egg-info and actual building creates the possibility for time-of-definition-to-time-of-use bugs, where the final wheel hopefully matches what egg-info said it would, but in practice there could be skew. (Of course this is true in any system which represents" - actually see https://bugs.launchpad.net/pbr/+bug/1502692 for a bug where this 'skew' is desirable: for older environments we want tailored deps with no markers, for anything supporting markers we want them - so the wheel will have markers and egg_info won't.
@Nathaniel " (Part of the intuition for the last part is that we also have a not-terribly-secret-conspiracy here for writing a PEP to get Linux wheels onto PyPI and at least achieve feature parity with Windows / OS X. Obviously there will always be weird platforms -- iOS and FreeBSD and Linux-without-glibc and ... -- but this should dramatically reduce the frequency with which people need sdist dependencies.)" - I think a distinction between sdist and binary names for dependencies would be a terrible mistake. It will raise complexity for reasoning and describing things without solving any concrete problem that I can see.
@Nathaniel "I guess to make progress in this conversation I need some more detailed explanations. I totally get that there's a long history of thought and conversations behind the various assertions here like "a sdist is fundamentally different from a VCS checkout", "there must be a 1-1 mapping between sdists and wheels", "pip needs sdists that have full wheel metadata in static form", and I'm barging in from the outside with no context, but I literally have no idea why the specific design features you're asking for are desirable or even viable. Right now if I were to try and write the PEP you're asking for, then the rationale section would just be "because Donald said so" over and over :-). I couldn't write the motivation section, because I don't know any problems that the PEP you're describing would fix for me as a package author (which doesn't mean they don't exist, but!)." -- VCS trees are (generally) by-humans for humans. They are the primary source of data and can do thinks like inferring versions from commit data. sdists are derived from the VCS tree and can include extra data (such as statically defined version data). Wheels are derived from a tree on disk and can (today) be built from either VCS trees or sdists. I'm not sure that forcing an sdist step is beneficial - the egg-info step we have today is basically that without the cost of compressing and decompressing potentially large trees for no reason.
@Jeremy "An sdist is an installable package which just happens to _look_ a lot like a source release tarball, but trying to pretend that downstream packagers will want to use it as such leads to a variety of pain points in the upstream/downstream relationship. For better or worse a lot of distros don't want generated files in upstream source code releases, since they need to confirm that they also ship the necessary tooling to regenerate any required files and that the generated files they ship match what their packaged tooling produces." - Well, pbr doesn't work if you just tar up or git export your VCS tree: it requires the chance to add metadata. And while distros have whinged about pbr in a number of contexts, that hasn't been one so far. Downstreams are pretty used to receiving tarballs with generated files in them - as long as they *have the option* to recreate those, so the source material isn't lost. [And for version data, 'grab from git' is a valid answer there']. OTOH perhaps ftpmaster just hasn't noticed and we're about to get a bug report ;)
another interesting use case for [not-] pip: https://github.com/mitsuhiko/pipsi
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
participants (19)
-
Antoine Pitrou
-
Brett Cannon
-
Carl Meyer
-
Chris Barker
-
Daniel Holth
-
David Cournapeau
-
Donald Stufft
-
Ionel Cristian Mărieș
-
Jeremy Stanley
-
Marcus Smith
-
Nathaniel Smith
-
Nick Coghlan
-
Oscar Benjamin
-
Paul Moore
-
Ralf Gommers
-
Robert Collins
-
Ronny Pfannschmidt
-
Thomas Kluyver
-
Wes Turner