A possible refactor/streamlining of PEP 517
Hi all, I just attempted an experimental refactor/streamlining of PEP 517, to match what I think it should look like :-). I haven't submitted it as a PR to the PEPs repository yet since I don't know if others will agree with the changes, but I've pasted the full text below, or you can see the text online at: https://github.com/njsmith/peps/blob/517-refactor-streamline/pep-0517.txt and the diff at: https://github.com/python/peps/compare/master...njsmith:517-refactor-streaml... Briefly, the changes are: - Rearrange text into (hopefully) better signposted sections with better organization - Clarify a number of details that have come up in discussion (e.g., be more explicit that the hooks are run with the process working directory set to the source tree, and why) - Drop prepare_wheel_metadata and prepare_wheel_build_files (for now); add detailed rationale for why we might want to add them back later. - Add an "extensions" hook namespace to allow prototyping of future extensions. - Rename get_build_*_requires -> get_requires_for_build_* to make the naming parallelism more obvious - Add the option to declare an operation unsupported by returning NotImplemented - Instead of defining a default value for get_requires_for_build_*, make it mandatory for get_require_for_build_* and build_* to appear together; this seems simpler now that we have multiple high-level operations defined in the same PEP, and also simplifies the definition of the NotImplemented semantics. - Update title to better match the scope we ended up with - Add a TODO to decide how to handle backends that don't want to have multiple hooks called from the same process, including some discussion of the options. --------------------- PEP: 517 Title: Supporting non-setup.py-based build backends in pyproject.toml Version: $Revision$ Last-Modified: $Date$ Author: Nathaniel J. Smith <njs@pobox.com>, Thomas Kluyver <thomas@kluyver.me.uk> BDFL-Delegate: Nick Coghlan <ncoghlan@gmail.com> Discussions-To: <distutils-sig@python.org> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 30-Sep-2015 Post-History: 1 Oct 2015, 25 Oct 2015, 1 July 2017 ========== Abstract ========== While ``distutils`` / ``setuptools`` have taken us a long way, they suffer from three serious problems: (a) they're missing important features like usable build-time dependency declaration, autoconfiguration, and even basic ergonomic niceties like `DRY <https://en.wikipedia.org/wiki/Don%27t_repeat_yourself>`_-compliant version number management, and (b) extending them is difficult, so while there do exist various solutions to the above problems, they're often quirky, fragile, and expensive to maintain, and yet (c) it's very difficult to use anything else, because distutils/setuptools provide the standard interface for installing packages expected by both users and installation tools like ``pip``. Previous efforts (e.g. distutils2 or setuptools itself) have attempted to solve problems (a) and/or (b). This proposal aims to solve (c). The goal of this PEP is get distutils-sig out of the business of being a gatekeeper for Python build systems. If you want to use distutils, great; if you want to use something else, then that should be easy to do using standardized methods. The difficulty of interfacing with distutils means that there aren't many such systems right now, but to give a sense of what we're thinking about see `flit <https://github.com/takluyver/flit>`_ or `bento <https://cournape.github.io/Bento/>`_. Fortunately, wheels have now solved many of the hard problems here -- e.g. it's no longer necessary that a build system also know about every possible installation configuration -- so pretty much all we really need from a build system is that it have some way to spit out standard-compliant wheels and sdists. We therefore propose a new, relatively minimal interface for installation tools like ``pip`` to interact with package source trees and source distributions. ========================== Reversion to Draft Status ========================== While this PEP was provisionally accepted for implementation in `pip` and other tools, some additional concerns were subsequently raised around adequately supporting out of tree builds. It has been reverted to Draft status while those concerns are being resolved. ======================= Terminology and goals ======================= A *source tree* is something like a VCS checkout. We need a standard interface for installing from this format, to support usages like ``pip install some-directory/``. A *source distribution* is a static snapshot representing a particular release of some source code, like ``lxml-3.4.4.tar.gz``. Source distributions serve many purposes: they form an archival record of releases, they provide a stupid-simple de facto standard for tools that want to ingest and process large corpora of code, possibly written in many languages (e.g. code search), they act as the input to downstream packaging systems like Debian/Fedora/Conda/..., and so forth. In the Python ecosystem they additionally have a particularly important role to play, because packaging tools like ``pip`` are able to use source distributions to fulfill binary dependencies, e.g. if there is a distribution ``foo.whl`` which declares a dependency on ``bar``, then we need to support the case where ``pip install bar`` or ``pip install foo`` automatically locates the sdist for ``bar``, downloads it, builds it, and installs the resulting package. Source distributions are also known as *sdists* for short. A *build frontend* is a tool that users might run that takes arbitrary source trees or source distributions and builds wheels from them. The actual building is done by each source tree's *build backend*. In a command like ``pip wheel some-directory/``, pip is acting as a build frontend. An *integration frontend* is a tool that users might run that takes a set of package requirements (e.g. a requirements.txt file) and attempts to update a working environment to satisfy those requirements. This may require locating, building, and installing a combination of wheels and sdists. In a command like ``pip install lxml==2.4.0``, pip is acting as an integration frontend. ============== Source trees ============== There is an existing, legacy source tree format involving ``setup.py``. We don't try to specify it further; its de facto specification is encoded in the source code and documentation of ``distutils``, ``setuptools``, ``pip``, and other tools. We'll refer to it as the ``setup.py``\-style. Here we define a new style of source tree based around the ``pyproject.toml`` file defined in PEP 518, extending the ``[build-system]`` table in that file with one additional key, ``build-backend``. Here's an example of how it would look:: [build-system] # Defined by PEP 518: requires = ["flit"] # Defined by this PEP: build-backend = "flit.api:main" ``build-backend`` is a string naming a Python object that will be used to perform the build (see below for details). This is formatted following the same ``module:object`` syntax as a ``setuptools`` entry point. For instance, if the string is ``"flit.api:main"`` as in the example above, this object would be looked up by executing the equivalent of:: import flit.api backend = flit.api.main It's also legal to leave out the ``:object`` part, e.g. :: build-backend = "flit.api" which acts like:: import flit.api backend = flit.api Formally, the string should satisfy this grammar:: identifier = (letter | '_') (letter | '_' | digit)* module_path = identifier ('.' identifier)* object_path = identifier ('.' identifier)* entry_point = module_path (':' object_path)? And we import ``module_path`` and then lookup ``module_path.object_path`` (or just ``module_path`` if ``object_path`` is missing). If the ``pyproject.toml`` file is absent, or the ``build-backend`` key is missing, the source tree is not using this specification, and tools should fall back to running ``setup.py``. Where the ``build-backend`` key exists, it takes precedence over ``setup.py``, and source trees need not include ``setup.py`` at all. Projects may still wish to include a ``setup.py`` for compatibility with tools that do not use this spec. ========================= Build backend interface ========================= The build backend object is expected to have callable attributes called "hooks", which the build frontend can use to perform various actions. The two high-level actions defined by this spec are creation of an sdist (analogous to the legacy ``setup.py sdist`` command) and building of a wheel (analogous to the legacy ``setup.py bdist_wheel`` command). We additionally define a namespace for tool-specific hooks, which may be useful for prototyping future extensions to this specification. General rules for all hooks --------------------------- Finding the source tree ~~~~~~~~~~~~~~~~~~~~~~~ All hooks are run with the process working directory set to the root of the source tree (i.e., the directory containing ``pyproject.toml``). To find the source tree, hooks should call ``os.getpwd()`` or equivalent. Rationale: the process working directory has to be set to something, and if we were to leave it up to the build frontend to pick, then packages developers would accidentally write code that assumes a particular answer here (example: ``long_desc = open("README.rst").read()``), and this code would break when used with other build frontends. So it's important that we standardize a value for all build frontends to use consistently. And this is the obvious thing to specify it as, especially because it's compatible with popular and long-standing conventions like calling ``open("README.rst").read()``. Then, given that we've decided to standardize on working directory = source directory, it makes sense to say that this is the *only* way that this information is passed, because providing a second redundant way (example: as an explicit argument to hooks) would only increase the possiblity of error without any benefit. Lifecycle ~~~~~~~~~ XX TODO: do we want to require frontends to use a new process for every hook call, or do we want to require backends to support multiple calls from the same process? Apparently scons and setuptools both can get cranky if you try to invoke them twice from the same process, so *someone* will be spawning extra processes here; the question is where to put that responsibility. The basic trade-off is that making it the backend's responsibility has better best-case performance if both the frontend and backend are able to re-use a single host process; but, if common frontends end up using new processes for each hook call for other reasons, then in practice either backends will end up spawning unnecessary extra processes, or else will end up with poorly tested paths when multiple hooks are run in the same process. Given that ``get_build_*_requires`` → ``build_*`` in general requires changing the Python environment, it doesn't necessarily make sense to run these in the same process anyway. However, there's an important special case where it does: when ``get_build_*_requires`` returns ``[]``. And this is probably the overwhelmingly most common case. Does it even matter? Windows is notoriously slow at spawning subprocesses. As a quick test, I tried measuring the time to spawn CPython 3.6 + import a package on a Windows 10 VM running on my laptop. ``python3.6 -c "import flit"`` was about 300 ms per call; ``python3.6 -c "import setuptools"`` was about 600 ms per call. We could also potentially get fancy and have a flag to let the frontend and backend negotiate this (e.g. ``process_reuse_safe`` as an opt-in flag). This could also be added later as an extension, as long as we initially default to requiring separate processes for each hook. Calling conventions ~~~~~~~~~~~~~~~~~~~ Hooks MAY be called with positional or keyword arguments, so backends implementing them MUST be careful to make sure that their signatures – including argument names – exactly match those specified here. Output ~~~~~~ Hooks MAY print arbitrary informational text on stdout and stderr. They MUST NOT read from stdin, and the build frontend MAY close stdin before invoking the hooks. The build frontend may capture stdout and/or stderr from the backend. If the backend detects that an output stream is not a terminal/console (e.g. ``not sys.stdout.isatty()``), it SHOULD ensure that any output it writes to that stream is UTF-8 encoded. The build frontend MUST NOT fail if captured output is not valid UTF-8, but it MAY not preserve all the information in that case (e.g. it may decode using the *replace* error handler in Python). If the output stream is a terminal, the build backend is responsible for presenting its output accurately, as for any program running in a terminal. If a hook raises any exception, or causes the process to terminate, then this indicates that the operation has failed. User-specified configuration ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All hooks take a standard ``config_settings`` argument. This argument is an arbitrary dictionary provided as an "escape hatch" for users to pass ad-hoc configuration into individual package builds. Build backends MAY assign any semantics they like to this dictionary. Build frontends SHOULD provide some mechanism for users to specify arbitrary string-key/string-value pairs to be placed in this dictionary. For example, they might support some syntax like ``--package-config CC=gcc``. Build frontends MAY also provide arbitrary other mechanisms for users to place entries in this dictionary. For example, ``pip`` might choose to map a mix of modern and legacy command line arguments like:: pip install \ --package-config CC=gcc \ --global-option="--some-global-option" \ --build-option="--build-option1" \ --build-option="--build-option2" into a ``config_settings`` dictionary like:: { "CC": "gcc", "--global-option": ["--some-global-option"], "--build-option": ["--build-option1", "--build-option2"], } Of course, it's up to users to make sure that they pass options which make sense for the particular build backend and package that they are building. Hook execution environment ~~~~~~~~~~~~~~~~~~~~~~~~~~ One of the responsibilities of a build frontend is to set up the Python environment in which the build backend will run. We do not require that any particular "virtual environment" mechanism be used; a build frontend might use virtualenv, or venv, or no special mechanism at all. But whatever mechanism is used MUST meet the following criteria: - All requirements specified by the project's build-requirements must be available for import from Python. In particular, the distributions specified in the ``pyproject.toml`` key ``build-system.requires`` must be made available to all hooks. Some hooks have additional requirements documented below. - This must remain true even for new Python subprocesses spawned by the build environment, e.g. code like:: import sys, subprocess subprocess.check_call([sys.executable, ...]) must spawn a Python process which has access to all the project's build-requirements. For example, this is necessary to support build backends that want to run legacy ``setup.py`` scripts in a subprocess. - All command-line scripts provided by the build-required packages must be present in the build environment's PATH. For example, if a project declares a build-requirement on `flit <https://flit.readthedocs.org/en/latest/>`__, then the following must work as a mechanism for running the flit command-line tool:: import subprocess subprocess.check_call(["flit", ...]) A build backend MUST be prepared to function in any environment which meets the above criteria. In particular, it MUST NOT assume that it has access to any packages except those that are present in the stdlib, or that are explicitly declared as build-requirements. Building an sdist ----------------- Building an sdist involves three phases: 1. The frontend calls the backend's ``get_requires_for_build_sdist`` hook to query for any extra requirements that are needed for the sdist build. 2. The frontend obtains those requirements. For example, it might download them from PyPI and install them into some kind of virtual environment. 3. The frontend calls the backend's ``build_sdist`` hook to create the sdist. If either hook is missing, or returns the built-in constant ``NotImplemented``. (Note that this is the object ``NotImplemented``, *not* the string ``"NotImplemented"``), then this indicates that this backend does not support building an sdist from this source tree. For example, some build backends might only support building sdists from a VCS checkout, and not from an unpacked sdist. If this occurs then the frontend should respond in whatever way it feels is appropriate. For example, it might display an error to the user. get_requires_for_build_sdist ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :: def get_requires_for_build_sdist(config_settings): ... Computes any additional requirements needed for ``build_sdist``. Returns: a list of strings containing PEP 508 dependency specifications, or ``NotImplemented``. Execution environment: everything specified by the ``build-system.requires`` key in ``pyproject.toml``. Example:: def get_requires_for_build_sdist(config_settings): return ["cython"] Or if there are no additional requirements beyond those specified in ``pyproject.toml``:: def get_requires_for_build_sdist(config_settings): return [] build_sdist ~~~~~~~~~~~ :: def build_sdist(sdist_directory, config_settings): ... Builds a ``.tar.gz`` source distribution and places it in the specified ``sdist_directory``. Returns: The basename (not the full path) of the new ``.tar.gz`` file, as a unicode string, or ``NotImplemented``. Execution environment: everything specified by the ``build-system.requires`` key in ``pyproject.toml`` and by the return value of ``get_requires_for_build_sdist``. Notes: A .tar.gz source distribution (sdist) is named like ``{name}-{version}.tar.gz`` (for example: ``foo-1.0.tar.gz``), and contains a single top-level directory called ``{name}-{version}`` (for example: ``foo-1.0``), which contains the source files of the package. This directory must also contain the ``pyproject.toml`` from the build directory, and a PKG-INFO file containing metadata in the format described in `PEP 345 <https://www.python.org/dev/peps/pep-0345/>`_. Although historically zip files have also been used as sdists, this hook should produce a gzipped tarball. This is already the more common format for sdists, and having a consistent format makes for simpler tooling, so build backends MUST generate ``.tar.gz`` sdists. The generated tarball should use the modern POSIX.1-2001 pax tar format, which specifies UTF-8 based file names. This is not yet the default for the tarfile module shipped with Python 3.6, so backends using the tarfile module need to explicitly pass ``format=tarfile.PAX_FORMAT``. Building a wheel ---------------- The interface for building a wheel is exactly analogous to that for building an sdist: the same three phases, the same interpretation of ``NotImplemented``, etc., except of course that at the end it produces a wheel instead of an sdist. get_requires_for_build_wheel ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :: def get_requires_for_build_wheel(config_settings): ... Computes any additional requirements needed for ``build_wheel``. Returns: a list of strings containing PEP 508 dependency specifications, or ``NotImplemented``. Execution environment: everything specified by the ``build-system.requires`` key in ``pyproject.toml``. Example:: def get_requires_for_build_wheel(config_settings): return ["wheel >= 0.25", "setuptools"] build_wheel ~~~~~~~~~~~ :: def build_wheel(wheel_directory, config_settings): ... Builds a ``.whl`` binary distribution, and places it in the specified ``wheel_directory``. Returns: the basename (not the full path) of the new ``.whl``, as a unicode string, or ``NotImplemented``. Execution environment: everything specified by the ``build-system.requires`` key in ``pyproject.toml`` and by the return value of ``get_requires_for_build_wheel``. Note: If you unpack an sdist named ``{name}-{version}.tar.gz``, and then build a wheel from it, then the resulting wheel MUST be named ``{name}-{version}-{compat-info}.whl``. Extensions ---------- Particular frontends and backends MAY coordinate to define additional hooks beyond those described here, but they MUST NOT claim top-level attributes on the build backend object to do so; these attributes are reserved for future PEPs. Backends MAY provide a ``extensions`` dict, and the semantics of the object at ``BACKEND.extensions["XX"]`` can be defined by the project that owns the name ``XX`` on PyPI. For example, the pip project could choose to define extension hooks like:: BACKEND.extensions["pip"].get_wheel_metadata or:: BACKEND.extensions["pip"]["prepare_build_files"] ===================================================== Recommendations for build frontends (non-normative) ===================================================== A build frontend MAY use any mechanism for setting up a build environment that meets the above criteria. For example, simply installing all build-requirements into the global environment would be sufficient to build any compliant package -- but this would be sub-optimal for a number of reasons. This section contains non-normative advice to frontend implementors. A build frontend SHOULD, by default, create an isolated environment for each build, containing only the standard library and any explicitly requested build-dependencies. This has two benefits: - It allows for a single installation run to build multiple packages that have contradictory build-requirements. E.g. if package1 build-requires pbr==1.8.1, and package2 build-requires pbr==1.7.2, then these cannot both be installed simultaneously into the global environment -- which is a problem when the user requests ``pip install package1 package2``. Or if the user already has pbr==1.8.1 installed in their global environment, and a package build-requires pbr==1.7.2, then downgrading the user's version would be rather rude. - It acts as a kind of public health measure to maximize the number of packages that actually do declare accurate build-dependencies. We can write all the strongly worded admonitions to package authors we want, but if build frontends don't enforce isolation by default, then we'll inevitably end up with lots of packages on PyPI that build fine on the original author's machine and nowhere else, which is a headache that no-one needs. However, there will also be situations where build-requirements are problematic in various ways. For example, a package author might accidentally leave off some crucial requirement despite our best efforts; or, a package might declare a build-requirement on ``foo >= 1.0`` which worked great when 1.0 was the latest version, but now 1.1 is out and it has a showstopper bug; or, the user might decide to build a package against numpy==1.7 -- overriding the package's preferred numpy==1.8 -- to guarantee that the resulting build will be compatible at the C ABI level with an older version of numpy (even if this means the resulting build is unsupported upstream). Therefore, build frontends SHOULD provide some mechanism for users to override the above defaults. For example, a build frontend could have a ``--build-with-system-site-packages`` option that causes the ``--system-site-packages`` option to be passed to virtualenv-or-equivalent when creating build environments, or a ``--build-requirements-override=my-requirements.txt`` option that overrides the project's normal build-requirements. The general principle here is that we want to enforce hygiene on package *authors*, while still allowing *end-users* to open up the hood and apply duct tape when necessary. =================================== Comparison to competing proposals =================================== The primary difference between this and competing proposals (in particular, PEP 516) is that our build backend is defined via a Python hook-based interface rather than a command-line based interface. We do *not* expect that this will, by itself, intrinsically reduce the complexity calling into the backend, because build frontends will in any case want to run hooks inside a child -- this is important to isolate the build frontend itself from the backend code and to better control the build backends execution environment. So under both proposals, there will need to be some code in ``pip`` to spawn a subprocess and talk to some kind of command-line/IPC interface, and there will need to be some code in the subprocess that knows how to parse these command line arguments and call the actual build backend implementation. So this diagram applies to all proposals equally:: +-----------+ +---------------+ +----------------+ | frontend | -spawn-> | child cmdline | -Python-> | backend | | (pip) | | interface | | implementation | +-----------+ +---------------+ +----------------+ The key difference between the two approaches is how these interface boundaries map onto project structure:: .-= This PEP =-. +-----------+ +---------------+ | +----------------+ | frontend | -spawn-> | child cmdline | -Python-> | backend | | (pip) | | interface | | | implementation | +-----------+ +---------------+ | +----------------+ | |______________________________________| | Owned by pip, updated in lockstep | | | PEP-defined interface boundary Changes here require distutils-sig .-= Alternative =-. +-----------+ | +---------------+ +----------------+ | frontend | -spawn-> | child cmdline | -Python-> | backend | | (pip) | | | interface | | implementation | +-----------+ | +---------------+ +----------------+ | | |____________________________________________| | Owned by build backend, updated in lockstep | PEP-defined interface boundary Changes here require distutils-sig By moving the PEP-defined interface boundary into Python code, we gain three key advantages. **First**, because there will likely be only a small number of build frontends (``pip``, and... maybe a few others?), while there will likely be a long tail of custom build backends (since these are chosen separately by each package to match their particular build requirements), the actual diagrams probably look more like:: .-= This PEP =-. +-----------+ +---------------+ +----------------+ | frontend | -spawn-> | child cmdline | -Python+> | backend | | (pip) | | interface | | | implementation | +-----------+ +---------------+ | +----------------+ | | +----------------+ +> | backend | | | implementation | | +----------------+ : : .-= Alternative =-. +-----------+ +---------------+ +----------------+ | frontend | -spawn+> | child cmdline | -Python-> | backend | | (pip) | | | interface | | implementation | +-----------+ | +---------------+ +----------------+ | | +---------------+ +----------------+ +> | child cmdline | -Python-> | backend | | | interface | | implementation | | +---------------+ +----------------+ : : That is, this PEP leads to less total code in the overall ecosystem. And in particular, it reduces the barrier to entry of making a new build system. For example, this is a complete, working build backend:: # mypackage_custom_build_backend.py import os.path import pathlib def get_requires_for_build_wheel(config_settings): return ["wheel"] def build_wheel(wheel_directory, config_settings): from wheel.archive import archive_wheelfile filename = "mypackage-0.1-py2.py3-none-any" path = os.path.join(wheel_directory, filename) archive_wheelfile(path, "src/") return filename def _exclude_hidden_and_special_files(archive_entry): """Tarfile filter to exclude hidden and special files from the archive""" if entry.isfile() or entry.isdir(): if not os.path.basename(archive_entry.name).startswith("."): return archive_entry return None def get_requires_for_build_sdist(config_settings): return [] def build_sdist(sdist_dir, config_settings): sdist_subdir = "mypackage-0.1" sdist_path = pathlib.Path(sdist_dir) / (sdist_subdir + ".tar.gz") sdist = tarfile.open(sdist_path, "w:gz", format=tarfile.PAX_FORMAT) # Tar up the whole directory, minus hidden and special files sdist.add(os.getcwd(), arcname=sdist_subdir, filter=_exclude_hidden_and_special_files) return sdist_subdir + ".tar.gz" Of course, this is a *terrible* build backend: it requires the user to have manually set up the wheel metadata in ``src/mypackage-0.1.dist-info/``; when the version number changes it must be manually updated in multiple places... but it works, and more features could be added incrementally. Much experience suggests that large successful projects often originate as quick hacks (e.g., Linux -- "just a hobby, won't be big and professional"; `IPython/Jupyter <https://en.wikipedia.org/wiki/IPython#Grants_and_awards>`_ -- `a grad student's ``$PYTHONSTARTUP`` file <http://blog.fperez.org/2012/01/ipython-notebook-historical.html>`_), so if our goal is to encourage the growth of a vibrant ecosystem of good build tools, it's important to minimize the barrier to entry. **Second**, because Python provides a simpler yet richer structure for describing interfaces, we remove unnecessary complexity from the specification -- and specifications are the worst place for complexity, because changing specifications requires painful consensus-building across many stakeholders. In the command-line interface approach, we have to come up with ad hoc ways to map multiple different kinds of inputs into a single linear command line (e.g. how do we avoid collisions between user-specified configuration arguments and PEP-defined arguments? how do we specify optional arguments? when working with a Python interface these questions have simple, obvious answers). When spawning and managing subprocesses, there are many fiddly details that must be gotten right, subtle cross-platform differences, and some of the most obvious approaches -- e.g., using stdout to return data for the ``build_requires`` operation -- can create unexpected pitfalls (e.g., what happens when computing the build requirements requires spawning some child processes, and these children occasionally print an error message to stdout? obviously a careful build backend author can avoid this problem, but the most obvious way of defining a Python interface removes this possibility entirely, because the hook return value is clearly demarcated). In general, the need to isolate build backends into their own process means that we can't remove IPC complexity entirely -- but by placing both sides of the IPC channel under the control of a single project, we make it much cheaper to fix bugs in the IPC interface than if fixing bugs requires coordinated agreement and coordinated changes across the ecosystem. **Third**, and most crucially, the Python hook approach gives us much more powerful options for evolving this specification in the future. For concreteness, imagine that next year we add a new ``build_wheel2`` hook, which replaces the current ``build_wheel2`` hook with something that adds new features (for example, the ability to build multiple wheels from the same source tree). In order to manage the transition, we want it to be possible for build frontends to transparently use ``build_wheel2`` when available and fall back onto ``build_wheel`` otherwise; and we want it to be possible for build backends to define both methods, for compatibility with both old and new build frontends. Furthermore, our mechanism should also fulfill two more goals: (a) If new versions of e.g. ``pip`` and ``flit`` are both updated to support the new interface, then this should be sufficient for it to be used; in particular, it should *not* be necessary for every project that *uses* ``flit`` to update its individual ``pyproject.toml`` file. (b) We do not want to have to spawn extra processes just to perform this negotiation, because process spawns can easily become a bottleneck when deploying large multi-package stacks on some platforms (Windows). In the interface described here, all of these goals are easy to achieve. Because ``pip`` controls the code that runs inside the child process, it can easily write it to do something like:: command, backend, args = parse_command_line_args(...) if command == "build_wheel": if hasattr(backend, "build_wheel2"): backend.build_wheel2(...) elif hasattr(backend, "build_wheel"): backend.build_wheel(...) else: # error handling In the alternative where the public interface boundary is placed at the subprocess call, this is not possible -- either we need to spawn an extra process just to query what interfaces are supported (as was included in an earlier draft of PEP 516, an alternative to this), or else we give up on autonegotiation entirely (as in the current version of that PEP), meaning that any changes in the interface will require N individual packages to update their ``pyproject.toml`` files before any change can go live, and that any changes will necessarily be restricted to new releases. ==================== Evolutionary notes ==================== A goal here is to make it as simple as possible to convert old-style sdists to new-style sdists. (E.g., this is one motivation for supporting dynamic build requirements.) The ideal would be that there would be a single static ``pyproject.toml`` that could be dropped into any "version 0" VCS checkout to convert it to the new shiny. This is probably not 100% possible, but we can get close, and it's important to keep track of how close we are... hence this section. A rough plan would be: Create a build system package (``setuptools_pypackage`` or whatever) that knows how to speak whatever hook language we come up with, and convert them into calls to ``setup.py``. This will probably require some sort of hooking or monkeypatching to setuptools to provide a way to extract the ``setup_requires=`` argument when needed, and to provide a new version of the sdist command that generates the new-style format. This all seems doable and sufficient for a large proportion of packages (though obviously we'll want to prototype such a system before we finalize anything here). (Alternatively, these changes could be made to setuptools itself rather than going into a separate package.) But there remain two obstacles that mean we probably won't be able to automatically upgrade packages to the new format: 1) There currently exist packages which insist on particular packages being available in their environment before setup.py is executed. This means that if we decide to execute build scripts in an isolated virtualenv-like environment, then projects will need to check whether they do this, and if so then when upgrading to the new system they will have to start explicitly declaring these dependencies (either via ``setup_requires=`` or via static declaration in ``pyproject.toml``). 2) There currently exist packages which do not declare consistent metadata (e.g. ``egg_info`` and ``bdist_wheel`` might get different ``install_requires=``). When upgrading to the new system, projects will have to evaluate whether this applies to them, and if so they will need to stop doing that. ================================ Rejected and deferred features ================================ A number of potential extra features were discussed beyond the above. For the most part the decision was made that it was better to defer trying to implement these until we had more experience with the basic interface, and to provide a minimal extension interface (the ``extensions`` dictionary) that will allow us to prototype these features before standardizing them. Specifically: * Editable installs: This PEP originally specified another hook, ``install_editable``, to do an editable install (as with ``pip install -e``). It was removed due to the complexity of the topic, but may be specified in a later PEP. Briefly, the questions to be answered include: what reasonable ways existing of implementing an 'editable install'? Should the backend or the frontend pick how to make an editable install? And if the frontend does, what does it need from the backend to do so. * Getting wheel metadata from a source tree without building a wheel: it's believed that when pip adds a backtracking constraint solver for package dependencies, it may be useful to add a hook to query a source tree to get metadata about the wheel that it *would* generate, if it were asked to build a wheel. Specifically, the kind of situation where it's anticipated that this might come up is: 1. Package A depends on B and C==1.0 2. B is only available as an sdist 3. We fetch the sdist for the latest version of B, build it into a wheel, and then discover that it depends on C==1.5, which means that it isn't compatible with this version of A. 4. We fetch the sdist for the latest-but-one version of B, build it into a wheel, and then discover that it depends on C==1.4, which means that it isn't compatible with this version of A. 5. We fetch the sdist for the latest-but-two version of B... The idea would be that we could reduce (but not eliminate) the cost of steps 3, 4, 5, ... if there were a way to query a build backend to find out the requirements without actually building a wheel, which is a potentially expensive operation. Of course, these repeated fetches are expensive no matter what we do, so the ideal solution would be to provide wheels for B, so that none of this needs to be done at all. And for many packages (for example, pure Python packages), building a wheel is nearly as cheap as fetching the metadata. And building a wheel also has the advantage of giving us something we can store in the wheel cache for next time. But perhaps this is still a good idea for packages that are particularly slow to build (for example, complex packages like scipy or qt). It was eventually decided to defer this for now, since it adds non-trivial complexity for build backends (the metadata fetching phase and the wheel building phase run at different times, yet have to produce consistent results), and until pip's backtracking resolver is actually implemented, we're only guessing at the value of this optimization and the exact semantics it will require. * A specialized hook for copying a source tree into a new source tree: in certain cases, like when installing directly from a local VCS checkout, pip prefers to copy the source tree to a temporary directory before building it. This provides some protection against build systems that can give incorrect results when repeatedly building in the same tree. Historically, pip has accomplished this copy using a simple ``shutil.copytree``, but this causes various problems, like copying large git checkouts or intermediate artifacts from previous in-place builds. In the future, therefore, pip might move to a multi-step process like: 1. Create an sdist from the VCS checkout 2. Unpack this sdist into a temporary directory. 3. Build a wheel from the unpacked sdist. 4. Install the wheel. Even better, this provides some guarantee that going from VCS checkout → sdist → wheel will produce identical results to going directly from VCS checkout → wheel. However, this runs into a potential problem: what if this particular combination of source tree + build backend can't actually build an sdist? (For example, `flit <http://flit.readthedocs.org/>`__ may have this limitation for certain trees unpacked from sdists.) Therefore, we considered adding an optional hook like ``prepare_temporary_tree_for_build_wheel`` that would copy the required source files into a specified temporary directory. But: * Such a hook would add non-trivial complexity to this spec: it requires us to promote the idea of an "out of tree build" to a first class concept, and specify which kinds of trees are required to support which operations, etc. * A major motivation for doing the build-sdist-unpack-sdist dance in the first place is that we don't trust the backend code to produce the same result when building from a VCS checkout as when building from an sdist, but if we don't trust the backend then it seems odd to add a special hook that puts the backend in charge of doing the dance. * If sdist creation is unsupported, then pip can fall back on a ``shutil.copytree`` strategy in just a few lines of code * And in fact, for the one known case where this might be a problem (unpacked sdist using flit), ``shutil.copytree`` is essentially optimal * Though in fact for flit, this is still a pointless expense – doing an in-place build is perfectly safe and even more efficient. * Plus projects using flit always have wheels, so this will essentially never even come up in the first place * And pip hasn't even implemented the sdist optimization for legacy ``setup.py``\-based projects yet, so we have no operational experience to refer to and it might turn out there are some unknown-unknowns that we'll want to take into account before standardizing an optimization for it here. And since this would be an optional hook anyway, it's just as easy to add later once the exact parameters are better understood. * There was some discussion of extending these hooks to allow a single source tree to produce multiple wheels. But this is a complex enough topic that it clearly requires its own PEP. * We also discussed making the wheel and sdist hooks build unpacked directories containing the same contents as their respective archives. In some cases this could avoid the need to pack and unpack an archive, but this seems like premature optimisation. It's advantageous for tools to work with archives as the canonical interchange formats (especially for wheels, where the archive format is already standardised). Close control of archive creation is important for reproducible builds. And it's not clear that tasks requiring an unpacked distribution will be more common than those requiring an archive. =========== Copyright =========== This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: -n -- Nathaniel J. Smith -- https://vorpus.org
On 1 July 2017 at 11:53, Nathaniel Smith <njs@pobox.com> wrote:
I just attempted an experimental refactor/streamlining of PEP 517, to match what I think it should look like :-). I haven't submitted it as a PR to the PEPs repository yet since I don't know if others will agree with the changes, but I've pasted the full text below, or you can see the text online at:
I'm a little confused - is this a formalisation of the proposals you've already made on this thread, or is it something different? So far the discussions we've had have been on points of dispute with the existing PEP 517, which have been relatively easy to follow. I don't really have the time to go through this proposal looking for the points of similarity and the differences with PEP 517. I feel like we're pretty close to finalising PEP 517, and it's not really a good time to introduce a whole new competing PEP for consideration. I appreciate that you have some fairly fundamental disagreements with the approach of the PEP, but I honestly don't think going back to square one on a new PEP is the right approach. You're asking for a significant delay in acceptance of either PEP, while we start the review process over again. At the moment, I'm strongly inclined to vote -1 to this new PEP simply because we were so close to consensus on PEP 517, and I don't want to see that progress lost (and potentially the whole thing shelved because people get burned out on the debate). Please consider rephrasing your proposal as a set of points of difference with PEP 517 - preferably just the points that haven't already been discussed (e.g., we've already had the debate on "Add the option to declare an operation unsupported by returning NotImplemented" - what's different in your new proposal?) IMO that would be more productive. Thanks, Paul
On 1 July 2017 at 20:53, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
I just attempted an experimental refactor/streamlining of PEP 517, to match what I think it should look like :-). I haven't submitted it as a PR to the PEPs repository yet since I don't know if others will agree with the changes, but I've pasted the full text below, or you can see the text online at:
https://github.com/njsmith/peps/blob/517-refactor-streamline/pep-0517.txt
and the diff at:
https://github.com/python/peps/compare/master...njsmith:517-refactor-streaml...
Briefly, the changes are:
- Rearrange text into (hopefully) better signposted sections with better organization
This is definitely needed, so thanks for that.
- Clarify a number of details that have come up in discussion (e.g., be more explicit that the hooks are run with the process working directory set to the source tree, and why)
This is again very helpful (we also need to be clearer on which sets of dependencies must be installed prior to running the various hooks)
- Drop prepare_wheel_metadata and prepare_wheel_build_files (for now); add detailed rationale for why we might want to add them back later.
I want prepare_wheel_metadata there as a straightforward way for backends to expose the equivalent of "setup.py egg_info". Losing that would be a significant regression relative to the status quo, since pip relies on it during the dependency resolution process (and a backtracking resolver will be even more dependent on being able to do that efficiently). For the build preparation hook, Thomas already rejected promising that build_sdist will always work in flit when given only an unpacked sdist, while Donald and Paul rejected relying on full tree copies as a fallback for build_sdist failing. Since I'm not interested in rehashing that debate, the build preparation hook also stays. However, I'd be fine with renaming these hooks using the "prepare_X_for_Y" scheme you suggest below: - prepare_metadata_for_build_wheel - prepare_input_for_build_wheel That makes it clearer where they fit into the overall wheel building process, as well as which particular step they implement within that process.
- Add an "extensions" hook namespace to allow prototyping of future extensions.
No, on grounds of YAGNI. You can't reject the two hooks we've actually identified as needed (aka the matrix multiplication operators of this situation), and then turn around and argue in favour of an overly general scheme that supports arbitrary extension hooks (which would be somewhat akin to having taken the PEP 225 approach to resolving the syntactic ambiguity between elementwise multiplication and matrix multiplication). We're not fumbling around based on a complete absence of information here - we're formalising a backend build interface based on more than a decade of experience with the successes and challenges of the existing system based on distutils, setuptools and PyPI. Besides, if frontends want to invoke arbitrary interfaces that are specific to particular backends, we already have ways for them to do that: Python's regular import system, and plugin management approaches like pkg_resources entry points.
- Rename get_build_*_requires -> get_requires_for_build_* to make the naming parallelism more obvious
This sounds like a good change to me.
- Add the option to declare an operation unsupported by returning NotImplemented
No. Supported operations for a backend must be identifiable without installing any dependencies, and without running any backend code. To tighten that requirement up even further: if the backend's capabilities can't be accurately determined using "inspect.getattr_static", then the backend is not compliant with the spec. The build frontend/backend API is not a space where we want people to try to be clever - we want them to be completely dull and boring and use the most mundane code they can possibly come up with. However, it's fine for operations to fail with an exception if there are external dependencies that haven't been satisfied (e.g. requiring a C/C++/FORTRAN/Rust/Go/etc compiler for extension modules when there isn't one available, or requiring some other non-Python dependency that needs to be set up prior to requesting the sdist or wheel build). Backends are also free to raise NotImplementedError when it seems appropriate to do so (e.g. when the project doesn't support the current CPU architecture) - frontends will handle the same way they will any other backend exception.
- Instead of defining a default value for get_requires_for_build_*, make it mandatory for get_require_for_build_* and build_* to appear together; this seems simpler now that we have multiple high-level operations defined in the same PEP, and also simplifies the definition of the NotImplemented semantics.
I don't see the value in requiring four lines of boiler plate in every backend definition that doesn't support dynamic dependencies: def get_requires_for_build_sdist(*args, **kwargs): return () def get_requires_for_build_wheel(*args, **kwargs): return () That's just pointless noise when we can instead say "Leave those hooks out of the backend implementation entirely if all build dependencies will be declared in pyproject.toml (and if you really want your backend work to support dynamic build dependencies, please make sure you have a really good rationale for doing so that isn't already covered by pyproject.toml and the environment marker system defined in PEP 508)"
- Update title to better match the scope we ended up with
I'd prefer to retain the current title, as it describes the desired *outcome* of the change, whereas the proposed revision only describes one of the technical pre-requiisites of achieving that outcome (i.e. eliminating the historical dependency on setup.py as the build system's executable interface).
- Add a TODO to decide how to handle backends that don't want to have multiple hooks called from the same process, including some discussion of the options.
I don't think that's a TODO: I'm happy with the option of restricting frontends to "one hook invocation per subprocess call". It only becomes an open question in this revised draft by virtue of making the get_requires_* hooks mandatory, and I have a different resolution for that: keep those hooks optional, so that only backends that genuinely support dynamic build time dependencies will define them (others should either just get users to list any additional static build dependencies in pyproject.toml, or else list any always-needed static dependencies in the backend's own install_requires). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
-1 on _for_ ; why should a common prefix plus extra typing be any clearer than common suffix? Or rearrange the words without _for_. For the --config options, there's a double dash on the key and the value, I found that confusing. I suppose the theory, not thoroughly explained, is that the value is sent to the command line of the build backend. I think "--build-option=toves=slithy" would be more clear. --build-option="--build-option1" A backwards compatible setup.py shim will have to convert prepare_wheel_metadata to .egg-info for pip, which is pretty easy. I didn't anticipate get_requires_* to require running the whole (designed for a single invocation per process) build backend. Suppose anything's possible. On Mon, Jul 3, 2017 at 10:04 AM Nick Coghlan <ncoghlan@gmail.com> wrote:
On 1 July 2017 at 20:53, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,
I just attempted an experimental refactor/streamlining of PEP 517, to match what I think it should look like :-). I haven't submitted it as a PR to the PEPs repository yet since I don't know if others will agree with the changes, but I've pasted the full text below, or you can see the text online at:
https://github.com/njsmith/peps/blob/517-refactor-streamline/pep-0517.txt
and the diff at:
https://github.com/python/peps/compare/master...njsmith:517-refactor-streaml...
Briefly, the changes are:
- Rearrange text into (hopefully) better signposted sections with better organization
This is definitely needed, so thanks for that.
- Clarify a number of details that have come up in discussion (e.g., be more explicit that the hooks are run with the process working directory set to the source tree, and why)
This is again very helpful (we also need to be clearer on which sets of dependencies must be installed prior to running the various hooks)
- Drop prepare_wheel_metadata and prepare_wheel_build_files (for now); add detailed rationale for why we might want to add them back later.
I want prepare_wheel_metadata there as a straightforward way for backends to expose the equivalent of "setup.py egg_info". Losing that would be a significant regression relative to the status quo, since pip relies on it during the dependency resolution process (and a backtracking resolver will be even more dependent on being able to do that efficiently).
For the build preparation hook, Thomas already rejected promising that build_sdist will always work in flit when given only an unpacked sdist, while Donald and Paul rejected relying on full tree copies as a fallback for build_sdist failing.
Since I'm not interested in rehashing that debate, the build preparation hook also stays.
However, I'd be fine with renaming these hooks using the "prepare_X_for_Y" scheme you suggest below:
- prepare_metadata_for_build_wheel - prepare_input_for_build_wheel
That makes it clearer where they fit into the overall wheel building process, as well as which particular step they implement within that process.
- Add an "extensions" hook namespace to allow prototyping of future extensions.
No, on grounds of YAGNI. You can't reject the two hooks we've actually identified as needed (aka the matrix multiplication operators of this situation), and then turn around and argue in favour of an overly general scheme that supports arbitrary extension hooks (which would be somewhat akin to having taken the PEP 225 approach to resolving the syntactic ambiguity between elementwise multiplication and matrix multiplication).
We're not fumbling around based on a complete absence of information here - we're formalising a backend build interface based on more than a decade of experience with the successes and challenges of the existing system based on distutils, setuptools and PyPI.
Besides, if frontends want to invoke arbitrary interfaces that are specific to particular backends, we already have ways for them to do that: Python's regular import system, and plugin management approaches like pkg_resources entry points.
- Rename get_build_*_requires -> get_requires_for_build_* to make the naming parallelism more obvious
This sounds like a good change to me.
- Add the option to declare an operation unsupported by returning NotImplemented
No. Supported operations for a backend must be identifiable without installing any dependencies, and without running any backend code.
To tighten that requirement up even further: if the backend's capabilities can't be accurately determined using "inspect.getattr_static", then the backend is not compliant with the spec. The build frontend/backend API is not a space where we want people to try to be clever - we want them to be completely dull and boring and use the most mundane code they can possibly come up with.
However, it's fine for operations to fail with an exception if there are external dependencies that haven't been satisfied (e.g. requiring a C/C++/FORTRAN/Rust/Go/etc compiler for extension modules when there isn't one available, or requiring some other non-Python dependency that needs to be set up prior to requesting the sdist or wheel build). Backends are also free to raise NotImplementedError when it seems appropriate to do so (e.g. when the project doesn't support the current CPU architecture) - frontends will handle the same way they will any other backend exception.
- Instead of defining a default value for get_requires_for_build_*, make it mandatory for get_require_for_build_* and build_* to appear together; this seems simpler now that we have multiple high-level operations defined in the same PEP, and also simplifies the definition of the NotImplemented semantics.
I don't see the value in requiring four lines of boiler plate in every backend definition that doesn't support dynamic dependencies:
def get_requires_for_build_sdist(*args, **kwargs): return ()
def get_requires_for_build_wheel(*args, **kwargs): return ()
That's just pointless noise when we can instead say "Leave those hooks out of the backend implementation entirely if all build dependencies will be declared in pyproject.toml (and if you really want your backend work to support dynamic build dependencies, please make sure you have a really good rationale for doing so that isn't already covered by pyproject.toml and the environment marker system defined in PEP 508)"
- Update title to better match the scope we ended up with
I'd prefer to retain the current title, as it describes the desired *outcome* of the change, whereas the proposed revision only describes one of the technical pre-requiisites of achieving that outcome (i.e. eliminating the historical dependency on setup.py as the build system's executable interface).
- Add a TODO to decide how to handle backends that don't want to have multiple hooks called from the same process, including some discussion of the options.
I don't think that's a TODO: I'm happy with the option of restricting frontends to "one hook invocation per subprocess call".
It only becomes an open question in this revised draft by virtue of making the get_requires_* hooks mandatory, and I have a different resolution for that: keep those hooks optional, so that only backends that genuinely support dynamic build time dependencies will define them (others should either just get users to list any additional static build dependencies in pyproject.toml, or else list any always-needed static dependencies in the backend's own install_requires).
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Jul 3, 2017, at 10:03 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 1 July 2017 at 20:53, Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote:
Hi all,
I just attempted an experimental refactor/streamlining of PEP 517, to match what I think it should look like :-). I haven't submitted it as a PR to the PEPs repository yet since I don't know if others will agree with the changes, but I've pasted the full text below, or you can see the text online at:
https://github.com/njsmith/peps/blob/517-refactor-streamline/pep-0517.txt
and the diff at:
https://github.com/python/peps/compare/master...njsmith:517-refactor-streaml...
Briefly, the changes are:
- Rearrange text into (hopefully) better signposted sections with better organization
This is definitely needed, so thanks for that.
- Clarify a number of details that have come up in discussion (e.g., be more explicit that the hooks are run with the process working directory set to the source tree, and why)
This is again very helpful (we also need to be clearer on which sets of dependencies must be installed prior to running the various hooks)
- Drop prepare_wheel_metadata and prepare_wheel_build_files (for now); add detailed rationale for why we might want to add them back later.
I want prepare_wheel_metadata there as a straightforward way for backends to expose the equivalent of "setup.py egg_info". Losing that would be a significant regression relative to the status quo, since pip relies on it during the dependency resolution process (and a backtracking resolver will be even more dependent on being able to do that efficiently).
To speak to a point Nathaniel made about being able to cache built wheels, we’re also planning on caching the result of ``setup.py egg-info`` in the GSoC student’s project. The exact semantics of this remain to be nailed down (for instance, it’s silly to cache it after we’ve built the wheel, we can just use that, but if we called it then later decided not to use it, then caching is beneficial). I don’t see any reason why we wouldn’t also cache the output of this new hook. Adding it on later is a bit tricky because we can’t pass the existing metadata into the build_wheel hook then, so they can ensure that they generate the same output. The frontend could still of course validate that after the fact (which pip likely will) but that removes the chance for the project to ensure they work in that case rather than having that case be a failure. OTOH I don’t know how likely it is that a project actually does something different based on the metadata directory that was passed in other than fail if it would do something different. At the end of the day I think we can add this either now or later and I have a slight preference to now and I think either way we’ll end up adding it, but I don’t really care if we add it now or later.
For the build preparation hook, Thomas already rejected promising that build_sdist will always work in flit when given only an unpacked sdist, while Donald and Paul rejected relying on full tree copies as a fallback for build_sdist failing.
Since I'm not interested in rehashing that debate, the build preparation hook also stays.
From my end, I’m happy either with the hook or with the hook Nathaniel has spec’d. I think the primary difference is if someone takes a .tar.gz, unpacks it, and then runs ``pip install that/dir`` with Nathaniel’s hook we would fail if the build_sdist hook didn’t build a sdist (e.g. in the case of how I understand flit’s use case. IOW we have a few possibilities of when we might call this API: 1) We have a sdist already and pip downloads it, unpacks it, and builds a wheel from it. - Works on both proposals for setuptools and flit (since for both proposals we would just unpack to a temporary location and jump straight to wheel building). 2) We have a VCS directory or “original development source” or whatever you want to call the thing you have before a sdist that typically gets into a sdist. - Works on both proposals for setuptools and flit (since both can go from a VCS to a sdist). - Thomas might have said he’d be unhappy if this case goes through a real sdist… I forget the specifics of that discussion now. - If build_sdist failed in Nathaniel’s proposal, pip would probably fail and surface that error to the user (for cases where e.g. you need git installed or something). 3) We have a directory on disk that represents an unpacked sdist (with or without modifications). - Works on the original proposal IF the build tool implements the prepare hook or the build_sdist hook can function as the identity function. - Fails on Nathaniel’s proposal IF the build tool doesn’t implement the build_sdist hook as the identity function (e.g. flit). So we basically only care about this for the (3) case, and even then only for projects that can’t or won’t detect that they’re in an unpacked sdist that they’ve already created and simply archive up the entire directory without any additional logic (or maybe they add additional logic to munge the version for instance if they detect there was file modifications). From my POV, I don’t really care if (3) fails or not for individual build tools. I don’t think (3) is a very popular path (and in both cases, we’re not covering (3) 100% of the time, we just provide a different mechanism to make it work) and from pip’s POV it’s easy to surface an error from flit that says like “Cannot build an sdist from this, try from a VCS clone” or something. We can’t make this work in 100% of situations either way so whatever. So I don’t really care if we add it nor or we use the semantics Nathaniel proposed.
However, I'd be fine with renaming these hooks using the "prepare_X_for_Y" scheme you suggest below:
- prepare_metadata_for_build_wheel - prepare_input_for_build_wheel
That makes it clearer where they fit into the overall wheel building process, as well as which particular step they implement within that process.
Paint the bike shed whatever color you want.
- Add an "extensions" hook namespace to allow prototyping of future extensions.
No, on grounds of YAGNI. You can't reject the two hooks we've actually identified as needed (aka the matrix multiplication operators of this situation), and then turn around and argue in favour of an overly general scheme that supports arbitrary extension hooks (which would be somewhat akin to having taken the PEP 225 approach to resolving the syntactic ambiguity between elementwise multiplication and matrix multiplication).
We're not fumbling around based on a complete absence of information here - we're formalising a backend build interface based on more than a decade of experience with the successes and challenges of the existing system based on distutils, setuptools and PyPI.
Besides, if frontends want to invoke arbitrary interfaces that are specific to particular backends, we already have ways for them to do that: Python's regular import system, and plugin management approaches like pkg_resources entry points.
I don’t care if we have extensions or not, in either form of the proposal.
- Rename get_build_*_requires -> get_requires_for_build_* to make the naming parallelism more obvious
This sounds like a good change to me.
- Add the option to declare an operation unsupported by returning NotImplemented
No. Supported operations for a backend must be identifiable without installing any dependencies, and without running any backend code.
To tighten that requirement up even further: if the backend's capabilities can't be accurately determined using "inspect.getattr_static", then the backend is not compliant with the spec. The build frontend/backend API is not a space where we want people to try to be clever - we want them to be completely dull and boring and use the most mundane code they can possibly come up with.
However, it's fine for operations to fail with an exception if there are external dependencies that haven't been satisfied (e.g. requiring a C/C++/FORTRAN/Rust/Go/etc compiler for extension modules when there isn't one available, or requiring some other non-Python dependency that needs to be set up prior to requesting the sdist or wheel build). Backends are also free to raise NotImplementedError when it seems appropriate to do so (e.g. when the project doesn't support the current CPU architecture) - frontends will handle the same way they will any other backend exception.
Agreed, I don’t like the NotImplemented thing.
- Instead of defining a default value for get_requires_for_build_*, make it mandatory for get_require_for_build_* and build_* to appear together; this seems simpler now that we have multiple high-level operations defined in the same PEP, and also simplifies the definition of the NotImplemented semantics.
I don't see the value in requiring four lines of boiler plate in every backend definition that doesn't support dynamic dependencies:
def get_requires_for_build_sdist(*args, **kwargs): return ()
def get_requires_for_build_wheel(*args, **kwargs): return ()
That's just pointless noise when we can instead say "Leave those hooks out of the backend implementation entirely if all build dependencies will be declared in pyproject.toml (and if you really want your backend work to support dynamic build dependencies, please make sure you have a really good rationale for doing so that isn't already covered by pyproject.toml and the environment marker system defined in PEP 508)”
Agreed.
- Update title to better match the scope we ended up with
I'd prefer to retain the current title, as it describes the desired *outcome* of the change, whereas the proposed revision only describes one of the technical pre-requiisites of achieving that outcome (i.e. eliminating the historical dependency on setup.py as the build system's executable interface).
- Add a TODO to decide how to handle backends that don't want to have multiple hooks called from the same process, including some discussion of the options.
I don't think that's a TODO: I'm happy with the option of restricting frontends to "one hook invocation per subprocess call".
It only becomes an open question in this revised draft by virtue of making the get_requires_* hooks mandatory, and I have a different resolution for that: keep those hooks optional, so that only backends that genuinely support dynamic build time dependencies will define them (others should either just get users to list any additional static build dependencies in pyproject.toml, or else list any always-needed static dependencies in the backend's own install_requires).
Agreed. — Donald Stufft
On Mon, Jul 3, 2017 at 5:06 PM, Donald Stufft <donald@stufft.io> wrote:
Adding it on later is a bit tricky because we can’t pass the existing metadata into the build_wheel hook then, so they can ensure that they generate the same output. The frontend could still of course validate that after the fact (which pip likely will) but that removes the chance for the project to ensure they work in that case rather than having that case be a failure. OTOH I don’t know how likely it is that a project actually does something different based on the metadata directory that was passed in other than fail if it would do something different.
No, I don't think there'd be any problem with adding it later. In the current PEP 517 draft, it's already spec'ed as: if a backend has no prepare_wheel_metadata hook, then build_wheel never gets a metadata directory; and even if a backend does have a prepare_wheel_metadata hook, then build_wheel might or might not get a metadata directory. So if we start out without a prepare_wheel_metadata hook and add it later, then we'd have: old frontend + old backend: obviously fine old frontend + new backend: acts like a new frontend that happened not to call prepare_wheel_metadata, so that's fine new frontend + old backend: there's no prepare_wheel_metadata hook, so it won't try to pass metadata to build_wheel, so that's fine new frontend + new backend: obviously fine So adding it now versus later are pretty much equivalent in this respect.
From my end, I’m happy either with the hook or with the hook Nathaniel has spec’d. I think the primary difference is if someone takes a .tar.gz, unpacks it, and then runs ``pip install that/dir`` with Nathaniel’s hook we would fail if the build_sdist hook didn’t build a sdist (e.g. in the case of how I understand flit’s use case.
I think the most important *practical* difference between current-PEP-517 and my-proposed-PEP-517 isn't about whether the backend can control what happens when it doesn't support sdist building, but about whether the backend can control what happens when it *does* support sdist building. In my-proposed-PEP-517, there is no prepare_build_files hook, but the practical effect of this is pretty minimal, because the only identified use case for prepare_build_files is when building from an unpacked flit sdist, and in that case, flit's prepare_build_files is essentially equivalent to a copytree anyway. OTOH, in current-PEP-517, there is no way for a backend to signal whether build_sdist will work or not, so if pip sees a prepare_build_files hook then it *has* to call it, *even if build_sdist would have worked*. This means that current-PEP-517 supports an interesting practical use case that my-proposed-PEP-517 doesn't: if you're a project like scipy who thinks that in-place builds are a good idea and has a somewhat adversarial relationship with the pip devs, you can put something like this in your backend: if hasattr(os, "symlink"): def prepare_build_files(tmp_dir): os.symlink(os.getpwd(), tmp_dir) and now pip will do in-place builds. (Except on Windows, of course; Windows users get screwed as usual, but from scipy's perspective that's pip's fault.) In my proposal, this doesn't happen because pip will always try build_sdist before falling back (and also because there is no prepare_build_files hook, but I'm not opposed to adding that eventually, I just think the current design is hastily spec'ed and not-fully-thought-through, as evidenced by the fact that the best use case I can come up with for it is exactly the one that its advocates are trying to prevent...). There's a bit of a complicated linkage here. There are two differences between my proposal and current-PEP-517: mine has NotImplemented but not prepare_build_files, and current-PEP-517 has prepare_build_files but not NotImplemented. In this case, the thing that matters for preventing in-place-build shenanigans is NotImplemented support. So there's third possible design, which is to have both NotImplemented *and* prepare_build_files together. But this doesn't seem very attractive, because once you have the NotImplemented thing then I don't know of any practical use cases for prepare_build_fiiles at all. -n -- Nathaniel J. Smith -- https://vorpus.org
On 4 July 2017 at 12:46, Nathaniel Smith <njs@pobox.com> wrote:
if hasattr(os, "symlink"): def prepare_build_files(tmp_dir): os.symlink(os.getpwd(), tmp_dir)
and now pip will do in-place builds. (Except on Windows, of course; Windows users get screwed as usual, but from scipy's perspective that's pip's fault.)
scipy shouldn't be relying on pip as their frontend for local development builds (except via pip -e), and if frontends see backends actively subverting their build policies, they'll be well within their rights to blacklist those backends (or, more likely, push them into a chroot or container, rather than just using a subprocess with normal user level access to the host filesystem). However, Donald's persuaded me that the cases where: 1. build_sdist will be called (i.e. not starting with a known sdist); and 2. build_sdist will throw an exception are going to be sufficiently rare that we can drop the input preparation hook, and instead just let the build fail in such cases (the conclusion of the previous round of discussions suggested we weren't going to be OK with that, so I'm happy to change my view based on the updated info). Going down that path also still leaves frontends with the option of looking for a pre-existing PKG-INFO file when handed an arbitrary directory and taking that into account when deciding which build strategy they want to use (in-tree or out-of-tree), and which copying strategy they use in the out-of-tree build case. Cheers, Nick. P.S. `build_sdist` should still indicate failure by throwing an exception, not by returning a non-string result. However, frontends should also fail the build when a backend doesn't adhere to the specification (so returning NotImplemented would technically be an alternative way to trigger a build failure, just not a recommended one) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
2) We have a VCS directory or “original development source” or whatever you want to call the thing you have before a sdist that typically gets into a sdist.> - Works on both proposals for setuptools and flit (since both can go from a VCS to a sdist).> - Thomas might have said he’d be unhappy if this case goes through a real sdist… I forget the specifics of that discussion now. Practical objection: besides it being a VCS checkout, you need the VCS tools available (e.g. git on $PATH). It's not hard to imagine cases where this doesn't hold, e.g. installing from a directory bind-mounted into a docker container. Between this and your case 3 (local directory not a VCS checkout), failures - while not common - won't be
On Tue, Jul 4, 2017, at 01:06 AM, Donald Stufft wrote: particularly rare. Principle objection: you don't want an sdist! You want the necessary files copied efficiently to a clean location. You're using something complex as a proxy for something simple. Prediction objection: If we end up with pip asking for an sdist when it's trying to build a wheel, I don't want to be endlessly explaining to people why it's broken. Nor do I want people to upload badly made sdists because flit doesn't have the necessary information to put extra files in there. Given the dominance of pip, I think my best option is to find a way for build_sdist to produce an sdist which pip accepts but pypi rejects if you try to upload it. I assume we all agree that's not optimal? So can we please leave the hook in place? (We could still avoid all of this if there was a way to trust the backend to build a wheel directly from the source directory, by the way.) Thomas
On 4 July 2017 at 08:22, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
Practical objection: besides it being a VCS checkout, you need the VCS tools available (e.g. git on $PATH). It's not hard to imagine cases where this doesn't hold, e.g. installing from a directory bind-mounted into a docker container. Between this and your case 3 (local directory not a VCS checkout), failures - while not common - won't be particularly rare.
Also, the use case of an unpacked sdist isn't all that uncommon. I've certainly downloaded a sdist (pip download foo is easier than "find the project homepage, look up the VCS URL, git clone it") and edited it before building in the past. Paul
On 4 July 2017 at 17:22, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Tue, Jul 4, 2017, at 01:06 AM, Donald Stufft wrote:
2) We have a VCS directory or “original development source” or whatever you want to call the thing you have before a sdist that typically gets into a sdist. - Works on both proposals for setuptools and flit (since both can go from a VCS to a sdist). - Thomas might have said he’d be unhappy if this case goes through a real sdist… I forget the specifics of that discussion now.
Practical objection: besides it being a VCS checkout, you need the VCS tools available (e.g. git on $PATH). It's not hard to imagine cases where this doesn't hold, e.g. installing from a directory bind-mounted into a docker container. Between this and your case 3 (local directory not a VCS checkout), failures - while not common - won't be particularly rare.
*sigh* I knew there was a reason I didn't want to rerun this particular argument :)
So can we please leave the hook in place?
+1, but we should explicitly note in the rationale section of the PEP that it's to cover both of the following cases: * build from an already unpacked and potentially edited sdist" * cleanly support explicitly out-of-tree builds even when the dependencies for working with the VCS aren't available Both Donald & I managed to forget that rationale between the first round of the argument and this reiteration of it, so I assume it isn't a particularly obvious point in general. Including the hook then leaves it up to frontends to decide whether they want to always use an out-of-tree build strategy or not. If pip makes that choice (as we expect it to), and some folks don't want that behaviour, we'll strongly encourage them to define a new local development focused frontend that uses an incremental build strategy by default, rather than subverting the build preparation hook. (Alternatively: pip has added an "--upgrade-strategy" hook to choose between eager and only-if-needed upgrades, and is likely to add a "--scheme" option to explicitly choose between working with the global, user, and venv installation sets, so it may be possible to make the case for adding a "--build-strategy" option that defaulted to the current "out-of-tree" model, but also allowed people to explicitly opt in to an "incremental" variant that executed an in-place build in the current venv) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, Jul 4, 2017, at 09:45 AM, Nick Coghlan wrote:
+1, but we should explicitly note in the rationale section of the PEP that it's to cover both of the following cases:
* build from an already unpacked and potentially edited sdist" * cleanly support explicitly out-of-tree builds even when the dependencies for working with the VCS aren't available
I'm happy to add a note about these requirements. Are you planning to merge some of Nathaniel's rewrite? If so, I'll hold off from making any changes until that is done. Thomas
On 4 July 2017 at 18:58, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Tue, Jul 4, 2017, at 09:45 AM, Nick Coghlan wrote:
+1, but we should explicitly note in the rationale section of the PEP that it's to cover both of the following cases:
* build from an already unpacked and potentially edited sdist" * cleanly support explicitly out-of-tree builds even when the dependencies for working with the VCS aren't available
I'm happy to add a note about these requirements. Are you planning to merge some of Nathaniel's rewrite? If so, I'll hold off from making any changes until that is done.
My proposed way forward is that I'll put together a PR tomorrow (my time, so Wed July 4 in UTC+10) that restructures things along the lines of Nathan's proposal, but doesn't make any functional changes to the PEP (except to rename some of the hooks to more clearly group them into related families). To be completely explicit, that grouping will be: Mandatory backend hooks: - build_sdist - build_wheel Optional backend hooks: - get_requires_for_build_sdist - get_requires_for_build_wheel - prepare_metadata_for_build_wheel - prepare_input_for_build_wheel The basis for the revised naming convention is: - all optional hooks are called X_for_Y, where Y must be a mandatory hook name - get_requires_* reports dynamic dependencies that can't be captured in pyproject.toml via environment markers - prepare_metadata generates wheel metadata without building binary extensions - prepare_input allows for out-of-tree builds even if the requirements for building an sdist aren't met Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Jul 4, 2017, at 3:22 AM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Tue, Jul 4, 2017, at 01:06 AM, Donald Stufft wrote:
2) We have a VCS directory or “original development source” or whatever you want to call the thing you have before a sdist that typically gets into a sdist. - Works on both proposals for setuptools and flit (since both can go from a VCS to a sdist). - Thomas might have said he’d be unhappy if this case goes through a real sdist… I forget the specifics of that discussion now.
Practical objection: besides it being a VCS checkout, you need the VCS tools available (e.g. git on $PATH). It's not hard to imagine cases where this doesn't hold, e.g. installing from a directory bind-mounted into a docker container. Between this and your case 3 (local directory not a VCS checkout), failures - while not common - won't be particularly rare.
It occurs to me that your case here is actually a reason *not* to implement this hook. The goal of the hook is that the wheel built from the tree created by copying this file is the same as the wheel built from a sdist created from that same VCS directory. However if you require the VCS tools in order to decide what files to include in the sdist, then you also need those tools to decide what files to copy into the temporary directory. Otherwise you’ll get different outputs. Which means that besides the unpacked sdist case, this actually breaks the from VCS case unless the prepare build files hook has all the same requirements as the build_sdist hook, in which case we lose the purpose of the hook to begin with. Maybe the problem boils down to the fact we’re trying to treat VCS directories and unpacked Sdists the same and maybe we should just add a file which *only* gets added to a sdist (e.g. not a development install or a wheel) similar to the .dist-info/WHEEL file that just acts as a marker for “Hey, I’m in an sdist” and if we hit that, then pip just does the copytree implementation. That absolves build backends from needing to worry about the unpacked sdist case, but makes it still function in 100% of the cases (assuming the build backend produces a correct sdist). I think that actually works in *more* cases without introducing weirdness where you get different results if the tools aren’t available via prepare build files. The main downsides I see are: (A) Docker bind mounts etc will require git etc to install when the sdist generation requires them, but I don’t think you can get around this as I said above. (B) It removes a place that a hypothetical build backend could use to mark that a unpacked sdist has been modified (e.g. it could store a RECORD with hashes of all the files, and if someone modifies it, when it builds the sdist it could add a local version to the version to indicate it’s been modified). However to my knowledge, nothing does (B) and (A) is unavoidable IMO. — Donald Stufft
It occurs to me that your case here is actually a reason *not* to implement this hook. The goal of the hook is that the wheel built from the tree created by copying this file is the same as the wheel built from a sdist created from that same VCS directory. However if you require the VCS tools in order to decide what files to include in the sdist, then you also need those tools to decide what files to copy into the temporary directory. Otherwise you’ll get different outputs. The function of the VCS is to identify extra files that are needed for an sdist that don't affect building a wheel (like docs and tests). It's easy for flit to identify everything necessary for building a wheel, but
On Tue, Jul 4, 2017, at 06:24 PM, Donald Stufft wrote: those things are not sufficient for a good sdist.
Maybe the problem boils down to the fact we’re trying to treat VCS directories and unpacked Sdists the same and maybe we should just add a file which *only* gets added to a sdist (e.g. not a development install or a wheel) similar to the .dist-info/WHEEL file that just acts as a marker for “Hey, I’m in an sdist”>
"Hey, I'm in something which was once an sdist, but may no longer be clean. Or someone has copied the marker file into a directory that was never an sdist to make something work because for some reason there's a difference in behaviour for directories that pip thinks come from sdists."
and if we hit that, then pip just does the copytree implementation. That absolves build backends from needing to worry about the unpacked sdist case, but makes it still function in 100% of the cases (assuming the build backend produces a correct sdist).
It still doesn't deal with the cases where you're not coming from an sdist but you can't get the VCS info for whatever reason. We had reached a compromise that we all seemed to be OK with - albeit not very keen on. I find it very frustrating that we seem to be rehashing the same arguments that got us there in the first place. Thomas
On Jul 4, 2017, at 1:35 PM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Tue, Jul 4, 2017, at 06:24 PM, Donald Stufft wrote:
It occurs to me that your case here is actually a reason *not* to implement this hook. The goal of the hook is that the wheel built from the tree created by copying this file is the same as the wheel built from a sdist created from that same VCS directory. However if you require the VCS tools in order to decide what files to include in the sdist, then you also need those tools to decide what files to copy into the temporary directory. Otherwise you’ll get different outputs.
The function of the VCS is to identify extra files that are needed for an sdist that don't affect building a wheel (like docs and tests). It's easy for flit to identify everything necessary for building a wheel, but those things are not sufficient for a good sdist.
For *flit* that may be the case, but this isn’t a hook solely for flit. It’s not hard to imagine that some backends will require git etc to correctly copy those files across. I don’t think we should look at adding special hooks for each possible set of requirements that a backend project might have, but that’s exactly what this hook is. If your backend has the same constraints as flit, then yea maybe it will work fine, but otherwise it won’t. Importantly, the most widely used build tool right now could not use any of it’s VCS plugins with this hook which I think says something.
Maybe the problem boils down to the fact we’re trying to treat VCS directories and unpacked Sdists the same and maybe we should just add a file which *only* gets added to a sdist (e.g. not a development install or a wheel) similar to the .dist-info/WHEEL file that just acts as a marker for “Hey, I’m in an sdist”
"Hey, I'm in something which was once an sdist, but may no longer be clean. Or someone has copied the marker file into a directory that was never an sdist to make something work because for some reason there's a difference in behaviour for directories that pip thinks come from sdists.”
I don’t care about starting an arms race with people who are going to purposely circumvent the operations of pip to try and force a square peg into a round hole. I think approximately zero people will do this by accident, and a tiny fraction will do it on purpose, and when they file bug reports I’ll just close them as invalid.
and if we hit that, then pip just does the copytree implementation. That absolves build backends from needing to worry about the unpacked sdist case, but makes it still function in 100% of the cases (assuming the build backend produces a correct sdist).
It still doesn't deal with the cases where you're not coming from an sdist but you can't get the VCS info for whatever reason.
I don’t think those cases are meaningfully different from any other case where a prerequisite for building isn’t installed that the frontend can’t automatically install. You error out and tell people to install the prerequisite.
We had reached a compromise that we all seemed to be OK with - albeit not very keen on. I find it very frustrating that we seem to be rehashing the same arguments that got us there in the first place.
I don’t think it’s (entirely) rehashing. The discussion has made me realize that the purported cases covered by the hook aren’t actually going to be covered except in a narrow set of circumstances, which suggests that it’s not actually a good hook. — Donald Stufft
On Tue, Jul 4, 2017 at 11:03 AM, Donald Stufft <donald@stufft.io> wrote:
I don’t think it’s (entirely) rehashing. The discussion has made me realize that the purported cases covered by the hook aren’t actually going to be covered except in a narrow set of circumstances, which suggests that it’s not actually a good hook.
The thought pattern that led to my counter-proposal is basically: This prepare_build_files hooks feels half-baked – the pip devs would just as rather leave it out, and if you look at Thomas's rationale for why it's important, it's to let him circumvent pip in the cases pip doesn't want him to circumvent, while he also points out that it adds overhead for no reason. I really honestly don't mind if we add something like this, but what we have now does *not* feel like an obvious one-way-to-do-it design that's ready for standardization. It feels like the sort of thing that we'll look back on in a few years and be like "why is this the way it is?" and the answer will be "well, you kinda had to be there". Okay, so is there any way to get the half-baked part out of the PEP 517 critical path? The idea came up of adding an extension namespace, and it seems to me that this is obviously a good idea totally independent of the specifics of which hooks make the cut for PEP 517. Nick has characterized it as being like the PEP 225 approach to matrix multiplication (PEP 225 proposed to add 6 new infix operators to Python with semantics to be specified later). But the inspiration is really more like - you know how people pop up on python-ideas saying "hey X should be added to the stdlib", and the response is "well, put it on PyPI first to work it out, and then we can talk"? You can't do that for infix operators, which is part of why it took *14 years* to sort out matrix multiplication. If we ever want to extend the build backend interface again, having some way to get that "try it first, standardize second" workflow is obviously helpful. And all it requires is reserving the 'extensions' attribute on backends, which has *zero* cost in terms of spec complexity or added code. Really that's enough on its own; when pip gets around to switching from copytree->sdist, they can hash out some prototype with Thomas, ship the new feature without breaking the world, and standardize whenever makes sense. Alternatively, there's the NotImplemented proposal, which is also sufficient to fix the immediate issue on its own. TBH I'm pretty confused about the reaction its gotten. The reason I like it is that we all seem to agree that "this backend has determined dynamically that it can't build an sdist" is an important case that we want to support. So a straightforward way of representing that in our spec seems like a nice future-proof thing. It's simple, and it makes sense totally independently of the details of pip's build pipeline. No-one's going to look back later and be like "why did you add a straightforward way for the frontend and backend to communicate about this thing that can happen?". But... for some reason everyone seems to think that it's very important that PEP 517 *handle* this case, but that also that it *pretend that this case doesn't exist*. I don't get it. Is the issue the slightly-weird use of a sentinel value (NotImplemented) instead of raising an error (like declaring that NotImplementedError is the standard way to indicate this issue?). I chose a sentinel because it avoids the chance of a bug in the hook causing an internal exception to leak out and be misinterpreted by the frontend (which I guess is why dunder methods use NotImplemented for exactly the analogous situation), but it doesn't really matter that much. Anyway, both of these proposals seem *obvious* in a way that prepare_build_files just isn't, and either one is sufficient to make PEP 517 workable. (Re: prepare_build_metadata: I basically agree with Donald that this is probably something we will add eventually, and it doesn't matter a lot whether we do it now or later. I went ahead and dropped it from my version of the proposal because I was trying to simplify to just the core proposal and it's very easy to add later, but really it's the issues around prepare_build_files where there's disagreement.) -n -- Nathaniel J. Smith -- https://vorpus.org
On 5 July 2017 at 03:24, Donald Stufft <donald@stufft.io> wrote:
It occurs to me that your case here is actually a reason *not* to implement this hook. The goal of the hook is that the wheel built from the tree created by copying this file is the same as the wheel built from a sdist created from that same VCS directory. However if you require the VCS tools in order to decide what files to include in the sdist, then you also need those tools to decide what files to copy into the temporary directory. Otherwise you’ll get different outputs.
I was already aware of this concern, and while it's a quality of implementation issue for backends to take into account, frontends should proceed on the assumption that backends *will* be consistent in that regard. Even with the optional hook defined, the simplest approach for backends to take is: 1. Don't implement prepare_input_for_build_wheel 2. Make sure their sdist building requirements are a strict subset of their wheel building requirements 3. Handle out of tree builds solely via build_sdist 4. Ensure that out of tree builds and in-place builds give the same result (modulo any build reproducibility limitations) This is the approach we'll need to take for setuptools, for example. The extra hook enables the path that Thomas wants to take with flit: 1. Implement prepare_input_for_build_wheel (which may be as simple as "copy everything which isn't in a hidden directory", or as complex as running a build dependency graph in reverse to get the full list of input artifacts actually needed by the wheel build process) 2. Implement get_build_sdist_requires to specify the additional dependencies to build an sdist that aren't needed to build a wheel (this also provides a convenient place to run checks with shutil.which() and/or the subprocess module to look for required non-Python dependencies and complain if they're missing) 3. Ensure that in-place builds, "build_sdist -> unpack sdist -> build_wheel" and "prepare_input_for_build_wheel -> build_wheel" give the same result (modulo any build reproducibility limitations) While it would definitely be useful to have a "check build consistency" tool that built wheel files via all defined paths and then used diffoscope to compare them, having such a tool available wouldn't be a prerequisite for PEP acceptance (just as having auditwheel available wasn't a requirement for accepting the manylinux1 specification). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Jul 4, 2017, at 11:53 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
While it would definitely be useful to have a "check build consistency" tool that built wheel files via all defined paths and then used diffoscope to compare them, having such a tool available wouldn't be a prerequisite for PEP acceptance (just as having auditwheel available wasn't a requirement for accepting the manylinux1 specification).
I’ve had a niggling feeling about this hook from the beginning, but I couldn’t quite put my finger on it until Nathaniel’s email made me realize it. I feel like this hook is really *only* useful for flit, and for other projects it is largely either going to be completely redundant or be an attractive nuisance that ends up only causing issues. It’s a pretty narrow use case where this hook is both able to do something AND doesn’t have the exact same requirements as build_sdist. When I felt this was a more generic hook, I was OK with it, but I don’t think it’s a good idea now that I’ve thought on it more and it feels entirely ungeneric. — Donald Stufft
On 5 July 2017 at 15:49, Donald Stufft <donald@stufft.io> wrote:
I’ve had a niggling feeling about this hook from the beginning, but I couldn’t quite put my finger on it until Nathaniel’s email made me realize it. I feel like this hook is really *only* useful for flit, and for other projects it is largely either going to be completely redundant or be an attractive nuisance that ends up only causing issues. It’s a pretty narrow use case where this hook is both able to do something AND doesn’t have the exact same requirements as build_sdist.
When I felt this was a more generic hook, I was OK with it, but I don’t think it’s a good idea now that I’ve thought on it more and it feels entirely ungeneric.
I don't think Thomas's plans for it are unusual, as it's normal for a build system to only be aware of the input files that are actually referenced by the build recipe, and also normal for published source archives to include additional files that *aren't* used by the build process for the binary artifacts. If you'd prefer some external validation for the concept, I see the "prepare_input_for_build_wheel" hook as fairly analagous to the "%prep" phase in the process of building an RPM: https://fedoraproject.org/wiki/How_to_create_an_RPM_package#.25prep_section The current difference is that we expect backends to be able to cope with frontends *not* calling that implicitly when building from an unpacked sdist. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Jul 5, 2017, at 2:02 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 5 July 2017 at 15:49, Donald Stufft <donald@stufft.io> wrote:
I’ve had a niggling feeling about this hook from the beginning, but I couldn’t quite put my finger on it until Nathaniel’s email made me realize it. I feel like this hook is really *only* useful for flit, and for other projects it is largely either going to be completely redundant or be an attractive nuisance that ends up only causing issues. It’s a pretty narrow use case where this hook is both able to do something AND doesn’t have the exact same requirements as build_sdist.
When I felt this was a more generic hook, I was OK with it, but I don’t think it’s a good idea now that I’ve thought on it more and it feels entirely ungeneric.
I don't think Thomas's plans for it are unusual, as it's normal for a build system to only be aware of the input files that are actually referenced by the build recipe, and also normal for published source archives to include additional files that *aren't* used by the build process for the binary artifacts.
Except in this case the build system and the thing that builds the binary artifacts are one in the same so it needs to know both of those things. This hook is only useful if you have two different mechanisms for declaring those types of files *AND* the requirements for the additional files imposes some other non-python level prerequisites on the system that could maybe be avoided. Quite literally, the only case I can think of that fits into this is flit’s “I will use git to figure out additional files, but you will have to configure in a static file the name of the Python package (as in import package) that you’re distributing and I’ll just glom down that entire package directory”. I know setuptools/distutils doesn’t work this way nor do any of the plugins that I am aware of. Best I can tell numpy.distutils nor enscons. Looking at https://github.com/takluyver/flit/blob/master/flit/sdist.py <https://github.com/takluyver/flit/blob/master/flit/sdist.py> (added in https://github.com/takluyver/flit/pull/106 <https://github.com/takluyver/flit/pull/106>) it appears that flit *also* doesn’t work this way, when it builds the sdist today it is looking at the VCS tracking and including any file mentioned in the VCS directory. So unless flit changes it’s logic so it only uses the “is this file in a VCS” for non installed files, it also can’t use this hook without the VCS tools installed and guarantee that the output is the same. So not only is this a hook that I think is only going to be used by flit, but flit itself can’t even use it right now without changing the logic on how they generate sdists to no longer reference the VCS for what installable files get added. The more I dig into this, the more I think Nathaniel is correct and we’re trying to add a hook without any real world experience guiding it’s inclusion that doesn’t actually solve the problem it’s trying to solve and which it’s primary use case is creating foot guns.
If you'd prefer some external validation for the concept, I see the "prepare_input_for_build_wheel" hook as fairly analagous to the "%prep" phase in the process of building an RPM: https://fedoraproject.org/wiki/How_to_create_an_RPM_package#.25prep_section
The current difference is that we expect backends to be able to cope with frontends *not* calling that implicitly when building from an unpacked sdist.
I don’t think this is external validation at all and the only similarities I see between he two hooks is they both reference the concept of “preparing” in their names. The %prep’s hook in RPMs primary use case is running tar to unpack a tarball and running patch on the resulting unpacked tarball (in fact, this is so common they have a %autosetup which basically does that). It’s really an entirely different hook from what we’re discussing here. — Donald Stufft
On Jul 5, 2017, at 11:19 AM, Donald Stufft <donald@stufft.io> wrote:
The more I dig into this, the more I think Nathaniel is correct and we’re trying to add a hook without any real world experience guiding it’s inclusion that doesn’t actually solve the problem it’s trying to solve and which it’s primary use case is creating foot guns.
Thinking about this more, I think the right thing to do here is remove this from the standard for now, and wait until we have real world experience with pip using the sdist hook for this purpose. We don’t know what the landscape is going to look like once this PEP has been out in the real world for a bit, and we’re just trying to guess. I suspect one of three scenarios will play out: 1) We’ll decide that the build_sdist hook is good enough, and we’ll just leave it as is and not feel the need to do anything further. 2) We’ll decide that there are cases where the build_sdist hook doesn’t solve the problem adequately, and we want to add an additional hook with different constraints. 3) Pip will decide that the landscape in a post PEP 517 world has changed significantly enough to revisit our decision about how we build projects to a way that removes the need for a special hook in general. Without real world experience, we don’t really know which one of (1, 2, 3) will be the optimal solution to this problem, and removing the hook *does* implicitly choose (1) for now, but (1) can be easily changed to (2) or (3) later on, but (2) and (3) cannot be changed to (1). So doing the simpler thing first gives us flexibility to adjust our solution once we have a chance to see how the ecosystem adjusts to this brave new world. — Donald Stufft
On 5 July 2017 at 16:19, Donald Stufft <donald@stufft.io> wrote:
Quite literally, the only case I can think of that fits into this is flit’s “I will use git to figure out additional files, but you will have to configure in a static file the name of the Python package (as in import package) that you’re distributing and I’ll just glom down that entire package directory”. I know setuptools/distutils doesn’t work this way nor do any of the plugins that I am aware of. Best I can tell numpy.distutils nor enscons.
I have to say I still have deep reservations about flit's approach of assuming/requiring that you're using VCS (git) to maintain your project. I know that in practical terms most people will be, but it still seems like a strong assumption to make. One of the consequences is that flit doesn't handle scenarios like "I unpacked a sdist" or "I downloaded the project archive from github and unpacked that" well. And the result of *that* is that we're putting in mechanisms in the PEP to manage that approach. Having said that, I think flit is a great project, and I don't think that my personal dislike of one specific design choice is particularly relevant here. Also, I expect flit to be an important backend, simply because it makes it dirt-simple to package up a pure python project. So I do think that the flit use case is important for PEP 517. One thought - at the moment, all of the debate seems to be over the PEP side of things. That's not surprising, as distutils-sig is for debating standards, not tool design. But are there any changes that might make sense for flit that could improve things? For example, add some fallback mechanisms to flit that mean that it *can* always build a sdist, even if it has to make guesses in the absence of a VCS (if there's no VCS, include everything, or if there's no VCS, only include the minimum needed to build the wheel - both seem reasonable choices, and either seems better than "refuse to build a sdist"). Paul
On Wed, Jul 5, 2017, at 05:08 PM, Paul Moore wrote:
is that flit doesn't handle scenarios like "I unpacked a sdist" or "I downloaded the project archive from github and unpacked that" well.
Flit handles these fine for everything *apart* from making an sdist. It can make a wheel, install the package, or symlink it as a development install. Hence why I'm so frustrated by the insistence that we must make an sdist when we have no need for an sdist. It's not a compromise I'm entirely happy with, but all the other options that we came up with had bigger problems, IMO.
On 5 July 2017 at 17:14, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Wed, Jul 5, 2017, at 05:08 PM, Paul Moore wrote:
is that flit doesn't handle scenarios like "I unpacked a sdist" or "I downloaded the project archive from github and unpacked that" well.
Flit handles these fine for everything *apart* from making an sdist. It can make a wheel, install the package, or symlink it as a development install. Hence why I'm so frustrated by the insistence that we must make an sdist when we have no need for an sdist.
It's not a compromise I'm entirely happy with, but all the other options that we came up with had bigger problems, IMO.
Apologies, I should have been clearer - I did indeed mean "in order to produce a sdist". I personally consider producing sdists and producing wheels to be the two fundamental responsibilities of a build system, so I dispute your statement that "we have no need for a sdist" (it's not just pip that needs to do it, tox for example also relies on building sdists). But this is off topic, and you've made your choice for flit, so I'll say no more. Paul
On Wed, Jul 5, 2017 at 9:14 AM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Wed, Jul 5, 2017, at 05:08 PM, Paul Moore wrote:
is that flit doesn't handle scenarios like "I unpacked a sdist" or "I downloaded the project archive from github and unpacked that" well.
Flit handles these fine for everything *apart* from making an sdist. It can make a wheel, install the package, or symlink it as a development install. Hence why I'm so frustrated by the insistence that we must make an sdist when we have no need for an sdist.
It's not a compromise I'm entirely happy with, but all the other options that we came up with had bigger problems, IMO.
What do you think of the compromise in the draft that I posted at the beginning of this thread? The idea there is that flit would be responsible for providing the operations "build an sdist (or say that it can't)" and "build a wheel", and then if pip tries to build an sdist and flit tells it that it can't, it's pip's problem to figure out how it wants to handle that. Of course you'll still probably want to argue with the pip devs about how they handle this, and whether they can support in-place builds, etc., but at least those arguments stop being blockers for PEP 517. -n -- Nathaniel J. Smith -- https://vorpus.org
On 6 July 2017 at 07:45, Nathaniel Smith <njs@pobox.com> wrote:
On Wed, Jul 5, 2017 at 9:14 AM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Wed, Jul 5, 2017, at 05:08 PM, Paul Moore wrote:
is that flit doesn't handle scenarios like "I unpacked a sdist" or "I downloaded the project archive from github and unpacked that" well.
Flit handles these fine for everything *apart* from making an sdist. It can make a wheel, install the package, or symlink it as a development install. Hence why I'm so frustrated by the insistence that we must make an sdist when we have no need for an sdist.
It's not a compromise I'm entirely happy with, but all the other options that we came up with had bigger problems, IMO.
What do you think of the compromise in the draft that I posted at the beginning of this thread? The idea there is that flit would be responsible for providing the operations "build an sdist (or say that it can't)" and "build a wheel", and then if pip tries to build an sdist and flit tells it that it can't, it's pip's problem to figure out how it wants to handle that. Of course you'll still probably want to argue with the pip devs about how they handle this, and whether they can support in-place builds, etc., but at least those arguments stop being blockers for PEP 517.
Along those lines, I realised there's a variant of your "return NotImplemented" proposal that I actually like: we can actively encourage backends like flit that have additional requirements for building sdists to check for those external dependencies in `get_requires_for_build_sdist` and raise an exception if they're missing. That is, the "get_requires_*" hooks would have a dual responsibility: - fail outright if external dependencies are missing - otherwise report any Python level dependencies that the backend wants the frontend to install That would give us the following situation: - as long as the build environment has the relevant VCS tools available sdist-based builds for flit will "just work" - plenty of build environments are already going to have VCS tools routinely available anyway, and if they don't, adding them often won't be that big a deal - for the "unpacked sdist that the frontend doesn't know is an unpacked sdist" case, a backend like flit can use either PKG-INFO or else its own custom sdist marker file to detect unpacked sdist directories and just tar them back up to create a fresh sdist for the frontend to unpack
From a frontend evolution perspective, we'd then be anticipating one or the other of the following outcomes:
- pip gains a --build-strategy option to choose between 1) sdist based out-of-tree builds (the default); 2) copytree based out-of-tree builds; 3) in-place incremental builds - we eventually decide to revise the backend interface to add back a non-sdist based build tree preparation hook That frontend-centric `--build-strategy sdist|copytree|in-place` idea is starting to sound to me like it may be a better way to address both Thomas's concern about still being able to build a wheel when the requirements for creating an sdist aren't met ("--build-strategy copytree" or "--build-strategy in-place") and Nathaniel's concern about making it easy to share object files between successive builds ("--build-strategy in-place"). Recommending a frontend option like that also has a significant added benefit over the optional backend hook: it better abides by the principle of "In the face of ambiguity, refuse the temptation to guess" (with the ambiguity in this case being "What behaviour will people want if a frontend uses sdist-based out-of-tree wheel builds by default, but building the sdist fails in a situation where building the wheel directly would succeed?") Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Thank-you all for the discussion and the attempts to accommodate flit, but I'll bow out now. It's become clear that the way flit approaches packaging is fundamentally incompatible with the priorities other people have for the ecosystem. Namely, I see sdists as archival artifacts to be made approximately once per release, but the general trend is to make them a key part of the build pipeline. Making a guerilla tool with no concern for integration was fun. It became frustrating as people began to use it and expected it to play well with other tools, so I jumped on PEP 517 as a way to bring it into the fold. That didn't work out, and a tool that doesn't play well with pip can only be an attractive nuisance at best, even if it technically complies with the relevant specs. Flit is therefore deprecated, and I recommend anyone using it migrate back to setup.py packaging. Best wishes, Thomas
On Thu, Jul 6, 2017 at 8:57 PM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
Thank-you all for the discussion and the attempts to accommodate flit, but I'll bow out now. It's become clear that the way flit approaches packaging is fundamentally incompatible with the priorities other people have for the ecosystem. Namely, I see sdists as archival artifacts to be made approximately once per release, but the general trend is to make them a key part of the build pipeline.
For the record: your view makes perfect sense to me, and is conceptually cleaner than the one that PEP 517 in its current form prefers. Making a guerilla tool with no concern for integration was fun. It
became frustrating as people began to use it and expected it to play well with other tools, so I jumped on PEP 517 as a way to bring it into the fold. That didn't work out, and a tool that doesn't play well with pip can only be an attractive nuisance at best, even if it technically complies with the relevant specs.
Flit is therefore deprecated, and I recommend anyone using it migrate back to setup.py packaging.
I hope you'll reconsider that deprecation - flit is one of only two (AFAIK) active attempts at making a saner build tool (enscons being the other one), and does have real value I think. Either way, thanks for all the effort you put in! Ralf
On 6 July 2017 at 11:26, Ralf Gommers <ralf.gommers@gmail.com> wrote:
I hope you'll reconsider that deprecation - flit is one of only two (AFAIK) active attempts at making a saner build tool (enscons being the other one), and does have real value I think.
Agreed. In spite of the fact that I've been part of the pushback you've had over flit's approach, I nevertheless feel that flit is a major step forward in providing a user-friendly project packaging tool for Python. Even if you don't wish to continue developing flit, I hope that someone takes up the reins and continues to develop the ideas introduced with flit, of having a straightforward, "do the simplest thing needed" approach to packaging the majority of projects.
Either way, thanks for all the effort you put in!
Agreed, and my apologies for any contribution my feedback may have made to the pressure you felt that led you to this decision. Paul
On Thu, Jul 6, 2017, at 11:55 AM, Paul Moore wrote:
On 6 July 2017 at 11:26, Ralf Gommers <ralf.gommers@gmail.com> wrote:
I hope you'll reconsider that deprecation - flit is one of only two (AFAIK) active attempts at making a saner build tool (enscons being the other one), and does have real value I think.
Agreed. In spite of the fact that I've been part of the pushback you've had over flit's approach, I nevertheless feel that flit is a major step forward in providing a user-friendly project packaging tool for Python.
Thanks both, and Matthias. I'd reconsider it if I could see a reliable way to support pip installing from a local directory. But at present, it seems unavoidable that pip will require building an sdist, and I can't see a sufficiently reliable way for flit to to do that. I compromised on requiring a VCS to build an sdist for release, but I consider that an unacceptable restriction for installing from source. Flit could cheat and build a partial sdist for pip to unpack and build a wheel from, but that becomes a problem if other tools use the hook to generate an sdist for release. So I see no good options for flit to be a good backend, and trying to argue for the spec to be something I can work with is exhausting. Thomas
On Jul 6, 2017, at 6:55 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 6 July 2017 at 11:26, Ralf Gommers <ralf.gommers@gmail.com> wrote:
I hope you'll reconsider that deprecation - flit is one of only two (AFAIK) active attempts at making a saner build tool (enscons being the other one), and does have real value I think.
Agreed. In spite of the fact that I've been part of the pushback you've had over flit's approach, I nevertheless feel that flit is a major step forward in providing a user-friendly project packaging tool for Python. Even if you don't wish to continue developing flit, I hope that someone takes up the reins and continues to develop the ideas introduced with flit, of having a straightforward, "do the simplest thing needed" approach to packaging the majority of projects.
I agree completely.
Either way, thanks for all the effort you put in!
Agreed, and my apologies for any contribution my feedback may have made to the pressure you felt that led you to this decision.
Likewise. — Donald Stufft
On Jul 6, 2017, at 6:26 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, Jul 6, 2017 at 8:57 PM, Thomas Kluyver <thomas@kluyver.me.uk <mailto:thomas@kluyver.me.uk>> wrote: Thank-you all for the discussion and the attempts to accommodate flit, but I'll bow out now. It's become clear that the way flit approaches packaging is fundamentally incompatible with the priorities other people have for the ecosystem. Namely, I see sdists as archival artifacts to be made approximately once per release, but the general trend is to make them a key part of the build pipeline.
For the record: your view makes perfect sense to me, and is conceptually cleaner than the one that PEP 517 in its current form prefers.
The fundamental problem here is that sdists *are* a key part of the build pipeline and are always going to be unless pip stops supporting sdists all together. I think it is a complete non-starter to suggest removing installation from sdist support from pip (particularly since it would immediately lose support for every platform but Windows, MacOS and many common Linux’s (but not all of them!). Given that (IMO) we can’t remove the fact that sdists are a key part of the pipeline of going from a directory on some developer’s machine to installed on some user’s machine, the question then becomes do we want to try and push things towards only having *one* primary flow through the state machine of Python’s packaging, or do we want to support transitions that allow you to “skip” steps. My opinion is (obviously at this point I think) that the fewer ways we have to go from some directory on some developer’s machine to installed on some user’s machine, the more robust the entire ecosystem becomes and the less likely we’re going to end up with projects getting weird packaging related bugs that depend on some sort of specific mechanism for installation. It’s a systematic solution to a problem that crops up over and over and over again on project after project where ensuring it else wise can be a lot of work [1]. [1] For reference, trying to systematically solve the problem in one project involved about a weeks worth of effort for me trying different things to get the packaging in a state that it could reliably be tested end to end in a way that is unlikely (but not impossible!) to vary depending on if someone installs in develop or from sdist, or from a wheel, or from a local directory etc. Some of that pain was due to distutils/setuptools, some of it was due to Python itself, and some of it was just inherent in the fact that when you have a combinatorial of installation paths variation is inevitable. My view is if fixing a project’s packaging bug took me a ~week, someone less steeped in lore is going to have a *really* rough time of it. — Donald Stufft
On 6 July 2017 at 15:54, Donald Stufft <donald@stufft.io> wrote:
The fundamental problem here is that sdists *are* a key part of the build pipeline and are always going to be unless pip stops supporting sdists all together. I think it is a complete non-starter to suggest removing installation from sdist support from pip (particularly since it would immediately lose support for every platform but Windows, MacOS and many common Linux’s (but not all of them!).
I wonder how true this is. Certainly the route "acquire sdist -> unpack -> build wheel -> install" is a fundamental route, as is "acquire wheel -> install". But as Nick pointed out, the awkward cases are all in the *other* area, which is "get a random source tree -> ??? -> install". That's where all the debate about isolated builds, incremental compiles, etc, occur. We've been focusing on the sdist as a means of copying trees, and maybe that makes it feel like sdists are more fundamental than maybe they need to be. The fundamental operation is really "copy this arbitrary source tree safely". I'm not sure I have a solution here, but as a starting point maybe we need to conceptually separate source trees into "publishing trees" (ones that the backend is capable of building a sdist from) and "build trees" (ones that the backend only supports building wheels from). Whether unpacking a sdist gives a publishing tree or a build tree is backend-defined (setuptools says yes, flit says no) but frontends need to deal with the distinction cleanly. Isolated and incremental build questions are answered differently depending on whether you have a publishing or a build tree. And some of those questions prompt the need for copying trees (or at least creating equivalent build trees from whatever you have). For a publishing tree, "make sdist and unpack" works, but that's not possible for a build tree. There's also the fact that tox uses sdists to populate its environments. But bluntly, that's tox's problem, not distutils-sig's. How tox handles flit-based projects is a different question, that we don't really have the relevant experts present here to answer. The same is true of any *other* potential consumers of PEP 517 backends such as hypothetical "unified sdist builders". I'm inclined to say that we shouldn't even try to consider these, but should limit PEP 517 to the pip (or equivalent) <-> backend interface. Future PEPs can expand the interface as needed. I don't know if any of this helps. If not, that's fine (it at least helped me to clarify my thinking about source trees and sdists). But I'm posting it in case it prompts any new insights. Paul
On Jul 6, 2017, at 11:36 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 6 July 2017 at 15:54, Donald Stufft <donald@stufft.io> wrote:
The fundamental problem here is that sdists *are* a key part of the build pipeline and are always going to be unless pip stops supporting sdists all together. I think it is a complete non-starter to suggest removing installation from sdist support from pip (particularly since it would immediately lose support for every platform but Windows, MacOS and many common Linux’s (but not all of them!).
I wonder how true this is. Certainly the route "acquire sdist -> unpack -> build wheel -> install" is a fundamental route, as is "acquire wheel -> install". But as Nick pointed out, the awkward cases are all in the *other* area, which is "get a random source tree -> ??? -> install". That's where all the debate about isolated builds, incremental compiles, etc, occur. We've been focusing on the sdist as a means of copying trees, and maybe that makes it feel like sdists are more fundamental than maybe they need to be. The fundamental operation is really "copy this arbitrary source tree safely”.
By my saying that they are a key part, I mean we can’t ever (reasonably) stop supporting that route so our options for those “other” areas are to either try to push them onto that same route or to just say that multiple routes are a thing and we just need deal with that fact and support multiple routes. My rationale for reusing sdists for “copy this arbitrary source tree safely” is if you have a separate hook for “make this sdist, which we will eventually build a wheel from” and “copy this arbitrary source tree, which we will eventually build a wheel from” then you are *going* to end up with variations where VCS -> Sdist -> Wheel -> Installed ends up with a different result than VCS -> Wheel -> Installed. The best we can hope for in that hypothetical is that the variations are minor enough that they don’t generally cause problems. The most common case for problems in this area are going to come from disparity in the list of files that get added to a sdist and the list of files that get installed. We can see this today with setuptools/distutils and MANIFEST.in controlling a file being added to the sdist, but package, py_modules, package_data, etc controlling what gets installed. This is *NOT* however a unique problem to setuptools/ditutils, for instance it appears the same issue can occur with enscons if your list of files that you pass into env.Whl includes files that you accidentally left out of env.SDist. When I looked at flit it also suffered the same problem if you forgot to commit a file to the VCS repository (which meant it wouldn’t get added to the sdist) but would be included in any wheels created from that directory. That’s not the only cases though, even if you get a build backend that is absolutely perfect about ensuring that a wheel created from a VCS directory will be close enough to a wheel created from a sdist that was created from a VCS directory you still have the fact that you can have extra files sitting in those directories that aren’t getting included. This would show up in cases like when you’re using a ``-e .`` install (which to be fair, this PEP doesn’t touch, but it’s still something to keep in mind) or even just the common case where you end up with trying to run your thing in a virtual environment but ``.`` is first on sys.path and you end up importing the copy that is sitting in your VCS (and thus has that extra file). Can each backend strive to implement this correctly and solve this problem that way? Yes absolutely. However in reality, good intentions don’t work and these issues are going to crop up in each backend and have to get resolved in each backend. Maybe the number of backends will be so small that this isn’t that big of a deal. However using sdist here is a pragmatic, systematic solution that completely side steps the entire class of problems for most cases (sans -e . unfortunately).
I'm not sure I have a solution here, but as a starting point maybe we need to conceptually separate source trees into "publishing trees" (ones that the backend is capable of building a sdist from) and "build trees" (ones that the backend only supports building wheels from). Whether unpacking a sdist gives a publishing tree or a build tree is backend-defined (setuptools says yes, flit says no) but frontends need to deal with the distinction cleanly.
Isolated and incremental build questions are answered differently depending on whether you have a publishing or a build tree. And some of those questions prompt the need for copying trees (or at least creating equivalent build trees from whatever you have). For a publishing tree, "make sdist and unpack" works, but that's not possible for a build tree.
This is similar to a thing I said above I think, where I would be happy adding an official marker to the inside of a sdist (similar to the .dist-info/WHEEL file) that can be used to generically determine if something is an unpacked sdist or not. In this case if we ran build_sdist inside of an unpacked sdist and it returned a NotImplemented marker, then we could fall back to just copying the tree (or building in pace if that’s what a front end wanted to do).
There's also the fact that tox uses sdists to populate its environments. But bluntly, that's tox's problem, not distutils-sig's. How tox handles flit-based projects is a different question, that we don't really have the relevant experts present here to answer. The same is true of any *other* potential consumers of PEP 517 backends such as hypothetical "unified sdist builders". I'm inclined to say that we shouldn't even try to consider these, but should limit PEP 517 to the pip (or equivalent) <-> backend interface. Future PEPs can expand the interface as needed.
I mean, I think they are our problems too. The ecosystem is made better by the fact tox exists and considering our impact there is important. That doesn’t mean that we should bend over backwards to contort the PEP to fit tox but I also don’t think we should dismiss it out of hand as someone else’s problem.
I don't know if any of this helps. If not, that's fine (it at least helped me to clarify my thinking about source trees and sdists). But I'm posting it in case it prompts any new insights.
Paul
— Donald Stufft
On Thu, Jul 6, 2017 at 10:57 AM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
Thank-you all for the discussion and the attempts to accommodate flit, but I'll bow out now. It's become clear that the way flit approaches packaging is fundamentally incompatible with the priorities other people have for the ecosystem. Namely, I see sdists as archival artifacts to be made approximately once per release, but the general trend is to make them a key part of the build pipeline.
Thanks Thomas for all the effort you've put into flit, the reviving of this pep, and the draft implementation you made. I'm sad to see you move away from this discussion, but I can understand it can be exhausting with the large number of exchanges and being twice almost done.
Making a guerilla tool with no concern for integration was fun. It became frustrating as people began to use it and expected it to play well with other tools, so I jumped on PEP 517 as a way to bring it into the fold. That didn't work out, and a tool that doesn't play well with pip can only be an attractive nuisance at best, even if it technically complies with the relevant specs.
Flit is therefore deprecated, and I recommend anyone using it migrate back to setup.py packaging.
This make me sad as well, honestly flit is the only tool I can remember how to use without having to lookup information online and was really pleasant to use. As long as it still work as is I might still continue publishing packages with it – even if its wheel only. It would have been nice to have some integration with pip though without too much complexity for you. I join Ralf in both his comments, and also hope the PEP will get back to a state were you'll consider un-deprecating flit and reintegrating your work. Thanks, -- Matthias
Best wishes, Thomas _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 6 July 2017 at 18:57, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
Thank-you all for the discussion and the attempts to accommodate flit, but I'll bow out now. It's become clear that the way flit approaches packaging is fundamentally incompatible with the priorities other people have for the ecosystem.
While I can completely understand how the current debate over whether or not the prepare_input_for_build_wheel hook is necessary or not would make you feel that way, I hope I can convince you that we're really just quibbling over a genuinely trivial arcane technical detail that I'd never let get in the way of flit being a full-fledged participant in the Python packaging ecosystem. So I'll be completely clear: if *you* want that hook to be part of the API, that's sufficient reason for me to approve keeping it in the specification. However, I'll also make one last attempt at explaining why we suspect it may actually be redundant, especially if some additional configuration settings were added to pip. (I'll also note that I have an entirely selfish reason for ensuring that you're happy to continue active development on flit, which is that I've found it to be a delight to use for my own pure Python projects)
Namely, I see sdists as archival artifacts to be made approximately once per release, but the general trend is to make them a key part of the build pipeline.
Sort of.
From my point of view, the single most critical build path is:
1. Publisher uploads sdist to PyPI 2. Consumers and redistributors do their own from source builds targeting the binary format of their choice That's the path that enables the "publish my stuff once, let redistributors deal with getting that source code into arbitrary binary formats" dream: http://www.curiousefficiency.org/posts/2016/09/python-packaging-ecosystem.ht... However, it's closely followed by the beginner-friendly: 1. Publisher uploads universal wheels or wheels for popular platforms to PyPI 2. Consumers use the appropriate wheel rather than doing their own builds Your relative priorities for those two paths are presumably the other way around, and that's fine. The only impact the current debate over whether or not to include a dedicated out-of-tree input preparation hook has on either of those paths relates to how much implicit testing the sdist building hook gets when developers do their own local builds (as if you're not using something like tox for your local testing, it's otherwise fairly easy to inadvertently publish sdists that don't actually include all the files they need to successfully build a wheel file). That is, the current point of contention is specifically about how we want tools to behave when we're starting with a source directory that: 1. Doesn't include VCS metadata (e.g. it's been exported as a tarball rather than cloned) 2. The build frontend doesn't want to use as the basis for an in-place build 3. The build frontend doesn't want to blindly copy into a separate build directory So just by way of those preconditions, we're already well outside the most common package installation workflows. Now, I'm personally entirely OK with backend developers saying that dealing with that scenario is entirely a frontend problem, since the problem is in large part *created* by the frontend's refusal to either do an in-place build, or just blindly copy the entire directory. It's a reasonable stance to take, and frontends and their users are in the best position to know how they want to handle it for their particular use case. That perspective is embodied in the hypothetical proposal to add a "--build-strategy" option to pip that would allow folks building wheels to choose between: - creating and unpacking an sdist and building a wheel from that - copying the directory tree and building a wheel from that - building a wheel directly from the original directory (Perhaps with a variant that tries to create and unpack the sdist first, and only if that fails falls back to copying the entire tree)
From a backend point of view, developers would then only need to worry about two cases:
- given a directory, either create an sdist, or fail, and report why it didn't work - given a directory, either create a wheel, or fail, and report why it didn't work The rest of the UX becomes a frontend developer concern. However, if you don't find that perspective compelling, and would strongly prefer to include a way to let frontends skip building an sdist just to do an out-of-tree wheel build, then I *don't* think that's a barrier to accepting the PEP. It would just be up to frontends how they wanted to prioritise that hook: - always use it in preference to build_sdist if present - use it as a fallback if build_sdist fails - never use it (for frontends that specifically want the sdist) The benefit I see to that approach is that it means that backend developers can clearly communicate the difference between what's required to make a source archive that's good enough to later build the wheel archive, and what's required to actually capture all the files that the publisher wants to publish.
Making a guerilla tool with no concern for integration was fun. It became frustrating as people began to use it and expected it to play well with other tools, so I jumped on PEP 517 as a way to bring it into the fold. That didn't work out, and a tool that doesn't play well with pip can only be an attractive nuisance at best, even if it technically complies with the relevant specs.
Flit is therefore deprecated, and I recommend anyone using it migrate back to setup.py packaging.
I stated this above, but it bears repeating: as the one currently offering to do the work for PEP 517, you *do* have the right to say that your willingness to make that contribution is contingent on the "prepare_input_for_build_wheel" hook being included in the design. While I see merit in the arguments raised against it, I'm personally OK with it, and ultimately, that's part of why we have the BDFL-Delegate system: so we don't lose potentially major contributions over details that ultimately don't matter all that much. The idea is for folks to end up mad at *me* for any such decisions that they don't like, rather than at the folks offering to volunteer their time to help improve the ecosystem for everyone (in this case, you). Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Jul 6, 2017, at 10:38 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
if you're not using something like tox for your local testing, it's otherwise fairly easy to inadvertently publish sdists that don't actually include all the files they need to successfully build a wheel file
Even if you *are* using tox, it is super easy to do this, because of the way Python’s import semantics work, it’s incredibly easy to run your tests against the version that is sitting in your local directory instead of the version that tox just installed into the virtual environment. If you use something like a top level ``tests/`` directory alongside your ``foobar/`` directory, this becomes entirely unavoidable even. To do this correctly requires moving your ``foobar/`` directory down a level into a ``src/`` directory, leaving the ``tests/`` directory at the top level and then using something like setup()’s package_dir to deal with that change. This of course then breaks other things like coverage.py where you then need to spend a bit of effort configuring coverage.py to understand that the code you’re running is going to be inside of a virtual environment in tox, and not in your local directory. There’s a lot of history to unpack in these PRs, and it’s not really required reading, but if you feel like diving into this more, you can see me trying to do everything I could to avoid the above mess on the cryptography projects, and eventually giving up and just dealing with the src/ directory at: * https://github.com/pyca/cryptography/pull/1468 <https://github.com/pyca/cryptography/pull/1468> * https://github.com/pyca/cryptography/pull/1469 <https://github.com/pyca/cryptography/pull/1469> * https://github.com/pyca/cryptography/pull/1470 <https://github.com/pyca/cryptography/pull/1470> It is *really* hard to test that your package works when installed and it requires on ensuring that a fairly arcane set of circumstances never change but which are completely non obvious that they’d effect that in the slightest. I suspect that the vast bulk of projects using tox are *not* actually testing against the installed sdist but are instead testing the local copy sitting in . — Donald Stufft
Thanks Nick for the detailed reply. I have read it carefully, and you've probably convinced me to get back on board. Some more responses inline: On Thu, Jul 6, 2017, at 03:38 PM, Nick Coghlan wrote:
While I can completely understand how the current debate over whether or not the prepare_input_for_build_wheel hook is necessary or not would make you feel that way, I hope I can convince you that we're really just quibbling over a genuinely trivial arcane technical detail that I'd never let get in the way of flit being a full-fledged participant in the Python packaging ecosystem.
To be clear, I don't particularly care for the hook. I can see that it's something of a kludge between two competing approaches. What is important to me is that if a user installs from source the obvious way (pip install . ), failure to build an sdist does not result in a failure to install. The extra hook was one approach to that, but it's also OK by me if it tries to make an sdist and falls back to either copytree or an inplace build.
That is, the current point of contention is specifically about how we want tools to behave when we're starting with a source directory that:
1. Doesn't include VCS metadata (e.g. it's been exported as a tarball rather than cloned) 2. The build frontend doesn't want to use as the basis for an in-place build 3. The build frontend doesn't want to blindly copy into a separate build directory
So just by way of those preconditions, we're already well outside the most common package installation workflows.
One of my concerns in this debate is that this is presented as a very rare corner case that we don't have to worry about too much. I agree that it's not the most common case, but I think it's common enough that we should care about making it easy, given that: - Condition 1 also covers directories with VCS metadata where the VCS tools are not on $PATH. Another case occurred to me recently: Windows users who have installed git but not added it to the default PATH. - Conditions 2 and 3 seem likely to be the default for a source install with pip. As an order of magnitude, I'd estimate this is ~10% of installs from a source directory - which is to say, moderately common.
That perspective is embodied in the hypothetical proposal to add a "--build-strategy" option to pip that would allow folks building wheels to choose between:
- creating and unpacking an sdist and building a wheel from that - copying the directory tree and building a wheel from that - building a wheel directly from the original directory
(Perhaps with a variant that tries to create and unpack the sdist first, and only if that fails falls back to copying the entire tree)
This could be useful flexibility for advanced users. But I worry that pip will use the 'sdist' build strategy by default, and expect users to handle cases where that fails. I think this would be a mistake. From a user perspective, it would mean: - "pip install ." is the recommended way to install from source, but in some situations it doesn't work. - Adding the mystic incantation "--build-strategy direct" makes it work, and from a user perspective makes absolutely no difference to the result. Of course, I also have a vested interest in things not working this way: I would get a steady trickle of people asking "why does flit require a VCS to install from source?" From my perspective, it doesn't require that, but I would be unable to 'fix' it. Donald:
I think it is a complete non-starter to suggest removing installation from sdist support from pip
I'm certainly not suggesting that (hopefully this was already clear, but just in case ;-)
the question then becomes do we want to try and push things towards only having *one* primary flow through the state machine of Python’s packaging, or do we want to support transitions that allow you to “skip” steps.
My idealised view of the state machine is something like this: wheel <-- source tree <--> sdist I agree that there's a problem with losing important data when you go [source tree --> sdist --> source tree] - in fact this is one of the pain points I was trying to avoid with flit. But I don't like the idea of solving that by saying that all wheels must have passed through an sdist; it feels like a redundant there-and-back-again journey. So how else could we tackle the systematic problem? It's definitely a good idea to ensure that [stree --> sdist --> stree --> wheel] doesn't miss out anything that [stree --> wheel] includes, but I'd focus on doing this in developer tools, e.g.: 1. Tools such as flit could check it when you're building a release 2. Tools running on CI services could build both and compare them 3. Bots could scan PyPI for projects with both a .whl and a .tar.gz, build a wheel from the tarball, compare them, and notify the maintainer if there's a problem. In the short term, I reckon that 2 is the most promising - we can make a convenient pip-installable tool and promote it as good practice for testing that your builds work. But in any case, I see a range of options for tackling this while leaving open the direct [stree --> wheel] pathway.
When I looked at flit it also suffered the same problem if you forgot to commit a file to the VCS repository (which meant it wouldn’t get added to the sdist)
You have to explicitly ignore a file to hit this. If you have untracked but non-ignored files in your repo, flit will refuse to build an sdist at all. I recognise that this is quite strict and still doesn't entirely prevent the issue, and I may refine it in the future, but I hope it makes such problems hard to hit accidentally. Thomas
On Jul 6, 2017, at 12:35 PM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
Thanks Nick for the detailed reply. I have read it carefully, and you've probably convinced me to get back on board. Some more responses inline:
On Thu, Jul 6, 2017, at 03:38 PM, Nick Coghlan wrote:
While I can completely understand how the current debate over whether or not the prepare_input_for_build_wheel hook is necessary or not would make you feel that way, I hope I can convince you that we're really just quibbling over a genuinely trivial arcane technical detail that I'd never let get in the way of flit being a full-fledged participant in the Python packaging ecosystem.
To be clear, I don't particularly care for the hook. I can see that it's something of a kludge between two competing approaches.
What is important to me is that if a user installs from source the obvious way (pip install . ), failure to build an sdist does not result in a failure to install. The extra hook was one approach to that, but it's also OK by me if it tries to make an sdist and falls back to either copytree or an inplace build.
I *think* if we had some way to signal expected failure vs unexpected failure this would be reasonable to me. I wouldn’t just want it to flat out be any failure, but if we used Nathaniels NotImplemented idea or something similar to indicate that “hey, I can’t build an sdist here for expected reasons” compared to “Hey I tried to build the sdist, but something went wrong” I think that would be workable. I think it’s most likely in pip we’d implement it as a copytree (at least to start, possibly when we have more experience with other build backends that could be relaxed to inplace).
That is, the current point of contention is specifically about how we want tools to behave when we're starting with a source directory that:
1. Doesn't include VCS metadata (e.g. it's been exported as a tarball rather than cloned) 2. The build frontend doesn't want to use as the basis for an in-place build 3. The build frontend doesn't want to blindly copy into a separate build directory
So just by way of those preconditions, we're already well outside the most common package installation workflows.
One of my concerns in this debate is that this is presented as a very rare corner case that we don't have to worry about too much. I agree that it's not the most common case, but I think it's common enough that we should care about making it easy, given that:
- Condition 1 also covers directories with VCS metadata where the VCS tools are not on $PATH. Another case occurred to me recently: Windows users who have installed git but not added it to the default PATH. - Conditions 2 and 3 seem likely to be the default for a source install with pip.
As an order of magnitude, I'd estimate this is ~10% of installs from a source directory - which is to say, moderately common.
Unfortunately metrics is hard in OSS software, I’d love to have pip have metrics so we could bring real numbers to the discussion to try and figure out what cases are more common than other cases and by how much. I do know that pip downloaded 12 million sdists from PyPI yesterday (and 28 million wheels) but how that compares to the number of people doing ``pip install .`` for varying states of a tree in ``.`` we really don’t know besides guessing.
That perspective is embodied in the hypothetical proposal to add a "--build-strategy" option to pip that would allow folks building wheels to choose between:
- creating and unpacking an sdist and building a wheel from that - copying the directory tree and building a wheel from that - building a wheel directly from the original directory
(Perhaps with a variant that tries to create and unpack the sdist first, and only if that fails falls back to copying the entire tree)
This could be useful flexibility for advanced users. But I worry that pip will use the 'sdist' build strategy by default, and expect users to handle cases where that fails. I think this would be a mistake. From a user perspective, it would mean:
- "pip install ." is the recommended way to install from source, but in some situations it doesn't work. - Adding the mystic incantation "--build-strategy direct" makes it work, and from a user perspective makes absolutely no difference to the result.
Of course, I also have a vested interest in things not working this way: I would get a steady trickle of people asking "why does flit require a VCS to install from source?" From my perspective, it doesn't require that, but I would be unable to 'fix' it.
From my perspective, I would prefer not to add a —build-strategy flag [1] to pip and would rather have some generic solution that just generally works OR raises a clear error. I agree that I suspect for most people this flag would just end up being some “make it work” turd they cargo cult around (which likely made one scenario work, but broke another scenario). Maybe it’s useful as something for advanced users, but that’s more of a pip discussion then a discussion for this PEP.
Donald:
I think it is a complete non-starter to suggest removing installation from sdist support from pip
I'm certainly not suggesting that (hopefully this was already clear, but just in case ;-)
Oh no, I didn’t think you were advocating for that. Rather I was trying to explain why I arrive at the “go via sdist” route, because I start at “How can we eliminate additional routes a package takes from “VCS” to “Installed”” and since I don’t think we can get rid of sdist, then my mind immediately goes to “well, can we make everything go through sdist?”.
the question then becomes do we want to try and push things towards only having *one* primary flow through the state machine of Python’s packaging, or do we want to support transitions that allow you to “skip” steps.
My idealised view of the state machine is something like this:
wheel <-- source tree <--> sdist
I agree that there's a problem with losing important data when you go [source tree --> sdist --> source tree] - in fact this is one of the pain points I was trying to avoid with flit. But I don't like the idea of solving that by saying that all wheels must have passed through an sdist; it feels like a redundant there-and-back-again journey.
So how else could we tackle the systematic problem? It's definitely a good idea to ensure that [stree --> sdist --> stree --> wheel] doesn't miss out anything that [stree --> wheel] includes, but I'd focus on doing this in developer tools, e.g.:
1. Tools such as flit could check it when you're building a release 2. Tools running on CI services could build both and compare them 3. Bots could scan PyPI for projects with both a .whl and a .tar.gz, build a wheel from the tarball, compare them, and notify the maintainer if there's a problem.
In the short term, I reckon that 2 is the most promising - we can make a convenient pip-installable tool and promote it as good practice for testing that your builds work. But in any case, I see a range of options for tackling this while leaving open the direct [stree --> wheel] pathway.
Yea, I absolutely don’t think going through sdist is the *only* way to tackle the problem. It’s attractive to me because in my mind it is entirely automatic so doesn’t require a hypothetical developer to learn another tool and setup infrastructure etc to handle it. The common stumbling block I see people (new and experienced alike) is when ``pip install .`` and ``pip install foo-1.0.tar.gz`` result in something different. Focusing on the developer side provides tooling that helps them detect when they’ve done something that might trigger that, but doesn’t actively prevent it. A similar-ish scenario is I hope to in the future be able to start validating the rendering of long_description on PyPI on upload, and rejecting for invalid syntax, because while readme_renderer exists and people can use it (and it lets them detect problems earlier on) forcing all uploads to PyPI to essentially have their long_description checked completely side steps that class of problems from reoccurring. If things don’t go the way I would prefer and we decide that we’re going to just deal with the problems that “many paths” creates (because as a collective, we liked the tradeoffs better) then I think that (2) is likely to be a good “second best” solution in my mind.
When I looked at flit it also suffered the same problem if you forgot to commit a file to the VCS repository (which meant it wouldn’t get added to the sdist)
You have to explicitly ignore a file to hit this. If you have untracked but non-ignored files in your repo, flit will refuse to build an sdist at all. I recognise that this is quite strict and still doesn't entirely prevent the issue, and I may refine it in the future, but I hope it makes such problems hard to hit accidentally.
Ah yes, I think I saw that chunk of code but it didn’t fully register what the effect of it was going to be. So I’ll still assert that this isn’t a problem that is specific to distutils/setuptools but that flit itself does make it harder to hit than I originally thought. [1] I know we have —upgrade-strategy, but that is intended to go away after the transition period of switching our default upgrade behavior is over. — Donald Stufft
On Thu, Jul 6, 2017, at 06:19 PM, Donald Stufft wrote:
I *think* if we had some way to signal expected failure vs unexpected failure this would be reasonable to me. I wouldn’t just want it to flat out be any failure, but if we used Nathaniels NotImplemented idea or something similar to indicate that “hey, I can’t build an sdist here for expected reasons” compared to “Hey I tried to build the sdist, but something went wrong” I think that would be workable. I'd prefer that it catches any failure, prints a warning and carries on to the fallback, but I can see where you're coming from, and I can live with this if I can trigger the fallback when the VCS is not available. A similar-ish scenario is I hope to in the future be able to start validating the rendering of long_description on PyPI on upload, and rejecting for invalid syntax, because while readme_renderer exists and people can use it (and it lets them detect problems earlier on) forcing all uploads to PyPI to essentially have their long_description checked completely side steps that class of problems from reoccurring. I'm with you on this, and flit actually already checks this before upload (though it's a warning, rather than an error). But the difference in this case is that it's never an inconvenience for a downstream user - it only affects the person uploading to PyPI, who can presumably fix it. Enforcing things when installing from source affects users who can't fix any issues it highlights. Thomas
On 6 July 2017 at 17:35, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
Of course, I also have a vested interest in things not working this way: I would get a steady trickle of people asking "why does flit require a VCS to install from source?" From my perspective, it doesn't require that, but I would be unable to 'fix' it.
That's a good point - and provides a good contrast to my perspective as a pip developer that *pip* gets issues raised that aren't really pip's problem. I think it's in everyone's best interests to ensure that the user's experience is as unambiguous as possible in saying where any given problem lies. One thought occurs to me in that context - in my view, we should be clearly presenting to the user that it's *pip's* role to do the install, and flit's responsibility is to build wheels. I know that flit includes an install command, but I view that as a temporary workaround for the fact that PEP 517 isn't implemented yet. I'd be interested to know if you agree with that. See below as to my view on how the responsibility for "needing a VCS to install from source" follows from that. But essentially, we're promoting "pip install <whatever>" as the canonical install command, and "pip wheel <whatever>" as the canonical "build a wheel" command - backend specific commands should be for specialised use only, as I see it.
My idealised view of the state machine is something like this:
wheel <-- source tree <--> sdist
Personally I wouldn't have a major problem with this, although I don't think Donald would agree, as there's questions that he's raised around potential inconsistencies between sdists and wheels built direct from the source tree that are unanswered in this model. My biggest concern, though, is that if we take this view, then it's critical that we have a reliable and efficient means of *copying* source trees. Specifically: 1. By reliable, I mean that wheels built from the original and the copy must be identical. And that if the original supports building a sdist, then by implication wheels created via the source tree -> sdist -> wheel route must be identical to both. 2. By efficient, I mean that copying the directory isn't sufficient, because we already know that has unacceptable overheads in the presence of VCS data and things like .tox directories. The question of build isolation definitely requires a means to copy a source tree, but I don't want to get tied up with that debate here - I simply think that *not* being able to copy a source tree is going to be a problem at some point, and we should design the interface to avoid that problem. All the business over the "prepare files for sdist" hooks, and the "create sdist and unpack it" approaches is basically trying to address the question of how we duplicate an arbitrary source tree. With this arrangement, it's clearly pip's responsibility to do an install from whatever source the user provides. The only requirement on backends like flit is that we have a way to copy source trees, and I don't think you have an issue with that. The copy is only required to be sufficient to build a wheel, not a sdist. (At least for now, as we don't currently promote a canonical command to build sdists). Tox may have more stringent requirements - currently it requires the ability to build a sdist to install from, and I'm inclined to think that this is a deliberate design choice rather than merely a convenience. I'm guessing that no-one has particularly explored the question of how tox would interact with flit-based projects yet? Would it be acceptable to say that tox only works on a full checkout with VCS tools present (i.e, what flit needs to build a sdist) for flit-based projects? I don't really know.
I agree that there's a problem with losing important data when you go [source tree --> sdist --> source tree] - in fact this is one of the pain points I was trying to avoid with flit. But I don't like the idea of solving that by saying that all wheels must have passed through an sdist; it feels like a redundant there-and-back-again journey.
So how else could we tackle the systematic problem? It's definitely a good idea to ensure that [stree --> sdist --> stree --> wheel] doesn't miss out anything that [stree --> wheel] includes, but I'd focus on doing this in developer tools, e.g.:
1. Tools such as flit could check it when you're building a release 2. Tools running on CI services could build both and compare them 3. Bots could scan PyPI for projects with both a .whl and a .tar.gz, build a wheel from the tarball, compare them, and notify the maintainer if there's a problem.
Ideally, I'd say that the best way of addressing this is not to duplicate or discard information. But flit can make its own choices here. There's some overlap with the PEP, in the sense that we need the defined interface to not be actively hostile to frontends or other tools that want to maintain some level of invariant in terms of it doesn't matter what route is taken to produce the wheel, the result will be the same. That's why I'm now focusing on ensuring we have some means of enabling source tree copying. Personally, I'm not a fan of after-the-fact checking like you describe above. My specific concerns are (in reverse order of your points): 3. Reporting problems on PyPI is basically too late. There's already a broken release published. 2. Tools on CI are OK, but we can't guarantee that projects would run them - there's an education and publicity issue around making people aware of the need. 1. Having backends check is not bad, but I'm concerned about mandating a particular release process. But I don't mind deferring the question of how we validate (after all, we don't currently have any such tools) as long as it's understood that backends shouldn't lose data needed to build *wheels* (I think we can live with needing a specific setup - what I referred to as a "publishing tree" previously - to build sdists). Paul
On Thu, Jul 6, 2017, at 07:19 PM, Paul Moore wrote:
That's a good point - and provides a good contrast to my perspective as a pip developer that *pip* gets issues raised that aren't really pip's problem. I think it's in everyone's best interests to ensure that the user's experience is as unambiguous as possible in saying where any given problem lies.
Working on Jupyter, I've also seen plenty of misattributed issues from people who think that we make the whole ecosystem they're using in the notebook. I'm all for making it as unambiguous as possible, but I also believe that in many cases it's impossible to be totally clear which part has gone wrong, especially for users unfamiliar with the stack. So my priority is to minimise user-facing failures, because we're all likely to get bug reports.
One thought occurs to me in that context - in my view, we should be clearly presenting to the user that it's *pip's* role to do the install, and flit's responsibility is to build wheels. I know that flit includes an install command, but I view that as a temporary workaround for the fact that PEP 517 isn't implemented yet. I'd be interested to know if you agree with that.
Yes-ish. There are three parts: - flit installfrom (e.g. to install from a Github repo): entirely a stopgap measure, I'm very happy to pass this responsibility over to pip. - flit install (local install): I'll probably recommend pip once that works, but may leave it in place a bit longer. It already works by building a wheel and asking pip to install it. - flit install --symlink (development install): stays around at least until there's a standardised approach for this that I'm happy with.
But essentially, we're promoting "pip install <whatever>" as the canonical install command, and "pip wheel <whatever>" as the canonical "build a wheel" command - backend specific commands should be for specialised use only, as I see it.
Depending on exactly what you mean by 'specialised'. I don't see flit as 'a PEP 517 backend', but rather 'a tool which provides a PEP 517 backend'. I will continue to recommend that developers invoke flit directly to build and publish packages, but it should be transparent to typical downstream users.
Personally I wouldn't have a major problem with this, although I don't think Donald would agree, as there's questions that he's raised around potential inconsistencies between sdists and wheels built direct from the source tree that are unanswered in this model. My biggest concern, though, is that if we take this view, then it's critical that we have a reliable and efficient means of *copying* source trees.
So we have two alternative proposals for this bit: 1. Try to make an sdist, fall back to copytree if not supported + Provides some measure of built-in checking for the sdist problem + Reuses existing sdist machinery - Fallback may be slow 2. Separate hook for efficient copy + Single mechanism is more predictable than primary+fallback + Reliably efficient - Requires extra backend code I'm willing to implement the necessary for either (but preferably not both!). I think 2 is perhaps a bit more user friendly - it's not going to be inexplicably slow because you've hit the fallback case.
Tox may have more stringent requirements - currently it requires the ability to build a sdist to install from, and I'm inclined to think that this is a deliberate design choice rather than merely a convenience. I'm guessing that no-one has particularly explored the question of how tox would interact with flit-based projects yet?
I don't think so.
Would it be acceptable to say that tox only works on a full checkout with VCS tools present (i.e, what flit needs to build a sdist) for flit-based projects? I don't really know.
I'd be willing to explore whether we can do better than that, but I see tox as a developer tool, so I wouldn't consider it a show-stopper if it required the presence of a VCS. Thomas
It might be more natural to pass a build directory for intermediate build artefacts along with the wheel output directory to the build wheel hook. This would remove pip from an awkward position of managing a copy step in the middle of a build and would be more like out of tree builds in other build systems. For example in automake you do out of tree builds by making a new build directory and running the configure script from that directory instead of the source directory. With a fresh directory old builds don't get in the way. On Thu, Jul 6, 2017, 15:37 Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Thu, Jul 6, 2017, at 07:19 PM, Paul Moore wrote:
That's a good point - and provides a good contrast to my perspective as a pip developer that *pip* gets issues raised that aren't really pip's problem. I think it's in everyone's best interests to ensure that the user's experience is as unambiguous as possible in saying where any given problem lies.
Working on Jupyter, I've also seen plenty of misattributed issues from people who think that we make the whole ecosystem they're using in the notebook. I'm all for making it as unambiguous as possible, but I also believe that in many cases it's impossible to be totally clear which part has gone wrong, especially for users unfamiliar with the stack. So my priority is to minimise user-facing failures, because we're all likely to get bug reports.
One thought occurs to me in that context - in my view, we should be clearly presenting to the user that it's *pip's* role to do the install, and flit's responsibility is to build wheels. I know that flit includes an install command, but I view that as a temporary workaround for the fact that PEP 517 isn't implemented yet. I'd be interested to know if you agree with that.
Yes-ish. There are three parts:
- flit installfrom (e.g. to install from a Github repo): entirely a stopgap measure, I'm very happy to pass this responsibility over to pip. - flit install (local install): I'll probably recommend pip once that works, but may leave it in place a bit longer. It already works by building a wheel and asking pip to install it. - flit install --symlink (development install): stays around at least until there's a standardised approach for this that I'm happy with.
But essentially, we're promoting "pip install <whatever>" as the canonical install command, and "pip wheel <whatever>" as the canonical "build a wheel" command - backend specific commands should be for specialised use only, as I see it.
Depending on exactly what you mean by 'specialised'. I don't see flit as 'a PEP 517 backend', but rather 'a tool which provides a PEP 517 backend'. I will continue to recommend that developers invoke flit directly to build and publish packages, but it should be transparent to typical downstream users.
Personally I wouldn't have a major problem with this, although I don't think Donald would agree, as there's questions that he's raised around potential inconsistencies between sdists and wheels built direct from the source tree that are unanswered in this model. My biggest concern, though, is that if we take this view, then it's critical that we have a reliable and efficient means of *copying* source trees.
So we have two alternative proposals for this bit:
1. Try to make an sdist, fall back to copytree if not supported + Provides some measure of built-in checking for the sdist problem + Reuses existing sdist machinery - Fallback may be slow
2. Separate hook for efficient copy + Single mechanism is more predictable than primary+fallback + Reliably efficient - Requires extra backend code
I'm willing to implement the necessary for either (but preferably not both!). I think 2 is perhaps a bit more user friendly - it's not going to be inexplicably slow because you've hit the fallback case.
Tox may have more stringent requirements - currently it requires the ability to build a sdist to install from, and I'm inclined to think that this is a deliberate design choice rather than merely a convenience. I'm guessing that no-one has particularly explored the question of how tox would interact with flit-based projects yet?
I don't think so.
Would it be acceptable to say that tox only works on a full checkout with VCS tools present (i.e, what flit needs to build a sdist) for flit-based projects? I don't really know.
I'd be willing to explore whether we can do better than that, but I see tox as a developer tool, so I wouldn't consider it a show-stopper if it required the presence of a VCS.
Thomas _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Thu, Jul 6, 2017, at 10:40 PM, Daniel Holth wrote:
It might be more natural to pass a build directory for intermediate build artefacts along with the wheel output directory to the build wheel hook. This would remove pip from an awkward position of managing a copy step in the middle of a build and would be more like out of tree builds in other build systems. For example in automake you do out of tree builds by making a new build directory and running the configure script from that directory instead of the source directory. With a fresh directory old builds don't get in the way. I would also be happy with this. Though if you're trusting the backend to do a tidy build, do you need to pass in a directory for intermediates at all? The backend could just create a temporary directory itself. I think Paul & Donald have been pretty adamantly against trusting backends to build tidily, though. And this certainly doesn't do anything like Donald wants to ensure that sdists don't omit key files.
On 6 July 2017 at 22:54, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
I think Paul & Donald have been pretty adamantly against trusting backends to build tidily, though
On reflection, I'm less concerned about this than I was. If you wanted to propose a stripped down version of PEP 517 which assumed it was the backend's responsibility to ensure reproducible isolated builds, I'd be willing to listen. But the proposal would need to include some pretty strong requirements on precisely what we're asking of backends - if build isolation is pip's problem to solve, then I'm happy for us (the pip devs) to take that responsibility, and agree hooks that we need to do so, but if we're assuming backends handle it for us, I think we need to document clearly what we're assuming (because frankly, the pip devs are the ones with the experience of the potential issues). My concern here is the same one as always. People raising issues where they get a failed install, and it's because there was some sort of out of date artifact in the source directory. No-one *expects* this to happen, but we (the pip devs) are a bit paranoid about the possibility because we've had to deal with so many "this shouldn't have happened" issues around builds in the past. Admittedly, those issues are typically with the arcane depths of setuptools, and with projects that are being overly clever in setup.py, but it has left us with a bad "if someone can mess things up, someone will" attitude. I'm reasonably comfortable that a backend like flit that is limited to pure-python projects, should be OK to ensure clean builds. There's not *much* you can do to mess up zipping up some Python files. But once C compilers and the like get added to the mix, it gets much harder to ensure that a stray object file in the wrong place doesn't mess things up. Add some of the really nasty stuff that the scientific people seem to need, and I've no idea how anyone could guarantee clean builds without creating a brand new tree. And yet the scientific people are also the ones who want inplace incremental builds to mitigate their really long build times. So there's pressure there to *not* copy for the sake of faster builds. So I'm not as against trusting backends as I was. But I am concerned that we make sure that the backend authors clearly understand the problem they are responsible for (and this is a case where flit is *not* a good example of a backend, as it avoids the nasty cases by design). Paul
On Thu, Jul 6, 2017, at 11:51 PM, Paul Moore wrote:
On reflection, I'm less concerned about this than I was. If you wanted to propose a stripped down version of PEP 517 which assumed it was the backend's responsibility to ensure reproducible isolated builds, I'd be willing to listen. But the proposal would need to include some pretty strong requirements on precisely what we're asking of backends - if build isolation is pip's problem to solve, then I'm happy for us (the pip devs) to take that responsibility, and agree hooks that we need to do so, but if we're assuming backends handle it for us, I think we need to document clearly what we're assuming (because frankly, the pip devs are the ones with the experience of the potential issues).
How does this sound to you: """ If build_directory is not None, it is a unicode string containing the path to a directory where intermediate build artifacts may be stored. This may be empty, or it may contain artifacts from a previous build to be used as a cache. The backend is responsible for determining whether any cached artifacts are outdated. When a build_directory is provided, the backend should not create or modify any files in the source directory (the working directory where the hook is called). If the backend cannot reliably avoid modifying the directory it builds from, it should copy any files it needs to build_directory and perform the build there. If build_directory is None, the backend may do an 'in place' build which modifies the source directory. The semantics of this are not specified here. In either case, the backend may also store intermediates in other cache locations or temporary directories, which it is responsible for managing. The presence or absence of any caches should not make a material difference to the final result of the build. """
this is a case where flit is *not* a good example of a backend, as it avoids the nasty cases by design).
Agreed. Enscons is a better example, though, and Daniel seems confident that out-of-tree builds are feasible. Thomas
On 7 July 2017 at 08:59, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Thu, Jul 6, 2017, at 11:51 PM, Paul Moore wrote:
On reflection, I'm less concerned about this than I was. If you wanted to propose a stripped down version of PEP 517 which assumed it was the backend's responsibility to ensure reproducible isolated builds, I'd be willing to listen. But the proposal would need to include some pretty strong requirements on precisely what we're asking of backends - if build isolation is pip's problem to solve, then I'm happy for us (the pip devs) to take that responsibility, and agree hooks that we need to do so, but if we're assuming backends handle it for us, I think we need to document clearly what we're assuming (because frankly, the pip devs are the ones with the experience of the potential issues).
How does this sound to you:
""" If build_directory is not None, it is a unicode string containing the path to a directory where intermediate build artifacts may be stored. This may be empty, or it may contain artifacts from a previous build to be used as a cache. The backend is responsible for determining whether any cached artifacts are outdated. When a build_directory is provided, the backend should not create or modify any files in the source directory (the working directory where the hook is called). If the backend cannot reliably avoid modifying the directory it builds from, it should copy any files it needs to build_directory and perform the build there.
If build_directory is None, the backend may do an 'in place' build which modifies the source directory. The semantics of this are not specified here.
In either case, the backend may also store intermediates in other cache locations or temporary directories, which it is responsible for managing. The presence or absence of any caches should not make a material difference to the final result of the build. """
That sounds pretty good. One reservation I have (I told you I was paranoid :-)) is that it's not *just* "If the backend cannot reliably avoid modifying the directory it builds from" that's the issue. There's also the possibility that the user may have some "junk" files in the build directory that affect the build, which shouldn't be there. An untested example, which is not at all likely in practice but illustrates my point, would be if a developer was testing their code and dumped a temporary zipfile.py in the project directory to see how the code handled corrupted/overridden stdlib modules. After finishing the test, they forget to clean up and then do "pip wheel .". With an in-place build, flit picks up the zipfile.py instead of the stdlib one, and fails with a potentially very confusing error message. As I say, that's a very contrived scenario, but it does illustrate how "unexpected junk in the source tree that wouldn't be there in the make sdist and unpack route" could affect build results. But apart from this very minor niggle, I like your wording.
this is a case where flit is *not* a good example of a backend, as it avoids the nasty cases by design).
Agreed. Enscons is a better example, though, and Daniel seems confident that out-of-tree builds are feasible.
Agreed. I don't really know anything about enscons, but I'm happy to accept Daniel's view. Paul
On Fri, Jul 7, 2017 at 12:59 AM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Thu, Jul 6, 2017, at 11:51 PM, Paul Moore wrote:
On reflection, I'm less concerned about this than I was. If you wanted to propose a stripped down version of PEP 517 which assumed it was the backend's responsibility to ensure reproducible isolated builds, I'd be willing to listen. But the proposal would need to include some pretty strong requirements on precisely what we're asking of backends - if build isolation is pip's problem to solve, then I'm happy for us (the pip devs) to take that responsibility, and agree hooks that we need to do so, but if we're assuming backends handle it for us, I think we need to document clearly what we're assuming (because frankly, the pip devs are the ones with the experience of the potential issues).
How does this sound to you:
""" If build_directory is not None, it is a unicode string containing the path to a directory where intermediate build artifacts may be stored. This may be empty, or it may contain artifacts from a previous build to be used as a cache. The backend is responsible for determining whether any cached artifacts are outdated. When a build_directory is provided, the backend should not create or modify any files in the source directory (the working directory where the hook is called). If the backend cannot reliably avoid modifying the directory it builds from, it should copy any files it needs to build_directory and perform the build there.
I think this is a really interesting idea, but it makes me very nervous that we're starting design work on novel features when we still haven't finalized a basic build_wheel hook. PEP 517 was written in 2015... This proposal creates substantial complications for build systems that default to doing in-place builds, which is almost all existing system from 'make' onwards. Legacy build systems often can't do out-of-tree builds at all (e.g. consider the case where you have a vendored C library whose build system you want to einvoke as part of your build). Is this a problem? The benefits are potentially large, but are they worth it if they increase the barrier to entry for new build systems? I'm not sure how to tell. And in the mean time, pip is still unconditionally calling copytree before even looking at the source tree... Is it absolutely necessary to get this into the first PEP? -n -- Nathaniel J. Smith -- https://vorpus.org
On 7 July 2017 at 11:30, Nathaniel Smith <njs@pobox.com> wrote:
Is it absolutely necessary to get this into the first PEP?
As far as I'm concerned, it's no more than a restating (and simplification?) of all the discussions around building out of tree via creating an sdist and unpacking it, or having the various prepare files hooks. It's always been there and always been a requirement.
This proposal creates substantial complications for build systems that default to doing in-place builds
Well, we're focused on build systems that will get a PEP 517 interface, so far the only concrete examples we have are flit (which has no problem with this), enscons (which has no problem with it) and probably setuptools, in some form (which needs out-of-tree builds based on our experience with it in pip, although I concede that you would argue that point). Also, out of tree builds are something that pip is planning on implementing, and for us it's therefore necessary to have hooks that let us do that. The current version of the PEP provides those hooks, and we're planning on using them to do out of tree builds. This new suggestion is basically backend authors saying "you don't need to do the out of place builds, we're willing to take responsibility for that". If you're saying that you're not happy with that, then all that will end up happening is we revert to the previous approach, and pip implements out of tree builds based on that (and I guess you have the same argument all over again on the PR for the pip change...) Paul
Also, please note that the proposal doesn't *prohibit* in-place builds, quite the opposite, it allows backends to decide when asked how to implement both in-place and out of place builds (where the current tree allows backends to decide how to do in place builds and how to copy trees, and leaves the frontend to decide how to implement out of place builds, typically via something like a tree copy and subsequent in-place build). On 7 July 2017 at 12:05, Paul Moore <p.f.moore@gmail.com> wrote:
On 7 July 2017 at 11:30, Nathaniel Smith <njs@pobox.com> wrote:
Is it absolutely necessary to get this into the first PEP?
As far as I'm concerned, it's no more than a restating (and simplification?) of all the discussions around building out of tree via creating an sdist and unpacking it, or having the various prepare files hooks. It's always been there and always been a requirement.
This proposal creates substantial complications for build systems that default to doing in-place builds
Well, we're focused on build systems that will get a PEP 517 interface, so far the only concrete examples we have are flit (which has no problem with this), enscons (which has no problem with it) and probably setuptools, in some form (which needs out-of-tree builds based on our experience with it in pip, although I concede that you would argue that point).
Also, out of tree builds are something that pip is planning on implementing, and for us it's therefore necessary to have hooks that let us do that. The current version of the PEP provides those hooks, and we're planning on using them to do out of tree builds. This new suggestion is basically backend authors saying "you don't need to do the out of place builds, we're willing to take responsibility for that". If you're saying that you're not happy with that, then all that will end up happening is we revert to the previous approach, and pip implements out of tree builds based on that (and I guess you have the same argument all over again on the PR for the pip change...)
Paul
FYI distutils supports out of tree builds too. It is the -b argument to 'setup.py build'. On Fri, Jul 7, 2017 at 7:08 AM Paul Moore <p.f.moore@gmail.com> wrote:
Also, please note that the proposal doesn't *prohibit* in-place builds, quite the opposite, it allows backends to decide when asked how to implement both in-place and out of place builds (where the current tree allows backends to decide how to do in place builds and how to copy trees, and leaves the frontend to decide how to implement out of place builds, typically via something like a tree copy and subsequent in-place build).
On 7 July 2017 at 12:05, Paul Moore <p.f.moore@gmail.com> wrote:
On 7 July 2017 at 11:30, Nathaniel Smith <njs@pobox.com> wrote:
Is it absolutely necessary to get this into the first PEP?
As far as I'm concerned, it's no more than a restating (and simplification?) of all the discussions around building out of tree via creating an sdist and unpacking it, or having the various prepare files hooks. It's always been there and always been a requirement.
This proposal creates substantial complications for build systems that default to doing in-place builds
Well, we're focused on build systems that will get a PEP 517 interface, so far the only concrete examples we have are flit (which has no problem with this), enscons (which has no problem with it) and probably setuptools, in some form (which needs out-of-tree builds based on our experience with it in pip, although I concede that you would argue that point).
Also, out of tree builds are something that pip is planning on implementing, and for us it's therefore necessary to have hooks that let us do that. The current version of the PEP provides those hooks, and we're planning on using them to do out of tree builds. This new suggestion is basically backend authors saying "you don't need to do the out of place builds, we're willing to take responsibility for that". If you're saying that you're not happy with that, then all that will end up happening is we revert to the previous approach, and pip implements out of tree builds based on that (and I guess you have the same argument all over again on the PR for the pip change...)
Paul
Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Fri, Jul 7, 2017 at 9:23 AM Daniel Holth <dholth@gmail.com> wrote:
FYI distutils supports out of tree builds too. It is the -b argument to 'setup.py build'.
And it works in bdist_wheel by adding half a dozen lines. It copies the -b argument to the 'build' subcommand, so effectively you can put the ./build/ directory wherever you want. This will probably also solve other problems people sometimes have with 'unclean' bdist_wheel builds. As a bonus it no longer works in Python 2.6. +1 on Thomas' added wording.
On Fri, Jul 7, 2017 at 6:23 AM, Daniel Holth <dholth@gmail.com> wrote:
FYI distutils supports out of tree builds too. It is the -b argument to 'setup.py build'.
In theory, yes, but in practice, there are lots of setup.py files out there that mutate the source directory. For example, every project using Cython. I don't think a distutils/setuptools backend could comply with Thomas's wording except by doing a copytree or sdist or something just-in-case. -n -- Nathaniel J. Smith -- https://vorpus.org
On 8 July 2017 at 13:36, Nathaniel Smith <njs@pobox.com> wrote:
On Fri, Jul 7, 2017 at 6:23 AM, Daniel Holth <dholth@gmail.com> wrote:
FYI distutils supports out of tree builds too. It is the -b argument to 'setup.py build'.
In theory, yes, but in practice, there are lots of setup.py files out there that mutate the source directory. For example, every project using Cython.
I don't think a distutils/setuptools backend could comply with Thomas's wording except by doing a copytree or sdist or something just-in-case.
Which is fine - the point of making it a parameter instead of a separate hook is that the folks in the best position to know whether or not a particular build system has native out-of-tree build support are the developers of that particular PEP 517 backend. Since we know Scons supports this natively (by way of the "variant_dir" setting), I went and checked some of the other build systems that folks may end up wanting to develop backends for. Native build system support for out-of-tree builds, so developers of these shims should be able to delegate handling "build_directory": - Scons: set variant_dir appropriately (http://scons.org/doc/production/HTML/scons-user.html#chap-separate) - meson: *only* supports out-of-tree builds (http://mesonbuild.com/Using-multiple-build-directories.html) - waf: configure the output directory appropriately (https://waf.io/book/#_custom_build_outputs) - premake: set targetdir appropriately (https://github.com/premake/premake-core/wiki/targetdir) - CMake: simplest way seems to be to save the initial working directory as the source directory, cd to the build directory, then run "cmake <source_directory>" - yotta; based on CMake, so same approach applies - autotools/make: similar approach to CMake, but running "<source_directory>/configure && make <target>" from the build directory - maven: require that projects using the shim support a configurable build dir (https://stackoverflow.com/questions/13173063/out-of-tree-build-with-maven-is...) - ant: also uses pom.xml files, so same approach as maven applies - cargo: set CARGO_TARGET_DIR (http://doc.crates.io/environment-variables.html#environment-variables-cargo-...) Explicitly willing to add native out-of-tree wheel creation based on PEP 517's requirements: - flit Mostly assume in-place builds, no generic support for out-of-tree builds that I can find, so developers of these shims will need to work out how to handle "build_directory" (probably by copying the relevant input files into the specified build directory and then doing an in-place build, similar to the way Scons handles "variant_dir" by default): - setuptools/distutils - gyp/ninja (including node-gyp) - gradle So I think the folks saying "We don't think a separate hook for build directory preparation is the right abstraction" have a strong point, as that approach doesn't appear to map cleanly to the design of *any* popular build system for precompiled languages. By contrast, "build_directory" as an optional setting for the binary build command *does* map cleanly as a concept in many cases, and for the ones where it doesn't, I readily found questions from other folks also wanting to know how to do it, and PEP 517 interface shims copying the files they care about then doing an in-place build is a perfectly acceptable technical solution. And as an added bonus, if a backend starts out by blindly doing "shutil.copytree(source_directory)" for out-of-tree builds, then the potential performance implications of that approach will become more clearly a discussion between the projects using the backend and the developers of the backend, rather than being something that frontends like pip will need to worry about handling in the general case. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Big +1 for passing the option to the backend rather than making the frontend figure out how to do each backend’s job for them. And add MSBuild to the list, which trivially supports separate source, intermediate and output directories, as well as incremental rebuilds. (And recently got full x-plat support, so could potentially be useful on all platforms.) Cheers, Steve Top-posted from my Windows phone From: Nick Coghlan Sent: Saturday, July 8, 2017 7:59 To: Nathaniel Smith Cc: distutils-sig Subject: Re: [Distutils] A possible refactor/streamlining of PEP 517 On 8 July 2017 at 13:36, Nathaniel Smith <njs@pobox.com> wrote:
On Fri, Jul 7, 2017 at 6:23 AM, Daniel Holth <dholth@gmail.com> wrote:
FYI distutils supports out of tree builds too. It is the -b argument to 'setup.py build'.
In theory, yes, but in practice, there are lots of setup.py files out there that mutate the source directory. For example, every project using Cython.
I don't think a distutils/setuptools backend could comply with Thomas's wording except by doing a copytree or sdist or something just-in-case.
Which is fine - the point of making it a parameter instead of a separate hook is that the folks in the best position to know whether or not a particular build system has native out-of-tree build support are the developers of that particular PEP 517 backend. Since we know Scons supports this natively (by way of the "variant_dir" setting), I went and checked some of the other build systems that folks may end up wanting to develop backends for. Native build system support for out-of-tree builds, so developers of these shims should be able to delegate handling "build_directory": - Scons: set variant_dir appropriately (http://scons.org/doc/production/HTML/scons-user.html#chap-separate) - meson: *only* supports out-of-tree builds (http://mesonbuild.com/Using-multiple-build-directories.html) - waf: configure the output directory appropriately (https://waf.io/book/#_custom_build_outputs) - premake: set targetdir appropriately (https://github.com/premake/premake-core/wiki/targetdir) - CMake: simplest way seems to be to save the initial working directory as the source directory, cd to the build directory, then run "cmake <source_directory>" - yotta; based on CMake, so same approach applies - autotools/make: similar approach to CMake, but running "<source_directory>/configure && make <target>" from the build directory - maven: require that projects using the shim support a configurable build dir (https://stackoverflow.com/questions/13173063/out-of-tree-build-with-maven-is...) - ant: also uses pom.xml files, so same approach as maven applies - cargo: set CARGO_TARGET_DIR (http://doc.crates.io/environment-variables.html#environment-variables-cargo-...) Explicitly willing to add native out-of-tree wheel creation based on PEP 517's requirements: - flit Mostly assume in-place builds, no generic support for out-of-tree builds that I can find, so developers of these shims will need to work out how to handle "build_directory" (probably by copying the relevant input files into the specified build directory and then doing an in-place build, similar to the way Scons handles "variant_dir" by default): - setuptools/distutils - gyp/ninja (including node-gyp) - gradle So I think the folks saying "We don't think a separate hook for build directory preparation is the right abstraction" have a strong point, as that approach doesn't appear to map cleanly to the design of *any* popular build system for precompiled languages. By contrast, "build_directory" as an optional setting for the binary build command *does* map cleanly as a concept in many cases, and for the ones where it doesn't, I readily found questions from other folks also wanting to know how to do it, and PEP 517 interface shims copying the files they care about then doing an in-place build is a perfectly acceptable technical solution. And as an added bonus, if a backend starts out by blindly doing "shutil.copytree(source_directory)" for out-of-tree builds, then the potential performance implications of that approach will become more clearly a discussion between the projects using the backend and the developers of the backend, rather than being something that frontends like pip will need to worry about handling in the general case. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 8 July 2017 at 15:58, Nick Coghlan <ncoghlan@gmail.com> wrote:
Mostly assume in-place builds, no generic support for out-of-tree builds that I can find, so developers of these shims will need to work out how to handle "build_directory" (probably by copying the relevant input files into the specified build directory and then doing an in-place build, similar to the way Scons handles "variant_dir" by default):
- setuptools/distutils
Following up based on Daniel's clarification thread: it turns out setuptools/distutils falls into the same category of "opt-in native out-of-tree build support" as autotools, CMake, maven, Scons, etc. That means that a setuptools/distutils PEP 517 backend chosen specifically by the publisher (as opposed to implicitly by the front end as a fallback) would be able to rely on "setup.py build -b <build_directory>" to do the out-of-tree builds rather than having to copy the input files over. Alternatively, it could work the same way Scons does, which is to copy the input files by default, and have an off-switch that allows the publisher to say "we've ensured that out-of-tree builds work properly, so skip the copy step". So I think we have pretty solid evidence that the reason the procedural "build directory preparation" hook wasn't sitting well with people was because that isn't the way build systems typically model the concept, while a "build directory" setting is very common (even if that "setting" is "the current working directory when configuring or running the build"). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Mon, Jul 10, 2017, at 07:01 AM, Nick Coghlan wrote:
So I think we have pretty solid evidence that the reason the procedural "build directory preparation" hook wasn't sitting well with people was because that isn't the way build systems typically model the concept, while a "build directory" setting is very common (even if that "setting" is "the current working directory when configuring or running the build").
Hooray! :-) Do we want to also provide a build_directory for the build_sdist hook? In principle, I don't think making an sdist should involve a build step, but I know that some projects do perform steps like cython code gen or JS minification before making the sdist. I think this was a workaround to ease installation before wheel support was widespread, and I'd be inclined to discourage it now, so my preference would be no build_directory parameter for build_sdist. Backends which insist on generating intermediates at that point can make a temp dir themselves. Then I guess that the choice between building a wheel directly and attempting to build an sdist first (with direct fallback) is one for frontends, and doesn't need to be specified. Thomas
One nice thing about providing a “put your work in this directory” setting for all tasks is that only the front end has to know how and where to create it, and how and when to clean it up later. Users may want to configure this across all projects, regardless of the backend in use. Permitting this directory to be the source tree implicitly requires backends to support “in place” builds (i.e. you should put output files in a matching structure under that directory in case it really is the source tree). In this case, front ends need to be responsible for (not) running rmtree and backends should not blindly delete everything (or else they’ll get bug reports from very upset users). Cheers, Steve Top-posted from my Windows phone at EuroPython From: Thomas Kluyver Sent: Monday, July 10, 2017 9:14 To: distutils-sig@python.org Subject: Re: [Distutils] A possible refactor/streamlining of PEP 517 On Mon, Jul 10, 2017, at 07:01 AM, Nick Coghlan wrote:
So I think we have pretty solid evidence that the reason the procedural "build directory preparation" hook wasn't sitting well with people was because that isn't the way build systems typically model the concept, while a "build directory" setting is very common (even if that "setting" is "the current working directory when configuring or running the build").
Hooray! :-) Do we want to also provide a build_directory for the build_sdist hook? In principle, I don't think making an sdist should involve a build step, but I know that some projects do perform steps like cython code gen or JS minification before making the sdist. I think this was a workaround to ease installation before wheel support was widespread, and I'd be inclined to discourage it now, so my preference would be no build_directory parameter for build_sdist. Backends which insist on generating intermediates at that point can make a temp dir themselves. Then I guess that the choice between building a wheel directly and attempting to build an sdist first (with direct fallback) is one for frontends, and doesn't need to be specified. Thomas _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Mon, Jul 10, 2017, 04:54 Steve Dower <steve.dower@python.org> wrote:
One nice thing about providing a “put your work in this directory” setting for all tasks is that only the front end has to know how and where to create it, and how and when to clean it up later. Users may want to configure this across all projects, regardless of the backend in use.
Permitting this directory to be the source tree implicitly requires backends to support “in place” builds (i.e. you should put output files in a matching structure under that directory in case it really is the source tree). In this case, front ends need to be responsible for (not) running rmtree and backends should not blindly delete everything (or else they’ll get bug reports from very upset users).
Not sure about this particular idea. It might have to be a special case and implementation in the backend driver, usually build in a directory and copy file into the source tree.
Cheers,
Steve
Top-posted from my Windows phone at EuroPython
*From: *Thomas Kluyver <thomas@kluyver.me.uk> *Sent: *Monday, July 10, 2017 9:14 *To: *distutils-sig@python.org
*Subject: *Re: [Distutils] A possible refactor/streamlining of PEP 517
On Mon, Jul 10, 2017, at 07:01 AM, Nick Coghlan wrote:
So I think we have pretty solid evidence that the reason the
procedural "build directory preparation" hook wasn't sitting well with
people was because that isn't the way build systems typically model
the concept, while a "build directory" setting is very common (even if
that "setting" is "the current working directory when configuring or
running the build").
Hooray! :-)
Do we want to also provide a build_directory for the build_sdist hook?
In principle, I don't think making an sdist should involve a build step,
but I know that some projects do perform steps like cython code gen or
JS minification before making the sdist. I think this was a workaround
to ease installation before wheel support was widespread, and I'd be
inclined to discourage it now, so my preference would be no
build_directory parameter for build_sdist. Backends which insist on
generating intermediates at that point can make a temp dir themselves.
Then I guess that the choice between building a wheel directly and
attempting to build an sdist first (with direct fallback) is one for
frontends, and doesn't need to be specified.
Thomas
_______________________________________________
Distutils-SIG maillist - Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Mon, Jul 10, 2017 at 7:13 PM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Mon, Jul 10, 2017, at 07:01 AM, Nick Coghlan wrote:
So I think we have pretty solid evidence that the reason the procedural "build directory preparation" hook wasn't sitting well with people was because that isn't the way build systems typically model the concept, while a "build directory" setting is very common (even if that "setting" is "the current working directory when configuring or running the build").
Hooray! :-)
Do we want to also provide a build_directory for the build_sdist hook? In principle, I don't think making an sdist should involve a build step, but I know that some projects do perform steps like cython code gen or JS minification before making the sdist. I think this was a workaround to ease installation before wheel support was widespread, and I'd be inclined to discourage it now, so my preference would be no build_directory parameter for build_sdist. Backends which insist on generating intermediates at that point can make a temp dir themselves.
No preference on yes/no for build_directory for build_sdist hook, but invoking Cython on .pyx files to generate C code rather than checking in generated C code is good practice. I don't think we want to go back to checking in generated code, nor do we want to store generated code in tmpdirs (because that loses the advantage of not having to regenerate when .pyx hashes are identical). Ralf
Then I guess that the choice between building a wheel directly and attempting to build an sdist first (with direct fallback) is one for frontends, and doesn't need to be specified.
Thomas _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 10 July 2017 at 18:56, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, Jul 10, 2017 at 7:13 PM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Mon, Jul 10, 2017, at 07:01 AM, Nick Coghlan wrote:
So I think we have pretty solid evidence that the reason the procedural "build directory preparation" hook wasn't sitting well with people was because that isn't the way build systems typically model the concept, while a "build directory" setting is very common (even if that "setting" is "the current working directory when configuring or running the build").
Hooray! :-)
Do we want to also provide a build_directory for the build_sdist hook? In principle, I don't think making an sdist should involve a build step, but I know that some projects do perform steps like cython code gen or JS minification before making the sdist. I think this was a workaround to ease installation before wheel support was widespread, and I'd be inclined to discourage it now, so my preference would be no build_directory parameter for build_sdist. Backends which insist on generating intermediates at that point can make a temp dir themselves.
No preference on yes/no for build_directory for build_sdist hook, but invoking Cython on .pyx files to generate C code rather than checking in generated C code is good practice. I don't think we want to go back to checking in generated code, nor do we want to store generated code in tmpdirs (because that loses the advantage of not having to regenerate when .pyx hashes are identical).
If the frontend offers a way to specify a particular build directory, then that kind of caching wouldn't be a problem when regenerating files as part of build_wheel - all the build files would be preserved between runs, not only those that made it into the final wheel file. That's a slightly different question than the one Thomas was asking, which is whether or not we want to support out-of-tree builds for sdist creation, and I don't believe we do, since we don't really *want* folks to be adding generated files to their sdist that they aren't keeping under source control - we'd prefer that such activities were postponed to "build_wheel" now that we have separate source and precompiled distribution formats. If publishers still prefer to go down that path, then they can either use mechanisms like .gitignore to cope with the consequences of doing it in-place, or else use ccache-style caching mechanisms that aren't dependent on exactly where you do the build, and instead depend on the hash of the input artifact (e.g. I'd be astonished if cython couldn't implement something like ccache natively for pyx file compilation). Either way, it won't impact of the experience of folks using pre-compiled wheel files, and for at least some use cases for sdists (e.g. Linux distributions), it's typically preferable to regenerate that kind of file anyway. Cheers, NIck. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 2017-07-10 20:33:16 +1000 (+1000), Nick Coghlan wrote: [...]
we don't really *want* folks to be adding generated files to their sdist that they aren't keeping under source control - we'd prefer that such activities were postponed to "build_wheel" now that we have separate source and precompiled distribution formats. [...]
This is a mildly naive view. The software I'm familiar with is actually attempting to reflect metadata _from_ the source revision control _into_ the sdist because while it's "tracked" there it's not tracked as normal files (version information from tags, change logs from the commit history, contributor lists from commit authorship). The metadata in question is lost by just blindly tarring up tracked files, so needs some mechanism to export and inject as untracked files (from the source revision control perspective) for inclusion in the sdist. -- Jeremy Stanley
Sdists contain generated PKG-INFO and .egg-info. I'd prefer to let the build backend manage any sdist build directory. It doesn't provide the same benefit to pip as the configurable wheel build directory. On Mon, Jul 10, 2017, 08:09 Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2017-07-10 20:33:16 +1000 (+1000), Nick Coghlan wrote: [...]
we don't really *want* folks to be adding generated files to their sdist that they aren't keeping under source control - we'd prefer that such activities were postponed to "build_wheel" now that we have separate source and precompiled distribution formats. [...]
This is a mildly naive view. The software I'm familiar with is actually attempting to reflect metadata _from_ the source revision control _into_ the sdist because while it's "tracked" there it's not tracked as normal files (version information from tags, change logs from the commit history, contributor lists from commit authorship). The metadata in question is lost by just blindly tarring up tracked files, so needs some mechanism to export and inject as untracked files (from the source revision control perspective) for inclusion in the sdist. -- Jeremy Stanley _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 10 July 2017 at 22:08, Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2017-07-10 20:33:16 +1000 (+1000), Nick Coghlan wrote: [...]
we don't really *want* folks to be adding generated files to their sdist that they aren't keeping under source control - we'd prefer that such activities were postponed to "build_wheel" now that we have separate source and precompiled distribution formats. [...]
This is a mildly naive view. The software I'm familiar with is actually attempting to reflect metadata _from_ the source revision control _into_ the sdist because while it's "tracked" there it's not tracked as normal files (version information from tags, change logs from the commit history, contributor lists from commit authorship). The metadata in question is lost by just blindly tarring up tracked files, so needs some mechanism to export and inject as untracked files (from the source revision control perspective) for inclusion in the sdist.
Right, we know there will be *some* generated files (as Daniel notes, there will typically at least be the metadata files in a setuptools/distutils generated sdist), as well as potentially modifications to some files based on the version history. That isn't the question though - the question is whether we want to actively support folks moving "compilation" like activities (minification, pyx->C conversion, etc) to the sdist generation stage by adding the optional "build_directory" option to "build_sdist" as well. And that's the part where we decided the answer is "No", we only want to support the following configurations: build_sdist: working directory -> target directory build_wheel: working directory -> target directory working directory -> build directory -> target directory In all cases the frontend provides a target directory that is distinct from the current working directory, so backends have a place to put both generated intermediate artifacts *and* the final assembled archive file. The difference is that in the build_wheel case, the frontend can explicitly say "don't put intermediate artifacts in the working directory *or* the target directory, put them in the build directory". Backends are obviously still free to create their own temporary and caching directories that the frontend doesn't know anything about, but that's up to the backend to worry about. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 10 July 2017 at 14:58, Nick Coghlan <ncoghlan@gmail.com> wrote:
That isn't the question though - the question is whether we want to actively support folks moving "compilation" like activities (minification, pyx->C conversion, etc) to the sdist generation stage by adding the optional "build_directory" option to "build_sdist" as well.
And that's the part where we decided the answer is "No", we only want to support the following configurations:
I'm not sure I follow this comment (or if I do, I don't agree with it :-)). I would expect projects that use Cython to have the step to build the C files from the Cython sources as part of the "build sdist" step, so that the sdist contains standard C sources, and end users can build from sdist with a C compiler installed, but without needing to install Cython. That's the standard approach these days, and I'd hope it will be supported under PEP 517. Paul PS Completely off-topic, but since when has gmail's web interface stopped allowing you to highlight a section of a mail and hit "Reply" to get just that section quoted? It seems to have changed very recently, or is it some setting change I might have accidentally made? It's intensely annoying, as it makes it much more time consuming to avoid top-posting :-( (and please, no suggestions that I use an alternative client, my usage pattern makes that impractical).
A huge benefit to using non distutils build systems is making it easy to generate files at any step. I just don't think it's worth it to force a particular build directory at the generate sdist phase. Foolish consistentcy. On Mon, Jul 10, 2017, 10:20 Paul Moore <p.f.moore@gmail.com> wrote:
On 10 July 2017 at 14:58, Nick Coghlan <ncoghlan@gmail.com> wrote:
That isn't the question though - the question is whether we want to actively support folks moving "compilation" like activities (minification, pyx->C conversion, etc) to the sdist generation stage by adding the optional "build_directory" option to "build_sdist" as well.
And that's the part where we decided the answer is "No", we only want to support the following configurations:
I'm not sure I follow this comment (or if I do, I don't agree with it :-)). I would expect projects that use Cython to have the step to build the C files from the Cython sources as part of the "build sdist" step, so that the sdist contains standard C sources, and end users can build from sdist with a C compiler installed, but without needing to install Cython. That's the standard approach these days, and I'd hope it will be supported under PEP 517.
Paul
PS Completely off-topic, but since when has gmail's web interface stopped allowing you to highlight a section of a mail and hit "Reply" to get just that section quoted? It seems to have changed very recently, or is it some setting change I might have accidentally made? It's intensely annoying, as it makes it much more time consuming to avoid top-posting :-( (and please, no suggestions that I use an alternative client, my usage pattern makes that impractical). _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 11 July 2017 at 00:20, Paul Moore <p.f.moore@gmail.com> wrote:
On 10 July 2017 at 14:58, Nick Coghlan <ncoghlan@gmail.com> wrote:
That isn't the question though - the question is whether we want to actively support folks moving "compilation" like activities (minification, pyx->C conversion, etc) to the sdist generation stage by adding the optional "build_directory" option to "build_sdist" as well.
And that's the part where we decided the answer is "No", we only want to support the following configurations:
I'm not sure I follow this comment (or if I do, I don't agree with it :-)). I would expect projects that use Cython to have the step to build the C files from the Cython sources as part of the "build sdist" step, so that the sdist contains standard C sources, and end users can build from sdist with a C compiler installed, but without needing to install Cython. That's the standard approach these days, and I'd hope it will be supported under PEP 517.
It's supported - what's not being explicitly supported is generating such Cython files in a directory *other than* either the original source directory, or the target directory where the final sdist will also be created. (However, I'd also hope to see that practice start declining in popularity as PEP 517 makes it straightforward to ensure that cython is installed automatically when needed to build from source) Wheel builds are different, since they're expected to generate intermediate artifacts (e.g. object files, debugging symbols, Cython output files not included in the sdist), that *aren't* going to be included in the final wheel file, but may still be useful to keep around to speed up subsequent builds. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Jul 10, 2017 8:59 AM, "Nick Coghlan" <ncoghlan@gmail.com> wrote: On 10 July 2017 at 22:08, Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2017-07-10 20:33:16 +1000 (+1000), Nick Coghlan wrote: [...]
we don't really *want* folks to be adding generated files to their sdist that they aren't keeping under source control - we'd prefer that such activities were postponed to "build_wheel" now that we have separate source and precompiled distribution formats. [...]
This is a mildly naive view. The software I'm familiar with is actually attempting to reflect metadata _from_ the source revision control _into_ the sdist because while it's "tracked" there it's not tracked as normal files (version information from tags, change logs from the commit history, contributor lists from commit authorship). The metadata in question is lost by just blindly tarring up tracked files, so needs some mechanism to export and inject as untracked files (from the source revision control perspective) for inclusion in the sdist.
Right, we know there will be *some* generated files (as Daniel notes, there will typically at least be the metadata files in a setuptools/distutils generated sdist), as well as potentially modifications to some files based on the version history. That isn't the question though - the question is whether we want to actively support folks moving "compilation" like activities (minification, pyx->C conversion, etc) to the sdist generation stage by adding the optional "build_directory" option to "build_sdist" as well. And that's the part where we decided the answer is "No", we only want to support the following configurations: build_sdist: working directory -> target directory build_wheel: working directory -> target directory working directory -> build directory -> target directory In all cases the frontend provides a target directory that is distinct from the current working directory, so backends have a place to put both generated intermediate artifacts *and* the final assembled archive file. When preparing a redistributable archive, we don't want people to first generate difficult or inconvenient artifacts? I always thought that was a major feature of an archive, to reduce the content down to common denominators for verification, reproducibility, and build simplicity, at the expense of not being fully representative of the original build capabilities and likely an irreversible step. I know there's been a lot of discussion here, and I probably missed it, but early on we talked about things like Cython and cffi being part of sdist generation (in setuptools at least). Are these things now expected to be deferred to the wheel building stage, thus adding deps to the final build server, or are we just saying there is less utility in supporting an explicit "artifacts directory" during sdist generation? If the latter (and I think this is the case), that does seem reasonable, I just wanted to confirm that such "compliations" are still expected, or at least permissible, in downstream redistributables (sdists). -- C Anthony
On 11 July 2017 at 00:58, C Anthony Risinger <c@anthonyrisinger.com> wrote:
When preparing a redistributable archive, we don't want people to first generate difficult or inconvenient artifacts? I always thought that was a major feature of an archive, to reduce the content down to common denominators for verification, reproducibility, and build simplicity, at the expense of not being fully representative of the original build capabilities and likely an irreversible step.
My apologies folks, this is an entirely irrelevant tangent brought on by my attempting to explain my own preference that source archives (including sdists) actually *be* source archives, containing solely the original software is *its preferred form for modification*. However, that's a free software, and commercial redistributor centric point of view, where we aren't particularly keen on anyone publishing open source software without also properly declaring all the things that redistributors and end users will need in order to actually modify that software. It's neither enforced nor required by PEP 517, it's just part of my *own* rationale for wanting to see "build directory" kept purely as a wheel building concept, rather than something we define for sdists as well. If some projects *do* decide to put a lot of generated platform independent artifacts in their sdists, those of us that care always have the option of bypassing them and going straight to raw VCS clones and tarballs, just as we already ignore wheel files and retrieve sdists from PyPI instead. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Mon, Jul 10, 2017, at 04:13 PM, Nick Coghlan wrote:
My apologies folks, this is an entirely irrelevant tangent brought on by my attempting to explain my own preference that source archives (including sdists) actually *be* source archives, containing solely the original software is *its preferred form for modification*.
I also agree with this view - ideally, I think there should be no generated files in an sdist besides the metadata needed for tools consuming it. Many packages have included generated files to simplify building from an sdist, but I see this as a symptom of poor infrastructure which we're gradually fixing. In particular, before we had wheels, installing a package nearly always meant building it, so there was a strong pressure to do part of the build process before distributing the source. I don't think it's practical to forbid generating files to put in the sdist, but for pep517 I'd say it's appropriate to gently discourage it by not providing a build directory to that hook. I'm not going to argue hard for this if other people think the sdist hook needs that parameter, though. Thomas
On 10 July 2017 at 21:28, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
I don't think it's practical to forbid generating files to put in the sdist, but for pep517 I'd say it's appropriate to gently discourage it by not providing a build directory to that hook. I'm not going to argue hard for this if other people think the sdist hook needs that parameter, though.
I also don't think it's a huge issue either way (and it's definitely a tangential issue as far as PEP 517 is concerned) but anecdotally I have encountered packages that have taken a substantial amount of time (30 minutes+) to run Cython on the sources[1]. Having pregenerated C files in the sdist made the difference between "usable" and "not worth bothering" in that case. Paul [1] It may be that was a Cython bug - but that's not actually relevant, as I didn't know enough to confirm it was a bug, nor did I know how to avoid triggering it.
On 11 July 2017 at 06:56, Paul Moore <p.f.moore@gmail.com> wrote:
On 10 July 2017 at 21:28, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
I don't think it's practical to forbid generating files to put in the sdist, but for pep517 I'd say it's appropriate to gently discourage it by not providing a build directory to that hook. I'm not going to argue hard for this if other people think the sdist hook needs that parameter, though.
I also don't think it's a huge issue either way (and it's definitely a tangential issue as far as PEP 517 is concerned) but anecdotally I have encountered packages that have taken a substantial amount of time (30 minutes+) to run Cython on the sources[1]. Having pregenerated C files in the sdist made the difference between "usable" and "not worth bothering" in that case.
Right, I realised through this discussion that my previous mental mapping of "sdist->SRPM, wheel->RPM" isn't actually *quite* right, since the objectives of the Python level packaging ecosystem and a redistributor-centric system like RPM aren't exactly the same. In particular, I now realise that if a redistributor wants to absolutely *ensure* that we have the source in the preferred form for modification, then we technically need to be starting with the VCS commit or release tarball rather than the sdist, and it's just an artifact of history that those two starting points have traditionally been roughly equivalent. This is due to the fact that it makes sense for publishers to optimise their sdists and wheels to provide the best possible experience for folks using the *Python* level tooling, whereby: - sdists are nominally architecture, platform and Python version independent source archives for common compile toolchains - wheels are pre-built archives for common Python versions, platforms and architectures So while I still suggest we omit "build_directory" from the build_sdist signature (at least for the initial iteration of the API design), I'm now only -0 on the idea, rather than my original -1. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
To summarise the current state of this discussion as I understand it, the hooks we are currently thinking of defining are: def get_build_wheel_requires(config_settings) def prepare_wheel_metadata(metadata_directory, config_settings) def build_wheel(wheel_directory, config_settings, build_directory=None, metadata_directory=None) def get_build_sdist_requires(config_settings) def build_sdist(sdist_directory, config_settings[, build_directory=None]) # Last param under discussion I know we were also discussing a different naming scheme, but I've forgotten what names were proposed, and don't have time to find the relevant email again. Could someone produce a copy of the list above with the proposed names substituted? Thanks, Thomas
On Tue, Jul 11, 2017 at 9:58 AM Thomas Kluyver <thomas@kluyver.me.uk> wrote:
To summarise the current state of this discussion as I understand it, the hooks we are currently thinking of defining are:
def get_build_wheel_requires(config_settings) def prepare_wheel_metadata(metadata_directory, config_settings) def build_wheel(wheel_directory, config_settings, build_directory=None, metadata_directory=None)
def get_build_sdist_requires(config_settings) def build_sdist(sdist_directory, config_settings[, build_directory=None]) # Last param under discussion
I know we were also discussing a different naming scheme, but I've forgotten what names were proposed, and don't have time to find the relevant email again. Could someone produce a copy of the list above with the proposed names substituted?
Pro_duced. Mandatory backend hooks: - build_sdist(sdist_directory, config_settings={}) - build_wheel(wheel_directory, config_settings={}, build_directory=None, metadata_directory=None) Optional backend hooks: - get_requires_for_build_sdist(config_settings={}) - get_requires_for_build_wheel(config_settings={}) - prepare_metadata_for_build_wheel(metadata_directory, config_settings={})
On Tue, Jul 11, 2017, at 04:48 PM, Daniel Holth wrote:
Pro_duced.
Mandatory backend hooks:
- build_sdist(sdist_directory, config_settings={}) - build_wheel(wheel_directory, config_settings={}, build_directory=None, metadata_directory=None)> Optional backend hooks:
- get_requires_for_build_sdist(config_settings={}) - get_requires_for_build_wheel(config_settings={}) - prepare_metadata_for_build_wheel(metadata_directory, config_settings={})>
Thanks Daniel! I'm happy with all of those names. I'm at Europython at the moment, so I may not have much time to work on it for the next few days, but if no-one else gets there first I'll have a go at updating the PEP with the latest changes. Thomas
On Mon, Jul 10, 2017, 10:58 C Anthony Risinger <c@anthonyrisinger.com> wrote:
On Jul 10, 2017 8:59 AM, "Nick Coghlan" <ncoghlan@gmail.com> wrote:
On 10 July 2017 at 22:08, Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2017-07-10 20:33:16 +1000 (+1000), Nick Coghlan wrote: [...]
we don't really *want* folks to be adding generated files to their sdist that they aren't keeping under source control - we'd prefer that such activities were postponed to "build_wheel" now that we have separate source and precompiled distribution formats. [...]
This is a mildly naive view. The software I'm familiar with is actually attempting to reflect metadata _from_ the source revision control _into_ the sdist because while it's "tracked" there it's not tracked as normal files (version information from tags, change logs from the commit history, contributor lists from commit authorship). The metadata in question is lost by just blindly tarring up tracked files, so needs some mechanism to export and inject as untracked files (from the source revision control perspective) for inclusion in the sdist.
Right, we know there will be *some* generated files (as Daniel notes, there will typically at least be the metadata files in a setuptools/distutils generated sdist), as well as potentially modifications to some files based on the version history.
That isn't the question though - the question is whether we want to actively support folks moving "compilation" like activities (minification, pyx->C conversion, etc) to the sdist generation stage by adding the optional "build_directory" option to "build_sdist" as well.
And that's the part where we decided the answer is "No", we only want to support the following configurations:
build_sdist: working directory -> target directory
build_wheel: working directory -> target directory working directory -> build directory -> target directory
In all cases the frontend provides a target directory that is distinct from the current working directory, so backends have a place to put both generated intermediate artifacts *and* the final assembled archive file.
When preparing a redistributable archive, we don't want people to first generate difficult or inconvenient artifacts? I always thought that was a major feature of an archive, to reduce the content down to common denominators for verification, reproducibility, and build simplicity, at the expense of not being fully representative of the original build capabilities and likely an irreversible step.
I know there's been a lot of discussion here, and I probably missed it, but early on we talked about things like Cython and cffi being part of sdist generation (in setuptools at least).
Are these things now expected to be deferred to the wheel building stage, thus adding deps to the final build server, or are we just saying there is less utility in supporting an explicit "artifacts directory" during sdist generation?
Just that there is less utility in supporting a separate artifacts directory during sdist generation, correct. If the latter (and I think this is the case), that does seem reasonable, I
just wanted to confirm that such "compliations" are still expected, or at least permissible, in downstream redistributables (sdists).
Personally if you want to do I want to help make it easier to do. --
C Anthony _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 10 July 2017 at 17:13, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
Do we want to also provide a build_directory for the build_sdist hook? In principle, I don't think making an sdist should involve a build step, but I know that some projects do perform steps like cython code gen or JS minification before making the sdist. I think this was a workaround to ease installation before wheel support was widespread, and I'd be inclined to discourage it now, so my preference would be no build_directory parameter for build_sdist. Backends which insist on generating intermediates at that point can make a temp dir themselves.
Agreed, I think the two kinds of artifact generation we want to support directly in PEP 517 are: - artifacts checked in to source control (ala autotools and CPython's Argument Clinic) - artifacts created while building the wheel file While we can't *prevent* backends doing artifact generation in build_sdist, we don't really want to encourage it either - as you say, we'd prefer that folks migrate such processes to the sdist->wheel step.
Then I guess that the choice between building a wheel directly and attempting to build an sdist first (with direct fallback) is one for frontends, and doesn't need to be specified.
Yep, and frontends may even decide to do something like delegating out-of-tree build support to backends by default, while offering an option to force the use of an intermediate sdist, similar to the way that tox uses an sdist by default, but provides a setting in tox.ini to turn that off. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 7 July 2017 at 23:23, Daniel Holth <dholth@gmail.com> wrote:
FYI distutils supports out of tree builds too. It is the -b argument to 'setup.py build'.
Sort of. That's short for "--bdist-dir" and tells distutils/setuptools not to use the "dist/" subdirectory for either build trees or the build artifacts. It doesn't say anything about where intermediate artifacts generated by compilers etc should end up. For a hypothetical PEP 517 setuptools/distutils backend, I'd suggest an implementation that: - passed "--bdist-dir=<wheel_directory>" to "bdist_wheel" - copied the input files into the specific directory when "build_directory" was set and then did a normal in-place build in that directory Whereas for enscons, you could presumably just set "variant_dir" appropriately in the SConscript call (which, as it turns out, works by copying the input files to the designated build directory: http://scons.org/doc/2.1.0/HTML/scons-user/x3398.html). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Fri, Jul 7, 2017 at 9:45 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 7 July 2017 at 23:23, Daniel Holth <dholth@gmail.com> wrote:
FYI distutils supports out of tree builds too. It is the -b argument to 'setup.py build'.
Sort of. That's short for "--bdist-dir" and tells distutils/setuptools not to use the "dist/" subdirectory for either build trees or the build artifacts. It doesn't say anything about where intermediate artifacts generated by compilers etc should end up.
No, Daniel is correct. "setup.py *build* -b" is short for --build-base which is where all build artifacts go, even those from compilers. It seems you are confusing it with "setup.py *bdist* -b" which is indeed short for "--bdist-dir". -- Jeremy Kloth
Unrelated to pep 517, remind me whether when invoking setup.py build -b dir bdist, if the -b argument passed to the build command is supposed to affect the build command run as a subcommand of bdist? Asking for a friend 😸 On Sat, Jul 8, 2017, 11:17 Jeremy Kloth <jeremy.kloth@gmail.com> wrote:
On Fri, Jul 7, 2017 at 9:45 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 7 July 2017 at 23:23, Daniel Holth <dholth@gmail.com> wrote:
FYI distutils supports out of tree builds too. It is the -b argument to 'setup.py build'.
Sort of. That's short for "--bdist-dir" and tells distutils/setuptools not to use the "dist/" subdirectory for either build trees or the build artifacts. It doesn't say anything about where intermediate artifacts generated by compilers etc should end up.
No, Daniel is correct. "setup.py *build* -b" is short for --build-base which is where all build artifacts go, even those from compilers. It seems you are confusing it with "setup.py *bdist* -b" which is indeed short for "--bdist-dir".
-- Jeremy Kloth
On Sat, Jul 8, 2017 at 9:09 PM, Daniel Holth <dholth@gmail.com> wrote:
Unrelated to pep 517, remind me whether when invoking setup.py build -b dir bdist, if the -b argument passed to the build command is supposed to affect the build command run as a subcommand of bdist? Asking for a friend 😸
"setup.py build -b {dirname} bdist" will first run the build command (and its sub commands build_*) then invoke the bdist command. When bdist requests build to be executed, distutils will see that it is already run and simply continue on. Even if a reinit is requested (for build), the supplied options will be applied again when the build command is finalized prior to running. -- Jeremy Kloth
Great, thanks! It looks like out of tree builds are supported fine even in distutils. Properly written extensions will also find the build directory as a property of the build command. There should be no barrier to using the single build wheel function for the pep. On Sun, Jul 9, 2017, 10:33 Jeremy Kloth <jeremy.kloth@gmail.com> wrote:
On Sat, Jul 8, 2017 at 9:09 PM, Daniel Holth <dholth@gmail.com> wrote:
Unrelated to pep 517, remind me whether when invoking setup.py build -b dir bdist, if the -b argument passed to the build command is supposed to affect the build command run as a subcommand of bdist? Asking for a friend 😸
"setup.py build -b {dirname} bdist" will first run the build command (and its sub commands build_*) then invoke the bdist command. When bdist requests build to be executed, distutils will see that it is already run and simply continue on. Even if a reinit is requested (for build), the supplied options will be applied again when the build command is finalized prior to running.
-- Jeremy Kloth
On 9 July 2017 at 01:17, Jeremy Kloth <jeremy.kloth@gmail.com> wrote:
On Fri, Jul 7, 2017 at 9:45 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 7 July 2017 at 23:23, Daniel Holth <dholth@gmail.com> wrote:
FYI distutils supports out of tree builds too. It is the -b argument to 'setup.py build'.
Sort of. That's short for "--bdist-dir" and tells distutils/setuptools not to use the "dist/" subdirectory for either build trees or the build artifacts. It doesn't say anything about where intermediate artifacts generated by compilers etc should end up.
No, Daniel is correct. "setup.py *build* -b" is short for --build-base which is where all build artifacts go, even those from compilers. It seems you are confusing it with "setup.py *bdist* -b" which is indeed short for "--bdist-dir".
Aye, I assumed he was referring to the latter, as I wasn't able to find any documentation for "setup.py build" anywhere, and this is the first time I've ever heard of "build" being available as a separate setup.py command. Even given that clarification, thought, Nathaniel's point about folks doing local state manipulation directly in setup.py still stands. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Fri, Jul 7, 2017 at 4:05 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 7 July 2017 at 11:30, Nathaniel Smith <njs@pobox.com> wrote:
Is it absolutely necessary to get this into the first PEP?
As far as I'm concerned, it's no more than a restating (and simplification?) of all the discussions around building out of tree via creating an sdist and unpacking it, or having the various prepare files hooks. It's always been there and always been a requirement.
...? The text I was replying to was about adding a new argument to build_wheel that backends would have to check for and do special things based on. I mean, I really do see the attraction of requiring all build systems to support "build directories", that potentially enables some neat things like caching the directories across builds, but it's clearly a novel technical proposal, and one that's actually attempting to go beyond the rest of the world's build system state-of-the-art at a time when we're still trying to catch up with the capabilities of like, automake circa 2000.
Also, out of tree builds are something that pip is planning on implementing, and for us it's therefore necessary to have hooks that let us do that. The current version of the PEP provides those hooks, and we're planning on using them to do out of tree builds. This new suggestion is basically backend authors saying "you don't need to do the out of place builds, we're willing to take responsibility for that". If you're saying that you're not happy with that, then all that will end up happening is we revert to the previous approach, and pip implements out of tree builds based on that (and I guess you have the same argument all over again on the PR for the pip change...)
I am resigned to pip doing whatever it wants :-). At this point I just want to get something clean and functional landed that we can build on with a minimum of warts that we end up regretting later. If there's no way to land a workable build_wheel without this extra stuff, then OK; that's why I asked. But if it can be split into a future PEP then I think we should.
Also, please note that the proposal doesn't *prohibit* in-place builds, quite the opposite, it allows backends to decide when asked how to implement both in-place and out of place builds (where the current tree allows backends to decide how to do in place builds and how to copy trees, and leaves the frontend to decide how to implement out of place builds, typically via something like a tree copy and subsequent in-place build).
It mandates that all backends implement both in-place and out-of-place builds, including those backends which can do in-place builds correctly (e.g. projects that produce compiled artifacts but which accurately tell when they need to rebuild). Which is a little surprising, because what we actually care about is correct builds, and out-of-place builds are just one possible strategy for that. It's certainly one approach, but I don't feel confident that we can quickly and accurately determine that it's *the* best approach that we want to commit to forever. -n -- Nathaniel J. Smith -- https://vorpus.org
On 7 July 2017 at 20:30, Nathaniel Smith <njs@pobox.com> wrote:
I think this is a really interesting idea, but it makes me very nervous that we're starting design work on novel features when we still haven't finalized a basic build_wheel hook.
This isn't a novel feature, it's a revised approach to how we split the responsibility for supporting out of tree builds between frontends and backends, since folks aren't happy with the current "prepare_input/build_artifact" split.
PEP 517 was written in 2015...
And PEP 426 was written in 2012. Standards development tImelines can get looong when the status quo at least kinda sorta works, and nobody has commercial deadlines forcing them to push to standardize new interfaces before a genuine consensus has developed :)
This proposal creates substantial complications for build systems that default to doing in-place builds, which is almost all existing system from 'make' onwards. Legacy build systems often can't do out-of-tree builds at all (e.g. consider the case where you have a vendored C library whose build system you want to einvoke as part of your build). Is this a problem? The benefits are potentially large, but are they worth it if they increase the barrier to entry for new build systems? I'm not sure how to tell. And in the mean time, pip is still unconditionally calling copytree before even looking at the source tree...
Is it absolutely necessary to get this into the first PEP?
From flit's point of view, Tomas wants frontends to able to express a
If we don't explicitly include out-of-tree build support as a design concept from the start, then we increase the risk of folks creating in-tree only build backends that then break down later when out-of-tree support is introduced as an expectation. If anyone was negatively impacted by that, they could fairly accuse us of perpetrating a bait-and-switch in the standards definition process. And since distilling that support down to its core essence has been the single most controversial aspect of the PEP to date, getting to an approach that at least pip, flit, enscons, and a hypothetical PEP 517 adapter for setuptools can all live with is a fairly important point to resolve. The latest round of discussions have been enlightening, as they have allowed us to articulate that from pip's point of view, the key requirement is to be able to tell a backend not to include anything that wouldn't be included when building via an sdist. preference between two different failure modes: 1. The frontend wants the wheel build to *guarantee* it exactly matches going via the sdist path 2. The frontend wants a working wheel build more than it cares about matching the sdist path The "call build_sdist, then call build_wheel on the unpacked archive" approach gives us the first option, which is sufficient for pip's needs, but *only* including that path doesn't provide the latter capability. Up until now, we've attempted to provide the latter feature through various incarnations of the "prepare_wheel_input" hook, which have been consistently confusing as to when and how frontends should call them, and backends should deal with them. Daniel's latest suggestion was merely to ask "What if rather than making that a separate hook, we made it a parameter to the existing build_wheel hook?". That way, it's entirely up to the *backend* to decide how best to align the results of an out-of-tree build with what you'd get from going through the sdist path (with one of the main options being to move the input files and then do an in-place build in the new directory, and the other main option being to use any native out-of-tree build capabilities in the underlying build system).
From pip's point of view, it can still try the build_sdist hook implicitly *first* for source installations, and only fall back to setting "build_directory" on the build_wheel hook if the sdist creation fails.
So everybody gets what they're looking for from the API, and the only extra concept we have to explain in PEP 517 is the difference between in-place ("build_directory=None") and out-of-tree ("build_directory" set to a filesystem path) wheel builds. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Fri, Jul 7, 2017 at 8:27 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
The latest round of discussions have been enlightening, as they have allowed us to articulate that from pip's point of view, the key requirement is to be able to tell a backend not to include anything that wouldn't be included when building via an sdist.
From flit's point of view, Tomas wants frontends to able to express a preference between two different failure modes:
1. The frontend wants the wheel build to *guarantee* it exactly matches going via the sdist path 2. The frontend wants a working wheel build more than it cares about matching the sdist path
Both of these are handled neatly by my draft posted at the beginning of this thread. OTOH this whole 11th hour discussion of forcing every build system to have in-tree and out-of-tree build support is solving some other problem. I'm not entirely sure what that problem is -- I don't think anyone has articulated it. Your "key requirement" is technically vacuous -- by definition, *any* correct build backend will only include the things that would be included when building via an sdist. Some possible problems that I've seen mentioned in the thread include: - pip doesn't trust build systems to properly support incremental builds, so it wants to force it to throw away the build artifacts after every build - pip wants to enforce builds going via sdists - Thomas / Ralf / I are frustrated at the idea of not supporting incremental builds But the in-tree/out-of-tree build proposal doesn't address any of these problems. Part of the support for it seems to be that it *sounds* like it might somehow provide a compromise between the folks arguing over what pip should do by default, because it provides a way of doing incremental builds that kind of looks like pip's current hack for doing non-incremental builds. But I think this is illusory. We still have to argue over what pip will actually do by default; all this does is replace 1 way of doing incremental builds with 2 ways, and if pip doesn't want to do incremental builds it'll ignore both. Seriously, what problem is this solving? How can we even have a discussion about whether it's the best solution when we don't know that? In my previous emails I was trying to avoid getting into the nitty-gritty of actually critiquing the proposal, because I think it's a fundamental mistake to even try to hash out this kind of complex design at the last minute; for every issue I think of now there's probably another one that none of us have noticed yet. But since everyone else seems so gung-ho to ship this thing without even having that minimal discussion, here are some concerns and questions: If we require every project to support both in-tree and out-of-tree builds, then projects that don't really support out-of-tree builds now need to implement the copytree hack themselves, and they might get it wrong. For example, if you don't correctly clear out the old build tree before copying, your new build could potentially be corrupted by artifacts from your old build. (I'm having flashbacks to the bug reports we get on numpy from people who used setup.py install to upgrade, and because it doesn't uninstall the old version they end up with some combination of old and new versions overlaid on top of each other.) This is a whole new potential failure mode that this proposal is introducing. Is that acceptable? In fact, I'm guessing that pip will not actually cache build directories and re-use them, and will only ever supply empty directories as the out-of-tree build dir. This means that lots of projects are likely to be released without ever testing their build-with-a-non-empty-out-of-tree-build-dir path, and thus it won't work. What are the chances that this turns into another feature that exists in theory but in practice it rusts over and can't be used, because trying to do so would break too often? What happens if a project switches from scons to cmake or vice-versa, and they get passed a build tree that contains foreign build artifacts? Are they prepared to detect this and do something sensible? Traditionally reusing a build context (either an in-place build tree or an out-of-place build tree) is something that only developers do, and they do it manually, so this isn't a problem -- you just send an occasional email to the list saying "hey I just flipped the switch on the build systems, make sure to do a 'git clean' before your next build", or people just reflexively do a 'git clean' any time they get weird results. But now we're proposing to make this a first-class feature to be used by automatic unsupervised build pipelines; are they going to end up producing garbage? Are frontends allowed to move the out-of-tree build directory to another parent directory? Another filesystem? Another machine? Another operating system? what do you they have to preserve if they try? timestamps? inode numbers? (Using inode numbers as part of a change detection algorithm is a totally reasonable thing for a build system to do.) What if a source tree has previously been used for in-place builds, is this allowed to make future out-of-place builds break? Off the top of my head I know openssl's build system has an in-place XOR out-of-place restriction [1]. It looks like CMake is documented to have one as well [2]. There may be no automatic way to *ever* do a out-of-place build in a tree that has previously had an in-place build; you might have to throw out that tree and start over. Is a build system like this compliant? (Notice from [1] that it sounds like openssl handles this situation printing a message to the console saying "lol this build is probably broken idek" and then happily produces a broken build. Notice from [2] that cmake might randomly decide to do an in-place build even if you requested an out-of-place build.) Speaking of which, why do we force backends to support both in-place and out-of-place? Out-of-place is strictly more powerful (if it works at all). Why not make that the only mode of operation? Or: another idea that came up was just passing a flag to the build backend saying whether the source tree was temporary or not, which has the advantage that it's clearly defined (pip certainly knows whether it's going to delete the source tree after it finishes the build), and potentially side-steps a lot of these problems with managing the out-of-tree build cache. Would that be better or worse? I don't know how to answer that given that I don't know what problem we're trying to solve here. Is pip planning to enforce that the source directory is left unmodified? If not, then do we expect that projects will actually skip modifying the source tree in practice? It's *very* easy to accidentally break this rule without realizing it. Will this happen often enough to make this non-viable for whatever we're trying to do here? (I guess it has something to do with getting trustworthy builds from untrustworthy build systems, so this seems relevant.) Does anyone know how widely supported out-of-place builds are *in real life*? I know that all the major build frameworks have the infrastructure, but I'm one of those weirdos who habitually does out-of-place builds whenever I build software by hand, and back when I used to build a lot of software by hand then it was *very* common for this mode of operation to be broken due to whatever weird hack used inside a specific project's build system. It's been ~10 years since I did that often though; maybe things have gotten better? ------ I'm really not trying to be an asshole here :-(. Being the asshole is extremely unpleasant, and I just had throw myself in front of the prepare_build_files train before everyone suddenly realized that whoops, maybe wasn't as great an idea as they thought. But like... everyone does understand right that whatever we put in here, we're stuck with forever? This isn't like some new project where you can release 0.0.1 and then spend a few years noodling around with the API before you release 1.0 and start promising backcompat. This is version 0.0.1 and 1.0 at the same time, and also we can't write any tests until after we release it, and we may never be able to release a 2.0. (Look at WSGI -- I mean, it's tremendously popular and influential, obviously they did some things right, but the spec is full of awful stuff that everyone hates and all its imitators dropped, but fixing it is impossible.) I just don't understand how everyone has the confidence that this proposal is a mature solid thing that will stand the test of time. Maybe it will! But how can you possibly know that when we haven't even scratched the surface of all its implications? Shouldn't the prepare_build_files thing be a clue that your judgement might not be 100% reliable on these things? And the alternative is just like... go ahead and ship something that only supports in-place builds directly (e.g. my draft at the top of this thread), add out-of-place builds later as an optional extension if it's useful, if the frontend really wants an out-of-place build it can fall back on shutil.copytree (plus as a bonus it *knows* that the backend can't make use of the resulting temporary build directory, so it can throw it away instead of caching it, and there's zero chance of cross-build pollution). If anything this seems like the end result would be *superior* to this proposal, and I've seen zero evidence that out-of-core builds are something we need to solve now. Can we please just not? I actually have a list of fiddly details that need to be discussed about the core part of the proposal, but I don't see how we'll ever get to the point of nailing down these kinds of details when all the oxygen is going into this kind of proposal whose implications are too complicated for us to even understand.
PEP 517 was written in 2015...
And PEP 426 was written in 2012. Standards development tImelines can get looong when the status quo at least kinda sorta works, and nobody has commercial deadlines forcing them to push to standardize new interfaces before a genuine consensus has developed :)
I humbly suggest that this isn't an immutable fact of nature, but rather there are strategies that we can adopt intentionally to reduce the chance of repeating PEP 426's fate. If we want to. -n [1] https://mta.openssl.org/pipermail/openssl-dev/2016-June/007364.html [2] "Note: Before performing an out-of-source build, ensure that all CMake generated in-source build information is removed" -- https://cmake.org/Wiki/CMake_FAQ#What_is_an_.22out-of-source.22_build.3F -- Nathaniel J. Smith -- https://vorpus.org
On Fri, Jul 14, 2017, at 11:18 AM, Nathaniel Smith wrote:
OTOH this whole 11th hour discussion of forcing every build system to have in-tree and out-of-tree build support is solving some other problem.
It is not my intention to force build systems to support either of these. - Where build systems only support out-of-tree builds, it should be trivial to create a temporary directory when build_directory is not passed. - Where build systems only support in-tree builds, they should copy the necessary files to build_directory and run an in-tree build there. This is more complex, but it appears to be non-negotiable that there is some way of building without affecting the source directory, so whatever the interface is, we need some way to do this.
On Fri, Jul 14, 2017 at 3:32 AM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
it appears to be non-negotiable that there is some way of building without affecting the source directory, so whatever the interface is, we need some way to do this.
But this is confusing the means with the ends. Obviously no cares about that *per se*; they want it because it accomplishes something they actually care about. Maybe because they don't trust build tools to do incremental builds. Maybe because they want to exercise the sdist path to reduce chances for error. Maybe because pip has always done it that way. What is that thing? What are the advantages of this design, as compared to the fallback of doing unconditional copytree (like pip does now and may well continue doing for years to come regardless of what we say here), or the slightly fancier fallback that my draft supports of attempting to build an sdist and if that fails doing a copytree instead? Is this the simplest way to accomplish the real goal? All the goals that I can think of seem to allow for solutions that have fewer complications and unknowns... ...and if pip's goal is to go via sdist whenever possible while always being careful never to modify the source tree, then why did we end up with a design where sdist generation is the one case that *is* encouraged to modify the source tree? This doesn't make any sense. I'm sorry for dragging this out -- I know you just want to get something finished and stop arguing about this. So do I :-). And your efforts to keep pushing this stone up the hill are much appreciated. I just think we should... find a shorter hill and declare victory over that. -n -- Nathaniel J. Smith -- https://vorpus.org
On 14 July 2017 at 20:59, Nathaniel Smith <njs@pobox.com> wrote:
On Fri, Jul 14, 2017 at 3:32 AM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
it appears to be non-negotiable that there is some way of building without affecting the source directory, so whatever the interface is, we need some way to do this.
But this is confusing the means with the ends. Obviously no cares about that *per se*; they want it because it accomplishes something they actually care about.
Maybe because they don't trust build tools to do incremental builds. Maybe because they want to exercise the sdist path to reduce chances for error. Maybe because pip has always done it that way. What is that thing? What are the advantages of this design, as compared to the fallback of doing unconditional copytree (like pip does now and may well continue doing for years to come regardless of what we say here), or the slightly fancier fallback that my draft supports of attempting to build an sdist and if that fails doing a copytree instead? Is this the simplest way to accomplish the real goal? All the goals that I can think of seem to allow for solutions that have fewer complications and unknowns...
...and if pip's goal is to go via sdist whenever possible while always being careful never to modify the source tree, then why did we end up with a design where sdist generation is the one case that *is* encouraged to modify the source tree? This doesn't make any sense.
You're confusing two different aspects of the design here: - the pip developer's desire for publisher flows to closely match end user flows to reduce the risk of novice publishers shipping broken archives - the general build tool convention that out-of-tree builds shouldn't modify the source tree pip's design requirement is met by pip defaulting to doing build_sdist and then doing build_wheel based on that sdist, so that's the simplest possible design we could approve and ship. The question then is what the fallback should be when build_sdist fails, but only a wheel archive or installed package is needed. Candidates: - fail anyway (unecessarily poor UX if the wheel build would have worked) - build in-place (reasonable option) - build out-of-tree (reasonable option) - shutil.copytree then build in-place (status quo for legacy sdists, but problematic due to VCS directories & other large sets of data) The design that PEP 517 has settled on is to say that since both in-place and out-of-tree builds are reasonable things for a frontend to request, the *API* will allow frontends to request either an in-place build ("build_directory is None" or "os.path.samefile(build_directory, os.getcwd())") or an out-of-tree build ("os.path.samefile(build_directory, os.getcwd())"). We're not inventing the concept of an out-of-tree build, we're just choosing to make it a requirement for PEP 517 backends to support them. This is low risk, since by pushing out-of-tree build support to the backend, we also make it straightforward for the *backend* to tell publishers how to define a filtered shutli.copytree operation if the backend doesn't natively support out-of-tree builds. That way we don't need to standardise on a universal protocol for defining which files are needed for the wheel building process - that can be a backend dependent operation. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 14 July 2017 at 21:23, Nick Coghlan <ncoghlan@gmail.com> wrote:
The design that PEP 517 has settled on is to say that since both in-place and out-of-tree builds are reasonable things for a frontend to request, the *API* will allow frontends to request either an in-place build ("build_directory is None" or "os.path.samefile(build_directory, os.getcwd())") or an out-of-tree build ("os.path.samefile(build_directory, os.getcwd())").
Oops, forgot to add the "not" after copying-and-pasting the samefile call: out-of-tree builds are the inverse of out-of-tree builds, and hence indicated by "build_directory is not None and not os.path.samefile(build_directory, os.getcwd())". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Fri, Jul 14, 2017, at 11:59 AM, Nathaniel Smith wrote:
What are the advantages of this design, as compared to the fallback of doing unconditional copytree (like pip does now and may well continue doing for years to come regardless of what we say here),
I don't think pip currently does copytree. Naively copying everything may be very slow, especially on Windows, because it includes things like the .git subdirectory. If you exclude the .git subdirectory, you break tools like setuptools_scm which rely on the VCS for version numbers.
or the slightly fancier fallback that my draft supports of attempting to build an sdist and if that fails doing a copytree instead?
Having a fallback which may be pathologically slow seems like bad UX (it's hard to see what's going wrong), and I don't want whatever is specified to hinge on generating sdists.
...and if pip's goal is to go via sdist whenever possible while always being careful never to modify the source tree, then why did we end up with a design where sdist generation is the one case that *is* encouraged to modify the source tree? This doesn't make any sense.
Sdist generation is *not* encouraged to modify the source tree. It is encouraged to avoid generating significant artifacts, because the build steps should be done after extracting the sdist. But if the backend wants to generate intermediates during sdist creation, it is free to make a temporary directory, or use an external cache mechanism.
I'm sorry for dragging this out -- I know you just want to get something finished and stop arguing about this. So do I :-). And your efforts to keep pushing this stone up the hill are much appreciated. I just think we should... find a shorter hill and declare victory over that.
I appreciate the enthusiasm, but I feel like we're arguing about whether that hill over there is smaller when we're almost at the top of this one. ;-)
On Jul 14, 2017, at 9:46 AM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Fri, Jul 14, 2017, at 11:59 AM, Nathaniel Smith wrote:
What are the advantages of this design, as compared to the fallback of doing unconditional copytree (like pip does now and may well continue doing for years to come regardless of what we say here),
I don't think pip currently does copytree. Naively copying everything may be very slow, especially on Windows, because it includes things like the .git subdirectory. If you exclude the .git subdirectory, you break tools like setuptools_scm which rely on the VCS for version numbers.
Still processing this new direction, but just FYI we *do* do copy tree today. It is naive and it is slow in many cases. — Donald Stufft
On 14 July 2017 at 11:32, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Fri, Jul 14, 2017, at 11:18 AM, Nathaniel Smith wrote:
OTOH this whole 11th hour discussion of forcing every build system to have in-tree and out-of-tree build support is solving some other problem.
It is not my intention to force build systems to support either of these.
- Where build systems only support out-of-tree builds, it should be trivial to create a temporary directory when build_directory is not passed. - Where build systems only support in-tree builds, they should copy the necessary files to build_directory and run an in-tree build there. This is more complex, but it appears to be non-negotiable that there is some way of building without affecting the source directory, so whatever the interface is, we need some way to do this.
It's "non-negotiable" in the sense that we (the pip developers) have in the past had to deal with issues being reported that turned out to be because a developer had unexpected files in their build directory that resulted in the build being incorrect. That's not a case of "not trusting build systems to do incremental builds", it's direct evidence that developers sometimes make mistakes and think it's pip's fault (until we explain their mistake to them). The plan for pip, to address this ongoing issue, is to provide out of tree builds, so that we can ensure that the developer is protected against such mistakes. The UI for out of tree builds is not decided finally yet (for example, whether it's the default, or if a flag is needed). The reason for this is because current tools *make it impossible* for us to implement this (with one proviso, it appears that no-one ever actually tried the build a sdist, then build from the sdist route - I thought we had and had hit issues caused by certain tools, but I was apparently wrong). So rather than back ourselves into a new corner where pip can't provide out of tree builds to solve this issue, we need some means of doing this in PEP 517. The option to do it via sdists, or via "copy tree" hooks, became messy and was a problem for flit, so we took a second look at it and agreed that we'd got hung up on not being willing to trust backends to do out of tree builds. The new proposal was to add a means for frontends to request an out of tree build, and trust backends to provide that. I'm happy with the current proposal. We shouldn't have a problem trusting backends, and now we have a means of asking for what we want, so everything is fine. If the issue here is that front ends shouldn't be allowed to even *want* to do out of tree builds, I'm not sure what I can further say to convince you. We have actual bugs which we can solve with out of tree builds, and we can't think of another way to solve them. (Unfortunately, I can't find the issue in pip where we have discussed implementing out of tree builds, so I can't give you a pointer to actual bug reports - and anyway, most if not all of them will have been worked around by the developer cleaning up his or her source tree). I hope this clarifies a little - I wasn't too keen on seeing the requirement described as "non-negotiable" (or of it being linked to incremental builds), when it's actually nothing other than an identified need that addresses a real world problem frontends have to deal with (of not-always-pristine build directories). Paul
On 14 July 2017 at 20:18, Nathaniel Smith <njs@pobox.com> wrote:
On Fri, Jul 7, 2017 at 8:27 PM, Nick Coghlan <ncoghlan@gmail.com> wrote: Some possible problems that I've seen mentioned in the thread include:
- pip doesn't trust build systems to properly support incremental builds, so it wants to force it to throw away the build artifacts after every build
It's less that, and more pip wanting to ensure that the default publisher experience is reasonable close to the default end user experience, in order to increase the likelihood that publishers ship working sdists and wheel files, even if they haven't learned about the full suite of available pre-release testing tools yet (or have chosen not to use them).
- pip wants to enforce builds going via sdists
This isn't a requirement, just an option we want to enable (and can't avoid enabling).
- Thomas / Ralf / I are frustrated at the idea of not supporting incremental builds
This is why we're allowing frontends to request in-place builds. And that's been the main discovery of the last round of discussions - we'd been talking past each other, and what we actually wanted was for the backend interface to accurately model the in-place/out-of-tree discussion that has been common to most build systems at least since autotools (hence my catalog of them earlier in the thread). This means we're no longer attempting to invent anything new, we're just copying the extensive prior art around out-of-tree builds, and will deal with the consequences. Regarding your questions: 1. What if backends get their out-of-tree support wrong? That's fine, it can be reported as a bug, fixed, and frontends and redistributors can use pip constraint files to enforce minimum versions even if publishers don't update their pyproject.toml files (plus frontends will grab the latest version of build backends by default). 2. What happens if re-using build directories isn't tested properly? I'd be genuinely surprised if any frontend ever did this by default (excluding in-place builds). Instead, it would be application integrators doing it as part of something like BitBucket Pipelines or OpenShift ImageStreams, and if it didn't work properly, they'd stop caching that particular directory. 3. What happens if an upgrade breaks incremental builds? The same thing that happens if an upgrade straight-up breaks: either it gets reported as a bug and fixed, or else end users figure out a workaround and move on. 4. Do we enforce not modifying the source directory? No. We don't even enforce backends not doing "rm -rf ~". If folks want anything like that enforced, then they still need to use ephemeral build servers. 5. What happens if you try to switch from in-place to out-of-place in one tree and it breaks? Don't do that, then (this is a publisher-only issue that won't impact end users). 6. Is native out-of-tree build support common in practice? Yes, as they're a requirement for inclusion in Debian. 7. Why include in-place support in the API then? Because some folks (including you) would like to include incremental build support, and because if we don't support it explicitly backends will still have to deal with the "wheel_directory == build_directory" case. 8. But what if we make a design mistake and can't fix it? Like all other recent PyPA specifications, PEP 517 will go through a Provisional acceptance period where it is accepted for implementation in the default toolset, but explicitly subject to revision based on that real world feedback. But even if something does still get through that second period of review, the whole reason we opted for the Python backend API over the CLI based one is that it's much easier to evolve (since we can do getattr checks and so forth, as well as straight up declarations of API version support) 9. Why not wait and then add a new backend capability requirement later? Waiting to add the requirement won't provide us with any more data than we already have, but may give backend implementors the impression they don't need to care about out-of-tree build support. This is also our first, last, and only chance to make out-of-tree build support a *mandatory* backend requirement that frontends can rely on - if we add it later, it will necessarily only be optional. By contrast, if we include the out-of-tree build feature in the first version, and it's straightforward for frontends to work with and backends to implement, then cool, while if it proves problematic, then we can review the design decision based on that additional information. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Fri, Jul 14, 2017 at 4:00 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 14 July 2017 at 20:18, Nathaniel Smith <njs@pobox.com> wrote:
On Fri, Jul 7, 2017 at 8:27 PM, Nick Coghlan <ncoghlan@gmail.com> wrote: Some possible problems that I've seen mentioned in the thread include:
- pip doesn't trust build systems to properly support incremental builds, so it wants to force it to throw away the build artifacts after every build
It's less that, and more pip wanting to ensure that the default publisher experience is reasonable close to the default end user experience, in order to increase the likelihood that publishers ship working sdists and wheel files, even if they haven't learned about the full suite of available pre-release testing tools yet (or have chosen not to use them).
This is exactly the opposite of what Paul says in his message posted at about the same time as yours... AFAICT his argument is that build artifact leakage is exactly what pip is worried about, and if anything he's annoyed at me because he thinks I'm downplaying the problem :-). (My impression is that Donald's position might be closer to what you wrote, though.)
- pip wants to enforce builds going via sdists
This isn't a requirement, just an option we want to enable (and can't avoid enabling).
Donald sure seemed to be insisting on it as a requirement earlier, at least. And the *entire cause* of this whole discussion was conflict between this requirement and flit not wanting to support building sdists from unpacked sdists, so characterizing it as some automatic thing that doesn't even need to be talked about seems really weird.
- Thomas / Ralf / I are frustrated at the idea of not supporting incremental builds
This is why we're allowing frontends to request in-place builds.
Wait, what? That's not what any of us asked for at all. Ralf and I want incremental builds, which is orthogonal to the in-place versus out-of-place distinction, and Thomas wants pip not to call sdist due to some combination of inefficiency and lack-of-support-in-some-cases, which is also orthogonal to the in-place versus out-of-place distinction. (In retrospect I probably shouldn't have lumped Thomas / Ralf / I together there, those really are different motivations, though they lead to similar places.) You're trying to shut me down by making these pronouncements about what PEP 517 is going to be, and yet you don't seem to even understand what requirements people are talking about. I'm sorry if that's rude, and I'm not angry at you personally or anything, but I am frustrated and I don't understand why I seem to have to keep throwing myself in front of these ideas that seem so poorly motivated.
And that's been the main discovery of the last round of discussions - we'd been talking past each other, and what we actually wanted was for the backend interface to accurately model the in-place/out-of-tree discussion that has been common to most build systems at least since autotools (hence my catalog of them earlier in the thread).
This statement seems just... completely wrong to me. I can't see any way that the in-place/out-of-place distinction matches anything we were talking about earlier. Have we been reading the same threads? There was the whole thing about pip copying trees, of course, but that's a completely different thing. The motivation there is that they want to make sure that two builds don't accidentally influence each other. If they don't trust the build system, then this requires copying to a new tree (either via copytree or sdist). If they do trust the build system, then there are other options -- you can do an in-place build, or have a flag saying "please 'make clean' before this build", or something. So out-of-place builds are either insufficient or unnecessary for what pip wants. Copying the source tree and out-of-place builds are completely different things with different use cases; they happen to both involve external directories, but that's mostly a false similarity I think. And as for flit, it doesn't even have a distinction between in-place and out-of-place builds...
This means we're no longer attempting to invent anything new, we're just copying the extensive prior art around out-of-tree builds, and will deal with the consequences.
Forcing all build systems to support both in-place builds and out-of-place builds adds substantial additional complexity, introduces multiple new failure modes, and AFAICT is not very well suited to the problems people have been worrying about.
Regarding your questions:
1. What if backends get their out-of-tree support wrong?
That's fine, it can be reported as a bug, fixed, and frontends and redistributors can use pip constraint files to enforce minimum versions even if publishers don't update their pyproject.toml files (plus frontends will grab the latest version of build backends by default).
2. What happens if re-using build directories isn't tested properly?
I'd be genuinely surprised if any frontend ever did this by default (excluding in-place builds). Instead, it would be application integrators doing it as part of something like BitBucket Pipelines or OpenShift ImageStreams, and if it didn't work properly, they'd stop caching that particular directory.
3. What happens if an upgrade breaks incremental builds?
The same thing that happens if an upgrade straight-up breaks: either it gets reported as a bug and fixed, or else end users figure out a workaround and move on.
4. Do we enforce not modifying the source directory?
No. We don't even enforce backends not doing "rm -rf ~". If folks want anything like that enforced, then they still need to use ephemeral build servers.
These first 4 answers might be reasonable if it wasn't for the fact that the *entire alleged motivation* for this feature is to reduce the chance of accidental breakage. You can't justify it like that and then hand-wave away all the new potential sources of accidental breakage that it introduces...
5. What happens if you try to switch from in-place to out-of-place in one tree and it breaks?
Don't do that, then (this is a publisher-only issue that won't impact end users).
This will certainly affect developers. The symptom is that you do the equivalent of 'python setup.py build' in a source tree, and then later do 'pip install .' in a source tree, and get screwy undefined behavior because you're using a mainstream build system like cmake. At the very least the PEP needs some language like "Frontends MUST NOT assume that a given source tree can switch between in-place and out-of-place builds; once they've issued one type of build against a given source tree, they MUST stick with that type. One consequence of this is that when given an arbitrary pre-existing source tree whose history is unknown, frontends MUST NOT assume that they can perform either in-place or out-of-place builds using that tree. In practice, this means that frontends MUST explicitly query the user for which type of build to use, because anything else risks either producing corrupt builds or breaking the user's source tree with common build systems." Or, well, that sounds pretty unusable, so maybe instead we would want to go with the alternative: "Backends MUST support free switching between in-place and out-of-place builds in the same directory" -- but that's a whole 'nother likely source of weird problems, and breaks the idea that this is something that existing build systems all support already. But IMO the one thing the PEP can't do is just stay silent on this and pretend that everything will work out even if frontends and backends are using incompatible interpretations. That'd be, like, malpractice.
6. Is native out-of-tree build support common in practice?
Yes, as they're a requirement for inclusion in Debian.
Huh, interesting! Do you have a cite for this? I tried to find more information, and all I found this advice for upstreams, which explicitly discusses what they need from build systems that don't support out-of-tree builds. It would be odd to include that if such build systems were forbidden: https://wiki.debian.org/UpstreamGuide#Cleaning_the_Tree In any case, there's a huge difference between Debian wanting to take a single static snapshot and build it several times from scratch, and what's specified in the PEP right now which apparently allows for incrementally re-using the same build tree while also incrementally evolving the source being built. Most of the underspecified edge cases that I brought up are ones that simply don't arise in Debian's use case, but do arise in the PEP. This is why I'm unconvinced by the argument that we're just adopting some old well-known idea -- when you look at the details this proposal as written is much more ambitious than any deployed system I'm aware of.
7. Why include in-place support in the API then?
Because some folks (including you) would like to include incremental build support, and because if we don't support it explicitly backends will still have to deal with the "wheel_directory == build_directory" case.
Out-of-tree builds as currently specified are required to handle incremental out-of-tree builds... maybe that's a mistake, but that's what the text says. I don't understand the second half of your sentence.
8. But what if we make a design mistake and can't fix it?
Like all other recent PyPA specifications, PEP 517 will go through a Provisional acceptance period where it is accepted for implementation in the default toolset, but explicitly subject to revision based on that real world feedback.
But even if something does still get through that second period of review, the whole reason we opted for the Python backend API over the CLI based one is that it's much easier to evolve (since we can do getattr checks and so forth, as well as straight up declarations of API version support)
It's *dramatically* easier to add than to remove, though.
9. Why not wait and then add a new backend capability requirement later?
Waiting to add the requirement won't provide us with any more data than we already have, but may give backend implementors the impression they don't need to care about out-of-tree build support. This is also our first, last, and only chance to make out-of-tree build support a *mandatory* backend requirement that frontends can rely on - if we add it later, it will necessarily only be optional.
AFAICT making it optional is better for everyone, so not sure why that's seen as a bad thing. Analysis: - If you're an eager backend dev who loves this stuff, you'll implement it anyway, so it doesn't matter whether it's optional - If you're a just-trying-to-hack-something-together backend dev maybe constrained by some ugly legacy build system, then you'd much rather it be optional because then you don't have to deal with hacking up some fake "out-of-tree build" using copytree, and maybe getting it wrong - If you're pip, then AFAICT you're more worried about lazy backend devs screwing things up than anything else, in which case you don't want them hacking up their own fake "out-of-tree build" using copytree, you want to take responsibility for that so you know that you get it right - If you're an automatic building pipeline, then knowing which backends support "real" out-of-tree builds is *super valuable information* because it means you don't have to waste disk/bandwidth caching the "build tree" for backends that are just going to throw it all away anyway -n -- Nathaniel J. Smith -- https://vorpus.org
On Jul 14, 2017, at 2:24 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Fri, Jul 14, 2017 at 4:00 AM, Nick Coghlan <ncoghlan@gmail.com <mailto:ncoghlan@gmail.com>> wrote:
On 14 July 2017 at 20:18, Nathaniel Smith <njs@pobox.com> wrote:
On Fri, Jul 7, 2017 at 8:27 PM, Nick Coghlan <ncoghlan@gmail.com> wrote: Some possible problems that I've seen mentioned in the thread include:
- pip doesn't trust build systems to properly support incremental builds, so it wants to force it to throw away the build artifacts after every build
It's less that, and more pip wanting to ensure that the default publisher experience is reasonable close to the default end user experience, in order to increase the likelihood that publishers ship working sdists and wheel files, even if they haven't learned about the full suite of available pre-release testing tools yet (or have chosen not to use them).
This is exactly the opposite of what Paul says in his message posted at about the same time as yours... AFAICT his argument is that build artifact leakage is exactly what pip is worried about, and if anything he's annoyed at me because he thinks I'm downplaying the problem :-). (My impression is that Donald's position might be closer to what you wrote, though.)
- pip wants to enforce builds going via sdists
This isn't a requirement, just an option we want to enable (and can't avoid enabling).
Donald sure seemed to be insisting on it as a requirement earlier, at least. And the *entire cause* of this whole discussion was conflict between this requirement and flit not wanting to support building sdists from unpacked sdists, so characterizing it as some automatic thing that doesn't even need to be talked about seems really weird.
Just to point out here that like many things, it’s complicated (tm). We have a number of issues that regularly come in on the pip side which really boil down to the build tool did something wrong and unexpected. Over time we’ve built up a number of strategies to attempt to work around those cases and as we look at a new system, we’re thinking of those cases while we evaluate it and going “Is this going to make those cases, better, worse, or not effect them?” and trying to push the design towards things that make those cases better rather than worse. There is some differences I suspect in what specific problems me and Paul are each thinking of as well as well as what specific solutions to solving them we might like or even find acceptable. For better or worse pip doesn’t have a BDFL and for direction we generally follow rough consensus and working code. So “pip” doesn’t really have an official position on what we want from this, but Paul has a position and so do I. To try and break down my thinking some more: * I think it is a requirement that any PEP that proposes a mechanism for removing ``setup.py`` supports sdist and wheel building in it’s interface. Otherwise it sends a message that either one is an optional part of Python packaging, which is not a message we should be sending. * I would *like* to remove a fairly common error case where we have a difference in the files that get included in a sdist and the files that get installed such that ``pip install .`` causes different files to be installed than ``python setup.py sdist && pip install dist/*.tar.gz``. * I would *like* to remove another common error case where two subsequent builds end up affecting each other (at least by default) because of detritus left behind from the first build process. One example I can think of is if you have a library that can optionally link against some other library, you might ``pip install .``, notice it didn’t link against that (because you didn’t have it installed) and then install it and ``pip install .`` again. Nothing would have changed in the directory to trigger a rebuild, so you’d get the behavior where it wouldn’t rebuild. I have opinions about how the above things get solved, but I think only the first one is a hard and fast requirement where if the PEP doesn’t include it at all I simply cannot support the PEP at all. The other things are things I personally would very much prefer to solve, because it solves a common stumbling block for both new and experienced users alike. — Donald Stufft
On 14 July 2017 at 21:23, Donald Stufft <donald@stufft.io> wrote:
* I think it is a requirement that any PEP that proposes a mechanism for removing ``setup.py`` supports sdist and wheel building in it’s interface. Otherwise it sends a message that either one is an optional part of Python packaging, which is not a message we should be sending.
I agree - although I accept that under certain conditions, certain backends might fail to produce a sdist (for example flit can't produce a sdist if the user doesn't have VCS tools installed). I'd rather they didn't, but that's a quality of implementation issue with the backend, and not something I think we should mandate.
* I would *like* to remove a fairly common error case where we have a difference in the files that get included in a sdist and the files that get installed such that ``pip install .`` causes different files to be installed than ``python setup.py sdist && pip install dist/*.tar.gz``.
I agree - for me, this is an important improvement that we can make to pip, and if the PEP makes it impossible for us to do this I'd be very reluctant to accept it. We could fall back to building via sdist, but if sdist creation can fail, that's a problem - so I'd have to rethink my willingness to accept that build_sdist could fail if we didn't get support for this.
* I would *like* to remove another common error case where two subsequent builds end up affecting each other (at least by default) because of detritus left behind from the first build process. One example I can think of is if you have a library that can optionally link against some other library, you might ``pip install .``, notice it didn’t link against that (because you didn’t have it installed) and then install it and ``pip install .`` again. Nothing would have changed in the directory to trigger a rebuild, so you’d get the behavior where it wouldn’t rebuild.
I have no real opinion on this one. Or at least, I'm not willing to get into fights about incremental build support at this time, so I'll abstain here. Paul
I proposed the build directory parameter because the copytree hook made no sense to me. It is not a perfect substitute but perhaps a configurable build directory is nice on its own without having to satisfy all older arguments in favor of copytree. I think true in-place builds are the oddball (for example 2to3 or any build where sources have the same name as outputs needs a build directory to put the translated .py files, otherwise it would overwrite the source). What people think of as in-place builds in distutils are usually just builds into the default build directory. IIRC the main reason we are worried sdist generation would fail is when you are trying to do new development from an unpacked sdist. Suggest not worrying that this use case is a perfect experience. It's good enough for quick patching, but if you need to do a new source release then you probably have time to meet the requirements of your build backend. Surely the pluggable build systems will compete on reliability, rather than repeatedly making MANIFEST.in errors you will be able to just switch build-backend.
On Sat, Jul 15, 2017 at 9:31 AM, Daniel Holth <dholth@gmail.com> wrote:
I proposed the build directory parameter because the copytree hook made no sense to me. It is not a perfect substitute but perhaps a configurable build directory is nice on its own without having to satisfy all older arguments in favor of copytree. I think true in-place builds are the oddball (for example 2to3 or any build where sources have the same name as outputs needs a build directory to put the translated .py files, otherwise it would overwrite the source). What people think of as in-place builds in distutils are usually just builds into the default build directory.
That's not the interesting part, it doesn't matter if a build is done in build/lib*/etc inside the repo or outside, what matters is that the final build artifacts are placed back in the source tree. So a C extension will have .c files in the tree, and after an inplace build it will have .c and .so (but no .o !). Ralf
On Jul 14, 2017 5:30 PM, "Ralf Gommers" <ralf.gommers@gmail.com> wrote: On Sat, Jul 15, 2017 at 9:31 AM, Daniel Holth <dholth@gmail.com> wrote:
I proposed the build directory parameter because the copytree hook made no sense to me. It is not a perfect substitute but perhaps a configurable build directory is nice on its own without having to satisfy all older arguments in favor of copytree. I think true in-place builds are the oddball (for example 2to3 or any build where sources have the same name as outputs needs a build directory to put the translated .py files, otherwise it would overwrite the source). What people think of as in-place builds in distutils are usually just builds into the default build directory.
That's not the interesting part, it doesn't matter if a build is done in build/lib*/etc inside the repo or outside, what matters is that the final build artifacts are placed back in the source tree. So a C extension will have .c files in the tree, and after an inplace build it will have .c and .so (but no .o !). Ugggh well this might be one of our problems right here.... I had totally forgotten about that sense of "in place build". In case this has been a source of confusion: There's one thing called an "in place build", like `setup.py build -i`, where the idea is to convert the working source tree itself into something you can put on your python path. It's related to doing "development" or "editable" installs like `setup.py develop` or `pip install -e`. As far as I know, absolutely everyone is happy to defer all discussion of this to a future PEP. And then there's another thing called an "in place build", as opposed to an "out of place" build. In this case the distinction is just that an in-place build stores intermediate build artifacts somewhere inside the source tree, so that future builds using that source tree can take advantage of them, while an out-of-place build stores them in some other designated directory. This is the distinction I've been thinking of in all these emails. -n
On 15 July 2017 at 09:16, Nathaniel Smith <njs@pobox.com> wrote:
There's one thing called an "in place build", like `setup.py build -i`, where the idea is to convert the working source tree itself into something you can put on your python path. It's related to doing "development" or "editable" installs like `setup.py develop` or `pip install -e`. As far as I know, absolutely everyone is happy to defer all discussion of this to a future PEP.
Or more likely, that's something that should be completely managed by the backend, and frontends like pip don't offer any sort of interface to it. But as you say, a discussion for another time.
And then there's another thing called an "in place build", as opposed to an "out of place" build. In this case the distinction is just that an in-place build stores intermediate build artifacts somewhere inside the source tree, so that future builds using that source tree can take advantage of them, while an out-of-place build stores them in some other designated directory.
... where the backend can still manage to take advantage of them if it so chooses (and is able to - but so far no-one has a concrete example of where that's not possible). Paul
On 15 July 2017 at 18:45, Paul Moore <p.f.moore@gmail.com> wrote:
On 15 July 2017 at 09:16, Nathaniel Smith <njs@pobox.com> wrote:
And then there's another thing called an "in place build", as opposed to an "out of place" build. In this case the distinction is just that an in-place build stores intermediate build artifacts somewhere inside the source tree, so that future builds using that source tree can take advantage of them, while an out-of-place build stores them in some other designated directory.
... where the backend can still manage to take advantage of them if it so chooses (and is able to - but so far no-one has a concrete example of where that's not possible).
The path I took for adding out-of-tree build support to the example backend in PEP 517 was to have it call its own sdist building machinery and then unpack that archive into the build directory: https://github.com/python/peps/pull/310/files (There are also some spec clarifications in that PR that arose from updating the example to implement the latest version of the API). The difference between this and the previous "prepare_build_directory" hook is that it puts the decision on how to handle directory preparation entirely in the hands of the backend: - for our naive example backend, build_sdist essentially can't fail, so there's no problem in using it for build preparation - for a backend like flit, where build_sdist *can* fail, but out-of-tree builds can be supported a different way, then flit can just handle it - for a backend like enscons, the request for an out-of-tree build can be delegated to the underlying build system Regardless of which of those is the case, the frontend is relying on the backend to have implemented out of tree build support properly. Given this arrangement, I'd actually encourage pip to start out by only using the "sdist -> wheel -> install" path for legacy source trees, and (at least initially) rely entirely on out-of-tree build support in the backend for PEP 517 source trees. Only if pip was regularly getting bug reports that turned out to be due to misbehaving out-of-tree build support in backends would it switch to first attempting the "sdist -> wheel -> install" path even for PEP 517 trees. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 15 July 2017 at 07:31, Daniel Holth <dholth@gmail.com> wrote:
I proposed the build directory parameter because the copytree hook made no sense to me. It is not a perfect substitute but perhaps a configurable build directory is nice on its own without having to satisfy all older arguments in favor of copytree. I think true in-place builds are the oddball (for example 2to3 or any build where sources have the same name as outputs needs a build directory to put the translated .py files, otherwise it would overwrite the source). What people think of as in-place builds in distutils are usually just builds into the default build directory.
IIRC the main reason we are worried sdist generation would fail is when you are trying to do new development from an unpacked sdist. Suggest not worrying that this use case is a perfect experience. It's good enough for quick patching, but if you need to do a new source release then you probably have time to meet the requirements of your build backend.
This reminds me of something I should have mentioned on list, but made the mistake of assuming would be equally clear to everyone else: the last time that Thomas elaborated on his concerns about VCS metadata potentially being unavailable, I realised there's a *very* common build use case where we can safely assume that metadata will be missing. Specifically: source builds in systems like RPM, deb, and conda. Those start with a release tarball, *not* a VCS checkout, and while they'll likely start with the sdist rather than a raw tarball for most Python projects, that *won't* be the case for projects that include generated artifacts like Cython output files in their sdists. So to handle that kind of scenario under PEP 517, we're going to *need* wheel builds to work even when sdist builds would fail due to a lack of VCS metadata. That doesn't specifically say anything about in-tree vs out-of-tree build support (aside from the fact that Debian's packaging policies call for out-of-tree builds), but it does mean that having the backend API *only* support the "source tree -> sdist -> wheel" build path is genuinely insufficient for our full array of use cases. However, backends that *don't* have any special requirements for creating sdists will still be free to implement their out-of-tree build support that way, which is what I'm proposing we do for the example backend in the PEP: https://github.com/python/peps/pull/310/files Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 14 July 2017 at 19:24, Nathaniel Smith <njs@pobox.com> wrote:
- pip doesn't trust build systems to properly support incremental builds, so it wants to force it to throw away the build artifacts after every build
It's less that, and more pip wanting to ensure that the default publisher experience is reasonable close to the default end user experience, in order to increase the likelihood that publishers ship working sdists and wheel files, even if they haven't learned about the full suite of available pre-release testing tools yet (or have chosen not to use them).
This is exactly the opposite of what Paul says in his message posted at about the same time as yours... AFAICT his argument is that build artifact leakage is exactly what pip is worried about, and if anything he's annoyed at me because he thinks I'm downplaying the problem :-). (My impression is that Donald's position might be closer to what you wrote, though.)
No, you've misunderstood my point completely. What I was trying to say is *exactly* the same as what Nick said - we're just expressing it in different terms. I'm not sure how we can resolve this - from my perspective, everyone else[1] is in agreement with the latest revision of the PEP. You have some concerns, but I'm failing to understand what they are, except that they seem to be based on a confusion about what "the rest of us" want (I'm not trying to frame this as an "us against you" argument, just trying to point out that you seem to be seeing conflicts in our position that the rest of us don't). I'd really like to understand your specific concerns, but we seem to be spending too much time talking at cross purposes at the moment, and not making much progress. [1] Donald's still thinking about it, but I'm fairly sure the proposal aligns with what he's after, at least in broad terms. So, maybe I can address some points I *think* you're making, and we can try to understand each other that way. 1. Incremental builds. My understanding is that the current proposal says *nothing* about incremental builds. There's nothing stopping backends supporting them. On the other hand, pip at the moment doesn't have a way for the user to request an incremental build, or request a complete rebuild. As far as I know, there's nothing in pip's documented behaviour that even suggests whether builds are incremental (that's not to say we'll break what support there is arbitrarily, just that it's not formalised). Solving the question of incremental builds was, as far as I recall, agreed by everyone as something we'd defer to a future PEP. 2. The need for out of place builds. Pip needs them - to solve a certain class of issue that we've seen in real bug reports. You don't seem convinced that we do need them, but I'm not sure what else I can say here. You also seem concerned that if we get out of place builds, that means that pip will use them to break functionality that you need (specifically, incremental builds, as far as I can see). All I can say to that is that you seem very unwilling to trust the pip developers to take care to support our users' use cases. We're not going to arbitrarily break users' workflows - and if you don't believe that I can't convince you otherwise. 3. You are concerned that having to support in-place and out-of-place builds is a significant burden for backends. Thomas has said it's OK for flit (it's not a good fit, but he can support it). Daniel has confirmed enscons can handle it. Nick has surveyed a number of other build systems (that don't currently have backends, but could) and sees no major problems. Can you give any concrete example of a case where a realistic backend would find the requirement a burden? (Please don't claim incremental builds as the reason - the PEP explicitly says that backends can cache data from previous builds in the build_dir, which allows incremental builds if the backend wants them). The key point about *all* of the proposals we've had (build via sdist, have a "prepare a build directory" hook, and the build directory parameter) is to ensure that the wheels that are built when the frontend requests them are the same as those that would be obtained if the user were to build a sdist and then install from that. That's not a case of "not trusting the backend", or "protecting against a bug in the frontend or backend" - it's about making sure the *developer* didn't make a mistake when modifying his project. That class of mistake *does* happen - we've seen it reported by our end users. To address some of your specific comments (apologies if I misunderstand anything, it's just because I don't follow your basic position at all, so things that seem obvious to you are confusing me badly).
You're trying to shut me down by making these pronouncements about what PEP 517 is going to be, and yet you don't seem to even understand what requirements people are talking about. I'm sorry if that's rude, and I'm not angry at you personally or anything, but I am frustrated and I don't understand why I seem to have to keep throwing myself in front of these ideas that seem so poorly motivated.
From my POV, you keep flagging incremental builds as the big issue. It's not a matter of not understanding your requirements, it's more that we all (and I *thought* that included you) agreed that incremental builds were out of scope for this PEP.
If you want to expand the scope of the PEP to include incremental builds you'll need to make that clear - but I think most people are too burned out to be sympathetic to that.
And that's been the main discovery of the last round of discussions - we'd been talking past each other, and what we actually wanted was for the backend interface to accurately model the in-place/out-of-tree discussion that has been common to most build systems at least since autotools (hence my catalog of them earlier in the thread).
This statement seems just... completely wrong to me. I can't see any way that the in-place/out-of-place distinction matches anything we were talking about earlier. Have we been reading the same threads? There was the whole thing about pip copying trees, of course, but that's a completely different thing. The motivation there is that they want to make sure that two builds don't accidentally influence each other.
No it's not. It's to make sure that *things that are present in the source directory but aren't specified as being part of the project* (so they won't appear in the sdist) don't influence the build.
If they don't trust the build system, then this requires copying to a new tree (either via copytree or sdist). If they do trust the build system, then there are other options -- you can do an in-place build, or have a flag saying "please 'make clean' before this build", or something. So out-of-place builds are either insufficient or unnecessary for what pip wants. Copying the source tree and out-of-place builds are completely different things with different use cases; they happen to both involve external directories, but that's mostly a false similarity I think.
This makes no sense to me, because you seem to have misunderstood the motivation behind out of place builds. (It's not about trusting the build system or wanting to do a clean non-incremental build). Agreed that copying the whole source tree and doing an out of place build are different. I don't think there's anyone suggesting "copy the whole source tree" satisfies the requirements we have?
And as for flit, it doesn't even have a distinction between in-place and out-of-place builds...
And yet Thomas has stated that he's OK with implementing this requirement of the PEP for flit - so you can't use flit as an example for your arguments.
Forcing all build systems to support both in-place builds and out-of-place builds adds substantial additional complexity, introduces multiple new failure modes, and AFAICT is not very well suited to the problems people have been worrying about.
Please provide a concrete example of a build backend for which this is true. We can't make any progress based on speculative assumptions about what backends will find hard.
1. What if backends get their out-of-tree support wrong? 2. What happens if re-using build directories isn't tested properly? 3. What happens if an upgrade breaks incremental builds? 4. Do we enforce not modifying the source directory?
These first 4 answers might be reasonable if it wasn't for the fact that the *entire alleged motivation* for this feature is to reduce the chance of accidental breakage. You can't justify it like that and then hand-wave away all the new potential sources of accidental breakage that it introduces...
As I've said, you seem to have misunderstood what the motivation for the feature. It's preserving the equivalence of build_wheel and building via sdist (in the face of possible developer errors in what they have stored in the source directory), not about protecting against implementation bugs in pip or backends, nor is it in any way related to incremental builds.
Or, well, that sounds pretty unusable, so maybe instead we would want to go with the alternative: "Backends MUST support free switching between in-place and out-of-place builds in the same directory" -- but that's a whole 'nother likely source of weird problems, and breaks the idea that this is something that existing build systems all support already.
All of the "weird problems" you're suggesting might exist seem to me to be cases where building a wheel directly would produce a different result from producing a sdist and then producing a wheel from that sdist. (Because that's essentially what switching from in-place to out-of-place *means*). Are you explicitly requiring that build systems should be allowed to do that? That it's OK for a build system to produce a sdist and a wheel that *won't work the same* when used for installations? I'm unable to believe that's what you're suggesting.
7. Why include in-place support in the API then?
Because some folks (including you) would like to include incremental build support, and because if we don't support it explicitly backends will still have to deal with the "wheel_directory == build_directory" case.
Out-of-tree builds as currently specified are required to handle incremental out-of-tree builds... maybe that's a mistake, but that's what the text says.
Wait. Are you now proposing that the current out-of-place build support is OK for you and you'd be happy to drop support for *in-place* builds???
I don't understand the second half of your sentence.
9. Why not wait and then add a new backend capability requirement later?
Waiting to add the requirement won't provide us with any more data than we already have, but may give backend implementors the impression they don't need to care about out-of-tree build support. This is also our first, last, and only chance to make out-of-tree build support a *mandatory* backend requirement that frontends can rely on - if we add it later, it will necessarily only be optional.
AFAICT making it optional is better for everyone, so not sure why that's seen as a bad thing.
Analysis: - If you're an eager backend dev who loves this stuff, you'll implement it anyway, so it doesn't matter whether it's optional - If you're a just-trying-to-hack-something-together backend dev maybe constrained by some ugly legacy build system, then you'd much rather it be optional because then you don't have to deal with hacking up some fake "out-of-tree build" using copytree, and maybe getting it wrong - If you're pip, then AFAICT you're more worried about lazy backend devs screwing things up than anything else, in which case you don't want them hacking up their own fake "out-of-tree build" using copytree, you want to take responsibility for that so you know that you get it right
If you're pip, you *can't implement this* because you need the backend to tell you what files are needed. Full tree copies are a significant performance problem, and don't actually deliver what we need anyway. And we can't handle it via "create a sdist" because we can't guarantee that the build_sdist hook won't fail in cases where build_wheel would have worked. The fact that pip can't do this without backend assistance is *precisely* the reason we need one of the solutions we've debated here. And the build_directory parameter (out of place builds) is the solution that the backend developers (Thomas and Daniel) have accepted as the best option. I hope I haven't misunderstood any of your points too badly. But if I have, just ignore my comments and focus on the initial part of my email. That's the key point anyway. Paul
Hi Paul, We seem to have some really fundamental miscommunication here; probably we should figure out what that is instead of continuing to talk past each other. As a wild guess... can you define what an "out-of-place build" means to you? For me, the distinction between an in-place and out-of-place build is, ... well, first some background to make sure my terminology is clear: build systems typically work by taking a source tree as input and executing a series of rules to generate intermediate artifacts and eventually the final artifacts. Commonly, as an optimization, they have some system for caching these intermediate artifacts, so that future builds can go faster (called "incremental builds"). However, this optimization is often heuristic-based and therefore introduces a risk: if the system re-uses a cached artifact that it should have rebuilt, then this can generate a broken build. There are two popular strategies for storing this cache, and this is what "in-place" versus "out-of-place" refers to. "In-place builds" have a single cache that's stored inside the source tree -- often, but not always, intermingled with the source files. So a classic 'make'-based build where you end up with .o files next to all your .c files is an in-place build, and so is 'python setup.py build' putting a bunch of .o files inside the build/ directory. "Out-of-place builds" instead place the cached artifacts into a designated separate directory. The advantage of this is that you can potentially work around limitations of the caching strategy by having multiple caches and switching between them. [In traditional build systems the build tree concept is also often intermingled with the idea of a "build configuration", like debug versus optimized builds and this changes the workflow in various -- but we don't have those and it's a whole extra set of complexity so let's ignore that.] Corollaries: - if you're starting with a pristine source tree, then "in-place" and "out-of-place" builds will produce exactly the same results, because they're running exactly the same rules. (This is why I'm confused about why you seem to be claiming that out-of-place builds will help developers avoid bugs that happen with in-place builds... they're exactly the same thing!) - if you've done an out-of-place build in a given tree, you can return to a pristine source tree by deleting the out-of-place directory and making a new one, without having to deal with the build backend. if you've done an in-place build in a given tree, then you need something like a "make clean" rule. But if you have that, then these are identical, which is why I said that it sounded like pip would be just as happy with a way to do a clean build. (I'm not saying that the spec necessarily needs a way to request a clean build -- this is just trying to understand what the space of options actually is.) - if you're starting with a pristine source tree, and your goal is to end up with a wheel *while keeping the original tree pristine*, then some options include: (a) doing a copytree + in-place build on the copy, like pip does now, (b) making an sdist and then doing an in-place build - if you're not starting with a pristine source tree -- like say the user went and did an in-place build here before invoking pip -- then you have very few good options. Copytree + in-place build will hopefully work, but there's a chance you'll pick up detritus from the previous build. Out-of-tree-builds might or might not work -- I've given several examples of extant build systems that explicitly disclaim any reliability in this case. Sdist + build the sdist is probably the most reliable option here, honestly, when it's an option. Does that make sense? Does it... help explain any of the ways we're talking past each other? -n On Fri, Jul 14, 2017 at 1:58 PM, Paul Moore <p.f.moore@gmail.com> wrote:
On 14 July 2017 at 19:24, Nathaniel Smith <njs@pobox.com> wrote:
- pip doesn't trust build systems to properly support incremental builds, so it wants to force it to throw away the build artifacts after every build
It's less that, and more pip wanting to ensure that the default publisher experience is reasonable close to the default end user experience, in order to increase the likelihood that publishers ship working sdists and wheel files, even if they haven't learned about the full suite of available pre-release testing tools yet (or have chosen not to use them).
This is exactly the opposite of what Paul says in his message posted at about the same time as yours... AFAICT his argument is that build artifact leakage is exactly what pip is worried about, and if anything he's annoyed at me because he thinks I'm downplaying the problem :-). (My impression is that Donald's position might be closer to what you wrote, though.)
No, you've misunderstood my point completely. What I was trying to say is *exactly* the same as what Nick said - we're just expressing it in different terms.
I'm not sure how we can resolve this - from my perspective, everyone else[1] is in agreement with the latest revision of the PEP. You have some concerns, but I'm failing to understand what they are, except that they seem to be based on a confusion about what "the rest of us" want (I'm not trying to frame this as an "us against you" argument, just trying to point out that you seem to be seeing conflicts in our position that the rest of us don't). I'd really like to understand your specific concerns, but we seem to be spending too much time talking at cross purposes at the moment, and not making much progress.
[1] Donald's still thinking about it, but I'm fairly sure the proposal aligns with what he's after, at least in broad terms.
So, maybe I can address some points I *think* you're making, and we can try to understand each other that way.
1. Incremental builds. My understanding is that the current proposal says *nothing* about incremental builds. There's nothing stopping backends supporting them. On the other hand, pip at the moment doesn't have a way for the user to request an incremental build, or request a complete rebuild. As far as I know, there's nothing in pip's documented behaviour that even suggests whether builds are incremental (that's not to say we'll break what support there is arbitrarily, just that it's not formalised). Solving the question of incremental builds was, as far as I recall, agreed by everyone as something we'd defer to a future PEP.
2. The need for out of place builds. Pip needs them - to solve a certain class of issue that we've seen in real bug reports. You don't seem convinced that we do need them, but I'm not sure what else I can say here. You also seem concerned that if we get out of place builds, that means that pip will use them to break functionality that you need (specifically, incremental builds, as far as I can see). All I can say to that is that you seem very unwilling to trust the pip developers to take care to support our users' use cases. We're not going to arbitrarily break users' workflows - and if you don't believe that I can't convince you otherwise.
3. You are concerned that having to support in-place and out-of-place builds is a significant burden for backends. Thomas has said it's OK for flit (it's not a good fit, but he can support it). Daniel has confirmed enscons can handle it. Nick has surveyed a number of other build systems (that don't currently have backends, but could) and sees no major problems. Can you give any concrete example of a case where a realistic backend would find the requirement a burden? (Please don't claim incremental builds as the reason - the PEP explicitly says that backends can cache data from previous builds in the build_dir, which allows incremental builds if the backend wants them).
The key point about *all* of the proposals we've had (build via sdist, have a "prepare a build directory" hook, and the build directory parameter) is to ensure that the wheels that are built when the frontend requests them are the same as those that would be obtained if the user were to build a sdist and then install from that. That's not a case of "not trusting the backend", or "protecting against a bug in the frontend or backend" - it's about making sure the *developer* didn't make a mistake when modifying his project. That class of mistake *does* happen - we've seen it reported by our end users.
To address some of your specific comments (apologies if I misunderstand anything, it's just because I don't follow your basic position at all, so things that seem obvious to you are confusing me badly).
You're trying to shut me down by making these pronouncements about what PEP 517 is going to be, and yet you don't seem to even understand what requirements people are talking about. I'm sorry if that's rude, and I'm not angry at you personally or anything, but I am frustrated and I don't understand why I seem to have to keep throwing myself in front of these ideas that seem so poorly motivated.
From my POV, you keep flagging incremental builds as the big issue. It's not a matter of not understanding your requirements, it's more that we all (and I *thought* that included you) agreed that incremental builds were out of scope for this PEP.
If you want to expand the scope of the PEP to include incremental builds you'll need to make that clear - but I think most people are too burned out to be sympathetic to that.
And that's been the main discovery of the last round of discussions - we'd been talking past each other, and what we actually wanted was for the backend interface to accurately model the in-place/out-of-tree discussion that has been common to most build systems at least since autotools (hence my catalog of them earlier in the thread).
This statement seems just... completely wrong to me. I can't see any way that the in-place/out-of-place distinction matches anything we were talking about earlier. Have we been reading the same threads? There was the whole thing about pip copying trees, of course, but that's a completely different thing. The motivation there is that they want to make sure that two builds don't accidentally influence each other.
No it's not. It's to make sure that *things that are present in the source directory but aren't specified as being part of the project* (so they won't appear in the sdist) don't influence the build.
If they don't trust the build system, then this requires copying to a new tree (either via copytree or sdist). If they do trust the build system, then there are other options -- you can do an in-place build, or have a flag saying "please 'make clean' before this build", or something. So out-of-place builds are either insufficient or unnecessary for what pip wants. Copying the source tree and out-of-place builds are completely different things with different use cases; they happen to both involve external directories, but that's mostly a false similarity I think.
This makes no sense to me, because you seem to have misunderstood the motivation behind out of place builds. (It's not about trusting the build system or wanting to do a clean non-incremental build).
Agreed that copying the whole source tree and doing an out of place build are different. I don't think there's anyone suggesting "copy the whole source tree" satisfies the requirements we have?
And as for flit, it doesn't even have a distinction between in-place and out-of-place builds...
And yet Thomas has stated that he's OK with implementing this requirement of the PEP for flit - so you can't use flit as an example for your arguments.
Forcing all build systems to support both in-place builds and out-of-place builds adds substantial additional complexity, introduces multiple new failure modes, and AFAICT is not very well suited to the problems people have been worrying about.
Please provide a concrete example of a build backend for which this is true. We can't make any progress based on speculative assumptions about what backends will find hard.
1. What if backends get their out-of-tree support wrong? 2. What happens if re-using build directories isn't tested properly? 3. What happens if an upgrade breaks incremental builds? 4. Do we enforce not modifying the source directory?
These first 4 answers might be reasonable if it wasn't for the fact that the *entire alleged motivation* for this feature is to reduce the chance of accidental breakage. You can't justify it like that and then hand-wave away all the new potential sources of accidental breakage that it introduces...
As I've said, you seem to have misunderstood what the motivation for the feature. It's preserving the equivalence of build_wheel and building via sdist (in the face of possible developer errors in what they have stored in the source directory), not about protecting against implementation bugs in pip or backends, nor is it in any way related to incremental builds.
Or, well, that sounds pretty unusable, so maybe instead we would want to go with the alternative: "Backends MUST support free switching between in-place and out-of-place builds in the same directory" -- but that's a whole 'nother likely source of weird problems, and breaks the idea that this is something that existing build systems all support already.
All of the "weird problems" you're suggesting might exist seem to me to be cases where building a wheel directly would produce a different result from producing a sdist and then producing a wheel from that sdist. (Because that's essentially what switching from in-place to out-of-place *means*).
Are you explicitly requiring that build systems should be allowed to do that? That it's OK for a build system to produce a sdist and a wheel that *won't work the same* when used for installations? I'm unable to believe that's what you're suggesting.
7. Why include in-place support in the API then?
Because some folks (including you) would like to include incremental build support, and because if we don't support it explicitly backends will still have to deal with the "wheel_directory == build_directory" case.
Out-of-tree builds as currently specified are required to handle incremental out-of-tree builds... maybe that's a mistake, but that's what the text says.
Wait. Are you now proposing that the current out-of-place build support is OK for you and you'd be happy to drop support for *in-place* builds???
I don't understand the second half of your sentence.
9. Why not wait and then add a new backend capability requirement later?
Waiting to add the requirement won't provide us with any more data than we already have, but may give backend implementors the impression they don't need to care about out-of-tree build support. This is also our first, last, and only chance to make out-of-tree build support a *mandatory* backend requirement that frontends can rely on - if we add it later, it will necessarily only be optional.
AFAICT making it optional is better for everyone, so not sure why that's seen as a bad thing.
Analysis: - If you're an eager backend dev who loves this stuff, you'll implement it anyway, so it doesn't matter whether it's optional - If you're a just-trying-to-hack-something-together backend dev maybe constrained by some ugly legacy build system, then you'd much rather it be optional because then you don't have to deal with hacking up some fake "out-of-tree build" using copytree, and maybe getting it wrong - If you're pip, then AFAICT you're more worried about lazy backend devs screwing things up than anything else, in which case you don't want them hacking up their own fake "out-of-tree build" using copytree, you want to take responsibility for that so you know that you get it right
If you're pip, you *can't implement this* because you need the backend to tell you what files are needed. Full tree copies are a significant performance problem, and don't actually deliver what we need anyway. And we can't handle it via "create a sdist" because we can't guarantee that the build_sdist hook won't fail in cases where build_wheel would have worked.
The fact that pip can't do this without backend assistance is *precisely* the reason we need one of the solutions we've debated here. And the build_directory parameter (out of place builds) is the solution that the backend developers (Thomas and Daniel) have accepted as the best option.
I hope I haven't misunderstood any of your points too badly. But if I have, just ignore my comments and focus on the initial part of my email. That's the key point anyway.
Paul
-- Nathaniel J. Smith -- https://vorpus.org
On 15 July 2017 at 19:42, Nathaniel Smith <njs@pobox.com> wrote:
Does that make sense? Does it... help explain any of the ways we're talking past each other?
I don't think you're talking past each other, I think you're explaining why Paul's preferred build strategy is for pip to always try creating and unpacking an sdist first, and only fall back to requesting an out-of-tree build from the backend if building the sdist fails (for example, due to VCS metadata or command line tools being missing). (Which does make it clear that my hope that pip might be able to avoid that outcome for the PEP 517 case isn't going to be realised). Requesting an out-of-tree wheel build is then just a way for a frontend to say to the backend "Hey, please build the wheel *as if* you'd exported an sdist and then built that, even if you can't actually export an sdist right now". I think the disconnect may be happening because you seem to think we're looking for an unattainable ideal and are going to be disappointed when we don't achieve it: a build system that is guaranteed to never fail. That's simply not possible, and hence not what we're aiming for. Instead, we have the significantly more modest goal of defining a build system where if a build works for a publisher, it will *probably* also work for their end users, *even if* the publisher only tests the "pip install ." case prior to pushing their sdist and wheel archives to PyPI, and even if the end user has unrelated junk in their source directory when doing the build. It's OK if there are still ways for people (both publishers and end users) to break their builds, as long as those breakages aren't arising from the *default* workflows, and are instead due to the specifics of how a project is using their chosen backend build system. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 15 July 2017 at 10:42, Nathaniel Smith <njs@pobox.com> wrote:
Hi Paul,
We seem to have some really fundamental miscommunication here; probably we should figure out what that is instead of continuing to talk past each other.
Agreed. Thanks for summarising your understanding. Let's see if I can clarify what I'm saying.
As a wild guess... can you define what an "out-of-place build" means to you?
I'm going to do this without reference to your explanation, as trying to put things in the context of what you say is where I'm getting messed up. I'll comment on your explanations below. Again, I'll start with some background. My concern is where we're trying to deal with a user doing "pip install ." on their development directory. This is not a core use case for pip, and honestly I don't feel it's the right workflow (people wanting to work on code and install it as they go along should be using editable installs, IMO), but it is something we see people doing, and I don't want to break that workflow arbitrarily. Given that this is the case we're talking about, my experience is that working directories contain all sorts of clutter - small test files I knocked up, experimental changes I discarded, etc. That may simply reflect the way I work, but comments I've seen indicate that I'm not *completely* alone. So for me, the point here is about making sure that "pip install ." in a cluttered working directory results in "what the developer wants". For me the key property I'm looking for is that the developer gets consistent results for the build commands (i.e., build_wheel and build_sdist->build a wheel give the same wheel). This is important for a number of reasons - to avoid publishing errors where the developer builds a wheel and a sdist and deploys them, to ensure that tox (which uses sdists) gives the same results as manually running the tests, etc. In one of your posts you characterised the sorts of discrepancies I'm trying to avoid as "weird errors" and that's precisely the point - we get confused users raising issues and I want to avoid that happening. So, with that in mind, the distinction between an "in place" and an "out of place" build is that an in-place build simply trusts that the developer's directory will deliver consistent results, whereas an out-of-place build does the build in a separate location that we've asked the backend to ensure doesn't contain unexpected files. It has nothing to do with repeated builds (but see below).
For me, the distinction between an in-place and out-of-place build is, ... well, first some background to make sure my terminology is clear: build systems typically work by taking a source tree as input and executing a series of rules to generate intermediate artifacts and eventually the final artifacts. Commonly, as an optimization, they have some system for caching these intermediate artifacts, so that future builds can go faster (called "incremental builds"). However, this optimization is often heuristic-based and therefore introduces a risk: if the system re-uses a cached artifact that it should have rebuilt, then this can generate a broken build.
There are two popular strategies for storing this cache, and this is what "in-place" versus "out-of-place" refers to.
"In-place builds" have a single cache that's stored inside the source tree -- often, but not always, intermingled with the source files. So a classic 'make'-based build where you end up with .o files next to all your .c files is an in-place build, and so is 'python setup.py build' putting a bunch of .o files inside the build/ directory.
"Out-of-place builds" instead place the cached artifacts into a designated separate directory. The advantage of this is that you can potentially work around limitations of the caching strategy by having multiple caches and switching between them.
[In traditional build systems the build tree concept is also often intermingled with the idea of a "build configuration", like debug versus optimized builds and this changes the workflow in various -- but we don't have those and it's a whole extra set of complexity so let's ignore that.]
For me, all of the above comes under the heading of "incremental builds", and I'm considering that out of scope. Specifically, pip's current behaviour offers no (documented) means of choosing between incremental or clean builds, and users who want that level of control should be building with the backend tools (setuptools) directly, and only using pip for the install step once a wheel has been built. If and when we discuss a UI in pip for requesting incremental or clean builds, we'd look at the implications on the backend hooks at that point - but I'm not sure that'll ever be something we want to do, as it seems like that should probably always be a use case that we'd want users to be working directly with the backend for (but that's just my opinion).
Corollaries:
- if you're starting with a pristine source tree, then "in-place" and "out-of-place" builds will produce exactly the same results, because they're running exactly the same rules. (This is why I'm confused about why you seem to be claiming that out-of-place builds will help developers avoid bugs that happen with in-place builds... they're exactly the same thing!)
Agreed, but I'm concerned about build trees that *aren't* "pristine", insofar as they are working directories for development. All of your corollaries depend on the idea that you have a "pristine" build tree, and that's where our confusion lies, I suspect.
- if you've done an out-of-place build in a given tree, you can return to a pristine source tree by deleting the out-of-place directory and making a new one, without having to deal with the build backend. if you've done an in-place build in a given tree, then you need something like a "make clean" rule. But if you have that, then these are identical, which is why I said that it sounded like pip would be just as happy with a way to do a clean build. (I'm not saying that the spec necessarily needs a way to request a clean build -- this is just trying to understand what the space of options actually is.)
Again, agreed but irrelevant, as "pristine" is not the case that concerns me.
- if you're starting with a pristine source tree, and your goal is to end up with a wheel *while keeping the original tree pristine*, then some options include: (a) doing a copytree + in-place build on the copy, like pip does now, (b) making an sdist and then doing an in-place build
Same again
- if you're not starting with a pristine source tree -- like say the user went and did an in-place build here before invoking pip -- then you have very few good options. Copytree + in-place build will hopefully work, but there's a chance you'll pick up detritus from the previous build. Out-of-tree-builds might or might not work -- I've given several examples of extant build systems that explicitly disclaim any reliability in this case. Sdist + build the sdist is probably the most reliable option here, honestly, when it's an option.
Your idea of a "not pristine" tree differs from mine - having done an in-place build is the most innocuous example of a non-pristine tree as far as I'm concerned, and the easiest to deal with (make clean). * Copytree is certain *not* to work, because it copies all the things that make the tree not pristine. * Build sdist and unpack is pip's current planned approach, but Thomas had issues with guaranteeing that building a sdist was guaranteed possible. We do *not* want to have cases where pip can't build a wheel even though build_wheel would have worked, which means build sdist and unpack is a problem. * Ask the backend to make a "clean" directory would work (the backend should know what it needs) - that was the prepare_directory hooks. But that got too complex. * Tell the backend we want a build that's isolated from the source directory and trust it to do the right thing is where we've currently ended up. Based on the current discussion, however, I now have concerns that either a) Backend developers might not understand what build_directory is requesting, or b) The PEP doesn't define the semantics of build_directory in a way that delivers the results I'm suggesting here Having had this discussion, and re-read the current draft of the PEP, I do in fact think that (b) is the case. That worries me, because I don't think it's just me that had made that mistake. Nick has just posted a message saying
Requesting an out-of-tree wheel build is then just a way for a frontend to say to the backend "Hey, please build the wheel *as if* you'd exported an sdist and then built that, even if you can't actually export an sdist right now".
which is exactly what I'd expected. But the PEP doesn't say that. Specifically, in the PEP:
When a build_directory is provided, the backend should not create or modify any files in the source directory (the working directory where the hook is called). If the backend cannot reliably avoid modifying the directory it builds from, it should copy any files it needs to build_directory and perform the build there.
The statement "it should copy any files it needs" is correct (but more subtle than it looks - it doesn't emphasise that the backend must not copy files it *doesn't* need, i.e., the developer clutter I'm concerned about). But the statement about "If the backend cannot reliably avoid modifying the directory it builds from" is misleading - the reason has *nothing* to do with whether backends can modify the source directory, and everything to do with whether backends can reasonably guarantee that there's nothing that would cause inconsistencies. One particularly frustrating aspect of this discussion is that the worst offender for "wheel and sdist are inconsistent" is the way that setuptools requires developers to specify build and sdist contents separately (setup.py vs MANIFEST.in). That duplication is an obvious source of potential inconsistencies, and precisely why we get most of the reports we see. Ideally, new backends would not design in such inconsistency[1], which means it's easy to see such inconsistencies as "that should never happen" or "I don't understand the problem". But we will have to deal with the possibility of such backends, and the setuptools model isn't *that* unusual (setuptools didn't invent the file MANIFEST.in, it just reused the name for its own purpose). [1] I don't know enough about flit to be sure, but if the developer forgets to check in a new source file, would it be possible for that source file be in the wheel but not in the sdist?
Does that make sense? Does it... help explain any of the ways we're talking past each other?
It does, a lot. Thanks. Paul
On 15 July 2017 at 20:54, Paul Moore <p.f.moore@gmail.com> wrote:
Based on the current discussion, however, I now have concerns that either
a) Backend developers might not understand what build_directory is requesting, or b) The PEP doesn't define the semantics of build_directory in a way that delivers the results I'm suggesting here
Having had this discussion, and re-read the current draft of the PEP, I do in fact think that (b) is the case. That worries me, because I don't think it's just me that had made that mistake. Nick has just posted a message saying
Requesting an out-of-tree wheel build is then just a way for a frontend to say to the backend "Hey, please build the wheel *as if* you'd exported an sdist and then built that, even if you can't actually export an sdist right now".
which is exactly what I'd expected. But the PEP doesn't say that. Specifically, in the PEP:
When a build_directory is provided, the backend should not create or modify any files in the source directory (the working directory where the hook is called). If the backend cannot reliably avoid modifying the directory it builds from, it should copy any files it needs to build_directory and perform the build there.
The statement "it should copy any files it needs" is correct (but more subtle than it looks - it doesn't emphasise that the backend must not copy files it *doesn't* need, i.e., the developer clutter I'm concerned about). But the statement about "If the backend cannot reliably avoid modifying the directory it builds from" is misleading - the reason has *nothing* to do with whether backends can modify the source directory, and everything to do with whether backends can reasonably guarantee that there's nothing that would cause inconsistencies.
This is a fair concern, so I've updated that section in the PR where I'm working on bringing the example into line with the current specification: https://github.com/python/peps/pull/310/commits/49968595aa97c4ba8d621204a357... That commit also includes a fix to get the example to correctly handle repeated use of a common build directory by removing any previously extracted files. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, Jul 15, 2017, at 12:54 PM, Paul Moore wrote:
Given that this is the case we're talking about, my experience is that working directories contain all sorts of clutter - small test files I knocked up, experimental changes I discarded, etc. That may simply reflect the way I work, but comments I've seen indicate that I'm not *completely* alone. So for me, the point here is about making sure that "pip install ." in a cluttered working directory results in "what the developer wants".
If those 'clutter' files are within the package directory (i.e. next to __init__.py), and we have to do a direct wheel build because VCS information is not available (not a VCS checkout, or VCS not on PATH), then I don't know how flit can avoid installing them, under any of the proposed isolation mechanisms. We can't extract a 'pristine' source tree from a non-pristine one without a list of what belongs in the pristine one - which flit gets from the VCS. distutils/setuptools mitigates this to some extent by having you explicitly specify package_data, but between globbing and the automatic inclusion of .py files, it's imprecise. And this is exactly the sort of developer annoyance I want to get away from. So I don't see a good way to avoid picking up existing clutter when you install from source. I think the main purpose of isolating the build is to avoid generating new clutter in the source directory.
[1] I don't know enough about flit to be sure, but if the developer forgets to check in a new source file, would it be possible for that source file be in the wheel but not in the sdist?
For now, flit's protection against this is that it will only build an sdist if all files in the cwd have either been added to the VCS or ignored. So it's possible to exclude a .py file from the sdist by vcs-ignoring it, but you can't get there just by forgetting to check in a file. Thomas
On 15 July 2017 at 23:06, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
So I don't see a good way to avoid picking up existing clutter when you install from source. I think the main purpose of isolating the build is to avoid generating new clutter in the source directory.
Picking up clutter in general isn't the problem, so much as picking up clutter that a build via sdist *wouldn't* pick up. For flit, there are two main cases of interest: - publisher development environments - building from release tarballs rather than VCS clones In the first case, the typical scenario will have both VCS metadata *and* VCS tools available, so pip will successfully use the sdist->wheel path, and clutter will be avoided in the end result. In the second case, building the sdist will fail, *but* the source directory is unlikely to have clutter in it, so a naive tree copy on flit's side of things will still count as "good enough" If flit's build_wheel *also* refuses to run in a "dirty" VCS tree (i.e. uncommitted files that a naive copy would pick up, but building an sdist wouldn't) when an out of tree build is requested, then the odds of folks generating accidentally inconsistent build artifacts drop even further (keeping in mind that the goal here isn't to drop the change of such discrepancies to zero - it's just to make that chance substantially lower by default than that are today with routine discrepancies between MANIFEST.in and setup.py). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, Jul 15, 2017, at 03:28 PM, Nick Coghlan wrote:
For flit, there are two main cases of interest:
- publisher development environments - building from release tarballs rather than VCS clones
In the first case, the typical scenario will have both VCS metadata *and* VCS tools available, so pip will successfully use the sdist->wheel path, and clutter will be avoided in the end result.
I agree that this will typically be true, so long as we remember that there in a significant minority of cases in which it will not. This includes Windows developer environments where git is used but not on PATH, and installing from a directory bind-mounted in a docker container.
If flit's build_wheel *also* refuses to run in a "dirty" VCS tree
I'm reluctant for build_wheel to behave differently depending on whether VCS metadata is available - more code, more confusion, more chances for things to go wrong. I can see that it would reduce the likelihood of these kinds of problem, but I suspect that it would increase the likelihood of other kinds of problem which we haven't even discovered yet. Thomas
On 15 July 2017 at 14:06, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
If those 'clutter' files are within the package directory (i.e. next to __init__.py), and we have to do a direct wheel build because VCS information is not available (not a VCS checkout, or VCS not on PATH), then I don't know how flit can avoid installing them, under any of the proposed isolation mechanisms. We can't extract a 'pristine' source tree from a non-pristine one without a list of what belongs in the pristine one - which flit gets from the VCS.
Agreed. And I'm perfectly OK with a solution that reduces the odds of issues rather than (futilely) trying to eliminate them totally, so I'm happy to live with this. Paul
On 16 July 2017 at 00:15, Paul Moore <p.f.moore@gmail.com> wrote:
On 15 July 2017 at 14:06, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
If those 'clutter' files are within the package directory (i.e. next to __init__.py), and we have to do a direct wheel build because VCS information is not available (not a VCS checkout, or VCS not on PATH), then I don't know how flit can avoid installing them, under any of the proposed isolation mechanisms. We can't extract a 'pristine' source tree from a non-pristine one without a list of what belongs in the pristine one - which flit gets from the VCS.
Agreed. And I'm perfectly OK with a solution that reduces the odds of issues rather than (futilely) trying to eliminate them totally, so I'm happy to live with this.
Right, and the norms around what's reasonable and what's problematic will be resolved in the place where it makes sense to do so: between backend developers and their users. Frontend developers would only need to get involved *if* they see a pattern where *their* users are reporting problems, and those problems end up being consistently traced back to how a particular backend is handling requests for out-of-tree builds. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, Jul 15, 2017 at 3:54 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 15 July 2017 at 10:42, Nathaniel Smith <njs@pobox.com> wrote:
Hi Paul,
We seem to have some really fundamental miscommunication here; probably we should figure out what that is instead of continuing to talk past each other.
Agreed. Thanks for summarising your understanding. Let's see if I can clarify what I'm saying.
Thanks for your patience!
As a wild guess... can you define what an "out-of-place build" means to you?
I'm going to do this without reference to your explanation, as trying to put things in the context of what you say is where I'm getting messed up. I'll comment on your explanations below.
Again, I'll start with some background. My concern is where we're trying to deal with a user doing "pip install ." on their development directory. This is not a core use case for pip, and honestly I don't feel it's the right workflow (people wanting to work on code and install it as they go along should be using editable installs, IMO), but it is something we see people doing, and I don't want to break that workflow arbitrarily. Given that this is the case we're talking about, my experience is that working directories contain all sorts of clutter - small test files I knocked up, experimental changes I discarded, etc. That may simply reflect the way I work, but comments I've seen indicate that I'm not *completely* alone. So for me, the point here is about making sure that "pip install ." in a cluttered working directory results in "what the developer wants".
Oh yeah, I end up with all kinds of junk in my working directories too.
For me the key property I'm looking for is that the developer gets consistent results for the build commands (i.e., build_wheel and build_sdist->build a wheel give the same wheel). This is important for a number of reasons - to avoid publishing errors where the developer builds a wheel and a sdist and deploys them, to ensure that tox (which uses sdists) gives the same results as manually running the tests, etc. In one of your posts you characterised the sorts of discrepancies I'm trying to avoid as "weird errors" and that's precisely the point - we get confused users raising issues and I want to avoid that happening.
Right.
So, with that in mind, the distinction between an "in place" and an "out of place" build is that an in-place build simply trusts that the developer's directory will deliver consistent results, whereas an out-of-place build does the build in a separate location that we've asked the backend to ensure doesn't contain unexpected files. It has nothing to do with repeated builds (but see below).
...so this is where we diverge. As far as I understand it -- and I'm pretty sure this matches all the major build systems like automake, cmake, etc. -- the *only* difference between an in-place and out-of-place build is where they cache the intermediate artifacts. So if a build system is, say, scanning the source tree looking for stuff to build... well, the source tree is the same either way, so if they're going to pick up random junk in one case, they'll do it in the other case as well. In fact I think they'd consider it a bug if they didn't. Or contrariwise, if a build system is smart enough to recognize that some files are junk and some are not, then it doesn't matter where it's putting the intermediate files, it'll generate good results either way. Or for a more specific example: setuptools has the unfortunate distinction between sdist mode that uses MANIFEST.in, and bdist mode that uses setup.py, and skew between these is a frequent source of problems. But IIUC the only way to exercise the MANIFEST.in path is by actually generating an sdist. You can do in-place builds (the default), or out-of-place builds ('build -b some/dir'), but if one is broken then the other is broken as well.
For me, the distinction between an in-place and out-of-place build is, ... well, first some background to make sure my terminology is clear: build systems typically work by taking a source tree as input and executing a series of rules to generate intermediate artifacts and eventually the final artifacts. Commonly, as an optimization, they have some system for caching these intermediate artifacts, so that future builds can go faster (called "incremental builds"). However, this optimization is often heuristic-based and therefore introduces a risk: if the system re-uses a cached artifact that it should have rebuilt, then this can generate a broken build.
There are two popular strategies for storing this cache, and this is what "in-place" versus "out-of-place" refers to.
"In-place builds" have a single cache that's stored inside the source tree -- often, but not always, intermingled with the source files. So a classic 'make'-based build where you end up with .o files next to all your .c files is an in-place build, and so is 'python setup.py build' putting a bunch of .o files inside the build/ directory.
"Out-of-place builds" instead place the cached artifacts into a designated separate directory. The advantage of this is that you can potentially work around limitations of the caching strategy by having multiple caches and switching between them.
[In traditional build systems the build tree concept is also often intermingled with the idea of a "build configuration", like debug versus optimized builds and this changes the workflow in various -- but we don't have those and it's a whole extra set of complexity so let's ignore that.]
For me, all of the above comes under the heading of "incremental builds", and I'm considering that out of scope. Specifically, pip's current behaviour offers no (documented) means of choosing between incremental or clean builds, and users who want that level of control should be building with the backend tools (setuptools) directly, and only using pip for the install step once a wheel has been built.
If and when we discuss a UI in pip for requesting incremental or clean builds, we'd look at the implications on the backend hooks at that point - but I'm not sure that'll ever be something we want to do, as it seems like that should probably always be a use case that we'd want users to be working directly with the backend for (but that's just my opinion).
Well, so this is part of what's been making me so confused :-). AFAICT the only reason to care about in-place versus out-of-place builds in PEP 517 is if we want to provide explicit control over incremental builds, which seems a weird place to be spending our complexity budget to me too.
Corollaries:
- if you're starting with a pristine source tree, then "in-place" and "out-of-place" builds will produce exactly the same results, because they're running exactly the same rules. (This is why I'm confused about why you seem to be claiming that out-of-place builds will help developers avoid bugs that happen with in-place builds... they're exactly the same thing!)
Agreed, but I'm concerned about build trees that *aren't* "pristine", insofar as they are working directories for development. All of your corollaries depend on the idea that you have a "pristine" build tree, and that's where our confusion lies, I suspect.
Ah, right -- when I said "pristine" here I was implicitly thinking "pristine with respect to intermediate build artifacts", since as far as I know the other kind of pristine is orthogonal to this whole discussion.
- if you've done an out-of-place build in a given tree, you can return to a pristine source tree by deleting the out-of-place directory and making a new one, without having to deal with the build backend. if you've done an in-place build in a given tree, then you need something like a "make clean" rule. But if you have that, then these are identical, which is why I said that it sounded like pip would be just as happy with a way to do a clean build. (I'm not saying that the spec necessarily needs a way to request a clean build -- this is just trying to understand what the space of options actually is.)
Again, agreed but irrelevant, as "pristine" is not the case that concerns me.
- if you're starting with a pristine source tree, and your goal is to end up with a wheel *while keeping the original tree pristine*, then some options include: (a) doing a copytree + in-place build on the copy, like pip does now, (b) making an sdist and then doing an in-place build
Same again
- if you're not starting with a pristine source tree -- like say the user went and did an in-place build here before invoking pip -- then you have very few good options. Copytree + in-place build will hopefully work, but there's a chance you'll pick up detritus from the previous build. Out-of-tree-builds might or might not work -- I've given several examples of extant build systems that explicitly disclaim any reliability in this case. Sdist + build the sdist is probably the most reliable option here, honestly, when it's an option.
Your idea of a "not pristine" tree differs from mine - having done an in-place build is the most innocuous example of a non-pristine tree as far as I'm concerned, and the easiest to deal with (make clean).
* Copytree is certain *not* to work, because it copies all the things that make the tree not pristine. * Build sdist and unpack is pip's current planned approach, but Thomas had issues with guaranteeing that building a sdist was guaranteed possible. We do *not* want to have cases where pip can't build a wheel even though build_wheel would have worked, which means build sdist and unpack is a problem.
Right -- the idea in my simplified proposal is to expose just enough to give frontends the flexibility to support both of these, plus fallback from the latter to the former when necessary, so it's at least no worse than what we have now.
* Ask the backend to make a "clean" directory would work (the backend should know what it needs) - that was the prepare_directory hooks. But that got too complex. * Tell the backend we want a build that's isolated from the source directory and trust it to do the right thing is where we've currently ended up.
Based on the current discussion, however, I now have concerns that either
a) Backend developers might not understand what build_directory is requesting, or b) The PEP doesn't define the semantics of build_directory in a way that delivers the results I'm suggesting here
Having had this discussion, and re-read the current draft of the PEP, I do in fact think that (b) is the case. That worries me, because I don't think it's just me that had made that mistake. Nick has just posted a message saying
Requesting an out-of-tree wheel build is then just a way for a frontend to say to the backend "Hey, please build the wheel *as if* you'd exported an sdist and then built that, even if you can't actually export an sdist right now".
So my understanding is that that's what the build_wheel operation is -- like, backends should *always* be generating the same wheel as they would if they'd built an sdist, and if they don't, it's a bug. Of course bugs do happen, and distutils's fundamental architecture has some mistakes in it, and it makes sense that pip wants to try and be robust against them. But that requires some specific strategy for avoiding specific bugs, and I'm not sure what you have in mind here. To me having two different ways to run the build just seems like it gives twice as many chances for bugs (or maybe more once you take interactions into account). Similarly we could have a literal boolean flag that is documented to mean "please build the wheel *as if* you'd exported an sdist and then built that", but what specifically would you expect backends to do differently if this was set? Are there any circumstances where you wouldn't want this to be set? -n -- Nathaniel J. Smith -- https://vorpus.org
On 16 July 2017 at 01:07, Nathaniel Smith <njs@pobox.com> wrote:
...so this is where we diverge. As far as I understand it -- and I'm pretty sure this matches all the major build systems like automake, cmake, etc. -- the *only* difference between an in-place and out-of-place build is where they cache the intermediate artifacts.
Out-of-tree builds also change the location of the final output artifacts (executables, shared libraries, etc), not just the intermediate ones - that's why you can run them even when the source code is on a read-only file system, and why several build systems call them "variant" directories these days rather than "build" directories.
So if a build system is, say, scanning the source tree looking for stuff to build... well, the source tree is the same either way, so if they're going to pick up random junk in one case, they'll do it in the other case as well. In fact I think they'd consider it a bug if they didn't. Or contrariwise, if a build system is smart enough to recognize that some files are junk and some are not, then it doesn't matter where it's putting the intermediate files, it'll generate good results either way.
Full-fledged build systems aren't a problem for the clutter case, as they're not copying arbitrary files around, they're tracing their build dependency graph backwards from the desired outputs. So the single biggest long term contribution that PEP 517 makes to solving that problem is to enable newcomers to Python to start using backends that aren't as inherently error prone in this regard as setuptools and distutils are. However, the clutter problem still exists for any "grab files from disk and shove them into an archive" backends, like the naive "copy everything except hidden and special files into the sdist" and "copy the entire src directory into the wheel" example backend that we define in the PEP. For those, the best we can hope for in the general case is to say "at least try to keep the clutter in your sdists and wheel archives *consistent*". Anything beyond that will be up to the collective decision making of Python's userbase, as folks decide which backends they actually want to use for their projects. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Jul 15, 2017, at 6:54 AM, Paul Moore <p.f.moore@gmail.com> wrote:
One particularly frustrating aspect of this discussion is that the worst offender for "wheel and sdist are inconsistent" is the way that setuptools requires developers to specify build and sdist contents separately (setup.py vs MANIFEST.in). That duplication is an obvious source of potential inconsistencies, and precisely why we get most of the reports we see. Ideally, new backends would not design in such inconsistency[1], which means it's easy to see such inconsistencies as "that should never happen" or "I don't understand the problem". But we will have to deal with the possibility of such backends, and the setuptools model isn't *that* unusual (setuptools didn't invent the file MANIFEST.in, it just reused the name for its own purpose).
[1] I don't know enough about flit to be sure, but if the developer forgets to check in a new source file, would it be possible for that source file be in the wheel but not in the sdist?
I think all of the build tools that we’ve looked at so far has this problem to some degree. It appears that flit is the least likely of the bunch to get affected by it, because it tries really hard to yell at you when you have files that aren’t in source control, but like Thomas has indicated that can obviously fail when the VCS is not available for some reason. We’re all well aware of how distutils/setuptools has issues in this arena, and enscons has it too with the fact you have two separate lists that get built, the list of files to add to the sdist, and the list of files that get installed. Which is really the fundamental error case here. Whenever you have two different lists of files, one for the sdist and one for the install, you risk having areas where those two lists diverge which can give inconsistent results based on exactly how those two lists differ. One thing I’d maybe push back on is the idea that a hook can’t fail— that I think is obviously not attainable. All of these hooks can fail for any number of reasons, the real question is whether it’s a fatal error to the entire build process or not. If the wheel building hook fails, that is obviously a fatal error and a front end has to halt execution at that point because there’s nothing left for us to do (this is actually a distinct change from today, because today if wheel building fails we fall back to trying to do a direct install). The place that we seem to be getting held up on is trying to make it so that building a sdist is a non-fatal error and that execution can continue in the case that sdist failed (or would have failed, depending on the order of operations). The primary driver for sdist errors that wouldn’t necessarily also translate to a wheel failure seems to be the lack of some external tool that can’t be installed via pip as a build requirement. Thinking through all of the tooling that currently exists, as well as any ideas in my head that I can think of for other tooling, the main tools that fit into the category of that are VCS tools (which I think is why they regularly get used as part of the example of a case where that can fail). I wonder if maybe it would be more useful to simply recommend that instead of shelling out to random vcs binaries that these projects depend on (or bundle) libraries to directly interact with a repository. For instance, if your project supports git, then you can use dulwich or pygit2 and then the invariant of “building inside of a docker container without `git` installed” still remains functional. This is obviously not 100% since I’m sure there are going to be some tools people want to use that simply aren’t going to be able to be installed as a Python package, however I don’t personally feel like having a fatal error because you haven’t satisfied some constraint the package has on the build system is unreasonable. That might trigger feature requests to tools to relax their constraints, but assuming that those constraints exist for good reason, then it seems easy enough to close those issues with a link to some FAQ about why they exist. All of that being said, I don’t personally have a problem with the interface as it currently exists on https://www.python.org/dev/peps/pep-0517/ <https://www.python.org/dev/peps/pep-0517/> (assuming that’s the most up to date draft?). The inclusion of a build directory is fine with me, though the fact Nathaniel is concerned is somewhat concerning to me, given he has far more experience with random build tools than I do. I have some *other* comments about other parts of the PEP, but I’m going to hold off on addressing them until we get the interface, which is the meat of the spec, nailed down and decided. One thing that I’ve thought about as I was reading this spec, is really I think one of the important things to do with this spec is to somewhat divorce our thinking from what specifically pip or tox or whatever will or won’t do with it as the *only* path, and instead make sure it’s flexible enough to implement all of the paths that we’re still going to support. While I had been a proponent of making VCS -> sdist -> wheel -> install be the only path, it appears I am in a minority about that (since a lot of the effort has been in trying to decide how best to support *not* going through sdist). If we’re going to support other ways, then I think being flexible is the right way to do it (as different tools will likely impose different constraints on how they process build directories). One benefit of that is we can evolve the actual tooling faster then we can evolve specs (or at least, that seems to be the case!) and any spec we create we’re stuck with for a decade+ once it’s been implemented, but tooling itself lives for far fewer years. That means that tooling can initially start out being fairly strict or hardline, and then wait and see how the ecosystem reacts to that. We’re all making guesses about how likely one failure mode or another is going to happen with a new crop of tools designed in this decade and I don’t think we can really say for sure which cases are going to be more or less common. This is all a long winded way of saying that on the implementation side, it may make sense for pip to be strict VCS -> sdist -> wheel -> install at first, and see what issues that causes for people, and if barely anyone has any problems, well maybe great, we’re done. If there seems to be a number of folks running into issues that could/would be solved using whatever mechanism exists for going VCS -> wheel -> install, then we can start adding an option (and eventually migrating to on by default, then remove that option) to support doing that [2]. As long as the backend API is there, we can make decisions more “on the fly”. All of that is a long winded way of saying I don’t particularly care if the VCS -> wheel -> install path is spelled out *always* doing in-place builds or if we add a build directory to specify between out of place or in place. Having a robust mechanism in place for doing that means we can adjust how things *typically* work without going back to the PEP process and throwing everything away. Hopefully that all makes sense and is a useful sort of dumping of thoughts. [1] One note, I noticed there’s still instances of prepare_wheel_metadata in the text. [2] For an example, we’ve recently done with with —upgrade in order to better support projects like NumPy. The way pip works isn’t set in stone, and as we get more experience with new things we can adjust it. — Donald Stufft
On 16 July 2017 at 04:33, Donald Stufft <donald@stufft.io> wrote:
All of that is a long winded way of saying I don’t particularly care if the VCS -> wheel -> install path is spelled out *always* doing in-place builds or if we add a build directory to specify between out of place or in place. Having a robust mechanism in place for doing that means we can adjust how things *typically* work without going back to the PEP process and throwing everything away.
+1 The thing I like about the latest draft of the API is that it lets frontends choose freely between three build strategies: 1. build_sdist failing is a fatal error for the overall build 2. ask build_wheel to do its best to emulate a "via sdist" build with what's available 3. ask build_wheel for a wheel without worrying too much about matching the sdist The former is the approach that best encourages keeping sdist and wheel file lists in sync, but may fail in cases where one of the other strategies would have worked. Having this option available allows a frontend to prioritise archive consistency at the expense of a reduction in comprehensiveness (i.e. anything it can build will have consistent sdists and wheel archives, but there will be some cases where it will fail to build a wheel where a more permissive frontend would have succeeded). The second pushes the problem onto backends to figure out, and they're frankly in the best position to do so, since they have the most information about what their requirements are, and the greatest ability to *change* their requirements based on the circumstances of use (e.g. by switching from VCS command line tools to Python APIs, and then dynamically requesting the appropriate library based on the available VCS metadata). Some backends won't want to do that, and we're OK with that - it's a "do your best with the information and tools you have available" API, not a "you must get it exactly right every time" API. The final case is then mainly useful in situations where the frontend *knows* it is already working with an unpacked sdist, either because it downloaded it from PyPI, or because it just created it with build_sdist. In those cases, the priority for the backend is to produce a working wheel - whether or not that wheel matches building via the sdist is a lesser concern, since the frontend has indicated that it has either already handled it or else genuinely doesn't care. The exact norms around what's acceptable behaviour for out-of-tree wheel builds (and just how hard backends should try to match the build_sdist -> in-place build_wheel path in that case) is then something that will evolve over time, and I'm OK with that. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, Jul 15, 2017 at 8:50 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 16 July 2017 at 04:33, Donald Stufft <donald@stufft.io> wrote:
All of that is a long winded way of saying I don’t particularly care if the VCS -> wheel -> install path is spelled out *always* doing in-place builds or if we add a build directory to specify between out of place or in place. Having a robust mechanism in place for doing that means we can adjust how things *typically* work without going back to the PEP process and throwing everything away.
+1
The thing I like about the latest draft of the API is that it lets frontends choose freely between three build strategies:
1. build_sdist failing is a fatal error for the overall build 2. ask build_wheel to do its best to emulate a "via sdist" build with what's available 3. ask build_wheel for a wheel without worrying too much about matching the sdist
I guess here you're identifying (2) with "out-of-place builds" and (3) with "in-place builds"? But... that is not what the in-place/out-of-place distinction means in normal usage, it's not the distinction that any of those build systems you were surveying implement, and it's not the distinction specified in the current PEP text. If what we want is a distinction between "please give me a correct wheel" and "please give me a wheel but I don't care if it's broken", then wouldn't it make more sense to have a simple flag saying *that*? And in what case would a frontend ever set this give_me_a_correct_wheel flag to False? I know this sounds like a rhetorical question, but it's not; if a distinction between these is meaningful then there should be some concrete guidance we can give for frontends to choose between them, and for backends to decide which flag setting should behave in which way. Alternatively, would it help to add some language to the PEP saying that build_wheel MUST always produce the same output as sdist->unpack->build_wheel? (I.e., basically making the give_me_a_correct_wheel=True flag the only option, and then if it turns out that sloppy builds have some compelling use case we can add that later.) In particular, I think we could then say that since for distutils/setuptools, MANIFEST.in affects the sdist, then this language means that their build_wheel hook MUST also be sensitive to MANIFEST.in, and therefore it would need to be implemented internally as sdist->unpack->bdist_wheel. (Or if someone's ambitious we could even optimize that internally by skipping the pack/unpack step, which should make Donald happy :-).) And OTOH other backends that don't do this odd MANIFEST.in thing wouldn't have to worry about this. -n -- Nathaniel J. Smith -- https://vorpus.org
On 16 July 2017 at 14:56, Nathaniel Smith <njs@pobox.com> wrote:
On Sat, Jul 15, 2017 at 8:50 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 16 July 2017 at 04:33, Donald Stufft <donald@stufft.io> wrote:
All of that is a long winded way of saying I don’t particularly care if the VCS -> wheel -> install path is spelled out *always* doing in-place builds or if we add a build directory to specify between out of place or in place. Having a robust mechanism in place for doing that means we can adjust how things *typically* work without going back to the PEP process and throwing everything away.
+1
The thing I like about the latest draft of the API is that it lets frontends choose freely between three build strategies:
1. build_sdist failing is a fatal error for the overall build 2. ask build_wheel to do its best to emulate a "via sdist" build with what's available 3. ask build_wheel for a wheel without worrying too much about matching the sdist
I guess here you're identifying (2) with "out-of-place builds" and (3) with "in-place builds"?
But... that is not what the in-place/out-of-place distinction means in normal usage, it's not the distinction that any of those build systems you were surveying implement, and it's not the distinction specified in the current PEP text.
If what we want is a distinction between "please give me a correct wheel" and "please give me a wheel but I don't care if it's broken", then wouldn't it make more sense to have a simple flag saying *that*?
No, because pip *also* wants the ability to request that the backend put the intermediate build artifacts in a particular place, *and* having that ability will likely prove beneficial given directory based caching schemes in build automation pipelines (with BitBucket Pipelines and OpenShift Image Streams being the two I'm personally familiar with, but it's a logical enough approach to speeding up build pipelines that I'm sure there are others). It just turns out that we can piggy back off that in-place/out-of-tree distinction to *also* indicate how much the frontend cares about consistency with sdist builds (which the PEP previously *didn't* say, but explicit text along those lines was added as part of https://github.com/python/peps/pull/310/files based on this latest discussion).
And in what case would a frontend ever set this give_me_a_correct_wheel flag to False?
When the frontend either genuinely doesn't care (hopefully rare, but not inconceivable), or else when its building from an unpacked sdist and hence can be confident that the artifacts will be consistent with each other regardless of how the backend handles the situation (expected to be very common, since it's the path that will be followed for sdists published to PyPI, and when handed an arbitrary PEP 517 source tree to build, pip will likely try "build_sdist -> in-place build_wheel" first and only fall back to "out-of-tree build_wheel" if the initial build_sdist call fails). This is the main reason piggy backing off the in-place/out-of-tree distinction works so well for this purpose: if the frontend just unpacked an sdist into a build directory, then the most obvious thing for it to do is to do an in-place build in that directory. It's only when the frontend is handed an arbitrary directory to build that it doesn't know for sure is an unpacked sdist that the right thing to do becomes markedly less clear, which is why we're offering three options: 1. build_sdist -> unpack sdist -> in-place build_wheel (same as a PyPI download) 2. out-of-tree build_wheel (delegate the decision to the backend) 3. in-place build_wheel (explicitly decide not to worry about it) We *think* 1 & 2 are going to be the most sensible options when given an arbitrary directory, but allowing for 3 is a natural consequence of supporting building from an unpacked sdist.
In particular, I think we could then say that since for distutils/setuptools, MANIFEST.in affects the sdist, then this language means that their build_wheel hook MUST also be sensitive to MANIFEST.in, and therefore it would need to be implemented internally as sdist->unpack->bdist_wheel. (Or if someone's ambitious we could even optimize that internally by skipping the pack/unpack step, which should make Donald happy :-).) And OTOH other backends that don't do this odd MANIFEST.in thing wouldn't have to worry about this.
We don't want to get into the business of micromanaging how backends work (that's how we got into the current mess with distutils & setuptools), we just want to make it possible for frontend developers and backend developers to collaborate effectively over time. That turned out to be the core problem with the previous "prepare_input_for_build_wheel" hook: while it technically addressed Paul & Donald's concerns as frontend developers, it placed too many arbitrary constraints on how backend implementations worked and didn't align well with the way full-fledged build systems handle requests for out of tree builds. By contrast, `please do an out-of-tree build using this directory` handles the same scenario (build_sdist failed, but build_wheel can still be made to work) in a more elegant fashion by allowing the front end to state *what* it wants (i.e. something that gets as close as is practical to the "build_sdist -> unpack sdist -> in-place build_wheel" path given the limitations of the current environment), while delegating the precise details of *how* that is done to the backend. Some backends will implement those requests literally as "build_sdist -> unpack sdist -> in-place build_wheel" (which is what the example in the PEP does). Some won't offer any more artifact consistency guarantees than they do for the in-place build_wheel case (this is the current plan for flit and presumably for enscons as well) Some will be able to take their sdist manifest data into account without actually preparing a full sdist archive (this might make sense for a setuptools/distutils backend). All 3 of those options are fine from an ecosystem level perspective, and are ultimately a matter to be resolved between backend developers and their users. So we're now at a point where: - key frontend developers agree the current spec allows them to request what they need/want from backends - key backend developers agree the current spec can be readily implemented - there are still some open questions around exactly when its reasonable for hooks to fail, but we're only going to answer those through real world experience, not further hypothetical speculation - we have the ability to evolve the API in the future if some aspects turn out to be particularly problematic That means I'm going to *explicitly* ask that you accept that the PEP is going to be accepted, and it's going to be accepted with the API in its current form, even if you personally don't agree with our reasoning for all of the technical details. If your level of concern around the build_directory parameter specifically is high enough that you don't want to be listed as a co-author on PEP 517 anymore, then that's entirely reasonable (we can add a separate Acknowledgments section to recognise your significant input to the process without implying your endorsement of the final result), but as long as the accepted API ends up being supported in at least pip, flit, and enscons, it honestly doesn't really matter all that much in practice what the rest of us think of the design (we're here as design advisors, rather than being the ones that will necessarily need to cope with the bug reports arising from any interoperability challenges). However, something you can definitely still influence is how the PEP is *worded*, and how it explains its expectations to frontend and backend developers - requests for clarification, rather than requests for change. In particular, if you can figure out what the PEP would have to say that it doesn't currently say for the design outcome to seem logical to you, then I'd expect that to be a very helpful PR (keeping in mind that https://github.com/python/peps/pull/311/files is currently still open for review, and PR#310 was only merged recently). Cheers, Nick. P.S. We're also going to have a subsequent update to the specifications section of the Python Packaging User Guide, which will likely initially just be a link to the PEP in a new subsection, but will eventually involve being part of the expansion of that section into a Python packaging interoperability reference guide: https://github.com/pypa/python-packaging-user-guide/issues/319 ) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, Jul 15, 2017 at 11:27 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 16 July 2017 at 14:56, Nathaniel Smith <njs@pobox.com> wrote:
But... that is not what the in-place/out-of-place distinction means in normal usage, it's not the distinction that any of those build systems you were surveying implement, and it's not the distinction specified in the current PEP text.
If what we want is a distinction between "please give me a correct wheel" and "please give me a wheel but I don't care if it's broken", then wouldn't it make more sense to have a simple flag saying *that*?
No, because pip *also* wants the ability to request that the backend put the intermediate build artifacts in a particular place,
Say what? Where did they say this? I'm 99% sure this is just not true.
*and* having that ability will likely prove beneficial given directory based caching schemes in build automation pipelines (with BitBucket Pipelines and OpenShift Image Streams being the two I'm personally familiar with, but it's a logical enough approach to speeding up build pipelines that I'm sure there are others).
Yeah, this is a really neat idea! I'm genuinely enthusiastic about it. Which... is why I think this is a great target for a future PEP, that can adequately address the complications that the current PEP is skimming over. As I argued in a previous email, I think these pipelines would actually *prefer* that out-of-place build be an optional feature, but it's basically impossible to even have the discussion about what they want properly as part of the core PEP 517 discussion.
It just turns out that we can piggy back off that in-place/out-of-tree distinction to *also* indicate how much the frontend cares about consistency with sdist builds (which the PEP previously *didn't* say, but explicit text along those lines was added as part of https://github.com/python/peps/pull/310/files based on this latest discussion).
Okay, but then this is... bad. You're taking two unrelated distinctions (in-place/out-of-place and sloppy/precise) and smashing them together. In particular, one of the types of build that Donald has said that he considers "sloppy" and worries about avoiding is any kind of incremental build. So if we take Donald's concern and your new PEP text literally, then it rules out incremental out-of-place builds. But incremental out-of-place builds are exactly what you need for the build pipeline case that you're citing as a motivation for this feature. Your PEP is trying to do too many things at once, and that means it's going to do them poorly.
And in what case would a frontend ever set this give_me_a_correct_wheel flag to False?
When the frontend either genuinely doesn't care (hopefully rare, but not inconceivable), or else when its building from an unpacked sdist and hence can be confident that the artifacts will be consistent with each other regardless of how the backend handles the situation (expected to be very common, since it's the path that will be followed for sdists published to PyPI, and when handed an arbitrary PEP 517 source tree to build, pip will likely try "build_sdist -> in-place build_wheel" first and only fall back to "out-of-tree build_wheel" if the initial build_sdist call fails).
Ah, the unpacked sdist case is a good point, I neglected that in my discussion of a possible setuptools build_wheel hook. But it's fine -- even if we say that the setuptools build_wheel hook has to produce an "sdist-consistent" wheel in all cases, then it can detect whether it's building from an unpacked sdist (e.g. by keying off the presence of PKG-INFO), and in that case it knows MANIFEST.in has already been taken into account and it can go straight to bdist_wheel. And in fact, this is what we *want* it to key off of, *not* the in-place/out-of-place thing. Consider Debian: they want to do out-of-place builds of unpacked sdists. They're working with pristine unpacked sdists, so they don't want to setuptools to go pack up a new sdist and then unpack it again for every out-of-place build. (In fact, this is probably a less-tested path than building directly from the unpacked sdist.) So if the point of the out-of-place build feature is that it's supposed to tell setuptools that it needs to go via sdist... then in this case it will do the wrong thing. Out-of-place and sdist-consistency are orthogonal concepts.
This is the main reason piggy backing off the in-place/out-of-tree distinction works so well for this purpose: if the frontend just unpacked an sdist into a build directory, then the most obvious thing for it to do is to do an in-place build in that directory.
It's only when the frontend is handed an arbitrary directory to build that it doesn't know for sure is an unpacked sdist that the right thing to do becomes markedly less clear, which is why we're offering three options:
1. build_sdist -> unpack sdist -> in-place build_wheel (same as a PyPI download) 2. out-of-tree build_wheel (delegate the decision to the backend) 3. in-place build_wheel (explicitly decide not to worry about it)
We *think* 1 & 2 are going to be the most sensible options when given an arbitrary directory, but allowing for 3 is a natural consequence of supporting building from an unpacked sdist.
In particular, I think we could then say that since for distutils/setuptools, MANIFEST.in affects the sdist, then this language means that their build_wheel hook MUST also be sensitive to MANIFEST.in, and therefore it would need to be implemented internally as sdist->unpack->bdist_wheel. (Or if someone's ambitious we could even optimize that internally by skipping the pack/unpack step, which should make Donald happy :-).) And OTOH other backends that don't do this odd MANIFEST.in thing wouldn't have to worry about this.
We don't want to get into the business of micromanaging how backends work (that's how we got into the current mess with distutils & setuptools), we just want to make it possible for frontend developers and backend developers to collaborate effectively over time.
So, uh... you're saying that you don't like this proposal, because it has too much micro-management: - backends are required to produce sdist-consistent wheels and instead prefer this proposal: - backends must support one configuration where they place intermediate artifacts in one specified place, and are not required to produce sdist-consistent wheels - backends must also support another configuration where they place intermediate artifacts in another place that's specified in a different way, and in this case they are required to produce sdist-consistent wheels I mean... the second proposal encompasses the first proposal, and then adds extra requirements, and those extra requirements are about fiddly internal things like where intermediate artifacts should be cached... surely the first proposal is less micromanage-y?
That turned out to be the core problem with the previous "prepare_input_for_build_wheel" hook: while it technically addressed Paul & Donald's concerns as frontend developers, it placed too many arbitrary constraints on how backend implementations worked and didn't align well with the way full-fledged build systems handle requests for out of tree builds.
By contrast, `please do an out-of-tree build using this directory` handles the same scenario (build_sdist failed, but build_wheel can still be made to work) in a more elegant fashion by allowing the front end to state *what* it wants (i.e. something that gets as close as is practical to the "build_sdist -> unpack sdist -> in-place build_wheel" path given the limitations of the current environment),
We totally agree the frontend should state "*what* it wants". This is exactly my problem with the PEP -- it doesn't do this! Instead, it has frontends specify something they don't care about (in-place/out-of-place) and then overloads that to mean this unrelated thing. My draft at the start of this thread was exactly designed by trying to find the intersection of (1) satisfying everyone's requirements, (2) restricting myself to operations whose semantics I could define in a generic, future-proof way.
while delegating the precise details of *how* that is done to the backend.
Some backends will implement those requests literally as "build_sdist -> unpack sdist -> in-place build_wheel" (which is what the example in the PEP does).
Some won't offer any more artifact consistency guarantees than they do for the in-place build_wheel case (this is the current plan for flit and presumably for enscons as well)
Some will be able to take their sdist manifest data into account without actually preparing a full sdist archive (this might make sense for a setuptools/distutils backend).
All 3 of those options are fine from an ecosystem level perspective, and are ultimately a matter to be resolved between backend developers and their users.
So we're now at a point where:
- key frontend developers agree the current spec allows them to request what they need/want from backends - key backend developers agree the current spec can be readily implemented - there are still some open questions around exactly when its reasonable for hooks to fail, but we're only going to answer those through real world experience, not further hypothetical speculation - we have the ability to evolve the API in the future if some aspects turn out to be particularly problematic
That means I'm going to *explicitly* ask that you accept that the PEP is going to be accepted, and it's going to be accepted with the API in its current form, even if you personally don't agree with our reasoning for all of the technical details. If your level of concern around the build_directory parameter specifically is high enough that you don't want to be listed as a co-author on PEP 517 anymore, then that's entirely reasonable (we can add a separate Acknowledgments section to recognise your significant input to the process without implying your endorsement of the final result), but as long as the accepted API ends up being supported in at least pip, flit, and enscons, it honestly doesn't really matter all that much in practice what the rest of us think of the design (we're here as design advisors, rather than being the ones that will necessarily need to cope with the bug reports arising from any interoperability challenges).
Clearly neither of us have convinced each other here. If you want to take over authorship of PEP 517, that's fine -- it's basically just acknowledging the current reality. But in that case I think you should step down as BDFL-delegate; even Guido doesn't accept his own PEPs. Here's another possible way forward: you mold PEP 517 into what you think is best, I take my text from the beginning of this thread and base a new PEP off it, and then Donald gets stuck as BDFL-delegate and has to pick. (Sorry Donald.) What do you think? It would at least have the advantage that everyone else gets a chance to catch up and look things over -- I think at the moment it's basically only you and me who actually have strong opinions, and everyone else is barely following. -n -- Nathaniel J. Smith -- https://vorpus.org
Just throwing in a vote, since I am following along – Nathaniel is totally correct and I have no idea what Nick is talking about :) In/out-of-place builds can’t guarantee that any “build wheel” operation will be consistent with the “build sdist” operation, except for dealing with a very narrow set of bugs. If you want wheels and sdists to be the same, require it of backends through the specification – not the API. About the only option available here in the protocol would be a “strict” flag, which might allow backends to fail immediately if they can’t guarantee it (e.g. the example backend), but there is literally no API design that can implicitly force the correct behavior. “Build wheel and make a mess in my repo” vs “build wheel without leaving a mess” (in-tree vs. out of tree) is a useful option, but not a solution to backends producing incorrect output. (In case it’s not clear, I’m using Nathaniel’s definitions, since I don’t understand any of the other definitions people have used.) Cheers, Steve Top-posted from my Windows phone From: Nathaniel Smith Sent: Sunday, July 16, 2017 10:25 To: Nick Coghlan Cc: distutils-sig Subject: Re: [Distutils] A possible refactor/streamlining of PEP 517 On Sat, Jul 15, 2017 at 11:27 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 16 July 2017 at 14:56, Nathaniel Smith <njs@pobox.com> wrote:
But... that is not what the in-place/out-of-place distinction means in normal usage, it's not the distinction that any of those build systems you were surveying implement, and it's not the distinction specified in the current PEP text.
If what we want is a distinction between "please give me a correct wheel" and "please give me a wheel but I don't care if it's broken", then wouldn't it make more sense to have a simple flag saying *that*?
No, because pip *also* wants the ability to request that the backend put the intermediate build artifacts in a particular place,
Say what? Where did they say this? I'm 99% sure this is just not true.
*and* having that ability will likely prove beneficial given directory based caching schemes in build automation pipelines (with BitBucket Pipelines and OpenShift Image Streams being the two I'm personally familiar with, but it's a logical enough approach to speeding up build pipelines that I'm sure there are others).
Yeah, this is a really neat idea! I'm genuinely enthusiastic about it. Which... is why I think this is a great target for a future PEP, that can adequately address the complications that the current PEP is skimming over. As I argued in a previous email, I think these pipelines would actually *prefer* that out-of-place build be an optional feature, but it's basically impossible to even have the discussion about what they want properly as part of the core PEP 517 discussion.
It just turns out that we can piggy back off that in-place/out-of-tree distinction to *also* indicate how much the frontend cares about consistency with sdist builds (which the PEP previously *didn't* say, but explicit text along those lines was added as part of https://github.com/python/peps/pull/310/files based on this latest discussion).
Okay, but then this is... bad. You're taking two unrelated distinctions (in-place/out-of-place and sloppy/precise) and smashing them together. In particular, one of the types of build that Donald has said that he considers "sloppy" and worries about avoiding is any kind of incremental build. So if we take Donald's concern and your new PEP text literally, then it rules out incremental out-of-place builds. But incremental out-of-place builds are exactly what you need for the build pipeline case that you're citing as a motivation for this feature. Your PEP is trying to do too many things at once, and that means it's going to do them poorly.
And in what case would a frontend ever set this give_me_a_correct_wheel flag to False?
When the frontend either genuinely doesn't care (hopefully rare, but not inconceivable), or else when its building from an unpacked sdist and hence can be confident that the artifacts will be consistent with each other regardless of how the backend handles the situation (expected to be very common, since it's the path that will be followed for sdists published to PyPI, and when handed an arbitrary PEP 517 source tree to build, pip will likely try "build_sdist -> in-place build_wheel" first and only fall back to "out-of-tree build_wheel" if the initial build_sdist call fails).
Ah, the unpacked sdist case is a good point, I neglected that in my discussion of a possible setuptools build_wheel hook. But it's fine -- even if we say that the setuptools build_wheel hook has to produce an "sdist-consistent" wheel in all cases, then it can detect whether it's building from an unpacked sdist (e.g. by keying off the presence of PKG-INFO), and in that case it knows MANIFEST.in has already been taken into account and it can go straight to bdist_wheel. And in fact, this is what we *want* it to key off of, *not* the in-place/out-of-place thing. Consider Debian: they want to do out-of-place builds of unpacked sdists. They're working with pristine unpacked sdists, so they don't want to setuptools to go pack up a new sdist and then unpack it again for every out-of-place build. (In fact, this is probably a less-tested path than building directly from the unpacked sdist.) So if the point of the out-of-place build feature is that it's supposed to tell setuptools that it needs to go via sdist... then in this case it will do the wrong thing. Out-of-place and sdist-consistency are orthogonal concepts.
This is the main reason piggy backing off the in-place/out-of-tree distinction works so well for this purpose: if the frontend just unpacked an sdist into a build directory, then the most obvious thing for it to do is to do an in-place build in that directory.
It's only when the frontend is handed an arbitrary directory to build that it doesn't know for sure is an unpacked sdist that the right thing to do becomes markedly less clear, which is why we're offering three options:
1. build_sdist -> unpack sdist -> in-place build_wheel (same as a PyPI download) 2. out-of-tree build_wheel (delegate the decision to the backend) 3. in-place build_wheel (explicitly decide not to worry about it)
We *think* 1 & 2 are going to be the most sensible options when given an arbitrary directory, but allowing for 3 is a natural consequence of supporting building from an unpacked sdist.
In particular, I think we could then say that since for distutils/setuptools, MANIFEST.in affects the sdist, then this language means that their build_wheel hook MUST also be sensitive to MANIFEST.in, and therefore it would need to be implemented internally as sdist->unpack->bdist_wheel. (Or if someone's ambitious we could even optimize that internally by skipping the pack/unpack step, which should make Donald happy :-).) And OTOH other backends that don't do this odd MANIFEST.in thing wouldn't have to worry about this.
We don't want to get into the business of micromanaging how backends work (that's how we got into the current mess with distutils & setuptools), we just want to make it possible for frontend developers and backend developers to collaborate effectively over time.
So, uh... you're saying that you don't like this proposal, because it has too much micro-management: - backends are required to produce sdist-consistent wheels and instead prefer this proposal: - backends must support one configuration where they place intermediate artifacts in one specified place, and are not required to produce sdist-consistent wheels - backends must also support another configuration where they place intermediate artifacts in another place that's specified in a different way, and in this case they are required to produce sdist-consistent wheels I mean... the second proposal encompasses the first proposal, and then adds extra requirements, and those extra requirements are about fiddly internal things like where intermediate artifacts should be cached... surely the first proposal is less micromanage-y?
That turned out to be the core problem with the previous "prepare_input_for_build_wheel" hook: while it technically addressed Paul & Donald's concerns as frontend developers, it placed too many arbitrary constraints on how backend implementations worked and didn't align well with the way full-fledged build systems handle requests for out of tree builds.
By contrast, `please do an out-of-tree build using this directory` handles the same scenario (build_sdist failed, but build_wheel can still be made to work) in a more elegant fashion by allowing the front end to state *what* it wants (i.e. something that gets as close as is practical to the "build_sdist -> unpack sdist -> in-place build_wheel" path given the limitations of the current environment),
We totally agree the frontend should state "*what* it wants". This is exactly my problem with the PEP -- it doesn't do this! Instead, it has frontends specify something they don't care about (in-place/out-of-place) and then overloads that to mean this unrelated thing. My draft at the start of this thread was exactly designed by trying to find the intersection of (1) satisfying everyone's requirements, (2) restricting myself to operations whose semantics I could define in a generic, future-proof way.
while delegating the precise details of *how* that is done to the backend.
Some backends will implement those requests literally as "build_sdist -> unpack sdist -> in-place build_wheel" (which is what the example in the PEP does).
Some won't offer any more artifact consistency guarantees than they do for the in-place build_wheel case (this is the current plan for flit and presumably for enscons as well)
Some will be able to take their sdist manifest data into account without actually preparing a full sdist archive (this might make sense for a setuptools/distutils backend).
All 3 of those options are fine from an ecosystem level perspective, and are ultimately a matter to be resolved between backend developers and their users.
So we're now at a point where:
- key frontend developers agree the current spec allows them to request what they need/want from backends - key backend developers agree the current spec can be readily implemented - there are still some open questions around exactly when its reasonable for hooks to fail, but we're only going to answer those through real world experience, not further hypothetical speculation - we have the ability to evolve the API in the future if some aspects turn out to be particularly problematic
That means I'm going to *explicitly* ask that you accept that the PEP is going to be accepted, and it's going to be accepted with the API in its current form, even if you personally don't agree with our reasoning for all of the technical details. If your level of concern around the build_directory parameter specifically is high enough that you don't want to be listed as a co-author on PEP 517 anymore, then that's entirely reasonable (we can add a separate Acknowledgments section to recognise your significant input to the process without implying your endorsement of the final result), but as long as the accepted API ends up being supported in at least pip, flit, and enscons, it honestly doesn't really matter all that much in practice what the rest of us think of the design (we're here as design advisors, rather than being the ones that will necessarily need to cope with the bug reports arising from any interoperability challenges).
Clearly neither of us have convinced each other here. If you want to take over authorship of PEP 517, that's fine -- it's basically just acknowledging the current reality. But in that case I think you should step down as BDFL-delegate; even Guido doesn't accept his own PEPs. Here's another possible way forward: you mold PEP 517 into what you think is best, I take my text from the beginning of this thread and base a new PEP off it, and then Donald gets stuck as BDFL-delegate and has to pick. (Sorry Donald.) What do you think? It would at least have the advantage that everyone else gets a chance to catch up and look things over -- I think at the moment it's basically only you and me who actually have strong opinions, and everyone else is barely following. -n -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
I agree that sdist consistency is not enforceable. Very little is. What if we deleted every unenforceable part of the PEP? No explanations of what backends should do. Every parameter is a hint. If you put the output file where requested then you are a good back end. Would that work better?
That totally works for me. It also avoids the argument we seem to be having around tricking backends into creating the correct output by controlling their input. It’s the backend job to get the output correct – pip can output as many messages as it likes to blame someone else, if the aim is to avoid bugs being reported in the wrong place, but at some point we need to trust the backend to get it right even in a dirty source directory. Top-posted from my Windows phone at EuroPython From: Daniel Holth Sent: Sunday, July 16, 2017 21:18 To: Steve Dower; Nathaniel Smith; Nick Coghlan Cc: distutils-sig Subject: Re: [Distutils] A possible refactor/streamlining of PEP 517 I agree that sdist consistency is not enforceable. Very little is. What if we deleted every unenforceable part of the PEP? No explanations of what backends should do. Every parameter is a hint. If you put the output file where requested then you are a good back end. Would that work better?
Nathaniel Smith wrote:
I think at the moment it's basically only you and me who actually have strong opinions, and everyone else is barely following.
I've been a somewhat confused bystander in this thread for a while, and for what it's worth, I think Nathaniel is exactly right on these points: 1. In-place vs. out-of place and sdist/wheel consistency are orthogonal issues. 2. Sdist/wheel consistency should be an implied goal that goes without saying in all cases. Even if a backend can't always guarantee it, it should make a best effort. -- Greg
On 16 July 2017 at 18:24, Nathaniel Smith <njs@pobox.com> wrote:
On Sat, Jul 15, 2017 at 11:27 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 16 July 2017 at 14:56, Nathaniel Smith <njs@pobox.com> wrote:
But... that is not what the in-place/out-of-place distinction means in normal usage, it's not the distinction that any of those build systems you were surveying implement, and it's not the distinction specified in the current PEP text.
If what we want is a distinction between "please give me a correct wheel" and "please give me a wheel but I don't care if it's broken", then wouldn't it make more sense to have a simple flag saying *that*?
No, because pip *also* wants the ability to request that the backend put the intermediate build artifacts in a particular place,
Say what? Where did they say this? I'm 99% sure this is just not true.
pip currently works by moving the input files out to a different directory with shutil.copytree. For PEP 517, their default build strategy is going to be to take an sdist and unpack it. Both of those build strategies are particular ways of achieving an out-of-tree build *without* relying directly on out-of-tree build support in the backend. However, if build_sdist fails, they *can't* use their preferred build strategy, and need a way to ask the backend to do the best it reasonably can to emulate a build via sdist, even though an sdist build isn't technically feasible. Originally, the proposed mechanism for that was the various incarnations of the "prepare_input_for_build_wheel" hook, which had the key downside of not aligning well with the way full-fledged build systems actually work. By contrast, Daniel & Thomas's suggestion of a build_directory parameter to build wheel nicely models a front end saying "Well, I *would* have built an sdist and used that, but it didn't work, so instead I'm asking you to do the best you can" in a way that *also* aligns nicely with the way out-of-tree build support in full-fledged build systems actually works (i.e. by specifying target directories/build directories/variant directories appropriately)
*and* having that ability will likely prove beneficial given directory based caching schemes in build automation pipelines (with BitBucket Pipelines and OpenShift Image Streams being the two I'm personally familiar with, but it's a logical enough approach to speeding up build pipelines that I'm sure there are others).
Yeah, this is a really neat idea! I'm genuinely enthusiastic about it. Which... is why I think this is a great target for a future PEP, that can adequately address the complications that the current PEP is skimming over. As I argued in a previous email, I think these pipelines would actually *prefer* that out-of-place build be an optional feature, but it's basically impossible to even have the discussion about what they want properly as part of the core PEP 517 discussion.
Even with build_directory defined, backends remain free to ignore it and put their intermediate artifacts somewhere else. All the API design does is provide them with the *option* of using the frontend provided directory. So that just becomes a quality of implementation issue that folks will work out iteratively with backend developers.
It just turns out that we can piggy back off that in-place/out-of-tree distinction to *also* indicate how much the frontend cares about consistency with sdist builds (which the PEP previously *didn't* say, but explicit text along those lines was added as part of https://github.com/python/peps/pull/310/files based on this latest discussion).
Okay, but then this is... bad. You're taking two unrelated distinctions (in-place/out-of-place and sloppy/precise) and smashing them together. In particular, one of the types of build that Donald has said that he considers "sloppy" and worries about avoiding is any kind of incremental build. So if we take Donald's concern and your new PEP text literally, then it rules out incremental out-of-place builds. But incremental out-of-place builds are exactly what you need for the build pipeline case that you're citing as a motivation for this feature.
Your PEP is trying to do too many things at once, and that means it's going to do them poorly.
Not really, because sloppy/precise isn't actually a distinction we really care about, it's one that arises from the fact that in-place builds with setuptools/distutils *are* sloppy about their inputs, and hence starting with a "dirty" source directory may corrupt the sdists and wheels produced from those directories. Full-fledged build systems that work backwards through their dependency graph to their desired inputs rather than relying on file globs and collecting entire directory trees from local disk shouldn't be anywhere near as vulnerable to the problem (although even for those the general recommendation is that robust CI and release processes should always start with a clean checkout to ensure that everything is properly version controlled). This means that the key benefit that Daniel & Thomas's suggestion of the build_directory parameter provides is a clear indicator of *where design responsibility lies* for the behaviour of a given build. That is, from pip's point of view, the "normal" build path will be "build_sdist -> unpack sdist -> in-place build_wheel". That's the sequence that pip itself is directly responsible for. However, we know that build_sdist may sometimes fail in cases where build_wheel would have worked, so we *also* define a fallback that delegates *all* design responsibility for the build to the backend: out-of-tree build support. The reason that solves the problem from an ecosystem level perspective is because it divides responsibility appropriately: 1. backend developers know the most about why their build_sdist implementation might fail (triggering an out-of-tree build request) 2. backend developers know the most about ways starting with a "dirty" source directory might corrupt their output 3. backend developers know the most about how to detect if they're starting with an unpacked sdist or not The reason we *don't* define the delegation as being to an *in-place* build is because we *know* we're not happy with that for the setuptools/distutils status quo - if we were, this part of the design discussion would never have arisen, and we'd never have had either the build_directory parameter *or* the various incarnations of the "build directory preparation" hook.
And in what case would a frontend ever set this give_me_a_correct_wheel flag to False?
When the frontend either genuinely doesn't care (hopefully rare, but not inconceivable), or else when its building from an unpacked sdist and hence can be confident that the artifacts will be consistent with each other regardless of how the backend handles the situation (expected to be very common, since it's the path that will be followed for sdists published to PyPI, and when handed an arbitrary PEP 517 source tree to build, pip will likely try "build_sdist -> in-place build_wheel" first and only fall back to "out-of-tree build_wheel" if the initial build_sdist call fails).
Ah, the unpacked sdist case is a good point, I neglected that in my discussion of a possible setuptools build_wheel hook. But it's fine -- even if we say that the setuptools build_wheel hook has to produce an "sdist-consistent" wheel in all cases, then it can detect whether it's building from an unpacked sdist (e.g. by keying off the presence of PKG-INFO), and in that case it knows MANIFEST.in has already been taken into account and it can go straight to bdist_wheel.
And in fact, this is what we *want* it to key off of, *not* the in-place/out-of-place thing. Consider Debian: they want to do out-of-place builds of unpacked sdists. They're working with pristine unpacked sdists, so they don't want to setuptools to go pack up a new sdist and then unpack it again for every out-of-place build. (In fact, this is probably a less-tested path than building directly from the unpacked sdist.) So if the point of the out-of-place build feature is that it's supposed to tell setuptools that it needs to go via sdist... then in this case it will do the wrong thing. Out-of-place and sdist-consistency are orthogonal concepts.
That's a frontend design question, not an API design question - while I agree it would be logical for frontends to have a way for users to say "this directory is an unpacked sdist, you can trust it and ask the backend for an in-place build", that's neither here nor there when it comes to designing the interface between frontends and backends.
We don't want to get into the business of micromanaging how backends work (that's how we got into the current mess with distutils & setuptools), we just want to make it possible for frontend developers and backend developers to collaborate effectively over time.
So, uh... you're saying that you don't like this proposal, because it has too much micro-management:
- backends are required to produce sdist-consistent wheels
Yes, this qualifies as excessive micromanagement of backends, because we *know* backends can't reliably satisfy it, and we also know we don't actually need it (because this isn't a guarantee that setuptools/distutils provides, and that's the incumbent build system).
and instead prefer this proposal:
- backends must support one configuration where they place intermediate artifacts in one specified place, and are not required to produce sdist-consistent wheels - backends must also support another configuration where they place intermediate artifacts in another place that's specified in a different way, and in this case they are required to produce sdist-consistent wheels
Producing sdist-consistent wheels is always *desired*. PEP 517 acknowledges that it isn't always *possible*, and hence places the following constraints on *front*ends in order to free up design space for backends: - if a frontend wants to *ensure* consistency, it has to create the sdist first, and build from that - if build_sdist fails, then the frontend should fall back to an out-of-tree build using the build_directory parameter For backends: - for in-place builds, sdist-consistency is 100% the frontend's problem - for out-of-tree builds, sdist-consistency it the backend's problem, but mainly for cases where build_sdist would have failed
I mean... the second proposal encompasses the first proposal, and then adds extra requirements, and those extra requirements are about fiddly internal things like where intermediate artifacts should be cached... surely the first proposal is less micromanage-y?
No, as the first proposal not only overstates our requirements (as we're technically fine with backends producing inconsistent artifacts when handed a "dirty" source directory), it mainly looks less exacting because you haven't defined what you actually mean by "sdist-consistent wheels". The current PEP instead breaks out some simpler *mechanical* recommendations that provide some substance to what "sdist-consistent wheels" actually means, and whether it's a concern the backend can assume the frontend has already taken care of.
By contrast, `please do an out-of-tree build using this directory` handles the same scenario (build_sdist failed, but build_wheel can still be made to work) in a more elegant fashion by allowing the front end to state *what* it wants (i.e. something that gets as close as is practical to the "build_sdist -> unpack sdist -> in-place build_wheel" path given the limitations of the current environment),
We totally agree the frontend should state "*what* it wants". This is exactly my problem with the PEP -- it doesn't do this! Instead, it has frontends specify something they don't care about (in-place/out-of-place) and then overloads that to mean this unrelated thing.
Sure, I can see that concern - it arises from the fact that the way a frontend tells a backend "I was trying to do 'build sdist -> unpack sdist -> build wheel', but the 'build sdist' step failed" is by passing in the directory where it would have unpacked the sdist as the out-of-tree build directory. However, the apparent conflation arises from asking the question: "If we're going to support requesting out-of-tree builds in the backend API anyway, do we need *another* flag to say 'I wanted to build via sdist, but that didn't work'". And I don't think we do - there isn't anything extra that a well-behaved backend should do for an out-of-tree build *just* because building an sdist failed, while there *are* things that a backend should do for an out-of-tree build that it wouldn't do for an in-place build. Thus the distinction that actually models a meaningful real world build management concept is in-place/out-of-tree (as evidenced by the fact that it maps cleanly to most of the full-fledged build systems we/I surveyed), and that also turns out to be sufficient to handle our Python-specific problem of providing a fallback API for frontends to use when build_sdist fails.
My draft at the start of this thread was exactly designed by trying to find the intersection of (1) satisfying everyone's requirements, (2) restricting myself to operations whose semantics I could define in a generic, future-proof way.
It's a build system API - (2) is an utterly unattainable goal. The most we can hope for is an API that makes it relatively straightforward for end users, publishers, frontend developers and backend developers to figure out how to resolve cases where an end user's build fails (environmental issue, project build config issue, frontend bug, backend bug).
That means I'm going to *explicitly* ask that you accept that the PEP is going to be accepted, and it's going to be accepted with the API in its current form, even if you personally don't agree with our reasoning for all of the technical details. If your level of concern around the build_directory parameter specifically is high enough that you don't want to be listed as a co-author on PEP 517 anymore, then that's entirely reasonable (we can add a separate Acknowledgments section to recognise your significant input to the process without implying your endorsement of the final result), but as long as the accepted API ends up being supported in at least pip, flit, and enscons, it honestly doesn't really matter all that much in practice what the rest of us think of the design (we're here as design advisors, rather than being the ones that will necessarily need to cope with the bug reports arising from any interoperability challenges).
Clearly neither of us have convinced each other here. If you want to take over authorship of PEP 517, that's fine -- it's basically just acknowledging the current reality. But in that case I think you should step down as BDFL-delegate; even Guido doesn't accept his own PEPs.
This isn't my API design - it's one that Daniel suggested & Thomas accepted into the PEP. However, I like it as BDFL-Delegate, and think it resolves all the previous concerns, so now I'm attempting to arbitrate a dispute between the listed PEP authors (where Thomas likes the design and is willing to implement it for both pip & flit, while you still have concerns about it and definitely haven't indicated that you're willing for it to be accepted with your name attached to it).
Here's another possible way forward: you mold PEP 517 into what you think is best, I take my text from the beginning of this thread and base a new PEP off it, and then Donald gets stuck as BDFL-delegate and has to pick. (Sorry Donald.) What do you think? It would at least have the advantage that everyone else gets a chance to catch up and look things over -- I think at the moment it's basically only you and me who actually have strong opinions, and everyone else is barely following.
It isn't my design (I just approve of it and have been providing updates to help make the PEP self-consistent in the interests of being able to formally accept it), so I don't think this proposal makes sense. However, it may make sense for you and Thomas to take the discussion offline as PEP co-authors, and figure out a resolution that satisfies you both, as I can't reasonably accept a PEP where one of the listed co-authors has made it quite clear that they don't want it to be accepted in its current form. Whether that's a matter of Thomas becoming sole nominal author (with your significant contributions being recognised via an Acknowledgements section), or your deciding that you're willing to concede the point and reserve the right to tell us all "I told you so!" in a couple of years' time, or something else entirely, I don't mind (although I'd hope for a resolution that doesn't involve changing the proposed API yet again), but I do want to be clear: as BDFL-Delegate, I am 100% fine with the current technical proposal in PEP 517, as I think it addresses all of the design requirements that have been raised, can be readily implemented in both frontends and backends, and aligns well with the conventions of full-fledged build management systems. The clarity of the prose has definitely suffered from the long design process, but I think the real solution to that is going to be fleshing out the specifications section in PyPUG as the main source of reference information, rather than relying on the PEPs themselves for that (since that particular problem is far from being unique to PEP 517). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
[I don't have time to properly read & respond now, but I wanted to suggest something] Would it be better if we said that build_directory must always be specified, i.e. passing None is invalid? A) Nathaniel has quite rightly expressed concern over requiring two different code paths. While I think the extra work is less than his wording implies, it is an extra case for things to go wrong. B) The current spec does another 'semantics not specified' for in place builds. This should make us uncomfortable - we're asking backends to implement something without telling them what. When we did something similar for editable installs, we ended up pulling it out again. C) We thought we were adding this to satisfy people who want incremental builds, but it now looks like we were conflating two different things, and this is not actually a good answer for those use cases. D) The frontend can specify build_directory=source_dir/build to build in a subdirectory of the source tree. E) When we do work out the need and the semantics for in place builds, we can write another PEP adding an optional hook for that. Thomas On Mon, Jul 17, 2017, at 04:44 AM, Nick Coghlan wrote:
On 16 July 2017 at 18:24, Nathaniel Smith <njs@pobox.com> wrote:
On Sat, Jul 15, 2017 at 11:27 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 16 July 2017 at 14:56, Nathaniel Smith <njs@pobox.com> wrote:
But... that is not what the in-place/out-of-place distinction means in normal usage, it's not the distinction that any of those build systems you were surveying implement, and it's not the distinction specified in the current PEP text.
If what we want is a distinction between "please give me a correct wheel" and "please give me a wheel but I don't care if it's broken", then wouldn't it make more sense to have a simple flag saying *that*?
No, because pip *also* wants the ability to request that the backend put the intermediate build artifacts in a particular place,
Say what? Where did they say this? I'm 99% sure this is just not true.
pip currently works by moving the input files out to a different directory with shutil.copytree.
For PEP 517, their default build strategy is going to be to take an sdist and unpack it.
Both of those build strategies are particular ways of achieving an out-of-tree build *without* relying directly on out-of-tree build support in the backend.
However, if build_sdist fails, they *can't* use their preferred build strategy, and need a way to ask the backend to do the best it reasonably can to emulate a build via sdist, even though an sdist build isn't technically feasible.
Originally, the proposed mechanism for that was the various incarnations of the "prepare_input_for_build_wheel" hook, which had the key downside of not aligning well with the way full-fledged build systems actually work.
By contrast, Daniel & Thomas's suggestion of a build_directory parameter to build wheel nicely models a front end saying "Well, I *would* have built an sdist and used that, but it didn't work, so instead I'm asking you to do the best you can" in a way that *also* aligns nicely with the way out-of-tree build support in full-fledged build systems actually works (i.e. by specifying target directories/build directories/variant directories appropriately)
*and* having that ability will likely prove beneficial given directory based caching schemes in build automation pipelines (with BitBucket Pipelines and OpenShift Image Streams being the two I'm personally familiar with, but it's a logical enough approach to speeding up build pipelines that I'm sure there are others).
Yeah, this is a really neat idea! I'm genuinely enthusiastic about it. Which... is why I think this is a great target for a future PEP, that can adequately address the complications that the current PEP is skimming over. As I argued in a previous email, I think these pipelines would actually *prefer* that out-of-place build be an optional feature, but it's basically impossible to even have the discussion about what they want properly as part of the core PEP 517 discussion.
Even with build_directory defined, backends remain free to ignore it and put their intermediate artifacts somewhere else. All the API design does is provide them with the *option* of using the frontend provided directory.
So that just becomes a quality of implementation issue that folks will work out iteratively with backend developers.
It just turns out that we can piggy back off that in-place/out-of-tree distinction to *also* indicate how much the frontend cares about consistency with sdist builds (which the PEP previously *didn't* say, but explicit text along those lines was added as part of https://github.com/python/peps/pull/310/files based on this latest discussion).
Okay, but then this is... bad. You're taking two unrelated distinctions (in-place/out-of-place and sloppy/precise) and smashing them together. In particular, one of the types of build that Donald has said that he considers "sloppy" and worries about avoiding is any kind of incremental build. So if we take Donald's concern and your new PEP text literally, then it rules out incremental out-of-place builds. But incremental out-of-place builds are exactly what you need for the build pipeline case that you're citing as a motivation for this feature.
Your PEP is trying to do too many things at once, and that means it's going to do them poorly.
Not really, because sloppy/precise isn't actually a distinction we really care about, it's one that arises from the fact that in-place builds with setuptools/distutils *are* sloppy about their inputs, and hence starting with a "dirty" source directory may corrupt the sdists and wheels produced from those directories. Full-fledged build systems that work backwards through their dependency graph to their desired inputs rather than relying on file globs and collecting entire directory trees from local disk shouldn't be anywhere near as vulnerable to the problem (although even for those the general recommendation is that robust CI and release processes should always start with a clean checkout to ensure that everything is properly version controlled).
This means that the key benefit that Daniel & Thomas's suggestion of the build_directory parameter provides is a clear indicator of *where design responsibility lies* for the behaviour of a given build.
That is, from pip's point of view, the "normal" build path will be "build_sdist -> unpack sdist -> in-place build_wheel". That's the sequence that pip itself is directly responsible for.
However, we know that build_sdist may sometimes fail in cases where build_wheel would have worked, so we *also* define a fallback that delegates *all* design responsibility for the build to the backend: out-of-tree build support.
The reason that solves the problem from an ecosystem level perspective is because it divides responsibility appropriately:
1. backend developers know the most about why their build_sdist implementation might fail (triggering an out-of-tree build request) 2. backend developers know the most about ways starting with a "dirty" source directory might corrupt their output 3. backend developers know the most about how to detect if they're starting with an unpacked sdist or not
The reason we *don't* define the delegation as being to an *in-place* build is because we *know* we're not happy with that for the setuptools/distutils status quo - if we were, this part of the design discussion would never have arisen, and we'd never have had either the build_directory parameter *or* the various incarnations of the "build directory preparation" hook.
And in what case would a frontend ever set this give_me_a_correct_wheel flag to False?
When the frontend either genuinely doesn't care (hopefully rare, but not inconceivable), or else when its building from an unpacked sdist and hence can be confident that the artifacts will be consistent with each other regardless of how the backend handles the situation (expected to be very common, since it's the path that will be followed for sdists published to PyPI, and when handed an arbitrary PEP 517 source tree to build, pip will likely try "build_sdist -> in-place build_wheel" first and only fall back to "out-of-tree build_wheel" if the initial build_sdist call fails).
Ah, the unpacked sdist case is a good point, I neglected that in my discussion of a possible setuptools build_wheel hook. But it's fine -- even if we say that the setuptools build_wheel hook has to produce an "sdist-consistent" wheel in all cases, then it can detect whether it's building from an unpacked sdist (e.g. by keying off the presence of PKG-INFO), and in that case it knows MANIFEST.in has already been taken into account and it can go straight to bdist_wheel.
And in fact, this is what we *want* it to key off of, *not* the in-place/out-of-place thing. Consider Debian: they want to do out-of-place builds of unpacked sdists. They're working with pristine unpacked sdists, so they don't want to setuptools to go pack up a new sdist and then unpack it again for every out-of-place build. (In fact, this is probably a less-tested path than building directly from the unpacked sdist.) So if the point of the out-of-place build feature is that it's supposed to tell setuptools that it needs to go via sdist... then in this case it will do the wrong thing. Out-of-place and sdist-consistency are orthogonal concepts.
That's a frontend design question, not an API design question - while I agree it would be logical for frontends to have a way for users to say "this directory is an unpacked sdist, you can trust it and ask the backend for an in-place build", that's neither here nor there when it comes to designing the interface between frontends and backends.
We don't want to get into the business of micromanaging how backends work (that's how we got into the current mess with distutils & setuptools), we just want to make it possible for frontend developers and backend developers to collaborate effectively over time.
So, uh... you're saying that you don't like this proposal, because it has too much micro-management:
- backends are required to produce sdist-consistent wheels
Yes, this qualifies as excessive micromanagement of backends, because we *know* backends can't reliably satisfy it, and we also know we don't actually need it (because this isn't a guarantee that setuptools/distutils provides, and that's the incumbent build system).
and instead prefer this proposal:
- backends must support one configuration where they place intermediate artifacts in one specified place, and are not required to produce sdist-consistent wheels - backends must also support another configuration where they place intermediate artifacts in another place that's specified in a different way, and in this case they are required to produce sdist-consistent wheels
Producing sdist-consistent wheels is always *desired*.
PEP 517 acknowledges that it isn't always *possible*, and hence places the following constraints on *front*ends in order to free up design space for backends:
- if a frontend wants to *ensure* consistency, it has to create the sdist first, and build from that - if build_sdist fails, then the frontend should fall back to an out-of-tree build using the build_directory parameter
For backends:
- for in-place builds, sdist-consistency is 100% the frontend's problem - for out-of-tree builds, sdist-consistency it the backend's problem, but mainly for cases where build_sdist would have failed
I mean... the second proposal encompasses the first proposal, and then adds extra requirements, and those extra requirements are about fiddly internal things like where intermediate artifacts should be cached... surely the first proposal is less micromanage-y?
No, as the first proposal not only overstates our requirements (as we're technically fine with backends producing inconsistent artifacts when handed a "dirty" source directory), it mainly looks less exacting because you haven't defined what you actually mean by "sdist-consistent wheels".
The current PEP instead breaks out some simpler *mechanical* recommendations that provide some substance to what "sdist-consistent wheels" actually means, and whether it's a concern the backend can assume the frontend has already taken care of.
By contrast, `please do an out-of-tree build using this directory` handles the same scenario (build_sdist failed, but build_wheel can still be made to work) in a more elegant fashion by allowing the front end to state *what* it wants (i.e. something that gets as close as is practical to the "build_sdist -> unpack sdist -> in-place build_wheel" path given the limitations of the current environment),
We totally agree the frontend should state "*what* it wants". This is exactly my problem with the PEP -- it doesn't do this! Instead, it has frontends specify something they don't care about (in-place/out-of-place) and then overloads that to mean this unrelated thing.
Sure, I can see that concern - it arises from the fact that the way a frontend tells a backend "I was trying to do 'build sdist -> unpack sdist -> build wheel', but the 'build sdist' step failed" is by passing in the directory where it would have unpacked the sdist as the out-of-tree build directory.
However, the apparent conflation arises from asking the question: "If we're going to support requesting out-of-tree builds in the backend API anyway, do we need *another* flag to say 'I wanted to build via sdist, but that didn't work'".
And I don't think we do - there isn't anything extra that a well-behaved backend should do for an out-of-tree build *just* because building an sdist failed, while there *are* things that a backend should do for an out-of-tree build that it wouldn't do for an in-place build.
Thus the distinction that actually models a meaningful real world build management concept is in-place/out-of-tree (as evidenced by the fact that it maps cleanly to most of the full-fledged build systems we/I surveyed), and that also turns out to be sufficient to handle our Python-specific problem of providing a fallback API for frontends to use when build_sdist fails.
My draft at the start of this thread was exactly designed by trying to find the intersection of (1) satisfying everyone's requirements, (2) restricting myself to operations whose semantics I could define in a generic, future-proof way.
It's a build system API - (2) is an utterly unattainable goal. The most we can hope for is an API that makes it relatively straightforward for end users, publishers, frontend developers and backend developers to figure out how to resolve cases where an end user's build fails (environmental issue, project build config issue, frontend bug, backend bug).
That means I'm going to *explicitly* ask that you accept that the PEP is going to be accepted, and it's going to be accepted with the API in its current form, even if you personally don't agree with our reasoning for all of the technical details. If your level of concern around the build_directory parameter specifically is high enough that you don't want to be listed as a co-author on PEP 517 anymore, then that's entirely reasonable (we can add a separate Acknowledgments section to recognise your significant input to the process without implying your endorsement of the final result), but as long as the accepted API ends up being supported in at least pip, flit, and enscons, it honestly doesn't really matter all that much in practice what the rest of us think of the design (we're here as design advisors, rather than being the ones that will necessarily need to cope with the bug reports arising from any interoperability challenges).
Clearly neither of us have convinced each other here. If you want to take over authorship of PEP 517, that's fine -- it's basically just acknowledging the current reality. But in that case I think you should step down as BDFL-delegate; even Guido doesn't accept his own PEPs.
This isn't my API design - it's one that Daniel suggested & Thomas accepted into the PEP.
However, I like it as BDFL-Delegate, and think it resolves all the previous concerns, so now I'm attempting to arbitrate a dispute between the listed PEP authors (where Thomas likes the design and is willing to implement it for both pip & flit, while you still have concerns about it and definitely haven't indicated that you're willing for it to be accepted with your name attached to it).
Here's another possible way forward: you mold PEP 517 into what you think is best, I take my text from the beginning of this thread and base a new PEP off it, and then Donald gets stuck as BDFL-delegate and has to pick. (Sorry Donald.) What do you think? It would at least have the advantage that everyone else gets a chance to catch up and look things over -- I think at the moment it's basically only you and me who actually have strong opinions, and everyone else is barely following.
It isn't my design (I just approve of it and have been providing updates to help make the PEP self-consistent in the interests of being able to formally accept it), so I don't think this proposal makes sense. However, it may make sense for you and Thomas to take the discussion offline as PEP co-authors, and figure out a resolution that satisfies you both, as I can't reasonably accept a PEP where one of the listed co-authors has made it quite clear that they don't want it to be accepted in its current form.
Whether that's a matter of Thomas becoming sole nominal author (with your significant contributions being recognised via an Acknowledgements section), or your deciding that you're willing to concede the point and reserve the right to tell us all "I told you so!" in a couple of years' time, or something else entirely, I don't mind (although I'd hope for a resolution that doesn't involve changing the proposed API yet again), but I do want to be clear: as BDFL-Delegate, I am 100% fine with the current technical proposal in PEP 517, as I think it addresses all of the design requirements that have been raised, can be readily implemented in both frontends and backends, and aligns well with the conventions of full-fledged build management systems.
The clarity of the prose has definitely suffered from the long design process, but I think the real solution to that is going to be fleshing out the specifications section in PyPUG as the main source of reference information, rather than relying on the PEPs themselves for that (since that particular problem is far from being unique to PEP 517).
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 17 July 2017 at 15:41, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
[I don't have time to properly read & respond now, but I wanted to suggest something]
Would it be better if we said that build_directory must always be specified, i.e. passing None is invalid?
A) Nathaniel has quite rightly expressed concern over requiring two different code paths. While I think the extra work is less than his wording implies, it is an extra case for things to go wrong.
I don't think that makes sense, as we want a clear "you don't need to worry about sdist consistency" code path for the normal case where the frontend has already ensured that the source directory is an unpacked sdist (hence ensuring that the built wheel will be consistent with that sdist as long as the backend restricts itself to using the contents of that directory as inputs). This core requirement seems to have gotten lost for a lot of folks due to the recent focus on how best to allow frontends like pip to handle the "Here's an arbitrary directory, please build it" case, when we expect the typical end user cases to be ones where the frontend already knows it has a pristine unpacked sdist and simply needs the backend to turn that directory into an installable wheel file (either because it download the sdist from PyPI, or because it just created it via build_sdist). While it would be lovely to be able to go with the "let's just pretend that problem doesn't exist for now" approach for the arbitrary directory case, I think you're right that it's a common enough scenario that we can't ignore it entirely, even in the first iteration of the API specification - to enable and encourage adoption, we need to give frontends and backends a reasonable way to exchange information about what is going on, and preferably do so in a way that aligns nicely with design concepts that already exist as part of full-fledged build systems. That last point is why I specifically *don't* want an idiosyncratic Python-specific flag as part of the API, and would prefer instead to address the need as a conventional expectation around the behaviour of out-of-tree builds based on exemplars like flit and enscons, similar to the way the semantics of operator overloading are handled in Python itself (i.e. technically you can do whatever you want with operator overloading, but if your overloading choices are inconsistent with other conventional uses of those symbols and hence cause problems for people and makes your API confusing or unreliable, they may decide not to use your library because of it). In-place/out-of-tree is an inherently meaningful concept for arbitrary build backends, and something that they can't automatically choose for themselves (since they need the target directory as a user provided configuration setting). By contrast, whether the starting directory for build_wheel is an unpacked sdist or not is something a backend should be able to work out for itself (e.g. by checking for PKG-INFO). That said, I *might* accept an explicit "unpacked_sdist=True/False" flag if someone can provide a compelling justification for why it isn't redundant with "build_directory=None" and/or backends checking directly for PKG-INFO (or other files they add to the sdist) in the source directory, and an explanation of what backends should do for "unpacked_sdist=False, build_directory=None" that they wouldn't do for the "unpacked_sdist=True, build_directory=None" case. Unlike build_directory, they certainly couldn't pass such a flag straight on to a general purpose build backend (or not pass it, in the "build_directory=None" case). And if the proposal is instead for an unpacked_sdist flag as a *replacement* for the currently proposed build_directory parameter... well, no. Out-of-tree builds are useful for a lot more than just pip indicating whether or not it failed to build an sdist, and if the claim is "there's no evidence that out-of-tree builds are sufficiently useful to include them in the specification", then I'll point you to the laments of everyone trying to manage cross-compiling for multiple platform ABIs without them, both inside and outside the Python community, as well as the efforts to design sensible ecosystem independent intermediate artifact caching strategies for automated build pipelines. Those reasons aren't necessarily part of why the parameter was initially proposed, but they *are* a big part of why I was an instant convert to the idea once the suggestion was made (and why I've actively opposed the notion that the concept hasn't demonstrated its utility well enough to be incorporated).
B) The current spec does another 'semantics not specified' for in place builds. This should make us uncomfortable - we're asking backends to implement something without telling them what. When we did something similar for editable installs, we ended up pulling it out again.
Technically the "semantics unspecified" only applies when the source directory is something other than an unpacked sdist, since such a build may differ from building via the sdist in unspecified, backend dependent ways. (This isn't new, it's acknowledging that the handling of dirty source directories is an inherently backend dependent behaviour, and that the main problem that pip currently has is with the way that setuptools/distutils/setup.py in particular deal with it, *not* necessarily with the way future backends will handle it in general). The semantics specification we add for the out-of-tree case is that it should be as close to "build sdist -> unpack sdist -> in-place build wheel" as practical in the given environment, while still leaving the specifics of how that is implemented, and how close it gets to actually matching the build-via-sdist behaviour up to the backend. For example, you're free to decide as flit's developer that since frontends are free to try the "build_sdist" path for themselves, you *won't* implicitly implement that for out-of-tree builds in flit, and will instead handle out-of-tree builds exactly the same way as you handle in-place builds. Whether or not that is sufficient in practice will then be an implementation discussion between you and your users (and *maybe* frontend developers if they end up fielding a lot of flit-specific bug reports), *not* an API specification discussion at the PyPA/distutils-sig level. For enscons, Daniel may decide to handle out-of-tree builds by passing "build_directory" down to the Scons backend as the target "variant directory", and otherwise not do anything special. Again, an entirely acceptable way of handling the proposed API specification. I'm entirely open to the idea that this part of the PEP still needs clarification - what's there attempts to explain why I'm willing to accept it as BDFL-Delegate, and why the pip developers are willing to accept it as sufficient for their needs, but it's also important that other frontend developers are able to understand how to use it, and backend developers are able to understand the anticipated expectations around how they implement it.
C) We thought we were adding this to satisfy people who want incremental builds, but it now looks like we were conflating two different things, and this is not actually a good answer for those use cases.
It does give folks a backend independent way to request incremental builds, but the main use case in practice is as the baseline essential build step of going from an unpacked sdist directory to a wheel file.
D) The frontend can specify build_directory=source_dir/build to build in a subdirectory of the source tree.
While this is true, I don't think it needs to be called out explicitly in the PEP.
E) When we do work out the need and the semantics for in place builds, we can write another PEP adding an optional hook for that.
The minimal specification for in-place builds is "Whatever you would do to build a wheel file from an unpacked sdist". The unspecified part is how that behaviour may change in the case where the source directory *isn't* an unpacked sdist and hence may contain other files. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Mon, Jul 17, 2017 at 7:50 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 17 July 2017 at 15:41, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
E) When we do work out the need and the semantics for in place builds, we can write another PEP adding an optional hook for that.
The minimal specification for in-place builds is "Whatever you would do to build a wheel file from an unpacked sdist".
Eh no, in-place has nothing to do with building a wheel. Several people have already pointed this out, you're mixing unrelated concepts and that's likely due to you using a definition for in-place/out-of-place that's nonstandard. It would be helpful if you either defined your terminology or (better) just dropped in-place/out-of-place and replaced it with for example "an empty tmpdir" vs. "a default directory which may contain build artifacts from previous builds" vs. <fill in however you like>. Note that distutils behavior adds to the confusion here: `build_ext --inplace` is actually an out-of-place build where the final extension modules are copied back into the source tree (but not any intermediate artifacts). Cheers, Ralf
On 17 July 2017 at 18:29, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, Jul 17, 2017 at 7:50 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
The minimal specification for in-place builds is "Whatever you would do to build a wheel file from an unpacked sdist".
Eh no, in-place has nothing to do with building a wheel. Several people have already pointed this out, you're mixing unrelated concepts and that's likely due to you using a definition for in-place/out-of-place that's nonstandard.
I'm using in-place specifically to mean any given PEP 517 backend's equivalent of an unqualified "./setup.py build_wheel". For an autotools backend, that might ultimately mean something like "./configure && make python_wheel". It *doesn't* necessarily mean the equivalent of "./configure && make", because it wouldn't make sense to assume that a project's *default* build target for a full-fledged build system will be to make Python wheel files (fortunately, frontends won't need to take, since hiding those kinds of details will be up to backends). I'm using out-of-tree to mean (as a baseline) what Daniel suggested: any given backend's equivalent of "./setup.py build -b <build_directory> build_wheel" (e.g. variant directories in Scons). One additional config setting needed: the build/target directory This approach means that backends can implement build directory support without caring in the slightest about how Python frontends intend to use it, and without worrying overly much about the different kinds of source directory (VCS clone, unpacked VCS release tarball, unpacked sdist) except insofar as they'll need to be able to detect which of those they've been asked to build from if it matters to their build process (e.g. generating Cython files in the non-sdist cases). The non-standard semantic convention being proposed specifically as part of PEP 517 is then solely that for frontends like pip, if build_sdist fails, they should fall back to just asking the backend for an out-of-tree build, rather than doing anything more exotic (or Python-specific). This *won't* give them the general assurance of sdist consistency that actually building via the sdist will, but that's fine - the assumption is that a frontend that cares about that assurance will only be using this interface if the sdist build already failed, so full assurance clearly isn't possible in the current environment. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Mon, Jul 17, 2017 at 8:53 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Mon, Jul 17, 2017 at 7:50 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
The minimal specification for in-place builds is "Whatever you would do to build a wheel file from an unpacked sdist".
Eh no, in-place has nothing to do with building a wheel. Several people have already pointed this out, you're mixing unrelated concepts and that's
On 17 July 2017 at 18:29, Ralf Gommers <ralf.gommers@gmail.com> wrote: likely
due to you using a definition for in-place/out-of-place that's nonstandard.
I'm using in-place specifically to mean any given PEP 517 backend's equivalent of an unqualified "./setup.py build_wheel".
Thanks. Very much nonstandard and possibly circular, but at least you've defined it:) I suggest you pick more precise wording, because this leaves little room for the more common use of in-place. Which you can define in several flavors as well, but all of them definitely have the property that if you put the source directory on sys.path you can import and use the package. build_wheel does not have that property. For an
autotools backend, that might ultimately mean something like "./configure && make python_wheel". It *doesn't* necessarily mean the equivalent of "./configure && make", because it wouldn't make sense to assume that a project's *default* build target for a full-fledged build system will be to make Python wheel files (fortunately, frontends won't need to take, since hiding those kinds of details will be up to backends).
I'm using out-of-tree to mean (as a baseline) what Daniel suggested: any given backend's equivalent of "./setup.py build -b <build_directory> build_wheel" (e.g. variant directories in Scons).
Leave off build_wheel (which is some metadata generation + zipping up the right files on top of building), then out-of-tree build is a clear concept.
One additional config setting needed: the build/target directory
This approach means that backends can implement build directory support without caring in the slightest about how Python frontends intend to use it, and without worrying overly much about the different kinds of source directory (VCS clone, unpacked VCS release tarball, unpacked sdist) except insofar as they'll need to be able to detect which of those they've been asked to build from if it matters to their build process (e.g. generating Cython files in the non-sdist cases).
This seems useful and clear.
The non-standard semantic convention being proposed specifically as part of PEP 517 is then solely that for frontends like pip, if build_sdist fails, they should fall back to just asking the backend for an out-of-tree build,
Say "asking the backend to build a wheel in a clean tmpdir" or something like that. Not clear who decides the path to the build dir by the way, is it frontend or backend or or frontend-if-it-specifies-one-otherwise-up-to-backend?
rather than doing anything more exotic (or Python-specific).
Building a wheel is inherently Python-specific.
This *won't* give them the general assurance of sdist consistency that actually building via the sdist will, but that's fine - the assumption is that a frontend that cares about that assurance will only be using this interface if the sdist build already failed, so full assurance clearly isn't possible in the current environment.
That strategy makes sense, seems like there's consensus on it. Cheers, Ralf
On 17 July 2017 at 20:00, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, Jul 17, 2017 at 8:53 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 17 July 2017 at 18:29, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, Jul 17, 2017 at 7:50 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
The minimal specification for in-place builds is "Whatever you would do to build a wheel file from an unpacked sdist".
Eh no, in-place has nothing to do with building a wheel. Several people have already pointed this out, you're mixing unrelated concepts and that's likely due to you using a definition for in-place/out-of-place that's nonstandard.
I'm using in-place specifically to mean any given PEP 517 backend's equivalent of an unqualified "./setup.py build_wheel".
Thanks. Very much nonstandard and possibly circular, but at least you've defined it:) I suggest you pick more precise wording, because this leaves little room for the more common use of in-place. Which you can define in several flavors as well, but all of them definitely have the property that if you put the source directory on sys.path you can import and use the package. build_wheel does not have that property.
Ah, thanks for clarifying. That's using "in-place" to refer to the Python-specific notion of an editable install ('setup.py develop', 'pip install -e', etc). Not a usage I've personally encountered, but I'm also a former embedded systems developer that now works for an operating system company, so I'm not necessarily the most up to speed on common terminology in environments more specifically focused on Python itself, rather than the full C/C++(/Rust/Go)/Python stack :) The in-place/out-of-tree sense currently used in the PEP (and my posts to the list about this point) is the common meaning for compiled languages, and hence the one common to most full-fledged build systems. However, it will definitely make sense to clarify that point, as it's quite reasonable for folks to read a phrase with a Python specific meaning in a PEP, even if key parts of that PEP are primarily about effectively interfacing with build systems originally designed to handle precompiled languages :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Mon, Jul 17, 2017 at 10:15 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 17 July 2017 at 20:00, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, Jul 17, 2017 at 8:53 PM, Nick Coghlan <ncoghlan@gmail.com>
On 17 July 2017 at 18:29, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, Jul 17, 2017 at 7:50 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
The minimal specification for in-place builds is "Whatever you would do to build a wheel file from an unpacked sdist".
Eh no, in-place has nothing to do with building a wheel. Several
have already pointed this out, you're mixing unrelated concepts and that's likely due to you using a definition for in-place/out-of-place that's nonstandard.
I'm using in-place specifically to mean any given PEP 517 backend's equivalent of an unqualified "./setup.py build_wheel".
Thanks. Very much nonstandard and possibly circular, but at least you've defined it:) I suggest you pick more precise wording, because this leaves little room for the more common use of in-place. Which you can define in several flavors as well, but all of them definitely have the property
wrote: people that
if you put the source directory on sys.path you can import and use the package. build_wheel does not have that property.
Ah, thanks for clarifying. That's using "in-place" to refer to the Python-specific notion of an editable install ('setup.py develop', 'pip install -e', etc).
Not really Python-specific, here's two of the first results of a Google search: https://cmake.org/Wiki/CMake_FAQ#Out-of-source_build_trees https://stackoverflow.com/questions/4018869/what-is-the-in-place-out-of-plac... It's basically: build artifacts go right next to the source files. For Python it then follows that you can import from the source dir, but that's just a consequence and not part of the definition of in-place at all.
Not a usage I've personally encountered, but I'm also a former embedded systems developer that now works for an operating system company, so I'm not necessarily the most up to speed on common terminology in environments more specifically focused on Python itself, rather than the full C/C++(/Rust/Go)/Python stack :)
The in-place/out-of-tree sense currently used in the PEP (and my posts to the list about this point) is the common meaning for compiled languages, and hence the one common to most full-fledged build systems.
Well, you keep on saying "build_wheel". A wheel is a packaging artifact rather than a build artifact, and is Python-specific. So not common for compiled languages. My mental picture is: 1. build steps (in/out-place) produce .o, .so, etc. files 2. building a wheel is a two-step process: first there's a build step (see point 1), then a packaging step producing a .whl archive. I suspect most people will see it like that. Hence it is super confusing to see you describing a *build* concept like in-place with reference to a *packaging* command like build_wheel. Cheers, Ralf However, it will definitely make sense to clarify that point, as it's
quite reasonable for folks to read a phrase with a Python specific meaning in a PEP, even if key parts of that PEP are primarily about effectively interfacing with build systems originally designed to handle precompiled languages :)
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 17 July 2017 at 18:53, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 17 July 2017 at 18:29, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, Jul 17, 2017 at 7:50 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
The minimal specification for in-place builds is "Whatever you would do to build a wheel file from an unpacked sdist".
Eh no, in-place has nothing to do with building a wheel. Several people have already pointed this out, you're mixing unrelated concepts and that's likely due to you using a definition for in-place/out-of-place that's nonstandard.
I'm using in-place specifically to mean any given PEP 517 backend's equivalent of an unqualified "./setup.py build_wheel".
Slight correction, since I left out `--dist-dir`, which is relevant to both in-place and out-of-tree builds: ./setup.py build_wheel --dist-dir <wheel_directory>
I'm using out-of-tree to mean (as a baseline) what Daniel suggested: any given backend's equivalent of "./setup.py build -b <build_directory> build_wheel" (e.g. variant directories in Scons).
And for out-of-tree builds: ./setup.py build -b <build_directory> build_wheel --dist-dir <wheel_directory> And to elaborate a bit further on my perspective in the latest round of API discussions: As far as I can tell, Nathaniel is suggesting either that: 1. We shouldn't allow frontends to explicitly request out-of-tree builds (which I disagree with, since out-of-tree builds are useful for a whole host of reasons, hence why most build systems natively support them, even setuptools); or 2. Even with out-of-tree builds, we should allow frontends to *separately* indicate "I tried to build the sdist first, but that failed, so I'm resorting to building the wheel directly" And I can completely see the point of folks saying "But you're conflating two different questions" when I declare the latter notion redundant given out-of-tree build support, since that's exactly what I'm doing: 1. Daniel raised the idea of a build directory parameter to request out-of-tree builds as a replacement for prepare_input_for_build_wheel 2. Thomas liked that idea and added it to the PEP 3. I/we realised that if the pip devs were willing to rely on out-of-tree builds as a fallback when build_sdist failed, then we could *avoid* adding a Python-specific sdist-consistency concept directly to the backend API definition (beyond some aspirational text in the definition of build_wheel) until we had clear evidence that we actually needed it 4. I personally switched from requesting any further API changes to assisting with clarifications aimed at addressing the concerns of folks trying to follow what just happened and get us to a point where the PEP could be accepted The end result of all of that is that my position now is this: - Out-of-tree builds are an inherently useful thing for the backend API to model, independently of how pip and other frontends specifically end up using them. Now that they've been added, I'll be highly reluctant to accept any version of PEP 517 that attempts to take them out again. - I *think* we'll find that out-of-tree builds using a pip-managed intermediate directory are sufficient in practice to meet pip's needs when it comes to handling cases where build_sdist fails - *If* I turn out to be wrong about that, *then* we can look at adding back the "prepare_input_for_build_wheel" hook (or something along those lines) as a fallback specifically for the case where a frontend would really prefer to be building via sdist, but sdist creation isn't possible for some reason, informed by specific knowledge of cases where falling back to out-of-tree builds proved to be inadequate By doing things this way, we're not risking lumbering ourselves permanently with a Python-specific build flag or optional hook that may turn out to only be relevant to the way setuptools/distutils currently work. Instead, in the ideal case where we don't need to revise the backend API, we'll *only* have the concept of in-place/out-of-tree builds, which is a general one, not specific to any particular build system or any particular use case. And that now seems to be the core point of contention: my understanding is that Nathaniel would prefer to add the ill-defined and Python-specific notion of sdist-consistency to the backend API *now*, and *postpone* adding the general concept of out-of-tree builds. I don't think that's a good idea, since our only well-defined notion of sdist consistency involves actually building the sdist first, and then building the wheel from that. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Jul 17, 2017, at 6:01 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
And I can completely see the point of folks saying "But you're conflating two different questions" when I declare the latter notion redundant given out-of-tree build support, since that's exactly what I'm doing:
FWIW I consider the idea of a try-extra-hard flag (either explicit in the API or implicit in what kind of build you’re selected) a bad idea. While I ultimately “lost” the fight to narrow down the “variants” of installs to VCS -> sdist -> wheel -> install, I think adding a new variant of VCS -> funky try-harder but not as hard as sdist -> wheel -> install is only going to make things more confusing for end users as there is yet another subtle variant in their installs. I also don’t think it really even makes sense. If a backend had everything it needed to create a sdist, then it could have just successfully created the sdist and be done with it and we wouldn’t have called this hook. Since, aiui, the idea is to call this when sdists fail, then we’re assuming that for some reason the backend couldn’t create the sdist. Why do we assume that the backend is going to be capable of ensuring consistency with a sdist when it couldn’t create the sdist to begin with? In addition, why do we want a YOLO option that differs from the “try hard” version? Unless trying hard is some sort of super slow process (which it shouldn’t be, unless you’re creating a sdist as part of it, but if you can create a sdist why did build_sdist fail?) then it seems like you should just try hard always. I don’t care about in-place vs out-of-place builds. Add them, don’t add them whatever we can make it work either way. It seems like possibly a useful feature although Nathanial seems to have reservations that we haven’t fully thought through the implications of adding out of place building which seems like a legitimate concern. However, I am against having a “try hard” vs “YOLO” flag, explicit or implicit. — Donald Stufft
On 17 July 2017 at 11:41, Donald Stufft <donald@stufft.io> wrote:
FWIW I consider the idea of a try-extra-hard flag (either explicit in the API or implicit in what kind of build you’re selected) a bad idea. While I ultimately “lost” the fight to narrow down the “variants” of installs to VCS -> sdist -> wheel -> install, I think adding a new variant of VCS -> funky try-harder but not as hard as sdist -> wheel -> install is only going to make things more confusing for end users as there is yet another subtle variant in their installs.
I also don’t think it really even makes sense. If a backend had everything it needed to create a sdist, then it could have just successfully created the sdist and be done with it and we wouldn’t have called this hook. Since, aiui, the idea is to call this when sdists fail, then we’re assuming that for some reason the backend couldn’t create the sdist. Why do we assume that the backend is going to be capable of ensuring consistency with a sdist when it couldn’t create the sdist to begin with? In addition, why do we want a YOLO option that differs from the “try hard” version? Unless trying hard is some sort of super slow process (which it shouldn’t be, unless you’re creating a sdist as part of it, but if you can create a sdist why did build_sdist fail?) then it seems like you should just try hard always.
I don’t care about in-place vs out-of-place builds. Add them, don’t add them whatever we can make it work either way. It seems like possibly a useful feature although Nathanial seems to have reservations that we haven’t fully thought through the implications of adding out of place building which seems like a legitimate concern. However, I am against having a “try hard” vs “YOLO” flag, explicit or implicit.
If we have a consensus here that "build a sdist and build a wheel from it" is an acceptable/viable main route for pip to generate wheels (with "just ask the backend" as fallback) then I'm OK with not bothering with an "ask the backend to build a wheel out of tree" option. My recollection of the history was that there was some resistance in the past to pip going down the "build via sdist" route, but if that's now considered OK in this forum, then I'm fine with assuming that either I was mistaken or things have changed. I'm still concerned that the "wheel doesn't match sdist" problem might come up, but if we're OK with pointing users at the backend for resolution of such issues, then I'd be happy with pip simply including a message "Unable to create sdist - asking backend to create wheel directly" as a warning to the user as to what's happened. I'll also note that in both of the following cases, none of this matters: 1. Install from wheel (no backend involved). 2. Install from sdist (or build wheel from sdist) - pip just asks the backend to build inplace where we unpack the sdist. So for 99.9% of normal use (install from a requirement, i.e. from something published on PyPI) this whole debate is irrelevant. Paul
On Mon, Jul 17, 2017, at 01:07 PM, Paul Moore wrote:
If we have a consensus here that "build a sdist and build a wheel from it" is an acceptable/viable main route for pip to generate wheels (with "just ask the backend" as fallback) then I'm OK with not bothering with an "ask the backend to build a wheel out of tree" option. My recollection of the history was that there was some resistance in the past to pip going down the "build via sdist" route, but if that's now considered OK in this forum, then I'm fine with assuming that either I was mistaken or things have changed.
I think I was one of the people arguing against going via an sdist. The important point for me is that an sdist is not a requirement for installing from source - it's ok by me if it tries building an sdist first and then falls back to building a wheel directly. Since flit generates no intermediate artifacts, though, it's not an informative data point for where they should be placed. Thomas
On Mon, Jul 17, 2017 at 11:30 PM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Mon, Jul 17, 2017, at 01:07 PM, Paul Moore wrote:
If we have a consensus here that "build a sdist and build a wheel from it" is an acceptable/viable main route for pip to generate wheels (with "just ask the backend" as fallback) then I'm OK with not bothering with an "ask the backend to build a wheel out of tree" option. My recollection of the history was that there was some resistance in the past to pip going down the "build via sdist" route, but if that's now considered OK in this forum, then I'm fine with assuming that either I was mistaken or things have changed.
I think I was one of the people arguing against going via an sdist. The important point for me is that an sdist is not a requirement for installing from source - it's ok by me if it tries building an sdist first and then falls back to building a wheel directly.
Same here, I had a preference for not going via sdist but am OK with the current status of the PEP. Cheers, Ralf
On 17 July 2017 at 21:30, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Mon, Jul 17, 2017, at 01:07 PM, Paul Moore wrote:
If we have a consensus here that "build a sdist and build a wheel from it" is an acceptable/viable main route for pip to generate wheels (with "just ask the backend" as fallback) then I'm OK with not bothering with an "ask the backend to build a wheel out of tree" option. My recollection of the history was that there was some resistance in the past to pip going down the "build via sdist" route, but if that's now considered OK in this forum, then I'm fine with assuming that either I was mistaken or things have changed.
I think I was one of the people arguing against going via an sdist. The important point for me is that an sdist is not a requirement for installing from source - it's ok by me if it tries building an sdist first and then falls back to building a wheel directly.
Since flit generates no intermediate artifacts, though, it's not an informative data point for where they should be placed.
Exactly, and this was the key point that Daniel's out-of-tree build suggestion enabled: it *kept* the concept of on out-of-tree build that was previously being kind-of-sort-of enabled by the "prepare_input_for_build_wheel" hook, but changed it in such a way that pure Python frontends like flit could effectively just ignore it if they weren't going to attempt to emulate a limited version of a via-sdist build (since they don't otherwise include any intermediate build artifacts), while adapter backends like enscons could just pass the config option down to the underlying build system. At the same time, it allowed the decision on whether pip would fall back to out-of-tree builds or in-place ones when building the sdist failed to be deferred until PEP 517 support was actually being added to pip. While I do think it's still worthwhile to include the aspirational guidance around backends trying to keep their out-of-tree builds as close to the "build sdist -> unpack sdist -> build wheel" approach as they can (as well as implementing the example backend that way), we can also attempt to make it more explicit that frontends can only *ensure* sdist-consistency by actually making an sdist first - while backends are encouraged to make the two paths as consistent as they can, they're not *required* to do so (which is the distinction that allows both in-place and out-of-tree wheel builds to work in cases where building the sdist will fail due to missing VCS metadata or tools). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 17 July 2017 at 13:04, Nick Coghlan <ncoghlan@gmail.com> wrote:
While I do think it's still worthwhile to include the aspirational guidance around backends trying to keep their out-of-tree builds as close to the "build sdist -> unpack sdist -> build wheel" approach as they can (as well as implementing the example backend that way), we can also attempt to make it more explicit that frontends can only *ensure* sdist-consistency by actually making an sdist first - while backends are encouraged to make the two paths as consistent as they can, they're not *required* to do so (which is the distinction that allows both in-place and out-of-tree wheel builds to work in cases where building the sdist will fail due to missing VCS metadata or tools).
And yet I'm coming round to Donald's view that if we can't build a sdist, then we should just switch straight to "build_wheel", and dump both the responsibility for integrity of the result *and* any user issues that might result onto the backend. So I don't know that in practice, pip will bother with the "out-of-tree" option (explicitly specifying a build_directory). I may be wrong on that, though - we won't know for sure until we try to implement support for the PEP. That's going to be an education issue, as we currently tend to get users reporting these types of problem as pip issues (and with setuptools also being a PyPA project, I don't think currently it's necessarily an obvious distinction when we describe the behaviour as a setuptools limitation rather than a pip one). But again, if the community here is happy with that, I'm not going to argue. Paul
On 17 July 2017 at 22:19, Paul Moore <p.f.moore@gmail.com> wrote:
On 17 July 2017 at 13:04, Nick Coghlan <ncoghlan@gmail.com> wrote:
While I do think it's still worthwhile to include the aspirational guidance around backends trying to keep their out-of-tree builds as close to the "build sdist -> unpack sdist -> build wheel" approach as they can (as well as implementing the example backend that way), we can also attempt to make it more explicit that frontends can only *ensure* sdist-consistency by actually making an sdist first - while backends are encouraged to make the two paths as consistent as they can, they're not *required* to do so (which is the distinction that allows both in-place and out-of-tree wheel builds to work in cases where building the sdist will fail due to missing VCS metadata or tools).
And yet I'm coming round to Donald's view that if we can't build a sdist, then we should just switch straight to "build_wheel", and dump both the responsibility for integrity of the result *and* any user issues that might result onto the backend. So I don't know that in practice, pip will bother with the "out-of-tree" option (explicitly specifying a build_directory). I may be wrong on that, though - we won't know for sure until we try to implement support for the PEP.
Right, and that's why I think it's worth keeping the aspirational wording for out-of-tree builds in the initially accepted version of the PEP. If we later decide it's unnecessary even as an aspirational statement, then we can drop it without any problems (since the worst case scenario is that a trailblazing backend that isn't flit or enscons gets to simplify their build_wheel implementation a bit by changing how they handle out-of-tree build requests).
That's going to be an education issue, as we currently tend to get users reporting these types of problem as pip issues (and with setuptools also being a PyPA project, I don't think currently it's necessarily an obvious distinction when we describe the behaviour as a setuptools limitation rather than a pip one). But again, if the community here is happy with that, I'm not going to argue.
I think that's inevitable for any build frontend, and PEP 517 should at least give you improved options for reporting *which* part of the overall build process couldn't be executed (and perhaps even lead to changes in the way pip builds setuptools/distutils based projects by default). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Here's my own summary. pip can do build_sdist -> unpack sdist -> build wheel if it wants to, serving as some kind of linter if you happen to run 'pip install .' during development. build_directory provides a way to ask for a clean build, the lack of which causes problems in bdist_wheel sometimes. It is one way to try harder to generate a correct wheel, not try harder to generate a correct sdist. An important class of "can't build an sdist" problems happen in an unpacked sdist, when the build system expects VCS metadata to get the sdist manifest. Perhaps you are patching a dependency and don't need to do new source releases in this case, but it would be really annoying if 'pip install .' refused to work. You can probably get a source distribution when you really need one, i.e. during ordinary development of your own package from a VCS.
On Mon, Jul 17, 2017 at 6:36 AM, Daniel Holth <dholth@gmail.com> wrote:
Here's my own summary.
pip can do build_sdist -> unpack sdist -> build wheel if it wants to, serving as some kind of linter if you happen to run 'pip install .' during development.
build_directory provides a way to ask for a clean build, the lack of which causes problems in bdist_wheel sometimes. It is one way to try harder to generate a correct wheel, not try harder to generate a correct sdist.
I would say -- build_directory provides a way to ask that the source tree be left clean for next time. This is unfortunately not quite the same. From pip's point of view, the advantages of build_sdist are twofold: (1) it validates the sdist building path, (2) it really does give a clean build. This is discussed more here: https://mail.python.org/pipermail/distutils-sig/2017-July/031020.html (I know you've read it, but putting the link in for future archive readers etc.) -n -- Nathaniel J. Smith -- https://vorpus.org
Yes, build_directory does not do the same thing as sdist -> unpack -> build. It would be more likely to be useful if you are not going through the sdist step for any reason on a tree that has been built before. The assumption being that the back end might use src/*.py as its source, but build/** as the source for the wheel, so that any extra files that wind up in build/ are automatically included in the wheel. bdist_wheel does this. SCons does not work this way, it keeps track of all the files that should be in build/, adds only those files to the wheel, and ignores all extra files in build/. Perhaps you have a system that uses 'hg manifest' to create the sdist and src/*.py to build, but you forgot to add src/newfile.py to version control. Then sdist -> unpack -> build would tell you something. Can we imagine a situation where a built file shows up in src/ and the wheel but not in the sdist? Seems like you will need to show the backend's stdout on an error whatever the hook's return value is. Let's say you are not using 'pip install .' for development, then you get to validate your release artifacts some other way. On Mon, Jul 17, 2017 at 9:54 AM Nathaniel Smith <njs@pobox.com> wrote:
On Mon, Jul 17, 2017 at 6:36 AM, Daniel Holth <dholth@gmail.com> wrote:
Here's my own summary.
pip can do build_sdist -> unpack sdist -> build wheel if it wants to, serving as some kind of linter if you happen to run 'pip install .' during development.
build_directory provides a way to ask for a clean build, the lack of which causes problems in bdist_wheel sometimes. It is one way to try harder to generate a correct wheel, not try harder to generate a correct sdist.
I would say -- build_directory provides a way to ask that the source tree be left clean for next time. This is unfortunately not quite the same. From pip's point of view, the advantages of build_sdist are twofold: (1) it validates the sdist building path, (2) it really does give a clean build.
This is discussed more here: https://mail.python.org/pipermail/distutils-sig/2017-July/031020.html
(I know you've read it, but putting the link in for future archive readers etc.)
-n
-- Nathaniel J. Smith -- https://vorpus.org
On 18 July 2017 at 01:25, Daniel Holth <dholth@gmail.com> wrote:
Yes, build_directory does not do the same thing as sdist -> unpack -> build. It would be more likely to be useful if you are not going through the sdist step for any reason on a tree that has been built before.
Ralf's questions about describing "build_directory=None" as in-place and "build_directory=<some_dir>" as out-of-tree also made me realise something, which is that the real distinction is between whether it's the frontend or the backend that is specifying the build directory. That is, "build_directory=<some_dir>" is the frontend saying "definitely do an out-of-tree build, and use this directory for the generated artifacts". Whether that's a clean directory or one containing artifacts from a previous build is thus up to to the frontend. By contrast, "build_directory=None" is telling the backend "Use your default build directory, regardless of whether that's in-place or out-of-tree", rather than specifically saying "Do an in-place build". Cheers, NIck. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
One point of issue with the PEP as it currently stands: I would greatly prefer it if we had a sigil to differentiate from "can't build the sdist, please fall back" from "an error has occurred trying to build the sdist". This would allow flit to return something that's says it can't build a sdist inn the current tree but won't end up swallowing legit errors from trying to build a sdist. Sent from my iPhone
On Jul 17, 2017, at 7:30 AM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
The important point for me is that an sdist is not a requirement for installing from source - it's ok by me if it tries building an sdist first and then falls back to building a wheel directly.
On 17 July 2017 at 22:48, Donald Stufft <donald@stufft.io> wrote:
One point of issue with the PEP as it currently stands: I would greatly prefer it if we had a sigil to differentiate from "can't build the sdist, please fall back" from "an error has occurred trying to build the sdist". This would allow flit to return something that's says it can't build a sdist inn the current tree but won't end up swallowing legit errors from trying to build a sdist.
If Thomas is OK with it, I'd be fine with using "raise NotImplementedError" for that purpose. In flit's case, it would presumably be used to indicate that either the VCS metadata or or the required VCS tools can't be found. As an added bonus: the frontend could display the exception message as part of executing the fallback. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Mon, Jul 17, 2017 at 5:56 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 17 July 2017 at 22:48, Donald Stufft <donald@stufft.io> wrote:
One point of issue with the PEP as it currently stands: I would greatly prefer it if we had a sigil to differentiate from "can't build the sdist, please fall back" from "an error has occurred trying to build the sdist". This would allow flit to return something that's says it can't build a sdist inn the current tree but won't end up swallowing legit errors from trying to build a sdist.
If Thomas is OK with it, I'd be fine with using "raise NotImplementedError" for that purpose.
In flit's case, it would presumably be used to indicate that either the VCS metadata or or the required VCS tools can't be found.
As an added bonus: the frontend could display the exception message as part of executing the fallback.
I can live with this, but I wrote up a rationale for why it's somewhat worse than the alternative: https://mail.python.org/pipermail/distutils-sig/2017-July/030901.html So my request is that if you're going to insist on this can you provide some similar rationale for why? -n -- Nathaniel J. Smith -- https://vorpus.org
On 17 July 2017 at 23:06, Nathaniel Smith <njs@pobox.com> wrote:
On Mon, Jul 17, 2017 at 5:56 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
As an added bonus: the frontend could display the exception message as part of executing the fallback.
I can live with this, but I wrote up a rationale for why it's somewhat worse than the alternative: https://mail.python.org/pipermail/distutils-sig/2017-July/030901.html
So my request is that if you're going to insist on this can you provide some similar rationale for why?
Sure: because of the way magic return values fail if the *frontend* doesn't check for them. That is, if we make the protocol "return NotImplemented", then the likely symptom of a missing check in a frontend is going to be something like this (on 3.6+):
open(NotImplemented) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: expected str, bytes or os.PathLike object, not NotImplementedType
Or, on Python 2.7, the even more cryptic:
open(NotImplemented) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: coercing to Unicode: need string or buffer, NotImplementedType found
And on earlier versions of Python 3.x:
open(NotImplemented) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: invalid file: NotImplemented
And any associated exception traceback won't point to the offending unchecked call to backend.build_sdist, it will point to the first attempted *use* of that return value as a filename. By contrast, if we make the error signal "raise NotImplementedError", then a missing check in the frontend will not only reliably give a traceback that points to the failed build_sdist call (since NotImplementedError will be treated the same as any other exception by default), but it will also include any specific message that the backend developers chose to provide. If backend implementors want to ensure that it's categorically impossible for NotImplementedError to escape unexpectedly, they can write their build_sdist like this: def build_sdist(sdist_directory, config_settings=None): try: problem = check_sdist_preconditions(sdist_directory, config_settings) except NotImplementedError as exc: raise RuntimeErrror("Unexpected NotImplementedError when checking sdist preconditions") from exc if problem: raise NotImplementedError(problem) try: return _make_sdist(sdist_directory, config_settings) except NotImplementedError as exc: raise RuntimeErrror("Unexpected NotImplementedError when making sdist") from exc However, Python APIs letting NotImplementedError escape is rare enough that even I'd consider writing a build_sdist implementation that way as being overly paranoid :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Mon, Jul 17, 2017, at 02:56 PM, Nick Coghlan wrote:
If Thomas is OK with it, I'd be fine with using "raise NotImplementedError" for that purpose.
I will implement this if it's what we decide, but I agree with Nathaniel that a sentinel value is probably more robust, because return values can't automatically bubble up from internal code like errors do.
Fwiw I also agree return values are more robust, though the fact that an exception has a message is also a nice property since it allows backends to provide a reason they dont support sdist in this tree (which some tools may want to surface to users, particularly if they *need* a sdist, like a hypothetical 'twine sdist' command). That could be implemented by returning a Tuple though or something. Sent from my iPhone
On Jul 17, 2017, at 9:12 AM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Mon, Jul 17, 2017, at 02:56 PM, Nick Coghlan wrote: If Thomas is OK with it, I'd be fine with using "raise NotImplementedError" for that purpose.
I will implement this if it's what we decide, but I agree with Nathaniel that a sentinel value is probably more robust, because return values can't automatically bubble up from internal code like errors do.
On Mon, Jul 17, 2017 at 6:25 AM, Donald Stufft <donald@stufft.io> wrote:
Fwiw I also agree return values are more robust, though the fact that an exception has a message is also a nice property since it allows backends to provide a reason they dont support sdist in this tree (which some tools may want to surface to users, particularly if they *need* a sdist, like a hypothetical 'twine sdist' command). That could be implemented by returning a Tuple though or something.
What's your opinion on "show the backend's stderr output" as an error message reporting system? It's admittedly a bit duct-tape-and-baling-wire, but tools like 'twine sdist' need to capture and display stderr regardless, and either way it's just unstructured text we're dumping at the user, so I'm having trouble pointing to any specific value provided by having a second error channel. -n -- Nathaniel J. Smith -- https://vorpus.org
On Jul 15, 2017, at 11:50 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
The exact norms around what's acceptable behaviour for out-of-tree wheel builds (and just how hard backends should try to match the build_sdist -> in-place build_wheel path in that case) is then something that will evolve over time, and I'm OK with that.
My expectation is that build backends are going to do the same amount of effort for trying to match the “via sdist” case in both the out of place and the in-place builds. I may be wrong, but that’s my general expectation because I don’t think it really makes sense for a build backend to go through a whole lot of extra effort in the out of place build case and then.. just not do that in the in-place build case. So I think that option is largely satisfying the “don’t crap up the current directory” desire and a desire to put the build artifacts in a certain location for subsequent caching. I don’t think it’s going to be generally useful for trying to match a sdist. I don’t however think this is a bad thing, honestly trying to do something different in terms of matching in-place vs out-of-place builds in terms of what files get installed sounds like a recipe for adding *another* variant of way something gets installed which means it makes the possible problem worse not better. I think that if you want a guarantee of parity with the via sdist case, then you have to go via sdist or you’re just introducing another case, but we should document that build tools should generally strive to be as consistent as they can in both the via sdist and the direct to wheel case. — Donald Stufft
On 16 July 2017 at 04:33, Donald Stufft <donald@stufft.io> wrote:
[1] One note, I noticed there’s still instances of prepare_wheel_metadata in the text.
Good catch: https://github.com/python/peps/pull/311/files aims to deal with that through a combination of updating them to a new name, and through change the API evolution example to focus on build_sdist rather than prepare_metadata_for_build_wheel. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, Jul 15, 2017, at 08:33 PM, Donald Stufft wrote:
I wonder if maybe it would be more useful to simply recommend that instead of shelling out to random vcs binaries that these projects depend on (or bundle) libraries to directly interact with a repository. For instance, if your project supports git, then you can use dulwich or pygit2 and then the invariant of “building inside of a docker container without `git` installed” still remains functional. I did consider this kind of approach. It might be feasible for git using dulwich (pygit2 expects libgit2 on your system, so you can't just require it as a Python package). But it's ironically not workable with mercurial, even though it's pure Python, because hg uses Python 2, while flit requires Python 3. And I don't see this working reliably for svn, or bzr, or other less common VCSs. So at least for flit, I think we will continue to rely on external, non-pip installable dependencies for this. This isn't a problem so long as building an sdist isn't necessary to get a project installed. Thomas
On 7 July 2017 at 07:54, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On Thu, Jul 6, 2017, at 10:40 PM, Daniel Holth wrote:
It might be more natural to pass a build directory for intermediate build artefacts along with the wheel output directory to the build wheel hook. This would remove pip from an awkward position of managing a copy step in the middle of a build and would be more like out of tree builds in other build systems. For example in automake you do out of tree builds by making a new build directory and running the configure script from that directory instead of the source directory. With a fresh directory old builds don't get in the way.
I would also be happy with this. Though if you're trusting the backend to do a tidy build, do you need to pass in a directory for intermediates at all? The backend could just create a temporary directory itself.
I rather like this idea, as it has the potential to interact nicely with the directory caching features that are starting to be baked in to some CI and build pipelines (e.g. OpenShift s2i image builds, BitBucket Pipelines). For those kinds of caches to work properly, the build frontend needs to be able to manage where intermediate artifacts end up, rather than having it be an arbitrary directory chosen by the backend (Note: I'm not suggesting that *pip* would do this - just that given this feature in the backend API definition, it would make it easier for hypothetical future frontends to play nice with directory based caching models for intermediate artifacts). As such, specifying a non-None "build_directory" in the call to `build_wheel` would serve two purposes: 1. Inform the backend that it *shouldn't* attempt to mutate the current directory 2. Let the frontend control where in the filesystem the backend puts its intermediate artifacts If the frontend *doesn't* specify a separate build directory, we'd recommend that backends treat that as a request for an in-place build, but backends would technically still be free to do an out-of-place build if they preferred to do so.
I think Paul & Donald have been pretty adamantly against trusting backends to build tidily, though. And this certainly doesn't do anything like Donald wants to ensure that sdists don't omit key files.
I think with the out-of-tree build directory being passed as a parameter to build_wheel it would be possible to make the case for the logic in pip being: - try build_sdist first - if that succeeds, unpack the sdist, and do an in-place build from that directory - if it fails, ask the backend to do an out-of-tree build directly The reason I think that would be viable is because it would lead to the following outcome: - on a publisher's machine, we'd expect "build_sdist" to work, so *publishers* will normally be using the "via sdist" path, and hence they'll have a natural incentive to make the "stree -> sdist -> wheel" path reliable - on an end user's machine, we'd expect that even if they were missing the pieces needed to build the sdist, at least the out-of-tree wheel build should work As a primarily publisher focused tool, flit could also keep its own native "flit install" command that skipped calling the build_sdist hook and instead just checked directly for files that exist but would be omitted from the sdist. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Jul 5, 2017, at 12:08 PM, Paul Moore <p.f.moore@gmail.com> wrote:
I have to say I still have deep reservations about flit's approach of assuming/requiring that you're using VCS (git) to maintain your project. I know that in practical terms most people will be, but it still seems like a strong assumption to make. One of the consequences is that flit doesn't handle scenarios like "I unpacked a sdist" or "I downloaded the project archive from github and unpacked that" well. And the result of *that* is that we're putting in mechanisms in the PEP to manage that approach.
I think it’s possible for us to generically handle the “I unpacked a sdist” case in pip by just using the current shutil.copytree approach. The bad case for that (large .git repositories, .tox, etc) generally only happen in actual development environments and not inside of a sdist itself. The only thing we would need is some mechanism for a front end to determine if something is an unpacked sdist or not. Of course the backend could also implement it a similar way, by just tarring up the entire directory if it detects it’s in an unpacked sdist (which means it would need to add a mechanism to detect if it’s in an unpacked sdist, rather than us needing to add one to the PEP itself). I don’t have a strong preference for how we do that, and we could even do both things TBH, and reuse Nathaniel’s idea of using the NotImplemented singleton and make the logic (from pip’s POV): * From a VCS/random directory that ISNT an unpacked sdist, call build_sdist and if that fails or says it’s not implemented, then fail the build and surface an error to the user to indicate why. * From a unpacked sdist (as determined by some hypothetical mechanism we would add to the PEP) call build_sdist and if that fails surface an error to the user to indicate why, but if it returns NotImplemented then fall back to just using shutil.copytree. That would allow a tool like flit to just completely ignore the unpacked sdist case (other than to return a NotImplemented) and have things still work fine BUT it also allows another tool to do something extra in that case (such as verifying all of the file hashes against a record of known hashes and munging the version if any differ to indicate the version has changed). That doesn’t solve the “I downloaded a archive from GitHub” or “I mounted my VCS checkout into a docker container without my VCS installed”, but those can only be solved by the backend tool itself and IMO it’s perfectly acceptable for the backend to fail with an appropriate error message if it needs something that the current directory or environment doesn’t provide. — Donald Stufft
On 2017-07-05 12:40:08 -0400 (-0400), Donald Stufft wrote: [...]
That doesn’t solve the “I downloaded a archive from GitHub” or “I mounted my VCS checkout into a docker container without my VCS installed”, but those can only be solved by the backend tool itself and IMO it’s perfectly acceptable for the backend to fail with an appropriate error message if it needs something that the current directory or environment doesn’t provide.
That's exactly the approach PBR has taken since the beginning, and it's worked just fine. Users do still wrongly assume from time to time that they should be able to treat a "GitHub tarball" or similar contextless pile of files like an sdist even if it hasn't been correctly passed through the sdist build process first, but a combination of sane fallback behaviors, environment variable overrides and clear error messages keeps the invalid bug reports to a manageable minimum. -- Jeremy Stanley
I already responded to several of the overall points elsewhere in the thread, but a few specific points: On Mon, Jul 3, 2017 at 7:03 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I want prepare_wheel_metadata there as a straightforward way for backends to expose the equivalent of "setup.py egg_info". Losing that would be a significant regression relative to the status quo, since pip relies on it during the dependency resolution process
AFAIK this is a new argument that hasn't been made previously in any of the discussions here. Is it a real problem that we've all missed? This is important, because if pip really needs prepare_wheel_metadata right now, then that's an argument that it needs to be a *mandatory* hook, which is not the case in any version of PEP 517.
To tighten that requirement up even further: if the backend's capabilities can't be accurately determined using "inspect.getattr_static", then the backend is not compliant with the spec. The build frontend/backend API is not a space where we want people to try to be clever - we want them to be completely dull and boring and use the most mundane code they can possibly come up with.
This does sound like a nice idea, but I mean, look at what people do to distutils... I don't think there's much value in putting in Strongly Worded Prohibitions that people will just ignore. Also I don't think getattr_static is what you mean -- e.g. flit might well write something like: # flit/backend.py if os.path.exists(".git"): def build_sdist(config_settings): ... and that's totally getattr_static compliant.
- Add a TODO to decide how to handle backends that don't want to have multiple hooks called from the same process, including some discussion of the options.
I don't think that's a TODO: I'm happy with the option of restricting frontends to "one hook invocation per subprocess call".
It only becomes an open question in this revised draft by virtue of making the get_requires_* hooks mandatory, and I have a different resolution for that: keep those hooks optional, so that only backends that genuinely support dynamic build time dependencies will define them (others should either just get users to list any additional static build dependencies in pyproject.toml, or else list any always-needed static dependencies in the backend's own install_requires).
If you read the text, I pretty much came to the same conclusion :-). I wanted to flag it as TODO though b/c AFAICT there hasn't been any discussion of the trade-offs on the list, so the text I wrote is raising new topics. That said, if you have prepare_build_metadata and prepare_build_files, then those *also* raise the issue of whether you really want to have to spawn a new process. They ~double the number of process spawns, which is significant overhead on Windows. I suspect the right answer in any case is to default to running each hook in a new process, and either now or later add a flag the backend can provide to say that it's happy to run multiple hooks in the same process. (Adding this later is totally backwards compatible if the default is separate processes.) -n -- Nathaniel J. Smith -- https://vorpus.org
On 5 July 2017 at 10:45, Nathaniel Smith <njs@pobox.com> wrote:
I already responded to several of the overall points elsewhere in the thread, but a few specific points:
On Mon, Jul 3, 2017 at 7:03 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I want prepare_wheel_metadata there as a straightforward way for backends to expose the equivalent of "setup.py egg_info". Losing that would be a significant regression relative to the status quo, since pip relies on it during the dependency resolution process
AFAIK this is a new argument that hasn't been made previously in any of the discussions here. Is it a real problem that we've all missed? This is important, because if pip really needs prepare_wheel_metadata right now, then that's an argument that it needs to be a *mandatory* hook, which is not the case in any version of PEP 517.
The hook is optional in order to better accommodate backends like flit, which don't support binary extensions, so wheel creation is inherently fast. By contrast, a backend like setuptools or enscons definitely *should* implement this hook, since actually building the wheel may end up compiling arbitrary amounts of C/C++/Go/Rust/FORTRAN/etc, and hence take ages.
To tighten that requirement up even further: if the backend's capabilities can't be accurately determined using "inspect.getattr_static", then the backend is not compliant with the spec. The build frontend/backend API is not a space where we want people to try to be clever - we want them to be completely dull and boring and use the most mundane code they can possibly come up with.
This does sound like a nice idea, but I mean, look at what people do to distutils... I don't think there's much value in putting in Strongly Worded Prohibitions that people will just ignore.
Also I don't think getattr_static is what you mean -- e.g. flit might well write something like:
# flit/backend.py if os.path.exists(".git"): def build_sdist(config_settings): ...
and that's totally getattr_static compliant.
I'd technically be OK with backends taking that approach - the minimal design guideline is that the exposed backend API should be consistent for a given working directory and execution platform, and import time games are able to comply with that fairly readily. Of course, the *easiest* way to respect the consistency principle is to just not play any conditional availability games at all, and instead put any relevant environmental checks inside the "get_requires_for_build_*" call.
- Add a TODO to decide how to handle backends that don't want to have multiple hooks called from the same process, including some discussion of the options.
[snip]
If you read the text, I pretty much came to the same conclusion :-). I wanted to flag it as TODO though b/c AFAICT there hasn't been any discussion of the trade-offs on the list, so the text I wrote is raising new topics.
That said, if you have prepare_build_metadata and prepare_build_files, then those *also* raise the issue of whether you really want to have to spawn a new process. They ~double the number of process spawns, which is significant overhead on Windows.
I suspect the right answer in any case is to default to running each hook in a new process, and either now or later add a flag the backend can provide to say that it's happy to run multiple hooks in the same process. (Adding this later is totally backwards compatible if the default is separate processes.)
Aye, that reasoning sounds solid to me. I'm not going to make the revised PR today after all (since I want to give folks a chance to read and respond to my latest posts to this thread first), so how do you want to proceed with the next step? I see two main options: - you work up your changes into a PR against PEP 517, submit that to the PEPs repo, and I merge it after Thomas signs off on the updates - same basic idea, but I create the PR based on your current draft Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 1 July 2017 at 22:53, Nathaniel Smith <njs@pobox.com> wrote:
Hi all, .... If either hook is missing, or returns the built-in constant ``NotImplemented``. (Note that this is the object ``NotImplemented``, *not* the string ``"NotImplemented"``),
thank you for the clarification. I am unclear why you *return* that rather than raising NotImplementedError ? NotImplementedError permits embedding details about the cause of the failure, whereas the singleton does not. It seems to me cleaner - thinking in a type sense - to raise than to return a value from a different domain. -Rob
On Thu, Jul 6, 2017 at 3:09 AM, Robert Collins <robertc@robertcollins.net> wrote:
On 1 July 2017 at 22:53, Nathaniel Smith <njs@pobox.com> wrote:
Hi all, .... If either hook is missing, or returns the built-in constant ``NotImplemented``. (Note that this is the object ``NotImplemented``, *not* the string ``"NotImplemented"``),
thank you for the clarification.
I am unclear why you *return* that rather than raising NotImplementedError ? NotImplementedError permits embedding details about the cause of the failure, whereas the singleton does not.
It seems to me cleaner - thinking in a type sense - to raise than to return a value from a different domain.
Basically the options I thought of are: - Create a dedicated exception type just for this, like: class PEP517OperationNotSupported(Exception). But... for technical reasons, there's no obvious way to define this class in such a way that the frontend and backend can both see it. There are some non-obvious approaches that could probably be made to work, but they're ugly and complicated and then we'd have to argue about them, and I don't want to do that. - Re-use one of the built-in exception types. But then, well, it's an existing exception type, which means that it already has some other use outside of our interface. And that means that if a backend happens to have this exception raised internally for some other reason, and doesn't catch the error, then it might bubble out to the frontend/backend interface and be misinterpreted. - Return a sentinel value. This avoids all the problems of the above solutions. There's also a solid precedent, since this is exactly how most __dunder__ methods work. For example, __eq__'s return type is "bool or NotImplemented", and NotImplemented means "do some kind of fallback or raise an error or whatever is contextually appropriate". int.__add__'s return type is "int or NotImplemented" and likewise. And it's true that exceptions allow an extra error message payload, but this is just an unstructured string and the only thing we can do with it is print it. So the backend can just as well print the message and then return NotImplemented. So I figured the last option was the best. Though it's not a huge distinction either way, given that NotImplementedError is sort of a weird holdover from before when ABCs were added and doesn't have real usage. -n -- Nathaniel J. Smith -- https://vorpus.org
participants (14)
-
C Anthony Risinger
-
Daniel Holth
-
Donald Stufft
-
Greg Ewing
-
Jeremy Kloth
-
Jeremy Stanley
-
Matthias Bussonnier
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Ralf Gommers
-
Robert Collins
-
Steve Dower
-
Thomas Kluyver