On Jul 3, 2017, at 10:03 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On 1 July 2017 at 20:53, Nathaniel Smith <njs@pobox.com> wrote:
Hi all,

I just attempted an experimental refactor/streamlining of PEP 517, to
match what I think it should look like :-). I haven't submitted it as
a PR to the PEPs repository yet since I don't know if others will
agree with the changes, but I've pasted the full text below, or you
can see the text online at:

   https://github.com/njsmith/peps/blob/517-refactor-streamline/pep-0517.txt

and the diff at:

   https://github.com/python/peps/compare/master...njsmith:517-refactor-streamline?expand=1

Briefly, the changes are:

- Rearrange text into (hopefully) better signposted sections with
 better organization

This is definitely needed, so thanks for that.

- Clarify a number of details that have come up in discussion (e.g.,
 be more explicit that the hooks are run with the process working
 directory set to the source tree, and why)

This is again very helpful (we also need to be clearer on which sets
of dependencies must be installed prior to running the various hooks)

- Drop prepare_wheel_metadata and prepare_wheel_build_files (for now);
 add detailed rationale for why we might want to add them back later.

I want prepare_wheel_metadata there as a straightforward way for
backends to expose the equivalent of "setup.py egg_info". Losing that
would be a significant regression relative to the status quo, since
pip relies on it during the dependency resolution process (and a
backtracking resolver will be even more dependent on being able to do
that efficiently).


To speak to a point Nathaniel made about being able to cache built wheels, we’re also planning on caching the result of ``setup.py egg-info`` in the GSoC student’s project. The exact semantics of this remain to be nailed down (for instance, it’s silly to cache it after we’ve built the wheel, we can just use that, but if we called it then later decided not to use it, then caching is beneficial). I don’t see any reason why we wouldn’t also cache the output of this new hook.

Adding it on later is a bit tricky because we can’t pass the existing metadata into the build_wheel hook then, so they can ensure that they generate the same output. The frontend could still of course validate that after the fact (which pip likely will) but that removes the chance for the project to ensure they work in that case rather than having that case be a failure. OTOH I don’t know how likely it is that a project actually does something different based on the metadata directory that was passed in other than fail if it would do something different.

At the end of the day I think we can add this either now or later and I have a slight preference to now and I think either way we’ll end up adding it, but I don’t really care if we add it now or later.


For the build preparation hook, Thomas already rejected promising that
build_sdist will always work in flit when given only an unpacked
sdist, while Donald and Paul rejected relying on full tree copies as a
fallback for build_sdist failing.

Since I'm not interested in rehashing that debate, the build
preparation hook also stays.


From my end, I’m happy either with the hook or with the hook Nathaniel has spec’d. I think the primary difference is if someone takes a .tar.gz, unpacks it, and then runs ``pip install that/dir`` with Nathaniel’s hook we would fail if the build_sdist hook didn’t build a sdist (e.g. in the case of how I understand flit’s use case.

IOW we have a few possibilities of when we might call this API:

1) We have a sdist already and pip downloads it, unpacks it, and builds a wheel from it.
    - Works on both proposals for setuptools and flit (since for both proposals we would just unpack to a temporary location and jump straight to wheel building).
2) We have a VCS directory or “original development source” or whatever you want to call the thing you have before a sdist that typically gets into a sdist.
    - Works on both proposals for setuptools and flit (since both can go from a VCS to a sdist).
    - Thomas might have said he’d be unhappy if this case goes through a real sdist… I forget the specifics of that discussion now. 
    - If build_sdist failed in Nathaniel’s proposal, pip would probably fail and surface that error to the user (for cases where e.g. you need git installed or something).
3) We have a directory on disk that represents an unpacked sdist (with or without modifications).
    - Works on the original proposal IF the build tool implements the prepare hook or the build_sdist hook can function as the identity function.
    - Fails on Nathaniel’s proposal IF the build tool doesn’t implement the build_sdist hook as the identity function (e.g. flit).


So we basically only care about this for the (3) case, and even then only for projects that can’t or won’t detect that they’re in an unpacked sdist that they’ve already created and simply archive up the entire directory without any additional logic (or maybe they add additional logic to munge the version for instance if they detect there was file modifications).

From my POV, I don’t really care if (3) fails or not for individual build tools. I don’t think (3) is a very popular path (and in both cases, we’re not covering (3) 100% of the time, we just provide a different mechanism to make it work) and from pip’s POV it’s easy to surface an error from flit that says like “Cannot build an sdist from this, try from a VCS clone” or something. We can’t make this work in 100% of situations either way so whatever.

So I don’t really care if we add it nor or we use the semantics Nathaniel proposed.



However, I'd be fine with renaming these hooks using the
"prepare_X_for_Y" scheme you suggest below:

- prepare_metadata_for_build_wheel
- prepare_input_for_build_wheel

That makes it clearer where they fit into the overall wheel building
process, as well as which particular step they implement within that
process.

Paint the bike shed whatever color you want.


- Add an "extensions" hook namespace to allow prototyping of future
 extensions.

No, on grounds of YAGNI. You can't reject the two hooks we've actually
identified as needed (aka the matrix multiplication operators of this
situation), and then turn around and argue in favour of an overly
general scheme that supports arbitrary extension hooks (which would be
somewhat akin to having taken the PEP 225 approach to resolving the
syntactic ambiguity between elementwise multiplication and matrix
multiplication).

We're not fumbling around based on a complete absence of information
here - we're formalising a backend build interface based on more than
a decade of experience with the successes and challenges of the
existing system based on distutils, setuptools and PyPI.

Besides, if frontends want to invoke arbitrary interfaces that are
specific to particular backends, we already have ways for them to do
that: Python's regular import system, and plugin management approaches
like pkg_resources entry points.

I don’t care if we have extensions or not, in either form of the proposal.


- Rename get_build_*_requires -> get_requires_for_build_* to make the
 naming parallelism more obvious

This sounds like a good change to me.

- Add the option to declare an operation unsupported by returning
 NotImplemented

No. Supported operations for a backend must be identifiable without
installing any dependencies, and without running any backend code.

To tighten that requirement up even further: if the backend's
capabilities can't be accurately determined using
"inspect.getattr_static", then the backend is not compliant with the
spec. The build frontend/backend API is not a space where we want
people to try to be clever - we want them to be completely dull and
boring and use the most mundane code they can possibly come up with.

However, it's fine for operations to fail with an exception if there
are external dependencies that haven't been satisfied (e.g. requiring
a C/C++/FORTRAN/Rust/Go/etc compiler for extension modules when there
isn't one available, or requiring some other non-Python dependency
that needs to be set up prior to requesting the sdist or wheel build).
Backends are also free to raise NotImplementedError when it seems
appropriate to do so (e.g. when the project doesn't support the
current CPU architecture) - frontends will handle the same way they
will any other backend exception.

Agreed, I don’t like the NotImplemented thing.


- Instead of defining a default value for get_requires_for_build_*,
 make it mandatory for get_require_for_build_* and build_* to appear
 together; this seems simpler now that we have multiple high-level
 operations defined in the same PEP, and also simplifies the
 definition of the NotImplemented semantics.

I don't see the value in requiring four lines of boiler plate in every
backend definition that doesn't support dynamic dependencies:

   def get_requires_for_build_sdist(*args, **kwargs):
       return ()

   def get_requires_for_build_wheel(*args, **kwargs):
       return ()

That's just pointless noise when we can instead say "Leave those hooks
out of the backend implementation entirely if all build dependencies
will be declared in pyproject.toml (and if you really want your
backend work to support dynamic build dependencies, please make sure
you have a really good rationale for doing so that isn't already
covered by pyproject.toml and the environment marker system defined in
PEP 508)”

Agreed.


- Update title to better match the scope we ended up with

I'd prefer to retain the current title, as it describes the desired
*outcome* of the change, whereas the proposed revision only describes
one of the technical pre-requiisites of achieving that outcome (i.e.
eliminating the historical dependency on setup.py as the build
system's executable interface).

- Add a TODO to decide how to handle backends that don't want to have
 multiple hooks called from the same process, including some
 discussion of the options.

I don't think that's a TODO: I'm happy with the option of restricting
frontends to "one hook invocation per subprocess call".

It only becomes an open question in this revised draft by virtue of
making the get_requires_* hooks mandatory, and I have a different
resolution for that: keep those hooks optional, so that only backends
that genuinely support dynamic build time dependencies will define
them (others should either just get users to list any additional
static build dependencies in pyproject.toml, or else list any
always-needed static dependencies in the backend's own
install_requires).

Agreed.


Donald Stufft