On Jul 6, 2017, at 12:35 PM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:

Thanks Nick for the detailed reply. I have read it carefully, and you've
probably convinced me to get back on board. Some more responses inline:

On Thu, Jul 6, 2017, at 03:38 PM, Nick Coghlan wrote:
While I can completely understand how the current debate over whether
or not the prepare_input_for_build_wheel hook is necessary or not
would make you feel that way, I hope I can convince you that we're
really just quibbling over a genuinely trivial arcane technical detail
that I'd never let get in the way of flit being a full-fledged
participant in the Python packaging ecosystem.

To be clear, I don't particularly care for the hook. I can see that it's
something of a kludge between two competing approaches.

What is important to me is that if a user installs from source the
obvious way (pip install . ), failure to build an sdist does not result
in a failure to install. The extra hook was one approach to that, but
it's also OK by me if it tries to make an sdist and falls back to either
copytree or an inplace build.

I *think* if we had some way to signal expected failure vs unexpected failure this would be reasonable to me. I wouldn’t just want it to flat out be any failure, but if we used Nathaniels NotImplemented idea or something similar to indicate that “hey, I can’t build an sdist here for expected reasons” compared to “Hey I tried to build the sdist, but something went wrong” I think that would be workable.

I think it’s most likely in pip we’d implement it as a copytree (at least to start, possibly when we have more experience with other build backends that could be relaxed to inplace).

That is, the current point of contention is specifically about how we
want tools to behave when we're starting with a source directory that:

1. Doesn't include VCS metadata (e.g. it's been exported as a tarball
rather than cloned)
2. The build frontend doesn't want to use as the basis for an in-place
build
3. The build frontend doesn't want to blindly copy into a separate
build directory

So just by way of those preconditions, we're already well outside the
most common package installation workflows.

One of my concerns in this debate is that this is presented as a very
rare corner case that we don't have to worry about too much. I agree
that it's not the most common case, but I think it's common enough that
we should care about making it easy, given that:

- Condition 1 also covers directories with VCS metadata where the VCS
tools are not on $PATH. Another case occurred to me recently: Windows
users who have installed git but not added it to the default PATH.
- Conditions 2 and 3 seem likely to be the default for a source install
with pip.

As an order of magnitude, I'd estimate this is ~10% of installs from a
source directory - which is to say, moderately common.

Unfortunately metrics is hard in OSS software, I’d love to have pip have metrics so we could bring real numbers to the discussion to try and figure out what cases are more common than other cases and by how much. I do know that pip downloaded 12 million sdists from PyPI yesterday (and 28 million wheels) but how that compares to the number of people doing ``pip install .`` for varying states of a tree in ``.`` we really don’t know besides guessing.

That perspective is embodied in the hypothetical proposal to add a
"--build-strategy" option to pip that would allow folks building
wheels to choose between:

- creating and unpacking an sdist and building a wheel from that
- copying the directory tree and building a wheel from that
- building a wheel directly from the original directory

(Perhaps with a variant that tries to create and unpack the sdist
first, and only if that fails falls back to copying the entire tree)

This could be useful flexibility for advanced users. But I worry that
pip will use the 'sdist' build strategy by default, and expect users to
handle cases where that fails. I think this would be a mistake. From a
user perspective, it would mean:

- "pip install ." is the recommended way to install from source, but in
some situations it doesn't work.
- Adding the mystic incantation "--build-strategy direct" makes it work,
and from a user perspective makes absolutely no difference to the
result.

Of course, I also have a vested interest in things not working this way:
I would get a steady trickle of people asking "why does flit require a
VCS to install from source?" From my perspective, it doesn't require
that, but I would be unable to 'fix' it.

From my perspective, I would prefer not to add a —build-strategy flag [1] to pip and would rather have some generic solution that just generally works OR raises a clear error. I agree that I suspect for most people this flag would just end up being some “make it work” turd they cargo cult around (which likely made one scenario work, but broke another scenario). Maybe it’s useful as something for advanced users, but that’s more of a pip discussion then a discussion for this PEP.

Donald:
I think it is a complete non-starter to suggest removing installation from sdist support from pip

I'm certainly not suggesting that (hopefully this was already clear, but
just in case ;-)

Oh no, I didn’t think you were advocating for that. Rather I was trying to explain why I arrive at the “go via sdist” route, because I start at “How can we eliminate additional routes a package takes from “VCS” to “Installed”” and since I don’t think we can get rid of sdist, then my mind immediately goes to “well, can we make everything go through sdist?”.

the question then becomes do we want to try and push things towards only having *one* primary flow through the state machine of Python’s packaging, or do we want to support transitions that allow you to “skip” steps.

My idealised view of the state machine is something like this:

wheel <-- source tree <--> sdist

I agree that there's a problem with losing important data when you go
[source tree --> sdist --> source tree] - in fact this is one of the
pain points I was trying to avoid with flit. But I don't like the idea
of solving that by saying that all wheels must have passed through an
sdist; it feels like a redundant there-and-back-again journey.

So how else could we tackle the systematic problem? It's definitely a
good idea to ensure that [stree --> sdist --> stree --> wheel] doesn't
miss out anything that [stree --> wheel] includes, but I'd focus on
doing this in developer tools, e.g.:

1. Tools such as flit could check it when you're building a release
2. Tools running on CI services could build both and compare them
3. Bots could scan PyPI for projects with both a .whl and a .tar.gz,
build a wheel from the tarball, compare them, and notify the maintainer
if there's a problem.

In the short term, I reckon that 2 is the most promising - we can make a
convenient pip-installable tool and promote it as good practice for
testing that your builds work. But in any case, I see a range of options
for tackling this while leaving open the direct [stree --> wheel]
pathway.

Yea, I absolutely don’t think going through sdist is the *only* way to tackle the problem.

It’s attractive to me because in my mind it is entirely automatic so doesn’t require a hypothetical developer to learn another tool and setup infrastructure etc to handle it. The common stumbling block I see people (new and experienced alike) is when ``pip install .`` and ``pip install foo-1.0.tar.gz`` result in something different. Focusing on the developer side provides tooling that helps them detect when they’ve done something that might trigger that, but doesn’t actively prevent it.

A similar-ish scenario is I hope to in the future be able to start validating the rendering of long_description on PyPI on upload, and rejecting for invalid syntax, because while readme_renderer exists and people can use it (and it lets them detect problems earlier on) forcing all uploads to PyPI to essentially have their long_description checked completely side steps that class of problems from reoccurring.

If things don’t go the way I would prefer and we decide that we’re going to just deal with the problems that “many paths” creates (because as a collective, we liked the tradeoffs better) then I think that (2) is likely to be a good “second best” solution in my mind.

When I looked at flit it also suffered the same problem if you forgot to commit a file to the VCS repository (which meant it wouldn’t get added to the sdist)

You have to explicitly ignore a file to hit this. If you have untracked
but non-ignored files in your repo, flit will refuse to build an sdist
at all. I recognise that this is quite strict and still doesn't entirely
prevent the issue, and I may refine it in the future, but I hope it
makes such problems hard to hit accidentally.

Ah yes, I think I saw that chunk of code but it didn’t fully register what the effect of it was going to be. So I’ll still assert that this isn’t a problem that is specific to distutils/setuptools but that flit itself does make it harder to hit than I originally thought.