On Jun 3, 2017, at 1:40 AM, Nathaniel Smith <njs@pobox.com> wrote:

On Fri, Jun 2, 2017 at 8:38 PM, Donald Stufft <donald@stufft.io> wrote:

On Jun 2, 2017, at 10:14 PM, Nathaniel Smith <njs@pobox.com> wrote:

So far my belief is that packages with expensive build processes are
going to ignore you and implement, ship, document, and recommend the
direct source-tree->wheel path for developer builds. You can force the
make-a-wheel-from-a-directory-without-copying-and-then-install-it
command have a name that doesn't start with "pip", but it's still
going to exist and be used. Why wouldn't it? It's trivial to implement
and it works, and I haven't heard any alternative proposals that have
either of those properties. [1]


If someone wants to implement a direct-to-wheel build tool and have it
compete with ``pip install .`` they’re more than welcome to. Competition is
healthy and at the very worst case it could validate either the idea that
direct-to-wheel is important enough that people will gladly overcome the
relatively small barrier of having to install another tool and then we have
data to indicate maybe we need to rethink things or it could validate the
idea that it’s not important enough and leave things as they are.

I went and looked through all 105 pages of pip’s issues (open and closed)
and made several searches using any keyword I could think of looking for any
issue where someone asked for this. The only times I can find anyone asking
for this were you and Ralf Gommers as part of the extended discussion around
this set of PEPs and I’ve not been able to find a single other person asking
for it or complaining about it.

That's because until now, the message that everyone has received over
and over is that the way you install a package from a directory on
disk is:

 cd directory
 python setup.py install

and this does incremental builds. (My experience is that even today,
most people are surprised to learn that 'pip install' accepts
directory paths.)


I’m not sure this is true? I mean I’m sure it’s true that for *some* people that’s the message they get, but we see a reasonable volume of bug reports and the like from people passing a path to pip that it’s not hardly an un(der)used feature in the slightest. We have zero metrics so I suspect there is no way to answer which one is used more than the other though. I think writing either one off as “nobody uses that method” is likely to be the wrong answer.

Pip is generally used widely enough in enough different scenarios that it is unusual for any feature it has (even ones that are completely undocumented and requires diving in the source code!) to not be used by a decent chunk of people. I’m not saying that zero people exist that would want this (indeed, there is at least two!) but that it has obviously not been pressing enough to need it that someone felt the need to open a ticket or ask for it before.



In our glorious PEP 517 future, we have to teach everyone to stop
using 'setup.py install' and instead use 'pip install .'. This switch
enables a glorious new future of non-distutils-based build systems and
fixes a bunch of other brokenness at the same time, hooray, BUT
currently switching to 'pip install' also causes a regression for
everyone who's used to incremental builds working.

Ralf and I noticed this because we were looking at getting a head
start on the glorious future by making 'pip install' mandatory for
numpy and scipy. The reason no-one else has noticed is that we're
among the few people that have tried using 'pip install' as their
standard install-from-working-tree command. But soon there will be
more.

One thing I’d note is that as far as I can tell, neither the current copy of the PEP, nor my PR or Thomas’ PR or any of the discussions here have talked about having the interface itself mandate calling build_sdist prior to calling build_wheel. I’m personally fine if we want the interface to allow it (and I assumed we would TBH) and explicitly calling that out.

That means that whether or not you call build_sdist (or some copy files hook) first effectively ends up being an implementation detail of the frontend in question. That would allow pip to build the sdist (or do the copy thing) first but another tool could implement it the other way (and installing a local wheel is so simple that you could pretty easily implement that and just shell out to pip for the rest if you wanted).

That also means that we can adjust our answer to it in the future. If such a tool gets built and a lot of people end up using it and asking for it in pip, we can revisit that decision in a future version of pip. Part of the stand off here is the pip developers view it as a regression if we stop building in isolation and you view it as a regression if incremental/inplace builds are not supported. Both can be true! It’s the opinion of the pip developers who have spoken so far that for us, the risk of our regressions is high enough we don’t currently feel comfortable changing that behavior. 



However, what I was able to find was what appears to be the original reason
pip started copying the directory to begin with,
https://github.com/pypa/pip/issues/178 which was caused by the build system
reusing the build directory between two different virtual environments and
causing an invalid installation to happen. The ticket is old enough that I
can get at specifics it because it was migrated over from bitbucket. However
the fact that we *used* to do exactly what you want and it caused exactly
one of problem I was worried about seems to suggest to me that pip is
absolutely correct in keeping this behavior.

Hmm, it looks to me like that bug is saying that at the time, if you
ran 'python setup.py install' *inside the pip source tree*, and then
tried to run pip's test suite (possibly via 'setup.py test'), then it
broke. I don't think this is related to the behavior of 'pip install
.', and I feel like we would know if it were currently true that
running 'setup.py install' twice in the same directory produced broken
shebang lines. (Again, most people who install from source directories
are currently using setup.py install!)

Meh yea, I misread the bug report. Reading 105 pages of reports will do that I guess :P


The source tree copying was originally added in:

  https://github.com/pypa/pip/commit/57bd8163e4483b7138342da93f5f6bb8460f0e4a

(which is dated ~2 months before that bug you found, and if I'm
reading it right tweaks a code path that previously only worked for
'pip install foo.zip' so it also works for 'pip install foo/'). AFAICT
the reason it was written this way is that pip started out with the
assumption that it was always going to be downloading and unpacking
archives, so the logic went:

1) make a temporary directory
2) unpack the sdist into this temporary directory
3) build from this temporary directory

Then, when it came time to add support for building from directories,
the structure of the logic meant that by the time pip got to step (2)
and realized that it already had a source directory, it was too late
-- it was already committed to using the selected temporary directory.
So instead of refactoring all this code, they made the minimal change
of implementing the "unpack this sdist into this directory" operation
for source directories by using shutil.copytree.

I think this chain of reasoning will feel very familiar to anyone
working with the modern pip source 5 years later...

It's absolutely true that there are cases where incremental builds can
screw things up, especially when using distutils/setuptools. But I
don't think this is why pip does things this way originally :-).

It’s not that I don’t trust the backend, it’s that I believe in putting in
systems that make it harder to do the wrong thing than the right thing. As
it is now building in place correctly requires the build backend to do extra
work to ensure that some file that wouldn’t be included in the sdist doesn’t
influence the build in some way. Given that I’m pretty sure literally every
build tool in existence for Python currently fails this test, I think that
is a pretty reasonable statement to say that it might continue to be a
problem into the future.

Copying the files makes that harder to do (but still easier than always
going through the sdist). If you want to argue that we should always go
through the sdist and we shouldn’t have a copy_files hook, I’m ok with that.
I’m only partially in favor of it as a performance trade off because I think
it passes a high enough bar that it’s unlikely enough for mistakes to be
made (and when they do, they’ll be more obvious).

What do you think of letting build backends opt-in to in-place builds?

I think the PEP already allows the interface to be used for in-place builds, though it could use some language specifying that and making sure it’s explicit that is an option the PEP allows. I think I’m neutral on allowing backends to express a preference for in-place builds, but again I don’t think I would be comfortable listening to that to allow a backend to do an in-place build, at least not by default or at first. Could I see us get to a place where we did allow that? Maybe. I wouldn’t promise anything one way or another (and TBH how pip chooses to implement a PEP isn’t really something that the PEP or distutils-sig gets to choose) but I’m fine with leaving the mechanisms in place to allow that in the future.


Other unresolved issues:

- Donald had some concerns about get_wheel_metadata and they've led to
several suggestions, none of which has made everyone go "oh yeah
obviously that's the solution". To me this suggests we should go ahead
and drop it from PEP 517 and add it back later if/when the need is
more obvious. It's optional anyway, so adding it later doesn't hurt
anything.

My main concern is the metadata diverging between the get_wheel_metadata and
the building wheel phase. The current PEP solves that in a reasonable enough
way (and in a way I can assert against). My other concerns are mostly just
little API niggles to make it harder to mess up.

I think this one is important to support because we do not to be able to get
at the dependencies, and invoking the entire build chain to do that seems
like it will be extraordinarily slow.

It's only slow in the case where (a) there's no wheel (obviously), and
(b) after getting the dependencies we decide we don't want to install
this sdist after all. I imagine numpy for example won't bother
implementing get_wheel_metadata because we provide wheels for all the
platforms we support and because we have no dependencies, so it is
doubly useless AFAICT. But yeah in other cases it could matter. I'm
not opposed to including it in general, just thought this might be a
way to help get the minimal PEP 517 out the door.

Yea I don’t think this is really a sticking point, I think it’s mostly just around:

1) Do we include a build_sdist hook?
    A) Can pip use this as part of the process of going from a VCS to a wheel
2) Do we include a copy_the_files hook?
3) Minor decisions like unpacked vs packed wheels/sdists and cwd vs pass in a path.


- It sounds like there's some real question about how exactly a build
frontend should handle the output from build_wheel; in particular, the
PEP should say what happens if there are multiple files deposited into
the output dir. My original idea when writing the PEP was that the
build frontend would know the name/version of the wheel it was looking
for, and so it would ignore any other files found in the output dir,
which would be forward compatible with a future PEP allowing
build_wheel to drop multiple wheels into the output dir (i.e., old
pip's would just ignore them). It's clear from the discussion that
this isn't how others were imagining it. Which is fine, I don't think
this is a huge problem, but we should nail it down so we're not
surprised later.


How do you determine the name/version for ``pip install .`` except by
running get_wheel_metadata or build_wheel or build_sdist?

Well, I was imagining that the semantics of 'pip install .' in a
multi-wheel world would be to install all the generated wheels :-).
But yeah, it's not really well-specified as currently written.

Possibly the simplest solution is to say that build_wheel has to
return a string which names the wheel, and then in the future we could
add build_wheel2 which is identical but returns a list of strings, and
backwards compatibility would be:

def build_wheel2(...):
   return build_wheel(...)[0]

That seems reasonable to me.


-n

[1] Donald's suggestion of silently caching intermediate files in some
global cache dir is unreasonably difficult to implement in a
user-friendly way – cache management is Hard, and I frankly I still
don't think users will accept individual package's build systems
leaving hundreds of megabytes of random gunk inside hidden
directories. We could debate the details here, but basically, if this
were a great idea to do by default, then surely one of
cmake/autoconf/... would already do it? Also, my understanding is the
main reason pip wants to copy files in the first place is to avoid
accidental pollution between different builds using the same local
tree; but if a build system implements a global cache like this then
surprise, now you can get pollution between arbitrary builds using
different trees, or between builds that don't even use a local tree at
all (e.g. running 'pip install numpy==1.12.0' can potentially cause a
later run of 'pip install numpy==1.12.1' to be corrupted). And, it
assumes that all build systems can easily support out-of-tree
incremental builds, which is often true but not guaranteed when your
wheel build has to wrap some random third party C library's build
system.



Make it opt-in

If it's opt-in, then I might as well tell people to run 'pip
devinstall .' or 'in-place-install .' or whatever instead, and it'll
be much easier all around. But instead of making it opt-in, I'd much
rather it Just Work. It's frustrating that at the same time we're
moving to the glorious simplified future, we're also picking up a new
piece of arcane wisdom that devs will need to be taught, and another
place where numerical Python devs will roll their eyes at how the
standard Python tooling doesn't care about them. (And I totally
understand that the motivation on your end is also to make things Just
Work, but I feel like in the specific case where someone is
*repeatedly* building out of the *same source directory* – which is
the one in dispute here – we should optimize for developer
experience.)


People repeatably build out of the same source directory for lots of different reasons. I just closed a ticket the other day about someone who was trying to do a ``-e .`` in a read only directory because they had it mounted inside of a container or a VM or so and were editing it from another machine and it didn’t work (because -e obviously builds in place).

One more specific example I’m concerned about with regressions here is a case like:

docker run -it ubuntu:latest -v $PWD:/app/ pip install /app/
docker run -it centos:latest -v $PWD:/app/ pip install /app/

I’ve seen people doing similar things in the wild today, and if an in place build directory just uses the normal platform tuples to differentiate build directories, there is a good chance that is going to fail miserably, likely with some sort of god awful linking error or segfault.


Donald Stufft