[Distutils] [Numpy-discussion] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead

Sat Nov 7 18:37:53 EST 2015

On Sat, Nov 7, 2015 at 6:57 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> 2. (For here) Builds are not isolated from what's in the development
> directory. So if you have your sdist definition wrong, what you build
> locally may work, but when you release it it may fail. Obviously that
> can be fixed by proper development and testing practices, but pip is
> designed currently to isolate builds to protect against mistakes like
> this, we'd need to remove that protection for cases where we wanted to
> do in-place builds.

I agree that it would be nice to make sdist generation more reliable
and tested by default, but I don't think this quite works as a
solution.

1) There's no guarantee that building an sdist from some dirty working
tree will produce anything like what you'd have for a release sdist,
or even a clean isolated build. (E.g. a very common mistake is adding
a new file to the working directory for forgetting to run 'git/hg
add'. To protect against this, you have to either have to have a build
system that's smart enough to talk to the VCS when figuring out what
files to include, or better yet you have to work from a clean
checkout.) And as currently specified these "isolated" build trees
might even end up including partial build detritus from previous
in-place builds, copied from the source directory into the temporary
directory.

2) Sometimes people will want to download an sdist, unpack it, and
then run 'pip install .' from it. In your proposal this would require
first building a new sdist from the unpacked working tree. But there's
no guarantee that you can generate an sdist from an sdist. None of the
proposals for a new build system interface have contemplated adding an
"sdist" command, and even if they did, then a clever sdist command
might well fail, e.g. because it is only designed to build sdists from
a checkout with full VCS metadata that it can use to figure out what
files to include :-).

3) And anyway, it's pretty weird logically to include a mandatory
sdist command inside an interface that 99% of the time will be working
*from* an sdist :-). The rule of thumb I've used for the build
interface stuff so far is that it should be the minimal stuff that is
needed to provide a convenient interface for people who just want to
install packages, because the actual devs on a particular project can
use whatever project/build-system-specific interfaces make sense for
their workflow. And end-users don't build sdists. But for the
operations that pip does provide, like 'pip wheel' and 'pip install',
they should be usable by devs, because devs will use them.

> 3. The logic inside pip for doing builds is already pretty tricky.
> Adding code to sometimes build in place and sometimes in a temporary
> directory is going to make it even more complex. That might not be a
> concern for end users, but it makes maintaining pip harder, and risks
> there being subtle bugs in the logic that could bite end users. If you
> want specifics, I can't give them at the moment, because I don't know
> what the code to do the proposed in-place building would look like.

Yeah, this is always a concern for any change. The tradeoff is that
you get to delete the code for "downloading" unpacked directories into
a temporary directory (which currently doesn't even use sdist -- it
just blindly copies everything, including e.g. the full git history).
And you get to skip specifying a standard build-an-sdist interface
that pip and every build system backend would all have to support and
interoperate on.

Basically AFAICT the logic should be:

1) Arrange for the existence of a build directory:
  If building from a directory:
    great, we have one, use that
  else if building from a file/url:
    download it and unpack it, then use that
2) do the build using the build directory
3) if it's a temporary directory and the build succeeded, clean up

(Possibly with some complications like providing options for people to
specify a non-temporary directory to use for unpacking downloaded
sdists.)

It might need a bit of refactoring so that the "arrange for the
existence of a build directory" step returns the chosen build
directory instead of taking it as a parameter like I assume it does
now, but it doesn't seem like the intrinsic complexity is very high.

> I hope that helps. It's probably not as specific or explicit as you'd
> like, but to be fair, nor is the proposal.
>
> What we currently have on the table is "If 'pip (install/wheel) .' is
> supposed to become the standard way to build things, then it should
> probably build in-place by default." For my personal use cases, I
> don't actually agree with any of that, but my use cases are not even
> remotely like those of numpy developers, so I don't want to dismiss
> the requirement. But if it's to go anywhere, it needs to be better
> explained.
>
> Just to be clear, *my* position (for projects simpler than numpy and
> friends) is:
>
> 1. The standard way to install should be "pip install <requirement or wheel>".
> 2. The standard way to build should be "pip wheel <sdist or
> directory>". The directory should be a clean checkout of something you
> plan to release, with a unique version number.
> 3. The standard way to develop should be "pip install -e ."
> 4. Builds (pip wheel) should always unpack to a temporary location and
> build there. When building from a directory, in effect build a sdist
> and unpack it to the temporary location.
>
> I hear the message that for things like numpy these rules won't work.
> But I'm completely unclear on why. Sure, builds take ages unless done
> incrementally. That's what pip install -e does, I don't understand why
> that's not acceptable.

To me this feels like mixing two orthogonal issues. 'pip install' and
'pip install -e' have different *semantics* -- one installs a snapshot
into an environment, and one installs a symlink-like-thing into an
environment -- and that's orthogonal to the question of whether you
want to implement that using a "clean build" or not. (Also, it's
totally reasonable to want partial builds in 'pip wheel': 'pip wheel
.', get a compiler error, fix it, try again...)

Furthermore, I actually really dislike 'pip install -e' and am
surprised to see so many people talking about it as if it were the
obvious choice for all development :-). I understand it takes all
kinds, etc., I'm not arguing that it should be removed or anything
(though I probably would if I thought it had any chance of getting
consensus :-)). But from my point of view, 'pip install -e' is a weird
intrinsically-kinda-broken wart that provides no value outside of some
rare use cases that most people never encounter.

I say "intrinsically-kinda-broken" because as soon as you do an
editable install, the metadata in .egg/dist-info starts to drift out
of sync from your actual source tree, so that it necessarily makes the
installed package database less reliable, undermining a lot of the
work that's being done to make installation and resolution more
robust.

I also am really unsure about why people use it. I generally don't
*want* to install code-under-development into a full-fledged
virtualenv. I see lots of people who have a primary virtualenv that
they use for day-to-day work, and they 'pip install -e' all the
packages that they work on into this environment, and then run into
all kinds of weird problems because they're using a bunch of untested
code together, or they switch to a different branch of one package to
check something and then forget about it when they context switch to
some other project and everything is broken. And then they try to
install some other package, and it depends on foo >= 1.2, and they
have an editable install of foo that claims to be 1.1 (because that
was the last time the .egg-info was regenerated) but really it's 1.3
and all kinds of weird things happen.

And for packages with binary extensions, it doesn't really work,
anyway, because you still have to rebuild every time (and you can get
extra bonus forms of weird skew, where when you import the package
then you get the up-to-date version of some source files -- the .py
ones -- combined with out-of-date versions of others -- the .pyx / .c
/ .cpp ones). Even if I do decide that I want to install a
non-official release into some virtualenv, I'd like to install a
consistent snapshot that gets upgraded or uninstalled all together as
an atomic unit. What I actually do when working on NumPy is that I use
a little script [1] that does the equivalent of:

  $ rm -rf ./.tmpdir
  $ pip install . -d ./.tmpdir
  $ cd ./.tmpdir
  $ python -c 'import numpy; numpy.test()'

OTOH, for packages without binary extensions, I just run my tests or
start a REPL from the root of my source dir, and that works fine
without the hassle of creating and activating a virtualenv, or
polluting my normal environment with untested code.

Also, 'pip install -e' intrinsically pollutes your source tree with
build artifacts. I come from the build system tradition that says that
build artifacts should all be shunted to the side and leave the actual
directories uncluttered:

  https://www.gnu.org/software/automake/manual/html_node/VPATH-Builds.html

and I think that a valid approach that build system authors might want
to make is to enforce the invariant that the build system never writes
to anywhere outside of $srcdir/build/ or similar. If we insist that
editable installs are the only way to work, then we take this option
away from projects.

So there simply isn't any problem I have where editable installs are
the best solution, and I see them causing problems for people all the
time.

That said, there are two theoretical advantages I can see to editable installs:

1) Unlike starting an interpreter from the root of your source tree,
they trigger the install of runtime dependencies. I solve this by just
installing those into my working environment myself, but for projects
with complex dependencies I guess 'install -e' might ATM be the most
convenient way to get this set up. This isn't a very compelling
argument, though, because one could trivally provide better support
for just this ('pip install-dependencies .' or something) without
bringing along the intrinsically tricky bits of editable installs.

2) For people working on complex projects that involve multiple
pure-python packages that are distributed separately but that require
coordinated changes in sync (maybe OpenStack is like this?), so each
round of your edit/test cycle involves edits to multiple different
projects, then 'pip install -e' kinda solves a genuine problem,
because it lets you assemble a single working environment that
contains the editable versions of everything together. This seems like
a genuine use case -- but it's what I meant at the top about how they
seem like a very specialized tool for rare cases, because very few
people are working on meta-projects composed of multiple pure-python
sub-projects evolving in lock-step.

Anyway, like I said, I'm not trying to argue that 'pip install -e'
should be deprecated -- I understand that many people love it for
reasons that I don't fully understand. My goal is just to help those
who think 'pip install -e' is obviously the one-and-only way to do
python development to understand my perspective, and why we might want
to support other options as well.

I think the actual bottom line for pip as a project is: we all agree
that sooner or later we have to move users away from running 'setup.py
install'. Practically speaking, that's only going to happen if 'pip
install' actually functions as a real replacement, and doesn't create
regressions in people's workflows. Right now it does. The thing that
started this whole thread is that numpy had actually settled on going
ahead and making the switch to requiring pip install, but then got
derailed by issues like these...

-n

[1] https://github.com/numpy/numpy/blob/master/runtests.py

--
Nathaniel J. Smith -- http://vorpus.org