Re: [Distutils] [Numpy-discussion] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead
[Adding distutils-sig to the CC as a heads-up. The context is that
numpy is looking at deprecating the use of 'python setup.py install'
and enforcing the use of 'pip install .' instead, and running into
some issues that will probably need to be addressed if 'pip install .'
is going to become the standard interface to work with source trees.]
On Sun, Nov 1, 2015 at 3:16 PM, Ralf Gommers
Hmm, after some more testing I'm going to have to bring up a few concerns myself:
1. ``pip install .`` still has a clear bug; it starts by copying everything (including .git/ !) to a tempdir with shutil, which is very slow. And the fix for that will go via ``setup.py sdist``, which is still slow.
Ugh. If 'pip (install/wheel) .' is supposed to become the standard way to build things, then it should probably build in-place by default. Working in a temp dir makes perfect sense for 'pip install <requirement>' or 'pip install <url>', but if the user supplies an actual named on-disk directory then presumably the user is expecting this directory to be used, and to be able to take advantage of incremental rebuilds etc., no?
2. ``pip install .`` silences build output, which may make sense for some usecases, but for numpy it just sits there for minutes with no output after printing "Running setup.py install for numpy". Users will think it hangs and Ctrl-C it. https://github.com/pypa/pip/issues/2732
I tend to agree with the commentary there that for end users this is different but no worse than the current situation where we spit out pages of "errors" that don't mean anything :-). I posted a suggestion on that bug that might help with the apparent hanging problem.
3. ``pip install .`` refuses to upgrade an already installed development version. For released versions that makes sense, but if I'm in a git tree then I don't want it to refuse because 1.11.0.dev0+githash1 compares equal to 1.11.0.dev0+githash2. Especially after waiting a few minutes, see (1).
Ugh, this is clearly just a bug -- `pip install .` should always unconditionally install, IMO. (Did you file a bug yet?) At least the workaround is just 'pip uninstall numpy; pip install .', which is still better the running 'setup.py install' and having it blithely overwrite some files and not others. The first and last issue seem like ones that will mostly only affect developers, who should mostly have the ability to deal with these weird issues (or just use setup.py install --force if that's what they prefer)? This still seems like a reasonable trade-off to me if it also has the effect of reducing the number of weird broken installs among our thousands-of-times-larger userbase. -n -- Nathaniel J. Smith -- http://vorpus.org
On 3 November 2015 at 14:57, Nathaniel Smith
[Adding distutils-sig to the CC as a heads-up. The context is that numpy is looking at deprecating the use of 'python setup.py install' and enforcing the use of 'pip install .' instead, and running into some issues that will probably need to be addressed if 'pip install .' is going to become the standard interface to work with source trees.]
On Sun, Nov 1, 2015 at 3:16 PM, Ralf Gommers
wrote: [...] Hmm, after some more testing I'm going to have to bring up a few concerns myself:
1. ``pip install .`` still has a clear bug; it starts by copying everything (including .git/ !) to a tempdir with shutil, which is very slow. And the fix for that will go via ``setup.py sdist``, which is still slow.
Ugh. If 'pip (install/wheel) .' is supposed to become the standard way to build things, then it should probably build in-place by default. Working in a temp dir makes perfect sense for 'pip install <requirement>' or 'pip install <url>', but if the user supplies an actual named on-disk directory then presumably the user is expecting this directory to be used, and to be able to take advantage of incremental rebuilds etc., no?
Thats what 'pip install -e .' does. 'setup.py develop' -> 'pip install -e .'
3. ``pip install .`` refuses to upgrade an already installed development version. For released versions that makes sense, but if I'm in a git tree then I don't want it to refuse because 1.11.0.dev0+githash1 compares equal to 1.11.0.dev0+githash2. Especially after waiting a few minutes, see (1).
Ugh, this is clearly just a bug -- `pip install .` should always unconditionally install, IMO. (Did you file a bug yet?) At least the workaround is just 'pip uninstall numpy; pip install .', which is still better the running 'setup.py install' and having it blithely overwrite some files and not others.
There is a bug open. https://github.com/pypa/pip/issues/536
-Rob
--
Robert Collins
On Nov 2, 2015 6:51 PM, "Robert Collins"
On 3 November 2015 at 14:57, Nathaniel Smith
wrote: [Adding distutils-sig to the CC as a heads-up. The context is that numpy is looking at deprecating the use of 'python setup.py install' and enforcing the use of 'pip install .' instead, and running into some issues that will probably need to be addressed if 'pip install .' is going to become the standard interface to work with source trees.]
On Sun, Nov 1, 2015 at 3:16 PM, Ralf Gommers
[...]
Hmm, after some more testing I'm going to have to bring up a few concerns myself:
1. ``pip install .`` still has a clear bug; it starts by copying everything (including .git/ !) to a tempdir with shutil, which is very slow. And
wrote: the
fix for that will go via ``setup.py sdist``, which is still slow.
Ugh. If 'pip (install/wheel) .' is supposed to become the standard way to build things, then it should probably build in-place by default. Working in a temp dir makes perfect sense for 'pip install <requirement>' or 'pip install <url>', but if the user supplies an actual named on-disk directory then presumably the user is expecting this directory to be used, and to be able to take advantage of incremental rebuilds etc., no?
Thats what 'pip install -e .' does. 'setup.py develop' -> 'pip install -e .'
3. ``pip install .`` refuses to upgrade an already installed development version. For released versions that makes sense, but if I'm in a git
I'm not talking about in place installs, I'm talking about e.g. building a wheel and then tweaking one file and rebuilding -- traditionally build systems go to some effort to keep track of intermediate artifacts and reuse them across builds when possible, but if you always copy the source tree into a temporary directory before building then there's not much the build system can do. tree
then I don't want it to refuse because 1.11.0.dev0+githash1 compares equal to 1.11.0.dev0+githash2. Especially after waiting a few minutes, see (1).
Ugh, this is clearly just a bug -- `pip install .` should always unconditionally install, IMO. (Did you file a bug yet?) At least the workaround is just 'pip uninstall numpy; pip install .', which is still better the running 'setup.py install' and having it blithely overwrite some files and not others.
There is a bug open. https://github.com/pypa/pip/issues/536
Thanks! -n
On 3 November 2015 at 16:02, Nathaniel Smith
On Nov 2, 2015 6:51 PM, "Robert Collins"
wrote: ... Ugh. If 'pip (install/wheel) .' is supposed to become the standard way to build things, then it should probably build in-place by default. Working in a temp dir makes perfect sense for 'pip install <requirement>' or 'pip install <url>', but if the user supplies an actual named on-disk directory then presumably the user is expecting this directory to be used, and to be able to take advantage of incremental rebuilds etc., no?
Thats what 'pip install -e .' does. 'setup.py develop' -> 'pip install -e .'
I'm not talking about in place installs, I'm talking about e.g. building a wheel and then tweaking one file and rebuilding -- traditionally build systems go to some effort to keep track of intermediate artifacts and reuse them across builds when possible, but if you always copy the source tree into a temporary directory before building then there's not much the build system can do.
Ah yes. So I don't think pip should do what it does. It a violation of
the abstractions we all want to see within it. However its not me you
need to convince ;).
-Rob
--
Robert Collins
BTW scipy list is rejecting all my emails (vs eg moderating), so I'm
going to drop the cc in all future replies.
-Rob
On 3 November 2015 at 16:05, Robert Collins
On 3 November 2015 at 16:02, Nathaniel Smith
wrote: On Nov 2, 2015 6:51 PM, "Robert Collins"
wrote: ... Ugh. If 'pip (install/wheel) .' is supposed to become the standard way to build things, then it should probably build in-place by default. Working in a temp dir makes perfect sense for 'pip install <requirement>' or 'pip install <url>', but if the user supplies an actual named on-disk directory then presumably the user is expecting this directory to be used, and to be able to take advantage of incremental rebuilds etc., no?
Thats what 'pip install -e .' does. 'setup.py develop' -> 'pip install -e .'
I'm not talking about in place installs, I'm talking about e.g. building a wheel and then tweaking one file and rebuilding -- traditionally build systems go to some effort to keep track of intermediate artifacts and reuse them across builds when possible, but if you always copy the source tree into a temporary directory before building then there's not much the build system can do.
Ah yes. So I don't think pip should do what it does. It a violation of the abstractions we all want to see within it. However its not me you need to convince ;).
-Rob
-- Robert Collins
Distinguished Technologist HP Converged Cloud
--
Robert Collins
I'm not talking about in place installs, I'm talking about e.g. building a wheel and then tweaking one file and rebuilding -- traditionally build systems go to some effort to keep track of intermediate artifacts and reuse them across builds when possible, but if you always copy the source tree into a temporary directory before building then there's not much the build system can do.
This strikes me as an optimization -- is it an important one? If I'm doing a lot of tweaking and re-running, I'm usually in develop mode. I can see that when you build a wheel, you may build it, test it, discover an wheel-specific error, and then need to repeat the cycle -- but is that a major use-case? That being said, I have been pretty frustrated debugging conda-build scripts -- there is a lot of overhead setting up the build environment each time you do a build... But with wheel building there is much less overhead, and far fewer complications requiring the edit-build cycle. And couldn't make-style this-has-already-been-done checking happen with a copy anyway? CHB
Ah yes. So I don't think pip should do what it does. It a violation of the abstractions we all want to see within it. However its not me you need to convince ;).
-Rob
-- Robert Collins
Distinguished Technologist HP Converged Cloud _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Tue, Nov 3, 2015 at 6:10 PM, Chris Barker - NOAA Federal < chris.barker@noaa.gov> wrote:
I'm not talking about in place installs, I'm talking about e.g. building a wheel and then tweaking one file and rebuilding -- traditionally build systems go to some effort to keep track of intermediate artifacts and reuse them across builds when possible, but if you always copy the source tree into a temporary directory before building then there's not much the build system can do.
This strikes me as an optimization -- is it an important one?
Yes, I think it is. At least if we want to move people towards `pip install .` instead of `python setup.py`.
If I'm doing a lot of tweaking and re-running, I'm usually in develop mode.
Everyone has a slightly different workflow. What if you install into a bunch of different venvs between tweaks? The non-caching for a package like scipy pushes rebuild time from <30 sec to ~10 min.
I can see that when you build a wheel, you may build it, test it, discover an wheel-specific error, and then need to repeat the cycle -- but is that a major use-case?
That being said, I have been pretty frustrated debugging conda-build scripts -- there is a lot of overhead setting up the build environment each time you do a build...
But with wheel building there is much less overhead, and far fewer complications requiring the edit-build cycle.
And couldn't make-style this-has-already-been-done checking happen with a copy anyway?
The whole point of the copy is that it's a clean environment. Pip currently creates tempdirs and removes them when it's done building. So no. Ralf
I’m not at my computer, but does ``pip install —no-clean —build <insert build dir>`` make this work? On November 5, 2015 at 5:25:16 PM, Ralf Gommers (ralf.gommers@gmail.com) wrote:
On Tue, Nov 3, 2015 at 6:10 PM, Chris Barker - NOAA Federal < chris.barker@noaa.gov> wrote:
I'm not talking about in place installs, I'm talking about e.g. building a wheel and then tweaking one file and rebuilding -- traditionally build systems go to some effort to keep track of intermediate artifacts and reuse them across builds when possible, but if you always copy the source tree into a temporary directory before building then there's not much the build system can do.
This strikes me as an optimization -- is it an important one?
Yes, I think it is. At least if we want to move people towards `pip install .` instead of `python setup.py`.
If I'm doing a lot of tweaking and re-running, I'm usually in develop mode.
Everyone has a slightly different workflow. What if you install into a bunch of different venvs between tweaks? The non-caching for a package like scipy pushes rebuild time from <30 sec to ~10 min.
I can see that when you build a wheel, you may build it, test it, discover an wheel-specific error, and then need to repeat the cycle -- but is that a major use-case?
That being said, I have been pretty frustrated debugging conda-build scripts -- there is a lot of overhead setting up the build environment each time you do a build...
But with wheel building there is much less overhead, and far fewer complications requiring the edit-build cycle.
And couldn't make-style this-has-already-been-done checking happen with a copy anyway?
The whole point of the copy is that it's a clean environment. Pip currently creates tempdirs and removes them when it's done building. So no.
Ralf _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Thu, Nov 5, 2015 at 11:29 PM, Donald Stufft
I’m not at my computer, but does ``pip install —no-clean —build <insert build dir>`` make this work?
No, that option seems to not work at all. I tried with both a relative and an absolute path to --build. In the specified dir there are subdirs created (src.linux-i686-2.7/<pkgname>), but they're empty. The actual build still happens in a tempdir. Ralf P.S. adding flags for the various issues (/ things under discussion) this is what I actually had to try: pip install . --no-clean --build build/ -v --upgrade --no-deps :(
On Tue, Nov 3, 2015 at 6:10 PM, Chris Barker - NOAA Federal < chris.barker@noaa.gov> wrote:
I'm not talking about in place installs, I'm talking about e.g. building a wheel and then tweaking one file and rebuilding -- traditionally build systems go to some effort to keep track of intermediate artifacts and reuse them across builds when possible, but if you always copy the source
into a temporary directory before building then there's not much the build system can do.
This strikes me as an optimization -- is it an important one?
Yes, I think it is. At least if we want to move people towards `pip install .` instead of `python setup.py`.
If I'm doing a lot of tweaking and re-running, I'm usually in develop mode.
Everyone has a slightly different workflow. What if you install into a bunch of different venvs between tweaks? The non-caching for a package
On November 5, 2015 at 5:25:16 PM, Ralf Gommers (ralf.gommers@gmail.com) wrote: tree like
scipy pushes rebuild time from <30 sec to ~10 min.
I can see that when you build a wheel, you may build it, test it, discover an wheel-specific error, and then need to repeat the cycle -- but is that a major use-case?
That being said, I have been pretty frustrated debugging conda-build scripts -- there is a lot of overhead setting up the build environment each time you do a build...
But with wheel building there is much less overhead, and far fewer complications requiring the edit-build cycle.
And couldn't make-style this-has-already-been-done checking happen with a copy anyway?
The whole point of the copy is that it's a clean environment. Pip currently creates tempdirs and removes them when it's done building. So no.
Ralf _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Thu, Nov 5, 2015 at 11:44 PM, Ralf Gommers
On Thu, Nov 5, 2015 at 11:29 PM, Donald Stufft
wrote: I’m not at my computer, but does ``pip install —no-clean —build <insert build dir>`` make this work?
No, that option seems to not work at all. I tried with both a relative and an absolute path to --build. In the specified dir there are subdirs created (src.linux-i686-2.7/<pkgname>), but they're empty. The actual build still happens in a tempdir.
Commented on the source of the problem with both `--build` and `--no-clean` in https://github.com/pypa/pip/issues/804 Ralf
If ``pip install —build … —no-clean …`` worked to do incremental builds, would that satisfy this use case? (without the —upgrade and —no-deps, —no-deps is only needed because —upgrade and —upgrade is needed because of another ticket that I think will get fixed at some point). On November 5, 2015 at 6:09:46 PM, Ralf Gommers (ralf.gommers@gmail.com) wrote:
On Thu, Nov 5, 2015 at 11:44 PM, Ralf Gommers wrote:
On Thu, Nov 5, 2015 at 11:29 PM, Donald Stufft wrote:
I’m not at my computer, but does ``pip install —no-clean —build > >> build dir>`` make this work?
No, that option seems to not work at all. I tried with both a relative and an absolute path to --build. In the specified dir there are subdirs created (src.linux-i686-2.7/), but they're empty. The actual build still happens in a tempdir.
Commented on the source of the problem with both `--build` and `--no-clean` in https://github.com/pypa/pip/issues/804
Ralf _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Fri, Nov 6, 2015 at 12:37 AM, Donald Stufft
If ``pip install —build … —no-clean …`` worked to do incremental builds, would that satisfy this use case? (without the —upgrade and —no-deps, —no-deps is only needed because —upgrade and —upgrade is needed because of another ticket that I think will get fixed at some point).
Then there's at least a way to do it, but it's all very unsatisfying. Users are again going to have a hard time finding this. And I'd hate to have to type that every time. Robert and Nathaniel have argued the main points already so I'm not going to try to go in more detail, but I think the main point is: - we want to replace `python setup.py install` with `pip install .` in order to get proper uninstalls and dependency handling. - except for those two things, `python setup.py install` does the expected thing while pip is trying to be way too clever which is unhelpful. Ralf
On November 5, 2015 at 6:09:46 PM, Ralf Gommers (ralf.gommers@gmail.com) wrote:
On Thu, Nov 5, 2015 at 11:44 PM, Ralf Gommers wrote:
On Thu, Nov 5, 2015 at 11:29 PM, Donald Stufft wrote:
I’m not at my computer, but does ``pip install —no-clean —build > >>
build dir>`` make this work?
No, that option seems to not work at all. I tried with both a relative and an absolute path to --build. In the specified dir there are subdirs created (src.linux-i686-2.7/), but they're empty. The actual build still happens in a tempdir.
Commented on the source of the problem with both `--build` and `--no-clean` in https://github.com/pypa/pip/issues/804
Ralf _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 6 November 2015 at 07:39, Ralf Gommers
On Fri, Nov 6, 2015 at 12:37 AM, Donald Stufft
wrote: If ``pip install —build … —no-clean …`` worked to do incremental builds, would that satisfy this use case? (without the —upgrade and —no-deps, —no-deps is only needed because —upgrade and —upgrade is needed because of another ticket that I think will get fixed at some point).
Then there's at least a way to do it, but it's all very unsatisfying. Users are again going to have a hard time finding this. And I'd hate to have to type that every time.
Robert and Nathaniel have argued the main points already so I'm not going to try to go in more detail, but I think the main point is:
- we want to replace `python setup.py install` with `pip install .` in order to get proper uninstalls and dependency handling. - except for those two things, `python setup.py install` does the expected thing while pip is trying to be way too clever which is unhelpful.
While I understand what you're trying to achieve (and I'm in favour, in general) it should be remembered that pip's core goal is installing packages - not being a component of a development workflow. We absolutely need to make pip useful in the development workflow type of situation (that's why pip install -e exists, after all). But I don't think it's so much pip "trying to be too clever" as incremental rebuilds wasn't the original use case that "pip install ." was designed for. What we'll probably have to do is be *more* clever to special case out the situations where a development-style support for incremental rebuilds is more appropriate than the current behaviour. Paul
While I understand what you're trying to achieve (and I'm in favour, in general) it should be remembered that pip's core goal is installing packages - not being a component of a development workflow.
Yes -- clear separation of concerns here! So what IS supposed to be used in the development workflow? The new mythical build system? This brings. me back to my setuptools-lite concept -- while we are waiting for a new build system, you can use setuptools-lite, and get a setup.py install or setup.py develop that does what it's supposed to do and nothing else.... OK, I'll go away now :-) -Chris
We absolutely need to make pip useful in the development workflow type of situation (that's why pip install -e exists, after all). But I don't think it's so much pip "trying to be too clever" as incremental rebuilds wasn't the original use case that "pip install ." was designed for. What we'll probably have to do is be *more* clever to special case out the situations where a development-style support for incremental rebuilds is more appropriate than the current behaviour.
Paul _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 7 November 2015 at 01:26, Chris Barker - NOAA Federal
So what IS supposed to be used in the development workflow? The new mythical build system?
Fair question. Unfortunately, the answer is honestly that there's no simple answer - pip is not a bad option, but it's not its core use case so there are some rough edges. I'd argue that the best way to use pip is with pip install -e, but others in this thread have said that doesn't suit their workflow, which is fine. I don't know of any other really good options, though. I think it would be good to see if we can ensure pip is useful for this use case as well, all I was pointing out was that people shouldn't assume that it "should" work right now, and that changing it to work might involve some trade-offs that we don't want to make, if it compromises the core functionality of installing packages. Paul
On Sat, Nov 7, 2015 at 2:02 PM, Paul Moore
On 7 November 2015 at 01:26, Chris Barker - NOAA Federal
wrote: So what IS supposed to be used in the development workflow? The new mythical build system?
I'd like to point out again that this is not just about development workflow. This is just as much about simply *installing* from a local git repo, or downloaded sources/sdist. The "pip install . should reinstall" discussion in https://github.com/pypa/pip/issues/536 is also pretty much the same argument. Fair question. Unfortunately, the answer is honestly that there's no
simple answer - pip is not a bad option, but it's not its core use case so there are some rough edges.
My impression is that right now pip's core use-case is not "installing", but "installing from PyPi (and similar repos". There are a lot of rough edges around installing from anything on your own hard drive.
I'd argue that the best way to use pip is with pip install -e, but others in this thread have said that doesn't suit their workflow, which is fine. I don't know of any other really good options, though.
I think it would be good to see if we can ensure pip is useful for this use case as well, all I was pointing out was that people shouldn't assume that it "should" work right now, and that changing it to work might involve some trade-offs that we don't want to make, if it compromises the core functionality of installing packages.
It might be helpful to describe the actual trade-offs then, because as far as I can tell no one has actually described how this would either hurt another use-case or make pip internals much more complicated. Ralf
On November 7, 2015 at 8:56:02 AM, Ralf Gommers (ralf.gommers@gmail.com) wrote:
On Sat, Nov 7, 2015 at 2:02 PM, Paul Moore wrote:
On 7 November 2015 at 01:26, Chris Barker - NOAA Federal wrote:
So what IS supposed to be used in the development workflow? The new mythical build system?
I'd like to point out again that this is not just about development workflow. This is just as much about simply *installing* from a local git repo, or downloaded sources/sdist.
The "pip install . should reinstall" discussion in https://github.com/pypa/pip/issues/536 is also pretty much the same argument.
I think that everyone on that ticket has agreed that ``pip install .`` (where . is any local path) should reinstall. I think the thing that is being asked for here though is for pip to use that directory as the build directory, rather than copying everything to a temporary directory and using that. I’m hesitant to do that because it’s going to add another slightly different way that things could be installed and I’m trying to reduce those (and instead have two “paths” for installation, the normal one and the development one). IOW, I think in development ``-e`` is the right answer if you want to build and use the local directory. Otherwise you shouldn’t expect it to modify your current directory or the tarball at all. I do think we can make sure that specifying a build directory and instructing us not to clean it will function to have incremental builds though.
Fair question. Unfortunately, the answer is honestly that there's no
simple answer - pip is not a bad option, but it's not its core use case so there are some rough edges.
My impression is that right now pip's core use-case is not "installing", but "installing from PyPi (and similar repos". There are a lot of rough edges around installing from anything on your own hard drive.
This is probably true just in the fact that the bulk of the time when people use it, they are using it to install from a remote repository. There are rough edges for stuff on your own hard drive, but I think we can clean them up though, we just need to figure out what the answer is for each of those rough cases.
I'd argue that the best way to use pip is with pip install -e, but others in this thread have said that doesn't suit their workflow, which is fine. I don't know of any other really good options, though.
I think it would be good to see if we can ensure pip is useful for this use case as well, all I was pointing out was that people shouldn't assume that it "should" work right now, and that changing it to work might involve some trade-offs that we don't want to make, if it compromises the core functionality of installing packages.
It might be helpful to describe the actual trade-offs then, because as far as I can tell no one has actually described how this would either hurt another use-case or make pip internals much more complicated.
Ralf _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 7 November 2015 at 13:55, Ralf Gommers
On Sat, Nov 7, 2015 at 2:02 PM, Paul Moore
wrote: On 7 November 2015 at 01:26, Chris Barker - NOAA Federal
wrote: So what IS supposed to be used in the development workflow? The new mythical build system?
I'd like to point out again that this is not just about development workflow. This is just as much about simply *installing* from a local git repo, or downloaded sources/sdist.
Possibly I'm misunderstanding here.
The "pip install . should reinstall" discussion in https://github.com/pypa/pip/issues/536 is also pretty much the same argument.
Well, that one is about pip reinstalling if you install from a local directory, and not skipping the install if the local directory version is the same as the installed version. As I noted there, I'm OK with this, it seems reasonable to me to say that if someone has a directory of files, they may have updated something but not (yet) bumped the version. The debate over there has gone on to whether we force reinstall for a local *file* (wheel or sdist) which I'm less comfortable with. But that's is being covered over there. The discussion *here* is, I thought, about skipping build steps when possible because you can reuse build artifacts. That's not "should pip do the install?", but rather "*how* should pip do the install?" Specifically, to reuse build artifacts it's necessaryto *not* do what pip currently does for all (non-editable) installs, which is to isolate the build in a temporary directory and do a clean build. That's a sensible debate to have, but it's very different from the issue you referenced. IMO, the discussions currently are complex enough that isolating independent concerns is crucial if anyone is to keep track. (It certainly is for me!)
Fair question. Unfortunately, the answer is honestly that there's no simple answer - pip is not a bad option, but it's not its core use case so there are some rough edges.
My impression is that right now pip's core use-case is not "installing", but "installing from PyPi (and similar repos". There are a lot of rough edges around installing from anything on your own hard drive.
Not true. The rough edges are around installing things where (a) you don't want to rely in the invariant that name and version uniquely identify an installation (that's issue 536) and (b) where you don't want to do a clean build, because building is complex, slow, or otherwise something you want to optimise (that's this discussion). I routinely download wheels and use them to install. I also sometimes download sdists and install from them, although 99.99% of the time, I download them, build them into wheels and install them from wheels. It *always* works exactly as I'd expect. But if I'm doing development, I use -e. That seems to be the problem here, there are rough edges if you want a development workflow that doesn't rely on editable installs. I think that's what I already said :-)
I'd argue that the best way to use pip is with pip install -e, but others in this thread have said that doesn't suit their workflow, which is fine. I don't know of any other really good options, though.
I think it would be good to see if we can ensure pip is useful for this use case as well, all I was pointing out was that people shouldn't assume that it "should" work right now, and that changing it to work might involve some trade-offs that we don't want to make, if it compromises the core functionality of installing packages.
It might be helpful to describe the actual trade-offs then, because as far as I can tell no one has actually described how this would either hurt another use-case or make pip internals much more complicated.
1. (For issue 536, not this thread) Pip and users can't rely on the invariant that name and version uniquely identify a release. You could have version 1.2dev4 installed, and it may have come from your local working directory (with changes you made) or from a wheel that's on your local hard drive that you built last week, or from the release on PyPI you made last month. All 3 may behave differently. Also wheel caching is based on name/version - it would need to be switched off in cases where name/version doesn't guarantee repeatable code. 2. (For here) Builds are not isolated from what's in the development directory. So if you have your sdist definition wrong, what you build locally may work, but when you release it it may fail. Obviously that can be fixed by proper development and testing practices, but pip is designed currently to isolate builds to protect against mistakes like this, we'd need to remove that protection for cases where we wanted to do in-place builds. 3. The logic inside pip for doing builds is already pretty tricky. Adding code to sometimes build in place and sometimes in a temporary directory is going to make it even more complex. That might not be a concern for end users, but it makes maintaining pip harder, and risks there being subtle bugs in the logic that could bite end users. If you want specifics, I can't give them at the moment, because I don't know what the code to do the proposed in-place building would look like. I hope that helps. It's probably not as specific or explicit as you'd like, but to be fair, nor is the proposal. What we currently have on the table is "If 'pip (install/wheel) .' is supposed to become the standard way to build things, then it should probably build in-place by default." For my personal use cases, I don't actually agree with any of that, but my use cases are not even remotely like those of numpy developers, so I don't want to dismiss the requirement. But if it's to go anywhere, it needs to be better explained. Just to be clear, *my* position (for projects simpler than numpy and friends) is: 1. The standard way to install should be "pip install <requirement or wheel>". 2. The standard way to build should be "pip wheel <sdist or directory>". The directory should be a clean checkout of something you plan to release, with a unique version number. 3. The standard way to develop should be "pip install -e ." 4. Builds (pip wheel) should always unpack to a temporary location and build there. When building from a directory, in effect build a sdist and unpack it to the temporary location. I hear the message that for things like numpy these rules won't work. But I'm completely unclear on why. Sure, builds take ages unless done incrementally. That's what pip install -e does, I don't understand why that's not acceptable. If the discussion needs to go to the next level of detail, maybe that applies to the requirements as well as to the objections? Paul PS Alternatively, feel free to ignore my comments. I'm not likely to ever have the time to code any of the proposals being discussed here, but I won't block other pip developers either doing so or merging code, so my comments are not intended as anything more than input from someone who knows a bit about how pip is coded, how it's currently used, and what issues our users currently encounter. Seriously - I'm happy to say my piece and leave it at that if you prefer.
On Sat, Nov 7, 2015 at 3:57 PM, Paul Moore
On 7 November 2015 at 13:55, Ralf Gommers
wrote: On Sat, Nov 7, 2015 at 2:02 PM, Paul Moore
wrote: On 7 November 2015 at 01:26, Chris Barker - NOAA Federal
wrote: So what IS supposed to be used in the development workflow? The new mythical build system?
I'd like to point out again that this is not just about development workflow. This is just as much about simply *installing* from a local git repo, or downloaded sources/sdist.
Possibly I'm misunderstanding here.
I had an example above of installing into different venvs. Full rebuilds for that each time are very expensive. And this whole thread is basically about `pip install .`, not about inplace builds for development. As another example of why even for a single build/install it's helpful to just let the build system do what it wants to do instead of first copying stuff over, here are some timing results. This is for PyWavelets, which isn't that complicated a build (mostly pure Python, 1 Cython extension): 1. python setup.py install: 40 s 2. pip install . --upgrade --no-deps: 58 s # OK, (2) is slow due to using shutil, to be fixed to work like (3): 3. python setup.py sdist: 8 s pip install dist/PyWavelets0.4.0.dev0+da1c6b4.tar.gz: 41 s # so total time for (3) will be 41 + 8 = 49 s # and a better alternative to (1) 4. python setup.py bdist_wheel: 34 s pip install dist/PyWavelets-xxx.whl: 6 s # so total time for (3) will be 34 + 6 = 40 s Not super-scientific, but the conclusion is clear: what pip does is a lot slower than what for me is the expected behavior. And note that without the Cython compile, the difference in timing will get even larger. That expected behavior is: a) Just ask the build system to spit out a wheel (without any magic) b) Install that wheel (always)
The "pip install . should reinstall" discussion in https://github.com/pypa/pip/issues/536 is also pretty much the same argument.
Well, that one is about pip reinstalling if you install from a local directory, and not skipping the install if the local directory version is the same as the installed version. As I noted there, I'm OK with this, it seems reasonable to me to say that if someone has a directory of files, they may have updated something but not (yet) bumped the version.
The debate over there has gone on to whether we force reinstall for a local *file* (wheel or sdist) which I'm less comfortable with. But that's is being covered over there.
The discussion *here* is, I thought, about skipping build steps when possible because you can reuse build artifacts. That's not "should pip do the install?", but rather "*how* should pip do the install?" Specifically, to reuse build artifacts it's necessaryto *not* do what pip currently does for all (non-editable) installs, which is to isolate the build in a temporary directory and do a clean build. That's a sensible debate to have, but it's very different from the issue you referenced.
IMO, the discussions currently are complex enough that isolating independent concerns is crucial if anyone is to keep track. (It certainly is for me!)
Agreed that the discussions are complex now. But imho they're mostly complex because the basic principles of what pip should be doing are not completely clear, at least to me. If it's "build a wheel, install the wheel" then a lot of things become simpler.
Fair question. Unfortunately, the answer is honestly that there's no
simple answer - pip is not a bad option, but it's not its core use case so there are some rough edges.
My impression is that right now pip's core use-case is not "installing", but "installing from PyPi (and similar repos". There are a lot of rough edges around installing from anything on your own hard drive.
Not true. The rough edges are around installing things where (a) you don't want to rely in the invariant that name and version uniquely identify an installation (that's issue 536) and (b) where you don't want to do a clean build, because building is complex, slow, or otherwise something you want to optimise (that's this discussion).
I routinely download wheels and use them to install. I also sometimes download sdists and install from them, although 99.99% of the time, I download them, build them into wheels and install them from wheels. It *always* works exactly as I'd expect. But if I'm doing development, I use -e. That seems to be the problem here, there are rough edges if you want a development workflow that doesn't rely on editable installs. I think that's what I already said :-)
It always works as you expect because you're very familiar with how things work I suspect. I honestly started working on docs/code to make people use `pip install .` and immediately ran into 3 issues (start of this thread). This build caching is #4. And that doesn't even count --upgrade (that was issue #0). There are a vast amount of users that are used to `setup.py install`. They'll be downloading a released/dev version or do a git/hg clone, and run that `setup.py install` command. If we'll tell them to replace that by `pip install .`, then at the moment there's a lot of rough edges that they are going to run into. Now some of those rough edges are bugs, some are things like "does pip build from where you run it or in an isolated tmpdir". I'd like to get to the situation where: - the bugs are fixed - the behavior/performance is >= `setup.py install` - with the difference then being some UI tweaks like by default hiding the build log
I think it would be good to see if we can ensure pip is useful for this use case as well, all I was pointing out was that people shouldn't assume that it "should" work right now, and that changing it to work might involve some trade-offs that we don't want to make, if it compromises the core functionality of installing packages.
It might be helpful to describe the actual trade-offs then, because as
far
as I can tell no one has actually described how this would either hurt another use-case or make pip internals much more complicated.
2. (For here) Builds are not isolated from what's in the development directory. So if you have your sdist definition wrong, what you build locally may work, but when you release it it may fail. Obviously that can be fixed by proper development and testing practices, but pip is designed currently to isolate builds to protect against mistakes like this, we'd need to remove that protection for cases where we wanted to do in-place builds.
Now this is an actual development work feature/choice. "sdist definition wrong" may help developers which don't test install via sdist in their CI. It doesn't really help end users directly.
3. The logic inside pip for doing builds is already pretty tricky. Adding code to sometimes build in place and sometimes in a temporary directory is going to make it even more complex. That might not be a concern for end users, but it makes maintaining pip harder, and risks there being subtle bugs in the logic that could bite end users. If you want specifics, I can't give them at the moment, because I don't know what the code to do the proposed in-place building would look like.
I hope that helps. It's probably not as specific or explicit as you'd like, but to be fair, nor is the proposal.
It does help, thanks. I don't think I can make the proposal much more concrete than "build a wheel, install a wheel (without magic)" though. At least without starting to implement that proposal.
What we currently have on the table is "If 'pip (install/wheel) .' is supposed to become the standard way to build things, then it should probably build in-place by default." For my personal use cases, I don't actually agree with any of that, but my use cases are not even remotely like those of numpy developers, so I don't want to dismiss the requirement. But if it's to go anywhere, it needs to be better explained.
Just to be clear, *my* position (for projects simpler than numpy and friends) is:
1. The standard way to install should be "pip install <requirement or wheel>". 2. The standard way to build should be "pip wheel <sdist or directory>". The directory should be a clean checkout of something you plan to release, with a unique version number. 3. The standard way to develop should be "pip install -e ."
Agree with all of those.
4. Builds (pip wheel) should always unpack to a temporary location and build there. When building from a directory, in effect build a sdist and unpack it to the temporary location.
Here we seem to disagree. Your only concrete argument for it so far is aimed at developers, and I think it (a) is an extra step that adds complexity to the implementation, and (b) is inherently slower. I hear the message that for things like numpy these rules won't work.
But I'm completely unclear on why. Sure, builds take ages unless done incrementally. That's what pip install -e does, I don't understand why that's not acceptable.
I hope my replies above make clear why -e isn't too relevant here.
If the discussion needs to go to the next level of detail, maybe that applies to the requirements as well as to the objections?
Maybe this isn't 100% correct because I'm not that familiar with pip
internals yet, but I'll give it a try:
For `pip install
Paul
PS Alternatively, feel free to ignore my comments.
I won't, your detailed reply was quite helpful. Ralf
I'm not likely to ever have the time to code any of the proposals being discussed here, but I won't block other pip developers either doing so or merging code, so my comments are not intended as anything more than input from someone who knows a bit about how pip is coded, how it's currently used, and what issues our users currently encounter. Seriously - I'm happy to say my piece and leave it at that if you prefer.
On 7 November 2015 at 16:33, Ralf Gommers
I had an example above of installing into different venvs. Full rebuilds for that each time are very expensive.
Why doesn't wheel caching solve this problem? That's what it's *for*, surely? Paul
On November 7, 2015 at 5:45:24 PM, Paul Moore (p.f.moore@gmail.com) wrote:
On 7 November 2015 at 16:33, Ralf Gommers wrote:
I had an example above of installing into different venvs. Full rebuilds for that each time are very expensive.
Why doesn't wheel caching solve this problem? That's what it's *for*, surely?
I’m pretty sure we don’t cache wheels for local file paths. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 7 November 2015 at 22:46, Donald Stufft
On November 7, 2015 at 5:45:24 PM, Paul Moore (p.f.moore@gmail.com) wrote:
On 7 November 2015 at 16:33, Ralf Gommers wrote:
I had an example above of installing into different venvs. Full rebuilds for that each time are very expensive.
Why doesn't wheel caching solve this problem? That's what it's *for*, surely?
I’m pretty sure we don’t cache wheels for local file paths.
So is this an argument that we should? Paul
On November 7, 2015 at 5:46:53 PM, Paul Moore (p.f.moore@gmail.com) wrote:
On 7 November 2015 at 22:46, Donald Stufft wrote:
On November 7, 2015 at 5:45:24 PM, Paul Moore (p.f.moore@gmail.com) wrote:
On 7 November 2015 at 16:33, Ralf Gommers wrote:
I had an example above of installing into different venvs. Full rebuilds for that each time are very expensive.
Why doesn't wheel caching solve this problem? That's what it's *for*, surely?
I’m pretty sure we don’t cache wheels for local file paths.
So is this an argument that we should? Paul
Only if we think we can trust the version numbers to be unique from random paths on the file system. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 7 November 2015 at 22:47, Donald Stufft
Only if we think we can trust the version numbers to be unique from random paths on the file system.
Precisely. And that's the sort of trade-off that Ralf was asking to be clarified. Here, the trade off is that if we *are* allowed to rely on the fact that name/version uniquely identifies the build, then we can optimise build times via wheel cacheing. If we can't make that assumption, we can't do the optimisation. The request here seems to be that we provide the best of both worlds - provide optimal builds *without* making the assumptions we use for the "install a released version" case. Paul
On November 7, 2015 at 6:07:47 PM, Paul Moore (p.f.moore@gmail.com) wrote:
On 7 November 2015 at 22:47, Donald Stufft wrote:
Only if we think we can trust the version numbers to be unique from random paths on the file system.
Precisely. And that's the sort of trade-off that Ralf was asking to be clarified. Here, the trade off is that if we *are* allowed to rely on the fact that name/version uniquely identifies the build, then we can optimise build times via wheel cacheing. If we can't make that assumption, we can't do the optimisation.
The request here seems to be that we provide the best of both worlds - provide optimal builds *without* making the assumptions we use for the "install a released version" case. Paul
Well, you can get the optimized builds by not copying the path into a temporary location when you do ``pip install .`` and just letting the build system handle whether or not it caches the build output between multiple runs. I don’t want to start doing this, because I want to make a different change that will make it harder (impossible?) to do that. I want to reduce the “paths” that an installation can go down. Right now we have: 1. I have a wheel and pip installs it. 2. I have an sdist and pip turns it into a wheel and then pip installs it. 3. I have an sdist and pip installs it. 4. I have a directory and pip installs it. 5. I have a directory and pip installs it in editable mode. The outcome of all of these types of installs are subtly different and we’ve had a number of users regularly get confused when they act differently over the years. I do not think it’s possible to make (5) act like anything else because it is inherently different, however I think we can get to the point that 1-4 all act the exact same way. and I think the way to do it is to change these so instead it is like: 1. I have a wheel and pip installs it. 2. I have an sdist and pip turns it into a wheel and then pip installs it. 3. I have a directory and pip turns it into a sdist and then pip turns that sdist into a wheel and then pip installs it. 4. I have a directory and pip installs it in editable mode. Essentially, this is removing two “different” types of installations, one where we install directly from a sdist (without ever going through a wheel) and one where we install directly from a path (without ever going through a sdist or a wheel). Omitting the whole editable mode from the consideration, we get to a point where installs ONLY ever happen to go from a “Arbitrary Directory” to an Sdist to a Wheel to installation and the only real differences are at what point in that process the item we’re trying to install is already at. Of course development/editable installs are always going to be weird because they are in-place. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On November 7, 2015 at 6:16:34 PM, Donald Stufft (donald@stufft.io) wrote:
I want to reduce the “paths” that an installation can go down.
I decided I’d make a little visual aid to help explain what I mean here (omitting development/editable installs because they are weird and will always be weird)! Here’s essentially the way that installs can happen right now https://caremad.io/s/Ol1TuV6R9K/. Each of these types of installations act subtly different in ways that are not very obvious to most people. Here’s what I want it to be: https://caremad.io/s/uJYeVzBlQG/. In this way no matter what a user is installing from (Wheel, Source Dist, Directory) the outcome will be the same and there won’t be subtly different behaviors based on what is being provided. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Sun, Nov 8, 2015 at 12:38 AM, Donald Stufft
On November 7, 2015 at 6:16:34 PM, Donald Stufft (donald@stufft.io) wrote:
I want to reduce the “paths” that an installation can go down.
I decided I’d make a little visual aid to help explain what I mean here (omitting development/editable installs because they are weird and will always be weird)!
Here’s essentially the way that installs can happen right now https://caremad.io/s/Ol1TuV6R9K/. Each of these types of installations act subtly different in ways that are not very obvious to most people.
Here’s what I want it to be: https://caremad.io/s/uJYeVzBlQG/. In this way no matter what a user is installing from (Wheel, Source Dist, Directory) the outcome will be the same and there won’t be subtly different behaviors based on what is being provided.
Thanks, clear figures. Your final situation is definitely way better than what it's now. Here is what I proposed in a picture: https://github.com/pypa/pip/pull/3219#issuecomment-154810578 Comparison: - same number of arrows in flowchart - total path length in my proposal is 1 shorter - my proposal requires one less build system interface to be specified (sdist) Ralf
On Sat, Nov 7, 2015 at 3:16 PM, Donald Stufft
The outcome of all of these types of installs are subtly different and we’ve had a number of users regularly get confused when they act differently over the years. I do not think it’s possible to make (5) act like anything else because it is inherently different, however I think we can get to the point that 1-4 all act the exact same way. and I think the way to do it is to change these so instead it is like:
1. I have a wheel and pip installs it. 2. I have an sdist and pip turns it into a wheel and then pip installs it. 3. I have a directory and pip turns it into a sdist and then pip turns that sdist into a wheel and then pip installs it. 4. I have a directory and pip installs it in editable mode.
I wrote some more detailed comments on this idea in the reply I just posted to Paul's message, but briefly, the alternative way to approach this would be: 1. I have a wheel and pip installs it 2. I have an sdist and pip unpacks it into a directory and builds a wheel from that directory and then pip installs it. 3. I have a directory and pip builds a wheel from that directory and then pip installs it. 4. I have a directory and pip installs it in editable mode. This is actually simpler, because we've eliminated the "create an sdist" operation and replaced it with the far-more-trivial "unpack an sdist". And it isn't even a replacement, because your 2 and my 2 are actually identical when you look at what it means to turn an sdist into a wheel :-). -n -- Nathaniel J. Smith -- http://vorpus.org
On November 7, 2015 at 6:43:50 PM, Nathaniel Smith (njs@pobox.com) wrote:
On Sat, Nov 7, 2015 at 3:16 PM, Donald Stufft wrote: [...]
The outcome of all of these types of installs are subtly different and we’ve had a number of users regularly get confused when they act differently over the years. I do not think it’s possible to make (5) act like anything else because it is inherently different, however I think we can get to the point that 1-4 all act the exact same way. and I think the way to do it is to change these so instead it is like:
1. I have a wheel and pip installs it. 2. I have an sdist and pip turns it into a wheel and then pip installs it. 3. I have a directory and pip turns it into a sdist and then pip turns that sdist into a wheel and then pip installs it. 4. I have a directory and pip installs it in editable mode.
I wrote some more detailed comments on this idea in the reply I just posted to Paul's message, but briefly, the alternative way to approach this would be:
1. I have a wheel and pip installs it 2. I have an sdist and pip unpacks it into a directory and builds a wheel from that directory and then pip installs it. 3. I have a directory and pip builds a wheel from that directory and then pip installs it. 4. I have a directory and pip installs it in editable mode.
This is actually simpler, because we've eliminated the "create an sdist" operation and replaced it with the far-more-trivial "unpack an sdist". And it isn't even a replacement, because your 2 and my 2 are actually identical when you look at what it means to turn an sdist into a wheel :-).
The problem is that an sdist and a directory are not the same things even though they may trivially appear to be. A very common problem people run into right now is that they don’t adjust their MANIFEST.in so that some new file they’ve added gets included in the sdist. In the current system and your proposed system if someone types ``pip install .`` that just silently works. Then they go “Ok great, my package works” and they create a sdist and send that off… except the sdist is broken because it’s missing that file they needed. Since we’ve disabled the ability to delete + reupload files to PyPI I get probably once or twice a week someone contacting me asking if I can let them re-upload a file because they created an sdist that was missing a file. A decent number of those told me that they had “tested” it by running ``pip install .`` or ``setup.py install`` into a fresh virtual environment and that it had worked. It’s true that the MANIFEST.in system exacerbates this problem by being a pretty crummy and error prone system to begin with, however the same thing is going to exist for any system where you have a path that builds a sdist (and may or may not include a file in that sdist) and a path that goes direct to wheel. It might not only be files that didn’t get to be included because of a mistake either. Some files might not get generated until sdist build time, something like LXML generates .c files from Cython sources at sdist creation time and then they build a Wheel from those .c files. They do this to prevent people from needing to have Cython available on a machine other than a development machine. In your proposed work flow their “build wheel” command needs to be able to deal with the fact that the .c files may or may not be available (and will need to figure out a way to indicate that Cython is a build dependency if they are not). In my proposed workflow their wheel build command gets to be simpler, it only needs to deal with .c files and their sdist command gets used to create the .c files while the sdist is being generated. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Sat, Nov 7, 2015 at 4:02 PM, Donald Stufft
On November 7, 2015 at 6:43:50 PM, Nathaniel Smith (njs@pobox.com) wrote:
On Sat, Nov 7, 2015 at 3:16 PM, Donald Stufft wrote: [...]
The outcome of all of these types of installs are subtly different and we’ve had a number of users regularly get confused when they act differently over the years. I do not think it’s possible to make (5) act like anything else because it is inherently different, however I think we can get to the point that 1-4 all act the exact same way. and I think the way to do it is to change these so instead it is like:
1. I have a wheel and pip installs it. 2. I have an sdist and pip turns it into a wheel and then pip installs it. 3. I have a directory and pip turns it into a sdist and then pip turns that sdist into a wheel and then pip installs it. 4. I have a directory and pip installs it in editable mode.
I wrote some more detailed comments on this idea in the reply I just posted to Paul's message, but briefly, the alternative way to approach this would be:
1. I have a wheel and pip installs it 2. I have an sdist and pip unpacks it into a directory and builds a wheel from that directory and then pip installs it. 3. I have a directory and pip builds a wheel from that directory and then pip installs it. 4. I have a directory and pip installs it in editable mode.
This is actually simpler, because we've eliminated the "create an sdist" operation and replaced it with the far-more-trivial "unpack an sdist". And it isn't even a replacement, because your 2 and my 2 are actually identical when you look at what it means to turn an sdist into a wheel :-).
The problem is that an sdist and a directory are not the same things even though they may trivially appear to be. A very common problem people run into right now is that they don’t adjust their MANIFEST.in so that some new file they’ve added gets included in the sdist. In the current system and your proposed system if someone types ``pip install .`` that just silently works. Then they go “Ok great, my package works” and they create a sdist and send that off… except the sdist is broken because it’s missing that file they needed.
Since we’ve disabled the ability to delete + reupload files to PyPI I get probably once or twice a week someone contacting me asking if I can let them re-upload a file because they created an sdist that was missing a file. A decent number of those told me that they had “tested” it by running ``pip install .`` or ``setup.py install`` into a fresh virtual environment and that it had worked.
It’s true that the MANIFEST.in system exacerbates this problem by being a pretty crummy and error prone system to begin with, however the same thing is going to exist for any system where you have a path that builds a sdist (and may or may not include a file in that sdist) and a path that goes direct to wheel.
It might not only be files that didn’t get to be included because of a mistake either. Some files might not get generated until sdist build time, something like LXML generates .c files from Cython sources at sdist creation time and then they build a Wheel from those .c files. They do this to prevent people from needing to have Cython available on a machine other than a development machine. In your proposed work flow their “build wheel” command needs to be able to deal with the fact that the .c files may or may not be available (and will need to figure out a way to indicate that Cython is a build dependency if they are not). In my proposed workflow their wheel build command gets to be simpler, it only needs to deal with .c files and their sdist command gets used to create the .c files while the sdist is being generated.
I'm not sure how to respond, because I sympathize and agree with all of these points, but I just think that the trade-offs are such that pip is the wrong place to try and fix this. Even if pip always copies the source tree to a temp dir, or even builds an sdist and unpacks it to a temp dir, then this doesn't actually guarantee that the final distribution will work, because of the reasons I mentioned in my other email -- you can still forget to check things in, have random detritus in your working directory (orphaned .pyc files create all kinds of fun, since python will happily import them even if the corresponding .py file has been deleted), etc. Which isn't to say that it's hopeless to try and improve matters, but I don't think we should do so at the expense of adding otherwise unneeded complexity to the pip <-> project-build-system interface. ("Otherwise unneeded" because nothing else in pip cares about generating sdists.) And just in general, your plan to improve matters has an ocean-boiling feel to it to me, because you're pushing against a huge weight of history and conventions that expect builds to happen inside the source tree and to re-use partial build results etc. Convincing people to lengthen their edit/compile/test cycle is almost always a losing proposition, no matter how good your reason is. If people just refuse to use pip in favor of setup.py install, then what have we really gained? Paving cow paths, remember... So I think that on balance, the right place to tackle this problem is within the build system itself. Heck, bdist_wheel could probably today be modified to call sdist, unpack the resulting sdist, and then perform the resulting build, right? It'd be slow, but it'd work. And it would leave our options open for later, without enshrining this into the Standard Build System Interface that can't be changed without PEPs and elaborate transition plans. And there are all sorts of strategies that a new build system could use to guarantee reliable sdists (e.g. using the same source-of-truth for locating files to build as it uses for locating files to include in the sdist!) without giving up on incremental rebuilds, so long as pip doesn't just rule out incremental builds entirely. LXML's build system can use the sdist->wheel strategy if they think it makes sense to them -- they don't need anything from pip to do that. -n -- Nathaniel J. Smith -- http://vorpus.org
On Sat, Nov 7, 2015 at 2:44 PM, Paul Moore
On 7 November 2015 at 16:33, Ralf Gommers
wrote: I had an example above of installing into different venvs. Full rebuilds for that each time are very expensive.
Why doesn't wheel caching solve this problem? That's what it's *for*, surely?
The wheel cache maps (name, version) -> wheel. If I hand you a source directory, you may not even be able to determine the (name, version) except via building a wheel (depending on the resolution to that other thread about egg_info/dist-info commands). And it's certainly not true in general that you can trust the (name, version) from a working directory to indicating anything meaningful -- e.g. every commit to pip mainline right now creates a new different "pip 8.0.0.dev0". So what would you even use for your cache key? I don't see how wheel caching can really help here. -n -- Nathaniel J. Smith -- http://vorpus.org
On 7 November 2015 at 16:33, Ralf Gommers
Your only concrete argument for it so far is aimed at developers
I feel that there's some confusion over the classes of people involved here ("developers", "users", etc). For me, the core user base for pip is people who use "pip install" to install *released* distributions of packages. For those people, name and version uniquely identifies a build, they often won't have a build environment installed, etc. These people *do* however sometimes download wheels manually and install them locally (the main example of this is Christoph Gohlke's builds, which are not published as a custom PyPI-style index, and so have to be downloaded and installed from a local directory). The other important category of user is people developing those released distributions. They often want to do "pip install -e", they install their own package from a working directory where the code may change without a corresponding version change, they expect to build from source and want that build cycle to be fast. Historically, they have *not* used pip, they have used setup.py directly (or setup.py develop, or maybe custom build tools like bento). So pip is not optimised for their use cases. Invocations like "pip install <requirement>" cater for the first category. Invocations like "pip install <local directory>" cater for the second (although currently, mostly by treating the local directory as an unpacked sdist, which as I say is not optimised for this use case). Invocations like "pip install <local file>" are in the grey area - but I'd argue that it's more often used by the first category of users, I can't think of a development workflow that would need it. Regarding the point you made this comment about:
4. Builds (pip wheel) should always unpack to a temporary location and build there. When building from a directory, in effect build a sdist and unpack it to the temporary location.
I see building a wheel as a release activity. As such, it should produce a reproducible result, and so should not be affected by arbitrary state in the development directory. I don't know whether you consider "ensuring the wheels aren't wrong" as aimed at developers or at end users, it seems to me that both parties benefit. Personally, I'm deeply uncomfortable about *ever* encountering, or producing (as a developer) sdists or wheels with the same version number but functional differences. I am OK with installing a development version (i.e., direct from a development directory into a site-packages, either as -e or as a normal install) where the version number doesn't change even though the code does, but for me the act of producing release artifacts (wheels and sdists) should freeze the version number. I've been bitten too often by confusion caused by trying to install something with the same version but different code, to want to see that happen. Paul
On Sat, Nov 7, 2015 at 4:03 PM, Paul Moore
I see building a wheel as a release activity. As such, it should produce a reproducible result, and so should not be affected by arbitrary state in the development directory. I don't know whether you consider "ensuring the wheels aren't wrong" as aimed at developers or at end users, it seems to me that both parties benefit.
Personally, I'm deeply uncomfortable about *ever* encountering, or producing (as a developer) sdists or wheels with the same version number but functional differences. I am OK with installing a development version (i.e., direct from a development directory into a site-packages, either as -e or as a normal install) where the version number doesn't change even though the code does, but for me the act of producing release artifacts (wheels and sdists) should freeze the version number.
The problem with this is that we want to get rid of "direct installs" entirely, and move to doing wheel-based installs always -- direct installs require that every build system has to know about every possible install configuration, and it's just not viable. I think the way to approach this is to assume that 'pip install <whatever>' will always 100% of the time involve a wheel; the distinction is that sometimes that wheel is treated as a reliable artifact that can be cached etc., and that sometimes it's treated as a temporary intermediate format that's immediately discarded. (As a separate point I do think it would be good to encourage people to use + versions like 1.2+dev for VCS trees, or better yet 1.2+dev.<hash>, to emphasize that these are not real reliable version numbers. (Recall that PEP 440 defines + as defining a "local version" that's explicitly somewhat unreliable, not allowed on index servers, etc.) But even pip itself doesn't follow this rule right now so for the foreseeable future we'll have to assume that source directories have unreliable version numbers.) -n -- Nathaniel J. Smith -- http://vorpus.org
On Sun, Nov 8, 2015 at 1:03 AM, Paul Moore
On 7 November 2015 at 16:33, Ralf Gommers
wrote: Your only concrete argument for it so far is aimed at developers
I feel that there's some confusion over the classes of people involved here ("developers", "users", etc).
Good point. I meant your second category below.
For me, the core user base for pip is people who use "pip install" to install *released* distributions of packages. For those people, name and version uniquely identifies a build, they often won't have a build environment installed, etc. These people *do* however sometimes download wheels manually and install them locally (the main example of this is Christoph Gohlke's builds, which are not published as a custom PyPI-style index, and so have to be downloaded and installed from a local directory).
The other important category of user is people developing those released distributions. They often want to do "pip install -e", they install their own package from a working directory where the code may change without a corresponding version change, they expect to build from source and want that build cycle to be fast. Historically, they have *not* used pip, they have used setup.py directly (or setup.py develop, or maybe custom build tools like bento). So pip is not optimised for their use cases.
You only have two categories? I'm missing at least one important category: users who install things from a vcs or manually downloaded code (pre-release that's not on pypi for example). This category is probably a lot larger that than that of developers.
Invocations like "pip install <requirement>" cater for the first category. Invocations like "pip install <local directory>" cater for the second (although currently, mostly by treating the local directory as an unpacked sdist, which as I say is not optimised for this use case). Invocations like "pip install <local file>" are in the grey area - but I'd argue that it's more often used by the first category of users, I can't think of a development workflow that would need it.
Regarding the point you made this comment about:
4. Builds (pip wheel) should always unpack to a temporary location and build there. When building from a directory, in effect build a sdist and unpack it to the temporary location.
I see building a wheel as a release activity.
It's not just that. My third category of users above is building wheels all the time. Often without even realizing it, if they use pip.
As such, it should produce a reproducible result, and so should not be affected by arbitrary state in the development directory. I don't know whether you consider "ensuring the wheels aren't wrong" as aimed at developers or at end users, it seems to me that both parties benefit.
Ensuring wheels aren't wrong is something that developers need to do. End users may benefit, but they benefit from many things developers do. Personally, I'm deeply uncomfortable about *ever* encountering, or
producing (as a developer) sdists or wheels with the same version number but functional differences.
As soon as you produce a wheel with any compiled code inside, it matters with which compiler (and build flags, etc.) you build it. There are typically subtle, and sometimes very obvious, functional differences. Same for sdists, contents for example depend on the Cython version you have installed when you generate it.
I am OK with installing a development version (i.e., direct from a development directory into a site-packages, either as -e or as a normal install) where the version number doesn't change even though the code does, but for me the act of producing release artifacts (wheels and sdists) should freeze the version number. I've been bitten too often by confusion caused by trying to install something with the same version but different code, to want to see that happen.
"wheels and sdists" != "release artifacts" I fully agree of course that we want things on PyPi (which are release artifacts) to have unique version numbers etc. But wheels and sdists are produced all the time, and only sometimes are they release artifacts. Ralf
On 8 November 2015 at 11:13, Ralf Gommers
You only have two categories? I'm missing at least one important category: users who install things from a vcs or manually downloaded code (pre-release that's not on pypi for example). This category is probably a lot larger that than that of developers.
Hmm, I very occasionally will install the dev version of pip to get a fix I need. But I don't consider myself in that role as someone who pip should cater for - rather I expect to manage doing so myself, whether that's by editing the pip code to add a local version ID, or just by dealing with the odd edge cases. I find it hard to imagine that there are a significant number of users who install from development sources but who aren't developers (at least to the extent that testers of pre-release code are also developers).
As soon as you produce a wheel with any compiled code inside, it matters with which compiler (and build flags, etc.) you build it. There are typically subtle, and sometimes very obvious, functional differences. Same for sdists, contents for example depend on the Cython version you have installed when you generate it.
By "functional differences" I mean code changes, not build flag or compiler version changes. My rule is that if 2 wheels have the same version, they should be buildable from the exact same source code. (Ideally I'd like to say sdist here, but I don't want to get sucked into the distinction between source directory and sdist again, which is yet another completely independent debate).
"wheels and sdists" != "release artifacts"
Please explain. All you've done here is state that you don't agree with me, but given no reasons. Let me restate my comment, without using the disputed term: "for me the act of producing wheels and sdists should freeze the version number" I find it hard to understand what the point of a version number *is*, if it's not to identify a specific set of source code that has been used to generate the wheels and sdists that are tagged with that version number. Temporary development builds can be in all sorts of inconsistent states, and one of those states might be "the code has been changed but the version number hasn't" but as soon as you give a wheel or sdist to anyone else, you have a responsibility to identify the source code that the wheel/sdist came from, and the version number is how you do that. Personally, I think the issue here is that there are a lot of people in the scientific community who people outside that community would class as "developers", but who aren't considered that way from within the community. I tend to try to assign to these people the expertise and responsibilities that I would expect of a developer, not of an end user. If in fact they are a distinct class of user, then I think the scientific community need to explain more clearly what expertise and responsibilities pip can expect of these users. And why treating them as developers isn't reasonable. Paul
On Sun, Nov 8, 2015 at 2:23 PM, Paul Moore
On 8 November 2015 at 11:13, Ralf Gommers
wrote: "wheels and sdists" != "release artifacts"
Please explain. All you've done here is state that you don't agree with me, but given no reasons.
Come on, I elaborated in the sentence right below it. Which you cut out in your reply. Here it is again: "I fully agree of course that we want things on PyPi (which are release artifacts) to have unique version numbers etc. But wheels and sdists are produced all the time, and only sometimes are they release artifacts." Ralf
On 8 November 2015 at 13:34, Ralf Gommers
On Sun, Nov 8, 2015 at 2:23 PM, Paul Moore
wrote: On 8 November 2015 at 11:13, Ralf Gommers
wrote: "wheels and sdists" != "release artifacts"
Please explain. All you've done here is state that you don't agree with me, but given no reasons.
Come on, I elaborated in the sentence right below it. Which you cut out in your reply. Here it is again:
"I fully agree of course that we want things on PyPi (which are release artifacts) to have unique version numbers etc. But wheels and sdists are produced all the time, and only sometimes are they release artifacts."
Sorry, my mistake. I didn't see how this part related (and still don't). What are wheels and sdists if they are not not "release artifacts"? Are we just quibbling about the what term "release artifact" means? If so, I'll revert to using "wheels and sdists" as I did in my repsonse. I thought it was obvious that wheels and sdists *are* the release artifacts in the process of producing Python packages. It doesn't matter where they are released *to*, it can be to PyPI, or a local server, or just to a wheelhouse or other directory on your PC that you keep for personal use only. Once they are created by you as anything other than a temporary file in a multi-step install process they are "release artifacts" as I understand/mean the term. But terminology's not a big deal, as long as we understand each other. Paul
On Sun, Nov 8, 2015 at 2:45 PM, Paul Moore
On 8 November 2015 at 13:34, Ralf Gommers
wrote: On Sun, Nov 8, 2015 at 2:23 PM, Paul Moore
wrote: On 8 November 2015 at 11:13, Ralf Gommers
wrote:
"wheels and sdists" != "release artifacts"
Please explain. All you've done here is state that you don't agree with me, but given no reasons.
Come on, I elaborated in the sentence right below it. Which you cut out in your reply. Here it is again:
"I fully agree of course that we want things on PyPi (which are release artifacts) to have unique version numbers etc. But wheels and sdists are produced all the time, and only sometimes are they release artifacts."
Sorry, my mistake. I didn't see how this part related (and still don't). What are wheels and sdists if they are not not "release artifacts"? Are we just quibbling about the what term "release artifact" means?
I'm not sure about that, I don't think it's just terminology (see below). They obviously can be release artifacts, but they don't have to be - that's what I meant with !=.
If so, I'll revert to using "wheels and sdists" as I did in my repsonse. I thought it was obvious that wheels and sdists *are* the release artifacts in the process of producing Python packages. It doesn't matter where they are released *to*, it can be to
PyPI, or a local server, or just to a wheelhouse or other directory on
your PC that you keep for personal use only. Once they are created by you as anything other than a temporary file in a multi-step install process they are "release artifacts" as I understand/mean the term.
To me there's a fairly fundamental difference between things that are actually released (by the release manager of a project usually, or maybe someone building a local wheelhouse) and things that are produced under the hood by pip. For someone typing `pip install .`, sdist/wheel is an implementation detail that is invisible to him/her and he/she shouldn't have to care about imho.
But terminology's not a big deal, as long as we understand each other.
Agreed. Ralf
On Sun, Nov 8, 2015 at 2:23 PM, Paul Moore
You only have two categories? I'm missing at least one important category: users who install things from a vcs or manually downloaded code (pre-release that's not on pypi for example). This category is probably a lot larger
On 8 November 2015 at 11:13, Ralf Gommers
wrote: that than that of developers.
Hmm, I very occasionally will install the dev version of pip to get a fix I need. But I don't consider myself in that role as someone who pip should cater for - rather I expect to manage doing so myself, whether that's by editing the pip code to add a local version ID, or just by dealing with the odd edge cases.
I find it hard to imagine that there are a significant number of users who install from development sources but who aren't developers
There are way more of those users than actual developers, I'm quite sure of that. See below for numbers.
(at least to the extent that testers of pre-release code are also developers). ...
That's not a very helpful way to look at it from my point of view. Those users may just want to check that their code still works, or they need a bugfix that's not in the released version, or ....
Personally, I think the issue here is that there are a lot of people in the scientific community who people outside that community would class as "developers",
Then I guess those "outside" would be web/app developers? For anyone developing a library or some other infrastructure to be used somewhere other than via a graphical or command line UI, I think the distinction I make (I'll elaborate below) will be clear.
but who aren't considered that way from within the community. I tend to try to assign to these people the expertise and responsibilities that I would expect of a developer, not of an end user. If in fact they are a distinct class of user, then I think the scientific community need to explain more clearly what expertise and responsibilities pip can expect of these users. And why treating them as developers isn't reasonable.
To give an example for Numpy: - there are 5-10 active developers with commit rights - there are 50-100 contributors who submit PRs - there are O(1000) people who read the mailing list - there are O(1 million) downloads/installs per year Downloads/users are hard to count correctly, but there are at least 1000x more users than developers (this will be the case for many popular packages). Those users are often responsible for installing the package themselves. They aren't trained programmers, only know Python to the extent that they can get their work done, and they don't know much (if anything) about packaging, wheels, etc. All they know may be "I have to execute `python setup.py install`". Those are the users I'm concerned about. There's no reasonable way you can classify/treat them as developers I think. By the way, everything we discuss here has absolutely no impact on what you defined as "user" (the released-version only PyPi user). While it's critical for what I defined as "the second kind of user". Ralf
On 8 November 2015 at 13:51, Ralf Gommers
To give an example for Numpy: - there are 5-10 active developers with commit rights - there are 50-100 contributors who submit PRs - there are O(1000) people who read the mailing list - there are O(1 million) downloads/installs per year Downloads/users are hard to count correctly, but there are at least 1000x more users than developers (this will be the case for many popular packages). Those users are often responsible for installing the package themselves. They aren't trained programmers, only know Python to the extent that they can get their work done, and they don't know much (if anything) about packaging, wheels, etc. All they know may be "I have to execute `python setup.py install`". Those are the users I'm concerned about. There's no reasonable way you can classify/treat them as developers I think.
Agreed. But they (by which I assume you mean the 3rd and 4th categories in your list) should be using released versions, surely? So they should be using "pip install <requirement>", not downloading source and building it. Maybe they have to download and build right now, but that's precisely what we're trying to move away from, surely? Only the 100 or so developers and contributors (plus maybe as many more redistributors who build platform-specific wheels for platforms the project doesn't support directly) need to build from source, everyone else just installs prereleased wheels. Paul
On Nov 8, 2015 5:23 AM, "Paul Moore"
[...]
I find it hard to imagine that there are a significant number of users who install from development sources but who aren't developers (at least to the extent that testers of pre-release code are also developers).
I'm not sure exactly what's at stake in this terminological/ontological debate, but it certainly is fairly common for developers to have conversations like "thanks for reporting that issue, I think it's fixed in master but can't reproduce myself so can you try 'pip install https://github.com/pydata/patsy/archive/master.zip' and report back whether it helps?" And often the person on the other end of this conversation knows absolutely nothing about python packaging, might have started learning python last week, etc. (Or maybe more to the point, you as a developer have absolutely no idea how much they know or what reasonable or unreasonable things they'll try if something goes wrong, and don't have time to have a long tutorial discussion to figure it out, so you need to be able to give instructions that are robust enough to work regardless of your interlocutor's actual knowledge level.) Probably the absolute numbers aren't large, but when you're one of the 5-10 people maintaining a package that has complicated build/install/OS issues and is used by O(a million) people, many of whom are learning programming for the first time via your package and immediately using it for their real work, then these kinds of fuzzy middle cases take up a lot of time :-). (Maybe this should become an official ui metric. "How easy is your tool to use interactively", "how easy is your tool to use in an automated way via shell script", "how easy is your tool to use in a semi-automated way where we've replaced the shell with a human being who first learned what the terminal was one week ago".) -n
On 8 November 2015 at 17:42, Nathaniel Smith
I'm not sure exactly what's at stake in this terminological/ontological debate, but it certainly is fairly common for developers to have conversations like "thanks for reporting that issue, I think it's fixed in master but can't reproduce myself so can you try 'pip install https://github.com/pydata/patsy/archive/master.zip' and report back whether it helps?"
Well, reviewing this scenario is probably much more useful than the endless terminology debates that I seem to be forever starting, so thanks for stopping me! It seems to me that in this situation, optimising rebuild times probably isn't too important. The user is likely to only be building once or twice, so reusing object files from a previous build isn't likely to be a killer benefit. However, if the user does as you asked here, they'd likely be pretty surprised (and it'd be a nasty situation for you to debug) if pip didn't install what the user asked. In all honesty, You could argue that this implies that pip should unconditionally install files specified on the command line, but I'd suggest that you should actually be asking the user to run 'pip install --ignore-installed https://github.com/pydata/patsy/archive/master.zip'. That avoids any risk that whatever the user has currently installed could mess things up, and is explicit that it's doing so (and equally, it's explicit that it'll overwrite the currently installed version, which the user might not want to do in his main environment). Maybe you could argue that you want --ignore-installed to be the default (probably only when a file is specified rather than a requirement, assuming that distinguishing between a file and a requirement is practical). But if we did that, we'd still need a --dont-ignore-installed flag to restore the current behaviour. For example, because Christoph Gohlke's builds must be manually downloaded, I find it's quite common to download a wheel from his site and "pip install" it in a number of environments, with the meaning "only if it'd be an upgrade to whatever is currently installed". So this specific example seems to me to be entirely covered by current pip behaviour. Paul
On Sun, Nov 8, 2015 at 12:52 PM, Paul Moore
On 8 November 2015 at 17:42, Nathaniel Smith
wrote: I'm not sure exactly what's at stake in this terminological/ontological debate, but it certainly is fairly common for developers to have conversations like "thanks for reporting that issue, I think it's fixed in master but can't reproduce myself so can you try 'pip install https://github.com/pydata/patsy/archive/master.zip' and report back whether it helps?"
Well, reviewing this scenario is probably much more useful than the endless terminology debates that I seem to be forever starting, so thanks for stopping me!
It seems to me that in this situation, optimising rebuild times probably isn't too important. The user is likely to only be building once or twice, so reusing object files from a previous build isn't likely to be a killer benefit.
Sure. And there's no reasonable way to optimize rebuild times anyway when the input is a remote URL -- it's only when the input is an on-disk directory that worrying about incremental builds even makes sense.
However, if the user does as you asked here, they'd likely be pretty surprised (and it'd be a nasty situation for you to debug) if pip didn't install what the user asked. In all honesty, You could argue that this implies that pip should unconditionally install files specified on the command line,
Yes, that is what I do argue :-)
but I'd suggest that you should actually be asking the user to run 'pip install --ignore-installed https://github.com/pydata/patsy/archive/master.zip'. That avoids any risk that whatever the user has currently installed could mess things up, and is explicit that it's doing so (and equally, it's explicit that it'll overwrite the currently installed version, which the user might not want to do in his main environment).
Problem 1 is that I don't actually know what --ignore-installed does. My first guess is that it would cause pip to skip uninstalling packages before upgrading them, resulting in an inconsistent/corrupt environment. (No, this doesn't sound like particularly useful behavior to me either, but most operations/switches in pip have semantics that are somewhat skewed from what I would consider intuitive, so who knows. It's right next to --no-deps in the --help output, and --no-deps is literally a "please give me an inconsistent/corrupt environment" switch, so it's totally plausible that --ignore-installed is intended for similarly ill-conceived uses.) Or maybe it causes pip to pretend that the environment is totally empty when picking the set of (package, version) tuples to install, triggering upgrades of dependent packages? I would actually guess both of those before guessing that it means "please actually install the thing I asked you to install, but otherwise act normally", and as of right now I still actually have no idea which of these is correct (if any). AFAICT there aren't any docs -- maybe I'm just failing to search properly. Problem 2 is that even if --ignore-installed does do the appropriate thing, and even if there is some way for me to figure this out, then it will still inevitably happen that 1 in 10 times I will forget to mention it, not notice that I have forgotten to mention it, and the user will not realize that nothing has happened, and just report that "they installed the new version but they still get the same error", and then I spend hours tearing out my hair trying to figure out why not (because I "know" that they actually installed the new version). If you want to optimize your UI to frustrate people and waste their time, then a really impressively good technique is to include a special switch that usually does nothing, but every once in a while is necessary, and if you forget it then the computer and the user's mental model will get totally out of sync. Otherwise, though... :-/
Maybe you could argue that you want --ignore-installed to be the default (probably only when a file is specified rather than a requirement, assuming that distinguishing between a file and a requirement is practical). But if we did that, we'd still need a --dont-ignore-installed flag to restore the current behaviour. For example, because Christoph Gohlke's builds must be manually downloaded, I find it's quite common to download a wheel from his site and "pip install" it in a number of environments, with the meaning "only if it'd be an upgrade to whatever is currently installed".
Sure, I have no objection to a pip install --only-if-upgrade flag. -n -- Nathaniel J. Smith -- http://vorpus.org
On Sat, Nov 7, 2015 at 6:57 AM, Paul Moore
2. (For here) Builds are not isolated from what's in the development directory. So if you have your sdist definition wrong, what you build locally may work, but when you release it it may fail. Obviously that can be fixed by proper development and testing practices, but pip is designed currently to isolate builds to protect against mistakes like this, we'd need to remove that protection for cases where we wanted to do in-place builds.
I agree that it would be nice to make sdist generation more reliable and tested by default, but I don't think this quite works as a solution. 1) There's no guarantee that building an sdist from some dirty working tree will produce anything like what you'd have for a release sdist, or even a clean isolated build. (E.g. a very common mistake is adding a new file to the working directory for forgetting to run 'git/hg add'. To protect against this, you have to either have to have a build system that's smart enough to talk to the VCS when figuring out what files to include, or better yet you have to work from a clean checkout.) And as currently specified these "isolated" build trees might even end up including partial build detritus from previous in-place builds, copied from the source directory into the temporary directory. 2) Sometimes people will want to download an sdist, unpack it, and then run 'pip install .' from it. In your proposal this would require first building a new sdist from the unpacked working tree. But there's no guarantee that you can generate an sdist from an sdist. None of the proposals for a new build system interface have contemplated adding an "sdist" command, and even if they did, then a clever sdist command might well fail, e.g. because it is only designed to build sdists from a checkout with full VCS metadata that it can use to figure out what files to include :-). 3) And anyway, it's pretty weird logically to include a mandatory sdist command inside an interface that 99% of the time will be working *from* an sdist :-). The rule of thumb I've used for the build interface stuff so far is that it should be the minimal stuff that is needed to provide a convenient interface for people who just want to install packages, because the actual devs on a particular project can use whatever project/build-system-specific interfaces make sense for their workflow. And end-users don't build sdists. But for the operations that pip does provide, like 'pip wheel' and 'pip install', they should be usable by devs, because devs will use them.
3. The logic inside pip for doing builds is already pretty tricky. Adding code to sometimes build in place and sometimes in a temporary directory is going to make it even more complex. That might not be a concern for end users, but it makes maintaining pip harder, and risks there being subtle bugs in the logic that could bite end users. If you want specifics, I can't give them at the moment, because I don't know what the code to do the proposed in-place building would look like.
Yeah, this is always a concern for any change. The tradeoff is that you get to delete the code for "downloading" unpacked directories into a temporary directory (which currently doesn't even use sdist -- it just blindly copies everything, including e.g. the full git history). And you get to skip specifying a standard build-an-sdist interface that pip and every build system backend would all have to support and interoperate on. Basically AFAICT the logic should be: 1) Arrange for the existence of a build directory: If building from a directory: great, we have one, use that else if building from a file/url: download it and unpack it, then use that 2) do the build using the build directory 3) if it's a temporary directory and the build succeeded, clean up (Possibly with some complications like providing options for people to specify a non-temporary directory to use for unpacking downloaded sdists.) It might need a bit of refactoring so that the "arrange for the existence of a build directory" step returns the chosen build directory instead of taking it as a parameter like I assume it does now, but it doesn't seem like the intrinsic complexity is very high.
I hope that helps. It's probably not as specific or explicit as you'd like, but to be fair, nor is the proposal.
What we currently have on the table is "If 'pip (install/wheel) .' is supposed to become the standard way to build things, then it should probably build in-place by default." For my personal use cases, I don't actually agree with any of that, but my use cases are not even remotely like those of numpy developers, so I don't want to dismiss the requirement. But if it's to go anywhere, it needs to be better explained.
Just to be clear, *my* position (for projects simpler than numpy and friends) is:
1. The standard way to install should be "pip install <requirement or wheel>". 2. The standard way to build should be "pip wheel <sdist or directory>". The directory should be a clean checkout of something you plan to release, with a unique version number. 3. The standard way to develop should be "pip install -e ." 4. Builds (pip wheel) should always unpack to a temporary location and build there. When building from a directory, in effect build a sdist and unpack it to the temporary location.
I hear the message that for things like numpy these rules won't work. But I'm completely unclear on why. Sure, builds take ages unless done incrementally. That's what pip install -e does, I don't understand why that's not acceptable.
To me this feels like mixing two orthogonal issues. 'pip install' and 'pip install -e' have different *semantics* -- one installs a snapshot into an environment, and one installs a symlink-like-thing into an environment -- and that's orthogonal to the question of whether you want to implement that using a "clean build" or not. (Also, it's totally reasonable to want partial builds in 'pip wheel': 'pip wheel .', get a compiler error, fix it, try again...) Furthermore, I actually really dislike 'pip install -e' and am surprised to see so many people talking about it as if it were the obvious choice for all development :-). I understand it takes all kinds, etc., I'm not arguing that it should be removed or anything (though I probably would if I thought it had any chance of getting consensus :-)). But from my point of view, 'pip install -e' is a weird intrinsically-kinda-broken wart that provides no value outside of some rare use cases that most people never encounter. I say "intrinsically-kinda-broken" because as soon as you do an editable install, the metadata in .egg/dist-info starts to drift out of sync from your actual source tree, so that it necessarily makes the installed package database less reliable, undermining a lot of the work that's being done to make installation and resolution more robust. I also am really unsure about why people use it. I generally don't *want* to install code-under-development into a full-fledged virtualenv. I see lots of people who have a primary virtualenv that they use for day-to-day work, and they 'pip install -e' all the packages that they work on into this environment, and then run into all kinds of weird problems because they're using a bunch of untested code together, or they switch to a different branch of one package to check something and then forget about it when they context switch to some other project and everything is broken. And then they try to install some other package, and it depends on foo >= 1.2, and they have an editable install of foo that claims to be 1.1 (because that was the last time the .egg-info was regenerated) but really it's 1.3 and all kinds of weird things happen. And for packages with binary extensions, it doesn't really work, anyway, because you still have to rebuild every time (and you can get extra bonus forms of weird skew, where when you import the package then you get the up-to-date version of some source files -- the .py ones -- combined with out-of-date versions of others -- the .pyx / .c / .cpp ones). Even if I do decide that I want to install a non-official release into some virtualenv, I'd like to install a consistent snapshot that gets upgraded or uninstalled all together as an atomic unit. What I actually do when working on NumPy is that I use a little script [1] that does the equivalent of: $ rm -rf ./.tmpdir $ pip install . -d ./.tmpdir $ cd ./.tmpdir $ python -c 'import numpy; numpy.test()' OTOH, for packages without binary extensions, I just run my tests or start a REPL from the root of my source dir, and that works fine without the hassle of creating and activating a virtualenv, or polluting my normal environment with untested code. Also, 'pip install -e' intrinsically pollutes your source tree with build artifacts. I come from the build system tradition that says that build artifacts should all be shunted to the side and leave the actual directories uncluttered: https://www.gnu.org/software/automake/manual/html_node/VPATH-Builds.html and I think that a valid approach that build system authors might want to make is to enforce the invariant that the build system never writes to anywhere outside of $srcdir/build/ or similar. If we insist that editable installs are the only way to work, then we take this option away from projects. So there simply isn't any problem I have where editable installs are the best solution, and I see them causing problems for people all the time. That said, there are two theoretical advantages I can see to editable installs: 1) Unlike starting an interpreter from the root of your source tree, they trigger the install of runtime dependencies. I solve this by just installing those into my working environment myself, but for projects with complex dependencies I guess 'install -e' might ATM be the most convenient way to get this set up. This isn't a very compelling argument, though, because one could trivally provide better support for just this ('pip install-dependencies .' or something) without bringing along the intrinsically tricky bits of editable installs. 2) For people working on complex projects that involve multiple pure-python packages that are distributed separately but that require coordinated changes in sync (maybe OpenStack is like this?), so each round of your edit/test cycle involves edits to multiple different projects, then 'pip install -e' kinda solves a genuine problem, because it lets you assemble a single working environment that contains the editable versions of everything together. This seems like a genuine use case -- but it's what I meant at the top about how they seem like a very specialized tool for rare cases, because very few people are working on meta-projects composed of multiple pure-python sub-projects evolving in lock-step. Anyway, like I said, I'm not trying to argue that 'pip install -e' should be deprecated -- I understand that many people love it for reasons that I don't fully understand. My goal is just to help those who think 'pip install -e' is obviously the one-and-only way to do python development to understand my perspective, and why we might want to support other options as well. I think the actual bottom line for pip as a project is: we all agree that sooner or later we have to move users away from running 'setup.py install'. Practically speaking, that's only going to happen if 'pip install' actually functions as a real replacement, and doesn't create regressions in people's workflows. Right now it does. The thing that started this whole thread is that numpy had actually settled on going ahead and making the switch to requiring pip install, but then got derailed by issues like these... -n [1] https://github.com/numpy/numpy/blob/master/runtests.py -- Nathaniel J. Smith -- http://vorpus.org
On Mon, Nov 2, 2015 at 5:57 PM, Nathaniel Smith
On Sun, Nov 1, 2015 at 3:16 PM, Ralf Gommers
wrote: 2. ``pip install .`` silences build output, which may make sense for some usecases, but for numpy it just sits there for minutes with no output after printing "Running setup.py install for numpy". Users will think it hangs and Ctrl-C it. https://github.com/pypa/pip/issues/2732
I tend to agree with the commentary there that for end users this is different but no worse than the current situation where we spit out pages of "errors" that don't mean anything :-). I posted a suggestion on that bug that might help with the apparent hanging problem.
For the record, this is now fixed in pip's "develop" branch and should be in the next release. For commands like 'setup.py install', pip now displays a spinner that ticks over whenever the underlying process prints to stdout/stderr. So if the underlying process hangs, then the spinner will stop (it's not just lying to you), but normally it works nicely. https://github.com/pypa/pip/pull/3224 -n -- Nathaniel J. Smith -- http://vorpus.org
participants (6)
-
Chris Barker - NOAA Federal
-
Donald Stufft
-
Nathaniel Smith
-
Paul Moore
-
Ralf Gommers
-
Robert Collins