Should I use pip install numpy in linux?

Dear all, I know that in Windows, we should use either Christoph's package or Anaconda for MKL-optimized numpy. In Linux, the fortran compiler issue is solved, so should I directly used pip install numpy to get numpy with a reasonable BLAS library? Thanks! Shawn -- Yuxiang "Shawn" Wang Gerling Haptics Lab University of Virginia yw5aj@virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/

On Thu, Jan 7, 2016 at 6:18 PM, Yuxiang Wang <yw5aj@virginia.edu> wrote:
Dear all,
I know that in Windows, we should use either Christoph's package or Anaconda for MKL-optimized numpy. In Linux, the fortran compiler issue is solved, so should I directly used pip install numpy to get numpy with a reasonable BLAS library?
pip install numpy should work fine; whether it gives you a reasonable BLAS library will depend on whether you have the development files for a reasonable BLAS library installed, and whether numpy's build system is able to automatically locate them. Generally this means that if you're on a regular distribution and remember to install a decent BLAS -dev or -devel package, then you'll be fine. On Debian/Ubuntu, 'apt install libopenblas-dev' is probably enough to ensure something reasonable happens. Anaconda is also an option on linux if you want MKL (or openblas). -n -- Nathaniel J. Smith -- http://vorpus.org

Dear Nathaniel, Gotcha. That's very helpful. Thank you so much! Shawn On Thu, Jan 7, 2016 at 10:01 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Jan 7, 2016 at 6:18 PM, Yuxiang Wang <yw5aj@virginia.edu> wrote:
Dear all,
I know that in Windows, we should use either Christoph's package or Anaconda for MKL-optimized numpy. In Linux, the fortran compiler issue is solved, so should I directly used pip install numpy to get numpy with a reasonable BLAS library?
pip install numpy should work fine; whether it gives you a reasonable BLAS library will depend on whether you have the development files for a reasonable BLAS library installed, and whether numpy's build system is able to automatically locate them. Generally this means that if you're on a regular distribution and remember to install a decent BLAS -dev or -devel package, then you'll be fine.
On Debian/Ubuntu, 'apt install libopenblas-dev' is probably enough to ensure something reasonable happens.
Anaconda is also an option on linux if you want MKL (or openblas).
-n
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Yuxiang "Shawn" Wang Gerling Haptics Lab University of Virginia yw5aj@virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/

Hi, On Fri, Jan 8, 2016 at 4:28 PM, Yuxiang Wang <yw5aj@virginia.edu> wrote:
Dear Nathaniel,
Gotcha. That's very helpful. Thank you so much!
Shawn
On Thu, Jan 7, 2016 at 10:01 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Jan 7, 2016 at 6:18 PM, Yuxiang Wang <yw5aj@virginia.edu> wrote:
Dear all,
I know that in Windows, we should use either Christoph's package or Anaconda for MKL-optimized numpy. In Linux, the fortran compiler issue is solved, so should I directly used pip install numpy to get numpy with a reasonable BLAS library?
pip install numpy should work fine; whether it gives you a reasonable BLAS library will depend on whether you have the development files for a reasonable BLAS library installed, and whether numpy's build system is able to automatically locate them. Generally this means that if you're on a regular distribution and remember to install a decent BLAS -dev or -devel package, then you'll be fine.
On Debian/Ubuntu, 'apt install libopenblas-dev' is probably enough to ensure something reasonable happens.
Anaconda is also an option on linux if you want MKL (or openblas).
I wrote a page on using pip with Debian / Ubuntu here : https://matthew-brett.github.io/pydagogue/installing_on_debian.html Cheers, Matthew

Does anyone know if there's been any movements with the PyPI folks on allowing linux wheels to be uploaded? I know you can never be certain what's provided by the distro, but it seems like if Anaconda can solve the cross-distro-binary-distribution-of-compiled-python-extensions problem, there shouldn't be much technically different for Linux wheels. -Robert On Fri, Jan 8, 2016 at 9:12 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Jan 8, 2016 at 4:28 PM, Yuxiang Wang <yw5aj@virginia.edu> wrote:
Dear Nathaniel,
Gotcha. That's very helpful. Thank you so much!
Shawn
On Thu, Jan 7, 2016 at 10:01 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Jan 7, 2016 at 6:18 PM, Yuxiang Wang <yw5aj@virginia.edu> wrote:
Dear all,
I know that in Windows, we should use either Christoph's package or Anaconda for MKL-optimized numpy. In Linux, the fortran compiler issue is solved, so should I directly used pip install numpy to get numpy with a reasonable BLAS library?
pip install numpy should work fine; whether it gives you a reasonable BLAS library will depend on whether you have the development files for a reasonable BLAS library installed, and whether numpy's build system is able to automatically locate them. Generally this means that if you're on a regular distribution and remember to install a decent BLAS -dev or -devel package, then you'll be fine.
On Debian/Ubuntu, 'apt install libopenblas-dev' is probably enough to ensure something reasonable happens.
Anaconda is also an option on linux if you want MKL (or openblas).
I wrote a page on using pip with Debian / Ubuntu here : https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On 8 Jan 2016 19:07, "Robert McGibbon" <rmcgibbo@gmail.com> wrote:
Does anyone know if there's been any movements with the PyPI folks on
allowing linux wheels to be uploaded?
I know you can never be certain what's provided by the distro, but it
seems like if Anaconda can solve the cross-distro-binary-distribution-of-compiled-python-extensions problem, there shouldn't be much technically different for Linux wheels.
-Robert
On Fri, Jan 8, 2016 at 9:12 AM, Matthew Brett <matthew.brett@gmail.com>
wrote:
Hi,
On Fri, Jan 8, 2016 at 4:28 PM, Yuxiang Wang <yw5aj@virginia.edu> wrote:
Dear Nathaniel,
Gotcha. That's very helpful. Thank you so much!
Shawn
On Thu, Jan 7, 2016 at 10:01 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, Jan 7, 2016 at 6:18 PM, Yuxiang Wang <yw5aj@virginia.edu>
wrote:
Dear all,
I know that in Windows, we should use either Christoph's package or Anaconda for MKL-optimized numpy. In Linux, the fortran compiler issue is solved, so should I directly used pip install numpy to get numpy with a reasonable BLAS library?
pip install numpy should work fine; whether it gives you a reasonable BLAS library will depend on whether you have the development files for a reasonable BLAS library installed, and whether numpy's build system is able to automatically locate them. Generally this means that if you're on a regular distribution and remember to install a decent BLAS -dev or -devel package, then you'll be fine.
On Debian/Ubuntu, 'apt install libopenblas-dev' is probably enough to ensure something reasonable happens.
Anaconda is also an option on linux if you want MKL (or openblas).
I wrote a page on using pip with Debian / Ubuntu here : https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On 8 Jan 2016 19:07, "Robert McGibbon" <rmcgibbo@gmail.com> wrote:
Does anyone know if there's been any movements with the PyPI folks on
allowing linux wheels to be uploaded?
I know you can never be certain what's provided by the distro, but it
seems like if Anaconda can solve the cross-distro-binary-distribution-of-compiled-python-extensions problem, there shouldn't be much technically different for Linux wheels. Anaconda controls all of the dependent non-Python libraries which are outside of the pip/pypi ecosystem. Pip/wheel doesn't have that option until such libraries are packaged up for PyPI (e.g. pyopenblas). -- Oscar

Well, it's always possible to copy the dependencies like libopenblas.so into the wheel and fix up the RPATHs, similar to the way the Windows wheels work. I'm not sure if this is the right path for numpy or not, but it seems like something would be suitable for some projects with compiled extensions. But it's categorically ruled out by the PyPI policy, IIUC. Perhaps this is OT for this thread, and I should ask on distutils-sig. -Robert On Fri, Jan 8, 2016 at 12:12 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
On 8 Jan 2016 19:07, "Robert McGibbon" <rmcgibbo@gmail.com> wrote:
Does anyone know if there's been any movements with the PyPI folks on
allowing linux wheels to be uploaded?
I know you can never be certain what's provided by the distro, but it
seems like if Anaconda can solve the cross-distro-binary-distribution-of-compiled-python-extensions problem, there shouldn't be much technically different for Linux wheels.
Anaconda controls all of the dependent non-Python libraries which are outside of the pip/pypi ecosystem. Pip/wheel doesn't have that option until such libraries are packaged up for PyPI (e.g. pyopenblas).
-- Oscar
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Fri, Jan 8, 2016 at 1:58 PM, Robert McGibbon <rmcgibbo@gmail.com> wrote:
I'm not sure if this is the right path for numpy or not,
probably not -- AFAICT, the PyPa folks aren't interested in solving teh problems we have in the scipy community -- we can tweak around the edges, but we wont get there without a commitment to really solve the issues -- and if pip did that, it would essentially be conda -- non one wants to re-impliment conda.
Perhaps this is OT for this thread, and I should ask on distutils-sig.
there has been a lot of discussion of this issue on the distutils-sig in the last couple months (quiet lately). So yes, that's the place to go. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Hi, On Fri, Jan 8, 2016 at 11:27 PM, Chris Barker <chris.barker@noaa.gov> wrote:
On Fri, Jan 8, 2016 at 1:58 PM, Robert McGibbon <rmcgibbo@gmail.com> wrote:
I'm not sure if this is the right path for numpy or not,
probably not -- AFAICT, the PyPa folks aren't interested in solving teh problems we have in the scipy community -- we can tweak around the edges, but we wont get there without a commitment to really solve the issues -- and if pip did that, it would essentially be conda -- non one wants to re-impliment conda.
Well - as the OP was implying, it really should not be too difficult. We (here in Berkeley) have discussed how to do this for Linux, including (Nathaniel mainly) what would be sensible for pypi to do, in terms of platform labels. Both Anaconda and Canopy build on a base default Linux system so that the built binaries will work on many Linux systems. At the moment, Linux wheels have the platform tag of either linux_i686 (32-bit) or linux_x86_64 - example filenames: numpy-1.9.2-cp27-none-linux_i686.whl numpy-1.9.2-cp27-none-linux_x86_64.whl Obviously these platform tags are rather useless, because they don't tell you very much about whether this wheel will work on your own system. If we started building Linux wheels on a base system like that of Anaconda or Canopy we might like another platform tag that tells you that this wheel is compatible with a wide range of systems. So the job of negotiating with distutils-sig is trying to find a good name for this base system - we thought that 'manylinux' was a good one - and then put in a pull request to pip to recognize 'manylinux' as compatible when running pip install from a range of Linux systems. Cheers, Matthew

Both Anaconda and Canopy build on a base default Linux system so that the built binaries will work on many Linux systems.
I think the base linux system is CentOS 5, and from my experience, it seems like this approach has worked very well. Those packages are compatible with all essentially all Linuxes that are more recent than CentOS 5 (which is ancient). I have not heard of anyone complaining that the packages they install through conda don't work on their CentOS 4 or Ubuntu 6.06 box. I assume Python / pip is probably used on a wider diversity of linux flavors than conda is, so I'm sure that binaries built on CentOS 5 won't work for absolutely _every_ linux user, but it does seem to cover the substantial majority of linux users. Building redistributable linux binaries that work across a large number of distros and distro versions is definitely tricky. If you run ``python setup.py bdist_wheel`` on your Fedora Rawhide box, you can't really expect the wheel to work for too many other linux users. So given that, I can see why PyPI would want to be careful about accepting Linux wheels. But it seems like, if they make the upload something like ``` twine upload numpy-1.9.2-cp27-none-linux_x86_64.whl \ --yes-yes-i-know-this-is-dangerous-but-i-know-what-i'm-doing ``` that this would potentially be able to let packages like numpy serve their linux users better without risking too much junk being uploaded to PyPI. -Robert On Fri, Jan 8, 2016 at 3:50 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
On Fri, Jan 8, 2016 at 11:27 PM, Chris Barker <chris.barker@noaa.gov> wrote:
On Fri, Jan 8, 2016 at 1:58 PM, Robert McGibbon <rmcgibbo@gmail.com> wrote:
I'm not sure if this is the right path for numpy or not,
probably not -- AFAICT, the PyPa folks aren't interested in solving teh problems we have in the scipy community -- we can tweak around the edges, but we wont get there without a commitment to really solve the issues -- and if pip did that, it would essentially be conda -- non one wants to re-impliment conda.
Well - as the OP was implying, it really should not be too difficult.
We (here in Berkeley) have discussed how to do this for Linux, including (Nathaniel mainly) what would be sensible for pypi to do, in terms of platform labels.
Both Anaconda and Canopy build on a base default Linux system so that the built binaries will work on many Linux systems.
At the moment, Linux wheels have the platform tag of either linux_i686 (32-bit) or linux_x86_64 - example filenames:
numpy-1.9.2-cp27-none-linux_i686.whl numpy-1.9.2-cp27-none-linux_x86_64.whl
Obviously these platform tags are rather useless, because they don't tell you very much about whether this wheel will work on your own system.
If we started building Linux wheels on a base system like that of Anaconda or Canopy we might like another platform tag that tells you that this wheel is compatible with a wide range of systems. So the job of negotiating with distutils-sig is trying to find a good name for this base system - we thought that 'manylinux' was a good one - and then put in a pull request to pip to recognize 'manylinux' as compatible when running pip install from a range of Linux systems.
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Sat, Jan 9, 2016 at 12:31 AM, Robert McGibbon <rmcgibbo@gmail.com> wrote:
Both Anaconda and Canopy build on a base default Linux system so that the built binaries will work on many Linux systems.
I think the base linux system is CentOS 5, and from my experience, it seems like this approach has worked very well. Those packages are compatible with all essentially all Linuxes that are more recent than CentOS 5 (which is ancient). I have not heard of anyone complaining that the packages they install through conda don't work on their CentOS 4 or Ubuntu 6.06 box. I assume Python / pip is probably used on a wider diversity of linux flavors than conda is, so I'm sure that binaries built on CentOS 5 won't work for absolutely _every_ linux user, but it does seem to cover the substantial majority of linux users.
Building redistributable linux binaries that work across a large number of distros and distro versions is definitely tricky. If you run ``python setup.py bdist_wheel`` on your Fedora Rawhide box, you can't really expect the wheel to work for too many other linux users. So given that, I can see why PyPI would want to be careful about accepting Linux wheels.
But it seems like, if they make the upload something like
``` twine upload numpy-1.9.2-cp27-none-linux_x86_64.whl \ --yes-yes-i-know-this-is-dangerous-but-i-know-what-i'm-doing ```
that this would potentially be able to let packages like numpy serve their linux users better without risking too much junk being uploaded to PyPI.
I could well understand it if the pypa folks thought that was a bad idea. There are so many Linux distributions and therefore so many ways for this to go wrong, that the most likely outcome would be a relentless flood of people saying "ouch this doesn't work for me", and therefore decreased trust for pip / Linux in general. On the other hand, having a base build system and matching platform tag seems like it is well within reach, and, if we provided proof of concept, I guess that pypa would agree. Cheers, Matthew

On Fri, Jan 8, 2016 at 4:31 PM, Robert McGibbon <rmcgibbo@gmail.com> wrote:
Both Anaconda and Canopy build on a base default Linux system so that the built binaries will work on many Linux systems.
I think the base linux system is CentOS 5, and from my experience, it seems like this approach has worked very well. Those packages are compatible with all essentially all Linuxes that are more recent than CentOS 5 (which is ancient). I have not heard of anyone complaining that the packages they install through conda don't work on their CentOS 4 or Ubuntu 6.06 box.
Right. There's a small problem which is that the base linux system isn't just "CentOS 5", it's "CentOS 5 and here's the list of libraries that you're allowed to link to: ...", where that list is empirically chosen to include only stuff that really is installed on ~all linux machines and for which the ABI really has been stable in practice over multiple years and distros (so e.g. no OpenSSL). So the key next step is for someone to figure out and write down that list. Continuum and Enthought both have versions of it that we know are good... Does anyone know who maintains Anaconda's linux build environment?
I assume Python / pip is probably used on a wider diversity of linux flavors than conda is, so I'm sure that binaries built on CentOS 5 won't work for absolutely _every_ linux user, but it does seem to cover the substantial majority of linux users.
Building redistributable linux binaries that work across a large number of distros and distro versions is definitely tricky. If you run ``python setup.py bdist_wheel`` on your Fedora Rawhide box, you can't really expect the wheel to work for too many other linux users. So given that, I can see why PyPI would want to be careful about accepting Linux wheels.
But it seems like, if they make the upload something like
``` twine upload numpy-1.9.2-cp27-none-linux_x86_64.whl \ --yes-yes-i-know-this-is-dangerous-but-i-know-what-i'm-doing ```
that this would potentially be able to let packages like numpy serve their linux users better without risking too much junk being uploaded to PyPI.
That will never fly. But like Matthew says, I think we can probably get them to accept a PEP saying "here's a new well-specified platform tag that means that this wheel works on all linux systems meet the following list of criteria: ...", and then allow that new platform tag onto PyPI. -n -- Nathaniel J. Smith -- http://vorpus.org

Doesn't building on CentOS 5 also mean using a quite old version of gcc? I've never tested this, but I've seen claims on the anaconda mailing list of ~25% slowdowns compared to building from source or using system packages, which was attributed to building using an older gcc that doesn't optimize as well as newer versions. On Fri, Jan 8, 2016 at 9:13 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Fri, Jan 8, 2016 at 4:31 PM, Robert McGibbon <rmcgibbo@gmail.com> wrote:
Both Anaconda and Canopy build on a base default Linux system so that the built binaries will work on many Linux systems.
I think the base linux system is CentOS 5, and from my experience, it seems like this approach has worked very well. Those packages are compatible with all essentially all Linuxes that are more recent than CentOS 5 (which is ancient). I have not heard of anyone complaining that the packages they install through conda don't work on their CentOS 4 or Ubuntu 6.06 box.
Right. There's a small problem which is that the base linux system isn't just "CentOS 5", it's "CentOS 5 and here's the list of libraries that you're allowed to link to: ...", where that list is empirically chosen to include only stuff that really is installed on ~all linux machines and for which the ABI really has been stable in practice over multiple years and distros (so e.g. no OpenSSL).
So the key next step is for someone to figure out and write down that list. Continuum and Enthought both have versions of it that we know are good...
Does anyone know who maintains Anaconda's linux build environment?
I assume Python / pip is probably used on a wider diversity of linux flavors than conda is, so I'm sure that binaries built on CentOS 5 won't work for absolutely _every_ linux user, but it does seem to cover the substantial majority of linux users.
Building redistributable linux binaries that work across a large number of distros and distro versions is definitely tricky. If you run ``python setup.py bdist_wheel`` on your Fedora Rawhide box, you can't really expect the wheel to work for too many other linux users. So given that, I can see why PyPI would want to be careful about accepting Linux wheels.
But it seems like, if they make the upload something like
``` twine upload numpy-1.9.2-cp27-none-linux_x86_64.whl \ --yes-yes-i-know-this-is-dangerous-but-i-know-what-i'm-doing ```
that this would potentially be able to let packages like numpy serve their linux users better without risking too much junk being uploaded to PyPI.
That will never fly. But like Matthew says, I think we can probably get them to accept a PEP saying "here's a new well-specified platform tag that means that this wheel works on all linux systems meet the following list of criteria: ...", and then allow that new platform tag onto PyPI.
-n
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Fri, Jan 8, 2016 at 7:17 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:
Doesn't building on CentOS 5 also mean using a quite old version of gcc?
Yes. IIRC CentOS 5 ships with gcc 4.4, and you can bump that up to gcc 4.8 by using the Redhat Developer Toolset release (which is gcc + special backport libraries to let it generate RHEL5/CentOS5-compatible binaries). (I might have one or both of those version numbers slightly wrong.)
I've never tested this, but I've seen claims on the anaconda mailing list of ~25% slowdowns compared to building from source or using system packages, which was attributed to building using an older gcc that doesn't optimize as well as newer versions.
I'd be very surprised if that were a 25% slowdown in general, as opposed to a 25% slowdown on some particular inner loop that happened to neatly match some new feature in a new gcc (e.g. something where the new autovectorizer kicked in). But yeah, in general this is just an inevitable trade-off when it comes to distributing binaries: you're always going to pay some penalty for achieving broad compatibility as compared to artisanally hand-tuned binaries specialized for your machine's exact OS version, processor, etc. Not much to be done, really. At some point the baseline for compatibility will switch to "compile everything on CentOS 6", and that will be better but it will still be worse than binaries that target CentOS 7, and so on and so forth. -n -- Nathaniel J. Smith -- http://vorpus.org

Doesn't building on CentOS 5 also mean using a quite old version of gcc?
I have had pretty good luck using the (awesomely named) Holy Build Box <http://phusion.github.io/holy-build-box/>, which is a CentOS 5 docker image with a newer gcc version installed (but I guess the same old libc). I'm not 100% sure how it works, but it's quite nice. For example, you can use c++11 and still keep all the binary compatibility benefits of CentOS 5. -Robert On Fri, Jan 8, 2016 at 7:38 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Fri, Jan 8, 2016 at 7:17 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:
Doesn't building on CentOS 5 also mean using a quite old version of gcc?
Yes. IIRC CentOS 5 ships with gcc 4.4, and you can bump that up to gcc 4.8 by using the Redhat Developer Toolset release (which is gcc + special backport libraries to let it generate RHEL5/CentOS5-compatible binaries). (I might have one or both of those version numbers slightly wrong.)
I've never tested this, but I've seen claims on the anaconda mailing list of ~25% slowdowns compared to building from source or using system packages, which was attributed to building using an older gcc that doesn't optimize as well as newer versions.
I'd be very surprised if that were a 25% slowdown in general, as opposed to a 25% slowdown on some particular inner loop that happened to neatly match some new feature in a new gcc (e.g. something where the new autovectorizer kicked in). But yeah, in general this is just an inevitable trade-off when it comes to distributing binaries: you're always going to pay some penalty for achieving broad compatibility as compared to artisanally hand-tuned binaries specialized for your machine's exact OS version, processor, etc. Not much to be done, really. At some point the baseline for compatibility will switch to "compile everything on CentOS 6", and that will be better but it will still be worse than binaries that target CentOS 7, and so on and so forth.
-n
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Fri, Jan 8, 2016 at 7:41 PM, Robert McGibbon <rmcgibbo@gmail.com> wrote:
Doesn't building on CentOS 5 also mean using a quite old version of gcc?
I have had pretty good luck using the (awesomely named) Holy Build Box, which is a CentOS 5 docker image with a newer gcc version installed (but I guess the same old libc). I'm not 100% sure how it works, but it's quite nice. For example, you can use c++11 and still keep all the binary compatibility benefits of CentOS 5.
They say they have gcc 4.8: https://github.com/phusion/holy-build-box#isolated-build-environment-based-o... so I bet they're using RH's devtools gcc. This means that it works via the labor of some unsung programmers at RH who went through all the library changes between gcc 4.4 and 4.8, and put together a version of 4.8 that for every important symbol knows whether it's available in the old 4.4 libraries or not; for the ones that are, it dynamically links them; for the ones that aren't, it has a special static library that it pulls them out of. Like sewer cleaning, it's the kind of very impressive, incredibly valuable infrastructure work that I'm really glad someone does. Someone else who's not me... Continuum and Enthought both have a whole list of packages beyond glibc that are safe enough to link to, including a bunch of ones that would be big pains to statically link everywhere (libX11, etc.). That's the useful piece of information that goes beyond just CentOS5 + RH devtools + static linking -- can't tell of the "Holy Build Box" has anything like that. -n -- Nathaniel J. Smith -- http://vorpus.org

Continuum and Enthought both have a whole list of packages beyond glibc that are safe enough to link to, including a bunch of ones that would be big pains to statically link everywhere (libX11, etc.). That's the useful piece of information that goes beyond just CentOS5 + RH devtools + static linking -- can't tell of the "Holy Build Box" has anything like that.
Probably-crazy Idea: One could reconstruct that list by downloading all of https://repo.continuum.io/pkgs/free/linux-64/, untarring everything, and running `ldd` on all of the binaries and .so files. Can't be that hard... right? -Robert On Fri, Jan 8, 2016 at 8:03 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Fri, Jan 8, 2016 at 7:41 PM, Robert McGibbon <rmcgibbo@gmail.com> wrote:
Doesn't building on CentOS 5 also mean using a quite old version of gcc?
I have had pretty good luck using the (awesomely named) Holy Build Box, which is a CentOS 5 docker image with a newer gcc version installed (but I guess the same old libc). I'm not 100% sure how it works, but it's quite nice. For example, you can use c++11 and still keep all the binary compatibility benefits of CentOS 5.
They say they have gcc 4.8:
https://github.com/phusion/holy-build-box#isolated-build-environment-based-o... so I bet they're using RH's devtools gcc. This means that it works via the labor of some unsung programmers at RH who went through all the library changes between gcc 4.4 and 4.8, and put together a version of 4.8 that for every important symbol knows whether it's available in the old 4.4 libraries or not; for the ones that are, it dynamically links them; for the ones that aren't, it has a special static library that it pulls them out of. Like sewer cleaning, it's the kind of very impressive, incredibly valuable infrastructure work that I'm really glad someone does. Someone else who's not me...
Continuum and Enthought both have a whole list of packages beyond glibc that are safe enough to link to, including a bunch of ones that would be big pains to statically link everywhere (libX11, etc.). That's the useful piece of information that goes beyond just CentOS5 + RH devtools + static linking -- can't tell of the "Holy Build Box" has anything like that.
-n
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Jan 8, 2016 20:08, "Robert McGibbon" <rmcgibbo@gmail.com> wrote:
Continuum and Enthought both have a whole list of packages beyond glibc that are safe enough to link to, including a bunch of ones that would be big pains to statically link everywhere (libX11, etc.). That's the useful piece of information that goes beyond just CentOS5 + RH devtools + static linking -- can't tell of the "Holy Build Box" has anything like that.
Probably-crazy Idea: One could reconstruct that list by downloading all of https://repo.continuum.io/pkgs/free/linux-64/, untarring everything, and running `ldd` on all of the binaries and .so files. Can't be that hard...
right? You'd have to be slightly careful to not count libraries that they ship themselves, and then use the resulting list to recreate a build environment that contains those libraries (using a dockerfile, I guess). But yeah, should work. Are you feeling inspired? :-) -n

On 09.01.2016 04:38, Nathaniel Smith wrote:
On Fri, Jan 8, 2016 at 7:17 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:
Doesn't building on CentOS 5 also mean using a quite old version of gcc?
Yes. IIRC CentOS 5 ships with gcc 4.4, and you can bump that up to gcc 4.8 by using the Redhat Developer Toolset release (which is gcc + special backport libraries to let it generate RHEL5/CentOS5-compatible binaries). (I might have one or both of those version numbers slightly wrong.)
I've never tested this, but I've seen claims on the anaconda mailing list of ~25% slowdowns compared to building from source or using system packages, which was attributed to building using an older gcc that doesn't optimize as well as newer versions.
I'd be very surprised if that were a 25% slowdown in general, as opposed to a 25% slowdown on some particular inner loop that happened to neatly match some new feature in a new gcc (e.g. something where the new autovectorizer kicked in). But yeah, in general this is just an inevitable trade-off when it comes to distributing binaries: you're always going to pay some penalty for achieving broad compatibility as compared to artisanally hand-tuned binaries specialized for your machine's exact OS version, processor, etc. Not much to be done, really. At some point the baseline for compatibility will switch to "compile everything on CentOS 6", and that will be better but it will still be worse than binaries that target CentOS 7, and so on and so forth.
I have over the years put in one gcc specific optimization after the other so yes using an ancient version will make many parts significantly slower. Though that is not really a problem, updating a compiler is easy even without redhats devtoolset. At least as far as numpy is concerned linux binaries should not be a very big problem. The only dependency where the version matters is glibc which has updated its interfaces we use (in a backward compatible way) many times. But here if we use a old enough baseline glibc (e.g. centos5 or ubuntu 10.04) we are fine at reasonable performance costs, basically only slower memcpy. Scipy on the other hand is a larger problem as it contains C++ code. Linux systems are now transitioning to C++11 which is binary incompatible in parts to the old standard. There a lot of testing is necessary to check if we are affected. How does Anaconda deal with C++11?

On Sat, Jan 9, 2016 at 12:12 PM, Julian Taylor < jtaylor.debian@googlemail.com> wrote:
On Fri, Jan 8, 2016 at 7:17 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:
Doesn't building on CentOS 5 also mean using a quite old version of gcc?
Yes. IIRC CentOS 5 ships with gcc 4.4, and you can bump that up to gcc 4.8 by using the Redhat Developer Toolset release (which is gcc + special backport libraries to let it generate RHEL5/CentOS5-compatible binaries). (I might have one or both of those version numbers slightly wrong.)
I've never tested this, but I've seen claims on the anaconda mailing
~25% slowdowns compared to building from source or using system
On 09.01.2016 04:38, Nathaniel Smith wrote: list of packages,
which was attributed to building using an older gcc that doesn't optimize as well as newer versions.
I'd be very surprised if that were a 25% slowdown in general, as opposed to a 25% slowdown on some particular inner loop that happened to neatly match some new feature in a new gcc (e.g. something where the new autovectorizer kicked in). But yeah, in general this is just an inevitable trade-off when it comes to distributing binaries: you're always going to pay some penalty for achieving broad compatibility as compared to artisanally hand-tuned binaries specialized for your machine's exact OS version, processor, etc. Not much to be done, really. At some point the baseline for compatibility will switch to "compile everything on CentOS 6", and that will be better but it will still be worse than binaries that target CentOS 7, and so on and so forth.
I have over the years put in one gcc specific optimization after the other so yes using an ancient version will make many parts significantly slower. Though that is not really a problem, updating a compiler is easy even without redhats devtoolset.
At least as far as numpy is concerned linux binaries should not be a very big problem. The only dependency where the version matters is glibc which has updated its interfaces we use (in a backward compatible way) many times. But here if we use a old enough baseline glibc (e.g. centos5 or ubuntu 10.04) we are fine at reasonable performance costs, basically only slower memcpy.
Scipy on the other hand is a larger problem as it contains C++ code. Linux systems are now transitioning to C++11 which is binary incompatible in parts to the old standard. There a lot of testing is necessary to check if we are affected. How does Anaconda deal with C++11?
For canopy packages, we use the RH devtoolset w/ gcc 4.8.X, and statically link the C++ stdlib. It has worked so far for the few packages requiring C++11 and gcc > 4.4 (llvm/llvmlite/dynd), but that's not a solution I am a fan of myself, as the implications are not always very clear. David

On Jan 9, 2016 04:12, "Julian Taylor" <jtaylor.debian@googlemail.com> wrote:
On 09.01.2016 04:38, Nathaniel Smith wrote:
On Fri, Jan 8, 2016 at 7:17 PM, Nathan Goldbaum <nathan12343@gmail.com>
Doesn't building on CentOS 5 also mean using a quite old version of gcc?
Yes. IIRC CentOS 5 ships with gcc 4.4, and you can bump that up to gcc 4.8 by using the Redhat Developer Toolset release (which is gcc + special backport libraries to let it generate RHEL5/CentOS5-compatible binaries). (I might have one or both of those version numbers slightly wrong.)
I've never tested this, but I've seen claims on the anaconda mailing
~25% slowdowns compared to building from source or using system
wrote: list of packages,
which was attributed to building using an older gcc that doesn't optimize as well as newer versions.
I'd be very surprised if that were a 25% slowdown in general, as opposed to a 25% slowdown on some particular inner loop that happened to neatly match some new feature in a new gcc (e.g. something where the new autovectorizer kicked in). But yeah, in general this is just an inevitable trade-off when it comes to distributing binaries: you're always going to pay some penalty for achieving broad compatibility as compared to artisanally hand-tuned binaries specialized for your machine's exact OS version, processor, etc. Not much to be done, really. At some point the baseline for compatibility will switch to "compile everything on CentOS 6", and that will be better but it will still be worse than binaries that target CentOS 7, and so on and so forth.
I have over the years put in one gcc specific optimization after the other so yes using an ancient version will make many parts significantly slower. Though that is not really a problem, updating a compiler is easy even without redhats devtoolset.
At least as far as numpy is concerned linux binaries should not be a very big problem. The only dependency where the version matters is glibc which has updated its interfaces we use (in a backward compatible way) many times. But here if we use a old enough baseline glibc (e.g. centos5 or ubuntu 10.04) we are fine at reasonable performance costs, basically only slower memcpy.
Are you saying that it's easy to use, say, gcc 5.3's C compiler to produce binaries that will run on an out-of-the-box centos 5 install? I assumed that there'd be issues with things like new symbol versions in libgcc, not just glibc, but if not then that would be great...
Scipy on the other hand is a larger problem as it contains C++ code. Linux systems are now transitioning to C++11 which is binary incompatible in parts to the old standard. There a lot of testing is necessary to check if we are affected. How does Anaconda deal with C++11?
IIUC the situation with the C++ stdlib changes in gcc 5 is that old binaries will continue to work on new systems. The only thing that breaks is that if two libraries want to pass objects of the affected types back and forth (e.g. std::string), then either they both need to be compiled with the old abi or they both need to be compiled with the new abi. (And when using a new compiler it's still possible to choose the old abi with a #define; old compilers of course only support the old abi.) See: http://developerblog.redhat.com/2015/02/05/gcc5-and-the-c11-abi/ So the answer is that most python packages don't care, because even the ones written in C++ don't generally talk C++ across package boundaries, and for the ones that do care then the people making the binary packages will have to coordinate to use the same abi. And for local builds on modern systems that link against binary packages built using the old abi, people might have to use -D_GLIBCXX_USE_CXX11_ABI=0. -n

Right. There's a small problem which is that the base linux system isn't just "CentOS 5", it's "CentOS 5 and here's the list of libraries that you're allowed to link to: ...", where that list is empirically chosen to include only stuff that really is installed on ~all linux machines and for which the ABI really has been stable in practice over multiple years and distros (so e.g. no OpenSSL).
So the key next step is for someone to figure out and write down that list. Continuum and Enthought both have versions of it that we know are good...
Does anyone know who maintains Anaconda's linux build environment?
I strongly suspect it was originally set up by Aaron Meurer. Who maintains it now that he is no longer at Continuum is a good question. We build "generic Linux binaries" for Julia and I co-maintain that environment. It's using CentOS 5 (for now, until we hit an issue that is easiest to fix by upgrading the builders to CentOS 6 - I don't think we have any real users on CentOS 5 any more so no one would notice the bump), but we avoid using the Red Hat devtoolset. We tried it initially, but had issues due to the way they attempt to statically link libstdc++ and libgfortran. The -static-libgfortran GCC flag has been broken since GCC 4.5 because libgfortran now depends on libquadmath, and it requires rather messy modification of every link line to change -lgfortran to an explicit absolute path to libgfortran.a (and possibly also libquadmath.a) to really get libraries like BLAS and LAPACK statically linked. And if you want to statically link the GCC runtime libraries into a .so, your distribution's copies of libstdc++.a, libgfortran.a, etc must have been built with -fPIC, which many are not. Some of the issues where this was discussed and worked out were: https://github.com/JuliaLang/julia/issues/8397 https://github.com/JuliaLang/julia/issues/8433 https://github.com/JuliaLang/julia/pull/8442 https://github.com/JuliaLang/julia/pull/10043 So for Julia we wound up at building our own copy of latest GCC from source on CentOS 5 buildbots, and shipping private shared-library copies of libgfortran, libstdc++, libgcc_s, and a few others. This works on pretty much any glibc-using Linux distribution as new or newer than the buildbot. It sadly doesn't work on musl distributions, ah well. Rust has been experimenting with truly static libraries/executables that statically link musl libc in addition to everything else, I'm not sure how practical that would be for numpy, scipy, etc. If you go this route of building gcc from source and depending on private copies of the shared runtime libraries, it ends up important that you pick a newer version of gcc than any of your users. The reason is for packages that require compilation on the user's machine. If you build and distribute a set of shared libraries using GCC 4.8 but then someone on Ubuntu 15.10 tries to build something else (say as an example, pyipopt which needs a library that uses both C++ and Fortran) from source using GCC 5.2, the GCC 4.8 runtime libraries that you built against will get loaded first, and they won't contain newer-ABI symbols needed by the GCC 5.2-built user library. Sometimes this can be fixed by just deleting the older bundled copies of the runtime libraries and relying on the users' system copies, but for example libgfortran typically needs to be manually installed first.

Right. There's a small problem which is that the base linux system isn't just "CentOS 5", it's "CentOS 5 and here's the list of libraries that you're allowed to link to: ...", where that list is empirically chosen to include only stuff that really is installed on ~all linux machines and for which the ABI really has been stable in practice over multiple years and distros (so e.g. no OpenSSL).
Does anyone know who maintains Anaconda's linux build environment?
I strongly suspect it was originally set up by Aaron Meurer. Who maintains it now that he is no longer at Continuum is a good question.
From looking at all of the external libraries referenced by binaries included in Anaconda and the conda repos, I am not confident that they have a totally strict policy here, or at least not ones that is enforced by tooling. The sonames I listed here <https://mail.scipy.org/pipermail/numpy-discussion/2016-January/074602.html> cover all of the external dependencies used by the latest Anaconda release, but earlier releases and other conda-installable packages from the default channel are not so strict.
-Robert

Anaconda "build environment" was setup by Ilan and me. Aaron helped to build packages while he was at Continuum but spent most of his time on the open-source conda project. It is important to understand the difference between Anaconda and conda in this respect. Anaconda is a particular dependency foundation that Continuum supports and releases -- it will have a particular set of expected libraries on each platform (we work to keep this fairly limited and on Linux currently use CentOS 5 as the base). conda is a general package manager that is open-source and that anyone can use to produce a set of consistent binaries (there can be many conda-based distributions). It solves the problem of multiple binary dependency chains generally using the concept of "features". This concept of "features" allows you to create environments with different base dependencies. What packages you install when you "conda install" depends on which channels you are pointing to and which features you have "turned on" in the environment. It's a general system that extends the notions that were started by the PyPA. -Travis On Sun, Jan 10, 2016 at 10:14 PM, Robert McGibbon <rmcgibbo@gmail.com> wrote:
Right. There's a small problem which is that the base linux system isn't just "CentOS 5", it's "CentOS 5 and here's the list of libraries that you're allowed to link to: ...", where that list is empirically chosen to include only stuff that really is installed on ~all linux machines and for which the ABI really has been stable in practice over multiple years and distros (so e.g. no OpenSSL).
Does anyone know who maintains Anaconda's linux build environment?
I strongly suspect it was originally set up by Aaron Meurer. Who maintains it now that he is no longer at Continuum is a good question.
From looking at all of the external libraries referenced by binaries included in Anaconda and the conda repos, I am not confident that they have a totally strict policy here, or at least not ones that is enforced by tooling. The sonames I listed here <https://mail.scipy.org/pipermail/numpy-discussion/2016-January/074602.html> cover all of the external dependencies used by the latest Anaconda release, but earlier releases and other conda-installable packages from the default channel are not so strict.
-Robert
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
-- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io

On Fri, Jan 8, 2016 at 7:13 PM, Nathaniel Smith <njs@pobox.com> wrote:
that this would potentially be able to let packages like numpy serve their linux users better without risking too much junk being uploaded to PyPI.
That will never fly. But like Matthew says, I think we can probably get them to accept a PEP saying "here's a new well-specified platform tag that means that this wheel works on all linux systems meet the following list of criteria: ...", and then allow that new platform tag onto PyPI.
The second step is a trick though -- how does pip know, when being run on a client, that the system meets those requirements? Do we put a bunch of code in that checks for those libs, etc??? If we get all that worked out, we still haven't made any progress toward the non-standard libs that aren't python. This is the big "scipy problem" -- fortran, BLAS, hdf, ad infinitum. I argued for years that we could build binary wheels that hold each of these, and other python packages could depend on them, but pypa never seemed to like that idea. In the end, if you did all this right, you'd have something like conda -- so why not just use conda? All that being said, if you folks can get the core scipy stack setup to pip install on OS_X, Windows, and Linux, that would be pretty nice -- so keep at it ! -CHB
-n
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

The other half of the fun is how to deal with weird binary issues with libraries like libgeos_c, libhdf5 and such. You have to get all of the compile options right for your build of those libraries to get your build of GDAL and pyhdf working right. You also have packages like gdal and netcdf4 have diamond dependencies -- not only are they built and linked against numpy binaries, some of their binary dependencies are built against numpy binaries as well. Joys! I don't envy anybody that tries to take on the packaging problem in any language. Ben Root On Mon, Jan 11, 2016 at 1:25 PM, Chris Barker <chris.barker@noaa.gov> wrote:
On Fri, Jan 8, 2016 at 7:13 PM, Nathaniel Smith <njs@pobox.com> wrote:
that this would potentially be able to let packages like numpy serve their linux users better without risking too much junk being uploaded to PyPI.
That will never fly. But like Matthew says, I think we can probably get them to accept a PEP saying "here's a new well-specified platform tag that means that this wheel works on all linux systems meet the following list of criteria: ...", and then allow that new platform tag onto PyPI.
The second step is a trick though -- how does pip know, when being run on a client, that the system meets those requirements? Do we put a bunch of code in that checks for those libs, etc???
If we get all that worked out, we still haven't made any progress toward the non-standard libs that aren't python. This is the big "scipy problem" -- fortran, BLAS, hdf, ad infinitum.
I argued for years that we could build binary wheels that hold each of these, and other python packages could depend on them, but pypa never seemed to like that idea. In the end, if you did all this right, you'd have something like conda -- so why not just use conda?
All that being said, if you folks can get the core scipy stack setup to pip install on OS_X, Windows, and Linux, that would be pretty nice -- so keep at it !
-CHB
-n
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Mon, Jan 11, 2016 at 6:25 PM, Chris Barker <chris.barker@noaa.gov> wrote:
On Fri, Jan 8, 2016 at 7:13 PM, Nathaniel Smith <njs@pobox.com> wrote:
that this would potentially be able to let packages like numpy serve their linux users better without risking too much junk being uploaded to PyPI.
That will never fly. But like Matthew says, I think we can probably get them to accept a PEP saying "here's a new well-specified platform tag that means that this wheel works on all linux systems meet the following list of criteria: ...", and then allow that new platform tag onto PyPI.
The second step is a trick though -- how does pip know, when being run on a client, that the system meets those requirements? Do we put a bunch of code in that checks for those libs, etc???
You could make that option an opt-in at first, and gradually autodetect it for the main distros.
If we get all that worked out, we still haven't made any progress toward the non-standard libs that aren't python. This is the big "scipy problem" -- fortran, BLAS, hdf, ad infinitum.
I argued for years that we could build binary wheels that hold each of these, and other python packages could depend on them, but pypa never seemed to like that idea.
I don't think that's an accurate statement. There are issues to solve around this, but I did not encounter push back, either on the ML or face to face w/ various pypa members at Pycon, etc... There may be push backs for a particular detail, but making "pip install scipy" or "pip install matplotlib" a reality on every platform is something everybody agrees o

On Mon, Jan 11, 2016 at 11:02 AM, David Cournapeau <cournape@gmail.com> wrote:
If we get all that worked out, we still haven't made any progress toward
the non-standard libs that aren't python. This is the big "scipy problem" -- fortran, BLAS, hdf, ad infinitum.
I argued for years that we could build binary wheels that hold each of these, and other python packages could depend on them, but pypa never seemed to like that idea.
I don't think that's an accurate statement. There are issues to solve around this, but I did not encounter push back, either on the ML or face to face w/ various pypa members at Pycon, etc... There may be push backs for a particular detail, but making "pip install scipy" or "pip install matplotlib" a reality on every platform is something everybody agrees o
sure, everyone wants that. But when it gets deeper, they don't want to have a bunc hof pip-installable binary wheels that are simply clibs re-packaged as a dependency. And, then you have the problelm of those being "binary wheel" dependencies, rather than "package" dependencies. e.g.: this particular build of pillow depends on the libpng and libjpeg wheels, but the Pillow package, in general, does not. And you would have different dependencies on Windows, and OS-X, and Linux. pip/wheel simply was not designed for that, and I didn't get any warm and fuzzy feelings from dist-utils sig that the it ever would. And again, then you are re-designing conda. So the only way to do all that is to statically link all the dependent libs in with each binary wheel (or ship a dll). Somehow it always bothered me to ship the same lib with multiple packages -- isn't that why shared libs exist? - In practice, maybe it doesn't matter, memory is cheap. But I also got sick of building the damn things! I like that Anaconda comes with libs someone else has built, and I can use them with my packages, too. And when it comes to ugly stuff like HDF and GDAL, I'm really happy someone else has built them! Anyway -- carry on -- being able to pip install the scipy stack would be very nice. -CHB
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Jan 11, 2016 3:54 PM, "Chris Barker" <chris.barker@noaa.gov> wrote:
On Mon, Jan 11, 2016 at 11:02 AM, David Cournapeau <cournape@gmail.com>
If we get all that worked out, we still haven't made any progress
toward the non-standard libs that aren't python. This is the big "scipy
I argued for years that we could build binary wheels that hold each of
I don't think that's an accurate statement. There are issues to solve
around this, but I did not encounter push back, either on the ML or face to face w/ various pypa members at Pycon, etc... There may be push backs for a
wrote: problem" -- fortran, BLAS, hdf, ad infinitum. these, and other python packages could depend on them, but pypa never seemed to like that idea. particular detail, but making "pip install scipy" or "pip install matplotlib" a reality on every platform is something everybody agrees o
sure, everyone wants that. But when it gets deeper, they don't want to
have a bunc hof pip-installable binary wheels that are simply clibs re-packaged as a dependency. And, then you have the problelm of those being "binary wheel" dependencies, rather than "package" dependencies.
e.g.:
this particular build of pillow depends on the libpng and libjpeg wheels,
but the Pillow package, in general, does not. And you would have different dependencies on Windows, and OS-X, and Linux.
pip/wheel simply was not designed for that, and I didn't get any warm and
fuzzy feelings from dist-utils sig that the it ever would. And again, then you are re-designing conda. I agree that talking about such things on distutils-sig tends to elicit a certain amount of puzzled incomprehension, but I don't think it matters -- wheels already have everything you need to support this. E.g. wheels for different platforms can trivially have different dependencies. (They even go to some lengths to make sure this is possible for pure python packages where the same wheel can be used on multiple platforms.) When distributing a library-in-a-wheel then you need a little bit of hackishness to make sure the runtime loader can find the library, which conda would otherwise handle for you, but AFAICT it's like 10 lines of code or something. And in any case we have lots of users who don't use conda and are thus doomed to support both ecosystems regardless, so we might as well make the best of it :-). -n

And in any case we have lots of users who don't use conda and are thus doomed to support both ecosystems regardless, so we might as well make the best of it :-).
Yes, this is the key. Conda is a great tool for a lot of users / use cases, but it's not for everyone. Anyways, I think I've made a pretty good start on the tooling for a wheel ABI tag for a LSB-style base system that represents a common set of shared libraries and symbols versions provided by "many" linuxes (previously discussed by Nathaniel here: https://code.activestate.com/lists/python-distutils-sig/26272/) -Robert On Mon, Jan 11, 2016 at 5:29 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Jan 11, 2016 3:54 PM, "Chris Barker" <chris.barker@noaa.gov> wrote:
On Mon, Jan 11, 2016 at 11:02 AM, David Cournapeau <cournape@gmail.com>
If we get all that worked out, we still haven't made any progress
toward the non-standard libs that aren't python. This is the big "scipy
I argued for years that we could build binary wheels that hold each of
I don't think that's an accurate statement. There are issues to solve
around this, but I did not encounter push back, either on the ML or face to face w/ various pypa members at Pycon, etc... There may be push backs for a
wrote: problem" -- fortran, BLAS, hdf, ad infinitum. these, and other python packages could depend on them, but pypa never seemed to like that idea. particular detail, but making "pip install scipy" or "pip install matplotlib" a reality on every platform is something everybody agrees o
sure, everyone wants that. But when it gets deeper, they don't want to
have a bunc hof pip-installable binary wheels that are simply clibs re-packaged as a dependency. And, then you have the problelm of those being "binary wheel" dependencies, rather than "package" dependencies.
e.g.:
this particular build of pillow depends on the libpng and libjpeg
wheels, but the Pillow package, in general, does not. And you would have different dependencies on Windows, and OS-X, and Linux.
pip/wheel simply was not designed for that, and I didn't get any warm
and fuzzy feelings from dist-utils sig that the it ever would. And again, then you are re-designing conda.
I agree that talking about such things on distutils-sig tends to elicit a certain amount of puzzled incomprehension, but I don't think it matters -- wheels already have everything you need to support this. E.g. wheels for different platforms can trivially have different dependencies. (They even go to some lengths to make sure this is possible for pure python packages where the same wheel can be used on multiple platforms.) When distributing a library-in-a-wheel then you need a little bit of hackishness to make sure the runtime loader can find the library, which conda would otherwise handle for you, but AFAICT it's like 10 lines of code or something.
And in any case we have lots of users who don't use conda and are thus doomed to support both ecosystems regardless, so we might as well make the best of it :-).
-n
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Mon, Jan 11, 2016 at 5:29 PM, Nathaniel Smith <njs@pobox.com> wrote:
I agree that talking about such things on distutils-sig tends to elicit a certain amount of puzzled incomprehension, but I don't think it matters -- wheels already have everything you need to support this.
well, that's what I figured -- and I started down that path a while back and got no support whatsoever (OK, some from Matthew Brett -- thanks!). But I know myself well enough to know I wasn't going to get the critical mass required to make it useful by myself, so I've moved on to an ecosystem that is doing most of the work already. Also, you have the problem that there is one PyPi -- so where do you put your nifty wheels that depend on other binary wheels? you may need to fork every package you want to build :-( But sure, let's get the core scipy stack pip-installable as much as possible -- and maybe some folks with more energy than me can move this all forward. Sorry to be a downer -- keep up the good work and energy! -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 13 January 2016 at 22:23, Chris Barker <chris.barker@noaa.gov> wrote:
On Mon, Jan 11, 2016 at 5:29 PM, Nathaniel Smith <njs@pobox.com> wrote:
I agree that talking about such things on distutils-sig tends to elicit a certain amount of puzzled incomprehension, but I don't think it matters -- wheels already have everything you need to support this.
well, that's what I figured -- and I started down that path a while back and got no support whatsoever (OK, some from Matthew Brett -- thanks!). But I know myself well enough to know I wasn't going to get the critical mass required to make it useful by myself, so I've moved on to an ecosystem that is doing most of the work already.
I think the problem with discussing these things on distutils-sig is that the discussions are often very theoretical. In reality PyPA are waiting for people to adopt the infrastructure that they have created so far by uploading sets of binary wheels. Once that process really kicks off then as issues emerge there will be real specific problems to solve and a more concrete discussion of what changes are needed to wheel/pip/PyPI can emerge. The main exceptions to this are wheels for Linux and non-setuptools build dependencies for sdists so it's definitely good to pursue those problems and try to complete the basic infrastructure.
Also, you have the problem that there is one PyPi -- so where do you put your nifty wheels that depend on other binary wheels? you may need to fork every package you want to build :-(
Is this a real problem or a theoretical one? Do you know of some situation where this wheel to wheel dependency will occur that won't just be solved in some other way? -- Oscar

Also, you have the problem that there is one PyPi -- so where do you put your nifty wheels that depend on other binary wheels? you may need to fork every package you want to build :-(
Is this a real problem or a theoretical one? Do you know of some situation where this wheel to wheel dependency will occur that won't just be solved in some other way?
It's real -- at least during the whole bootstrapping period. Say I build a nifty hdf5 binary wheel -- I could probably just grab the name "libhdf5" on PyPI. So far so good. But the goal here would be to have netcdf and pytables and GDAL and who knows what else then link against that wheel. But those projects are all supported be different people, that all have their own distribution strategy. So where do I put binary wheels of each of those projects that depend on my libhdf5 wheel? _maybe_ I would put it out there, and it would all grow organically, but neither the culture nor the tooling support that approach now, so I'm not very confident you could gather adoption. Even beyond the adoption period, sometimes you need to do stuff in more than one way -- look at the proliferation of channels on Anaconda.org. This is more likely to work if there is a good infrastructure for third parties to build and distribute the binaries -- e.g. Anaconda.org. Or the Linux dist to model -- for the most part, the people developing a given library are not packaging it. -CHB

On Thu, Jan 14, 2016 at 9:14 AM, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
Also, you have the problem that there is one PyPi -- so where do you put your nifty wheels that depend on other binary wheels? you may need to fork every package you want to build :-(
Is this a real problem or a theoretical one? Do you know of some situation where this wheel to wheel dependency will occur that won't just be solved in some other way?
It's real -- at least during the whole bootstrapping period. Say I build a nifty hdf5 binary wheel -- I could probably just grab the name "libhdf5" on PyPI. So far so good. But the goal here would be to have netcdf and pytables and GDAL and who knows what else then link against that wheel. But those projects are all supported be different people, that all have their own distribution strategy. So where do I put binary wheels of each of those projects that depend on my libhdf5 wheel? _maybe_ I would put it out there, and it would all grow organically, but neither the culture nor the tooling support that approach now, so I'm not very confident you could gather adoption.
I don't think there's a very large amount of cultural work - but some to be sure. We already have the following on OSX: pip install numpy scipy matplotlib scikit-learn scikit-image pandas h5py where all the wheels come from pypi. So, I don't think this is really outside our range, even if the problem is a little more difficult for Linux.
Even beyond the adoption period, sometimes you need to do stuff in more than one way -- look at the proliferation of channels on Anaconda.org.
This is more likely to work if there is a good infrastructure for third parties to build and distribute the binaries -- e.g. Anaconda.org.
I thought that Anaconda.org allows pypi channels as well? Matthew

On Thu, Jan 14, 2016 at 10:58 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
but neither the culture nor the tooling support that approach now, so I'm not very confident you could gather adoption.
I don't think there's a very large amount of cultural work - but some to be sure.
We already have the following on OSX:
pip install numpy scipy matplotlib scikit-learn scikit-image pandas h5py
where all the wheels come from pypi. So, I don't think this is really outside our range, even if the problem is a little more difficult for Linux.
I'm actually less concerned about the Linux issue, I think that can be solved reasonably with "manylinux" -- which would put us in a very similar position to OS-X , and pretty similar to Windows -- i.e. a common platform with the basic libs (libc, etc), but not a whole lot else. I'm concerned about all the other libs various packages depend on. It's not too bad if you want the core scipy stack -- a decent BLAS being the real challenge there, but there is enough coordination between numpy and scipy that at least efforts to solve that will be shared. But we're still stuck with delivering dependencies on libs along with each package -- usually statically linked.
I thought that Anaconda.org allows pypi channels as well?
I think you can host pip-compatible wheels, etc on anaconda.org -- though that may be deprecated... but anyway, I thought the goal here was a simple "pip install", which will point only to PyPi -- I don't think there is a way, ala conda, to add "channels" that will then get automatically searched by pip. But I may be wrong there. So here is the problem I want to solve:
pip install numpy scipy matplotlib scikit-learn scikit-image pandas h5py
last I checked, each of those is self-contained, except for python-level dependencies, most notably on numpy. So it doesn't' help me solve my problem. For instance, I have my own C/C++ code that I'm wrapping that requires netcdf (https://github.com/NOAA-ORR-ERD/PyGnome), and another that requires image libs like libpng, libjpeg, etc.( https://github.com/NOAA-ORR-ERD/py_gd) netcdf is not too ugly itself, but it depends on hdf5, libcurl, zlib (others?). So I have all these libs I need. As it happens, matplotlib probably has the image libs I need, and h5py has hdf5 (and libcurl? and zlib?). But even then, as far as I can tell, I need to build and provide these libs myself for my code. Which is a pain in the @%$ and then I'm shipping (and running) multiple copies of the same libs all over the place -- will there be compatibility issues? apparently not, but it's still wastes the advantage of shared libs, and makes things a pain for all of us. With conda, on the other hand, I get netcdf libs, hdf5 libs, libpng, libjpeg, ibtiff, and I can build my stuff against those and depend on them -- saves me a lot of pain, and my users get a better system. Oh, and add on the GIS stuff: GDAL, etc. (seriously a pain to build), and I get a lot of value. And, many of these libs (GDAL, netcdf) come with nifty command line utilities -- I get those too. So, pip+wheel _may_ be able to support all that, but AFAICT, no one is doing it. And it's really not going to support shipping cmake, and perl, and who knows what else I might need in my toolchain that's not python or "just" a library. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Jan 15, 2016 10:23 AM, "Chris Barker" <chris.barker@noaa.gov> wrote:
So here is the problem I want to solve:
pip install numpy scipy matplotlib scikit-learn scikit-image pandas h5py
last I checked, each of those is self-contained, except for python-level dependencies, most notably on numpy. So it doesn't' help me solve my
[...] problem. For instance, I have my own C/C++ code that I'm wrapping that requires netcdf (https://github.com/NOAA-ORR-ERD/PyGnome), and another that requires image libs like libpng, libjpeg, etc.( https://github.com/NOAA-ORR-ERD/py_gd)
netcdf is not too ugly itself, but it depends on hdf5, libcurl, zlib
(others?). So I have all these libs I need. As it happens, matplotlib probably has the image libs I need, and h5py has hdf5 (and libcurl? and zlib?). But even then, as far as I can tell, I need to build and provide these libs myself for my code. Which is a pain in the @%$ and then I'm shipping (and running) multiple copies of the same libs all over the place -- will there be compatibility issues? apparently not, but it's still wastes the advantage of shared libs, and makes things a pain for all of us.
With conda, on the other hand, I get netcdf libs, hdf5 libs, libpng,
libjpeg, ibtiff, and I can build my stuff against those and depend on them -- saves me a lot of pain, and my users get a better system. Sure. Someone's already packaged those for conda, and no one has packaged them for pypi, so it makes sense that conda is more convenient for you. If someone does the work of packaging them for pypi, then that difference goes away. I'm not planning to do that work myself :-). My focus in these discussions around pip/pypi is selfishly focused on the needs of numpy. pip/pypi is clearly the weakest link in the space of packaging and distribution systems that our users care about, so improvements there raise the minimum baseline we can assume. But if/when we sort out the hard problems blocking numpy wheels (Linux issues, windows issues, etc.) then I suspect that we'll start seeing people packaging up those dependencies that you're worrying about and putting them on pypi, just because there won't be any big road blocks anymore to doing so. -n

On Fri, Jan 15, 2016 at 1:21 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Jan 15, 2016 10:23 AM, "Chris Barker" <chris.barker@noaa.gov> wrote:
So here is the problem I want to solve:
pip install numpy scipy matplotlib scikit-learn scikit-image pandas
[...] h5py
last I checked, each of those is self-contained, except for python-level
dependencies, most notably on numpy. So it doesn't' help me solve my problem. For instance, I have my own C/C++ code that I'm wrapping that requires netcdf (https://github.com/NOAA-ORR-ERD/PyGnome), and another that requires image libs like libpng, libjpeg, etc.( https://github.com/NOAA-ORR-ERD/py_gd)
netcdf is not too ugly itself, but it depends on hdf5, libcurl, zlib
(others?). So I have all these libs I need. As it happens, matplotlib probably has the image libs I need, and h5py has hdf5 (and libcurl? and zlib?). But even then, as far as I can tell, I need to build and provide these libs myself for my code. Which is a pain in the @%$ and then I'm shipping (and running) multiple copies of the same libs all over the place -- will there be compatibility issues? apparently not, but it's still wastes the advantage of shared libs, and makes things a pain for all of us.
With conda, on the other hand, I get netcdf libs, hdf5 libs, libpng,
libjpeg, ibtiff, and I can build my stuff against those and depend on them -- saves me a lot of pain, and my users get a better system.
Sure. Someone's already packaged those for conda, and no one has packaged them for pypi, so it makes sense that conda is more convenient for you. If someone does the work of packaging them for pypi, then that difference goes away. I'm not planning to do that work myself :-). My focus in these discussions around pip/pypi is selfishly focused on the needs of numpy. pip/pypi is clearly the weakest link in the space of packaging and distribution systems that our users care about, so improvements there raise the minimum baseline we can assume. But if/when we sort out the hard problems blocking numpy wheels (Linux issues, windows issues, etc.) then I suspect that we'll start seeing people packaging up those dependencies that you're worrying about and putting them on pypi, just because there won't be any big road blocks anymore to doing so.
I still submit that this is not the best use of time. Conda *already* solves the problem. My sadness is that people keep working to create an ultimately inferior solution rather than just help make a better solution more accessible. People mistakenly believe that wheels and conda packages are equivalent. They are not. If they were we would not have created conda. We could not do what was necessary with wheels and contorting wheels to become conda packages was and still is a lot more work. Now, obviously, it's just code and you can certainly spend effort and time to migrate wheels so that they functionally equivalently to conda packages --- but what is the point, really? Why don't we work together to make the open-source conda project and open-source conda packages more universally accessible? The other very real downside is that these efforts to promote numpy as wheels further encourages people to not use the better solution that already exists in conda. I have to deal with people that *think* pip will solve all their problems all the time. It causes a lot of difficulty when they end up with work-around after work-around that is actually all solved already with conda. It's a weird situation to be in. I'm really baffled by the resistance of this community to just help make conda *the* solution for the scientific python community. I think it would be better to spend time: 1) helping smooth out the pip/conda divide. There are many ways to do this that have my full support: * making sure pip install conda works well * creating "shadow packages" in conda for things that pip has installed * making it possible for pip to install conda packages directly (and ignore the extra features that conda makes possible that pip does not support). A pull-request to pip that did that would be far more useful than trying to cram things into the wheel concept. 2) creating a community conda packages to alleviate whatever concerns might exist about the "control" of the packages that people can install with conda. * Continuum has volunteered resources to Numfocus so that it can be the governing body of what goes into a "pycommunity" channel for conda that will be as easy to get as a "conda install pycommunity" I have not yet heard any really valid reason why this community should not just adopt conda packages and encourage others to do the same. The only thing I have heard are "chicken-and-egg" stories that come down to "we want people to be able to use pip." So, good, then let's make it so that pip can install conda packages and that conda packages with certain restrictions can be hosted on pypi or anywhere else that you have an "index". At least if there were valid reasons they could be addressed. But, this head-in-the-sand attitude towards a viable technology that is freely available is really puzzling to me. There are millions of downloads of Anaconda and many millions of downloads of conda packages each year. That is just with one company doing it. There could be many millions more with other companies and organizations hosting conda packages and indexes. The conda user-base is already very large. A great benefit to the Python ecosystem would be to allow pip users and conda users to share each other's work --- rather than to spend time reproducing work that is already done and freely available. -Travis
-n
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
-- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io

On Fri, Jan 15, 2016 at 11:56 AM, Travis Oliphant <travis@continuum.io> wrote:
I still submit that this is not the best use of time. Conda *already*
solves the problem. My sadness is that people keep working to create an ultimately inferior solution rather than just help make a better solution more accessible. People mistakenly believe that wheels and conda packages are equivalent. They are not. If they were we would not have created conda. We could not do what was necessary with wheels and contorting wheels to become conda packages was and still is a lot more work. Now, obviously, it's just code and you can certainly spend effort and time to migrate wheels so that they functionally equivalently to conda packages --- but what is the point, really?
Why don't we work together to make the open-source conda project and
open-source conda packages more universally accessible? The factors that motivate my interest in making wheels for Linux (i.e. the proposed manylinux tag) work on PyPI are - All (new) Python installations come with pip. As a package author writing documentation, I count on users having pip installed, but I can't count on conda. - I would like to see Linux have feature parity with OS X and Windows with respect to pip and PyPI. - I want the PyPA tools like pip to be as good as possible. - I'm confident that the manylinux proposal will work, and it's very straightforward. -Robert

On Fri, Jan 15, 2016 at 11:56 AM, Travis Oliphant <travis@continuum.io> wrote:
[...]
I still submit that this is not the best use of time. Conda *already* solves the problem. My sadness is that people keep working to create an ultimately inferior solution rather than just help make a better solution more accessible. People mistakenly believe that wheels and conda packages are equivalent. They are not. If they were we would not have created conda. We could not do what was necessary with wheels and contorting wheels to become conda packages was and still is a lot more work. Now, obviously, it's just code and you can certainly spend effort and time to migrate wheels so that they functionally equivalently to conda packages --- but what is the point, really?
Sure, conda definitely solves problems that wheels don't. And I recommend Anaconda to people all the time :-) But let me comment specifically on the topic of numpy-and-pip. (I don't want to be all "this is off-topic!" because there isn't really a better list of general scientific-python-ecosystem discussion, but the part of the conda-and-pip discussion that actually impacts on numpy development, or where the numpy maintainers can affect anything, is very small, and this is numpy-discussion, so...) Last month, numpy had ~740,000 downloads from PyPI, and there are probably hundreds of third-party projects that distribute via PyPI and depend on numpy. So concretely our options as a project are: 1) drop support for pip/PyPI and abandon those users 2) continue to support those users fairly poorly, and at substantial ongoing cost 3) spend some resources now to make pip/pypi work better, so we can support them better and at lower ongoing cost Option 1 would require overwhelming consensus of the community, which for better or worse is presumably not going to happen while substantial portions of that community are still using pip/PyPI. If folks interested in pushing things forward can convince them to switch to conda instead then the calculation changes, but that's just not the kind of thing that numpy-the-project can do or not do. So between the possible options, I've been spending some time trying to drag pip/PyPI into working better for us because I like (3) better than (2). It's not a referendum on anything else. I assume that others have similar motives, though I won't try speaking for them. I think beyond that there are also (currently) some unique benefits to supporting pip/PyPI, like the social/political benefits of not forking away from the broader Python community, and the fact that pip is currently the only functioning way we have of distributing numpy prereleases to users for testing. (The fine-grained numpy ABI tracking in conda is neat, but for this particular use case it's actually a bit of a problem :-).) But it doesn't much matter either way, because we can't/won't just abandon all those users regardless. -n -- Nathaniel J. Smith -- http://vorpus.org

Last month, numpy had ~740,000 downloads from PyPI,
Hm, given that Windows and Linux wheels have not been available, then that's mostly source installs anyway. Or failed installs -- no shortage of folks trying to pip install numpy on Windows and then having questions about why it doesn't work. Unfortunately, there is no way to know if pip downloads are successful, or if people pip install Numpy, then find out they need some other non-pip-installable packages, and go find another system.
and there are probably hundreds of third-party projects that distribute via PyPI and depend on numpy.
I'm not so sure -- see above--as pip install has not been reliable for Numpy for ages, I doubt it. Not that they aren't there, but I doubt it's the primary distribution mechanism. There's been an explosion in the use of conda, and there have been multiple other options for ages: Canopy, python(x,y), Gohlke's builds, etc. So at this point, I think the only people using pip are folks that are set up to build -- mostly Linux. (though Mathew's efforts with the Mac wheels may have created a different story on the Mac).
So concretely our options as a project are: 1) drop support for pip/PyPI and abandon those users
There is no one to abandon -- except the Mac users -- we haven't supported them yet.
2) continue to support those users fairly poorly, and at substantial ongoing cost
I'm curious what the cost is for this poor support -- throw the source up on PyPi, and we're done. The cost comes in when trying to build binaries...
Option 1 would require overwhelming consensus of the community, which for better or worse is presumably not going to happen while substantial portions of that community are still using pip/PyPI.
Are they? Which community are we talking about? The community I'd like to target are web developers that aren't doing what they think of as "scientific" applications, but could use a little of the SciPy stack. These folks are committed to pip, and are very reluctant to introduce a difficult dependency. Binary wheels would help these folks, but that is not a community that exists yet ( or it's small, anyway) All that being said, I'd be happy to see binary wheels for the core SciPy stack on PyPi. It would be nice for people to be able to do a bit with Numpy or pandas, it MPL, without having to jump ship to a whole new way of doing things. But we should be realistic about how far it can go.
If folks interested in pushing things forward can convince them to switch to conda instead then the calculation changes, but that's just not the kind of thing that numpy-the-project can do or not do.
We can't convince anybody, but we can decide where to expend our efforts. -CHB

On Tue, Jan 19, 2016 at 5:57 PM, Chris Barker - NOAA Federal < chris.barker@noaa.gov> wrote:
2) continue to support those users fairly poorly, and at substantial ongoing cost
I'm curious what the cost is for this poor support -- throw the source up on PyPi, and we're done. The cost comes in when trying to build binaries...
I'm sure Nathaniel means the cost to users of failed installs and of numpy losing users because of that, not the cost of building binaries.
Option 1 would require overwhelming consensus of the community, which
for better or worse is presumably not going to happen while substantial portions of that community are still using pip/PyPI.
Are they? Which community are we talking about? The community I'd like to target are web developers that aren't doing what they think of as "scientific" applications, but could use a little of the SciPy stack. These folks are committed to pip, and are very reluctant to introduce a difficult dependency. Binary wheels would help these folks, but that is not a community that exists yet ( or it's small, anyway)
All that being said, I'd be happy to see binary wheels for the core SciPy stack on PyPi. It would be nice for people to be able to do a bit with Numpy or pandas, it MPL, without having to jump ship to a whole new way of doing things.
This is indeed exactly why we need binary wheels. Efforts to provide those will not change our strong recommendation to our users that they're better off using a scientific Python distribution. Ralf

Hi all, Just as a heads up: Nathaniel and I wrote a draft PEP on binary linux wheels that is now being discussed on distutils-sig, so you can check that out and participate in the conversation if you're interested. - PEP on python.org: https://www.python.org/dev/peps/pep-0513/ - PEP on github with some typos fixed: https://github.com/manylinux/manylinux/blob/master/pep-513.rst - Email archive: https://mail.python.org/pipermail/distutils-sig/2016-January/027997.html -Robert On Tue, Jan 19, 2016 at 10:05 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Tue, Jan 19, 2016 at 5:57 PM, Chris Barker - NOAA Federal < chris.barker@noaa.gov> wrote:
2) continue to support those users fairly poorly, and at substantial ongoing cost
I'm curious what the cost is for this poor support -- throw the source up on PyPi, and we're done. The cost comes in when trying to build binaries...
I'm sure Nathaniel means the cost to users of failed installs and of numpy losing users because of that, not the cost of building binaries.
Option 1 would require overwhelming consensus of the community, which
for better or worse is presumably not going to happen while substantial portions of that community are still using pip/PyPI.
Are they? Which community are we talking about? The community I'd like to target are web developers that aren't doing what they think of as "scientific" applications, but could use a little of the SciPy stack. These folks are committed to pip, and are very reluctant to introduce a difficult dependency. Binary wheels would help these folks, but that is not a community that exists yet ( or it's small, anyway)
All that being said, I'd be happy to see binary wheels for the core SciPy stack on PyPi. It would be nice for people to be able to do a bit with Numpy or pandas, it MPL, without having to jump ship to a whole new way of doing things.
This is indeed exactly why we need binary wheels. Efforts to provide those will not change our strong recommendation to our users that they're better off using a scientific Python distribution.
Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Fri, Jan 15, 2016 at 11:21 AM, Nathaniel Smith <njs@pobox.com> wrote:
Sure. Someone's already packaged those for conda, and no one has packaged them for pypi, so it makes sense that conda is more convenient for you. If someone does the work of packaging them for pypi, then that difference goes away.
This is what I meant by "cultural" issues :-) but pypi has been an option for Windows and OS-X for ages and those platforms are the bigger problem anyway -- and no one has done it for those platforms -- Linux is not the issue here. I really did try to get that effort started a while back -- I got zero support, nothing , nada. Which doesn't mean I couldn't have gone ahead and done more myself, but I only have so much time, and I wanted to work on my actual problems, and it it hadn't gained any support, it would a been a big waste. MAybe the world is ready for it now... But besides the cultural/community/critical mass issues, pip+whell isn't very well set up to support these use cases anyway -- so we have a 50% solutikon now, and if we did this nifty binary-wheels-of-libs thing we'd have a 90% solution -- and still struggle eith teh other 10%, until we re-invented conda. Anyway -- not trying to be a drag here -- just telling my story. As for the many linux idea -- that's a great one -- hope we can get that working. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Fri, Jan 15, 2016 at 12:09 PM, Chris Barker <chris.barker@noaa.gov> wrote:
On Fri, Jan 15, 2016 at 11:21 AM, Nathaniel Smith <njs@pobox.com> wrote:
Sure. Someone's already packaged those for conda, and no one has packaged them for pypi, so it makes sense that conda is more convenient for you. If someone does the work of packaging them for pypi, then that difference goes away.
This is what I meant by "cultural" issues :-)
but pypi has been an option for Windows and OS-X for ages and those platforms are the bigger problem anyway -- and no one has done it for those platforms
I think what's going on here is that Windows *hasn't* been an option for numpy/scipy due to the toolchain issues, and on OS-X we've just been using the platform BLAS (Accelerate), so the core packages haven't had any motivation to sort out the whole library dependency issue, and no-one else is motivated enough to do it. My prediction is that after the core packages sort it out in order to solve their own problems then we might see others picking it up. -n -- Nathaniel J. Smith -- http://vorpus.org

Travis - I will preface the following by pointing out how valuable miniconda and anaconda has been for our workplace because we were running into issues with ensuring that everyone in our mixed platform office had access to all the same tools, particularly GDAL, NetCDF4 and such. For the longest time, we were all stuck on an ancient "Corporate Python" that our IT staff managed to put together, but never had the time to update. So, I do absolutely love conda for the problems that it solved for us. That being said... I take exception to your assertion that anaconda is *the* solution to the packaging problem. I still have a number of issues, particularly with the interactions of GDAL, shapely, and Basemap (they all seek out libgeos_c differently), and I have to use my own build of GDAL to enable many of the features that we use (the vanilla GDAL put out by Continuum just has the default options, and is quite limited). If I don't set up my environment *just right* one of those packages will fail to import in some way due to being unable to find their particular version of libgeos_c. I haven't figure it out exactly why this happens, but it is very easy to break such an environment this way after an update. But that problem is a solvable problem within the framework of conda, so I am not too concerned about that. The bigger problem is Apache. In particular, mod_wsgi. About a year ago, one of our developers was happily developing a webtool that utilized numpy, NetCDF4 and libxml via conda environments. All of the testing was done in flask and everything was peachy. We figured that final deployment out to the Apache server would be a cinch, right? Wrong. Because our mod_wsgi was built against the system's python because it came in through RPMs, it was completely incompatible with the compiled numpy because of the differences in the python compile options. In a clutch, we had our IT staff manually build mod_wsgi against anaconda's python, but they weren't too happy about that, due to mod_wsgi no longer getting updated via yum. If anaconda was the end-all, be-all solution, then it should just be a simple matter to do "conda install mod_wsgi". But the design of conda is that it is intended to be a user-space package-manager. mod_wsgi is installed via root/apache user, which is siloed off from the user. I would have to (in theory) go and install conda for the apache user and likely have to install a conda "apache" package and mod_wsgi package. I seriously doubt this would be an acceptable solution for many IT administrators who would rather depend upon the upstream distributions who are going to be very quick about getting updates out the door and are updated automatically through yum or apt. So, again, I love conda for what it can do when it works well. I only take exception to the notion that it can address *all* problems, because there are some problems that it just simply isn't properly situated for. Cheers! Ben Root On Fri, Jan 15, 2016 at 3:20 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Fri, Jan 15, 2016 at 11:21 AM, Nathaniel Smith <njs@pobox.com> wrote:
Sure. Someone's already packaged those for conda, and no one has
them for pypi, so it makes sense that conda is more convenient for you. If someone does the work of packaging them for pypi, then that difference goes away.
This is what I meant by "cultural" issues :-)
but pypi has been an option for Windows and OS-X for ages and those platforms are the bigger problem anyway -- and no one has done it for
On Fri, Jan 15, 2016 at 12:09 PM, Chris Barker <chris.barker@noaa.gov> wrote: packaged those
platforms
I think what's going on here is that Windows *hasn't* been an option for numpy/scipy due to the toolchain issues, and on OS-X we've just been using the platform BLAS (Accelerate), so the core packages haven't had any motivation to sort out the whole library dependency issue, and no-one else is motivated enough to do it. My prediction is that after the core packages sort it out in order to solve their own problems then we might see others picking it up.
-n
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On 01/15/2016 04:08 PM, Benjamin Root wrote:
So, again, I love conda for what it can do when it works well. I only take exception to the notion that it can address *all* problems, because there are some problems that it just simply isn't properly situated for.
Actually, I would say you didn't mention any ... ;) The issue is not that it "isn't properly situated for" (whatever that means) the problems you describe, but that -- in the case you mention, for example -- no one has conda-packaged those solutions yet. FWIW, our sysadmins and I use conda for django / apache / mod_wsgi sites and we are very happy with it. IMO, compiling mod_wsgi in the conda environment and keeping it up is trivial compared to the awkwardnesses introduced by using pip/virtualenv in those cases. We also use conda for sites with nginx and the conda-packaged uwsgi, which works great and even permits the use of a separate env (with, if necessary, different versions of django, etc.) for each application. No need to set up an entire VM for each app! *My* sysadmins love conda -- as soon as they saw how much better than pip/virtualenv it was, they have never looked back. IMO, conda is by *far* the best packaging solution the python community has ever seen (and I have been using python for more than 20 years). I too have been stunned by some of the resistance to conda that one sometimes sees in the python packaging world. I've had a systems package maintainer tell me "it solves a different problem [than pip]" ... hmmm ... I would say it solves the same problem *and more*, *better*. I attribute some of the conda-ignoring to "NIH" and, to some extent, possibly defensiveness (I would be defensive too if I had been working on pip as long as they had when conda came along ;). Cheers, Steve Waterbury NASA/GSFC

Hi, On Fri, Jan 15, 2016 at 1:56 PM, Steve Waterbury <waterbug@pangalactic.us> wrote:
On 01/15/2016 04:08 PM, Benjamin Root wrote:
So, again, I love conda for what it can do when it works well. I only take exception to the notion that it can address *all* problems, because there are some problems that it just simply isn't properly situated for.
Actually, I would say you didn't mention any ... ;) The issue is not that it "isn't properly situated for" (whatever that means) the problems you describe, but that -- in the case you mention, for example -- no one has conda-packaged those solutions yet.
FWIW, our sysadmins and I use conda for django / apache / mod_wsgi sites and we are very happy with it. IMO, compiling mod_wsgi in the conda environment and keeping it up is trivial compared to the awkwardnesses introduced by using pip/virtualenv in those cases.
We also use conda for sites with nginx and the conda-packaged uwsgi, which works great and even permits the use of a separate env (with, if necessary, different versions of django, etc.) for each application. No need to set up an entire VM for each app! *My* sysadmins love conda -- as soon as they saw how much better than pip/virtualenv it was, they have never looked back.
IMO, conda is by *far* the best packaging solution the python community has ever seen (and I have been using python for more than 20 years).
Yes, I think everyone would agree that, until recently, Python packaging was in a mess.
I too have been stunned by some of the resistance to conda that one sometimes sees in the python packaging world.
You can correct me if you see evidence to the contrary, but I think all of the argument is that it is desirable that pip should work as well.
I've had a systems package maintainer tell me "it solves a different problem [than pip]" ... hmmm ... I would say it solves the same problem *and more*, *better*.
I know what you mean, but I suppose the person you were talking to may have been thinking that many of us already have Python distributions that we are using, and for those, we want to use pip.
I attribute some of the conda-ignoring to "NIH" and, to some extent, possibly defensiveness (I would be defensive too if I had been working on pip as long as they had when conda came along ;).
I must say, I don't personally recognize those reasons. For example, I hadn't worked on pip at all before conda came along. Best, Matthew

On 01/15/2016 05:07 PM, Matthew Brett wrote:
I attribute some of the conda-ignoring to "NIH" and, to some extent, possibly defensiveness (I would be defensive too if I had been working on pip as long as they had when conda came along ;).
I must say, I don't personally recognize those reasons. For example, I hadn't worked on pip at all before conda came along.
By "working on pip", I was referring to *developers* of pip, not those who *use* pip for packaging things. Are you contributing to the development of pip, or merely using it for creating packages? Steve

On Fri, Jan 15, 2016 at 2:15 PM, Steve Waterbury <waterbug@pangalactic.us> wrote:
On 01/15/2016 05:07 PM, Matthew Brett wrote:
I attribute some of the conda-ignoring to "NIH" and, to some extent, possibly defensiveness (I would be defensive too if I had been working on pip as long as they had when conda came along ;).
I must say, I don't personally recognize those reasons. For example, I hadn't worked on pip at all before conda came along.
By "working on pip", I was referring to *developers* of pip, not those who *use* pip for packaging things. Are you contributing to the development of pip, or merely using it for creating packages?
Sorry - I assumed you were taking about us here on the list planning to build Linux and Windows wheels. It certainly doesn't seem surprising to me that the pip developers would continue to develop pip rather than switch to conda. Has there been any attempt to persuade the pip developers to do this? Best, Matthew

On 01/15/2016 05:19 PM, Matthew Brett wrote:
On Fri, Jan 15, 2016 at 2:15 PM, Steve Waterbury <waterbug@pangalactic.us> wrote:
On 01/15/2016 05:07 PM, Matthew Brett wrote:
I attribute some of the conda-ignoring to "NIH" and, to some extent, possibly defensiveness (I would be defensive too if I had been working on pip as long as they had when conda came along ;).
I must say, I don't personally recognize those reasons. For example, I hadn't worked on pip at all before conda came along.
By "working on pip", I was referring to *developers* of pip, not those who *use* pip for packaging things. Are you contributing to the development of pip, or merely using it for creating packages?
Sorry - I assumed you were taking about us here on the list planning to build Linux and Windows wheels.
No, I was definitely *not* talking about those on the list planning to build Linux and Windows wheels when I referred to the folks "working on pip". However, that said, I *completely* agree with Travis's remark: "The other very real downside is that these efforts to promote numpy as wheels further encourages people to not use the better solution that already exists in conda."
It certainly doesn't seem surprising to me that the pip developers would continue to develop pip rather than switch to conda. Has there been any attempt to persuade the pip developers to do this?
Not that I know of, but as I said, I have asked pip developers and core python developers for their opinions on conda, and my impression has always been one of, shall we say, a circling of the wagons. Yes, even the nice guys in the python community (and I've know some of them a *long* time) sometimes do that ... ;) Cheers, Steve

On Fri, Jan 15, 2016 at 2:33 PM, Steve Waterbury <waterbug@pangalactic.us> wrote:
On 01/15/2016 05:19 PM, Matthew Brett wrote:
On Fri, Jan 15, 2016 at 2:15 PM, Steve Waterbury <waterbug@pangalactic.us> wrote:
On 01/15/2016 05:07 PM, Matthew Brett wrote:
I attribute some of the conda-ignoring to "NIH" and, to some extent, possibly defensiveness (I would be defensive too if I had been working on pip as long as they had when conda came along ;).
I must say, I don't personally recognize those reasons. For example, I hadn't worked on pip at all before conda came along.
By "working on pip", I was referring to *developers* of pip, not those who *use* pip for packaging things. Are you contributing to the development of pip, or merely using it for creating packages?
Sorry - I assumed you were taking about us here on the list planning to build Linux and Windows wheels.
No, I was definitely *not* talking about those on the list planning to build Linux and Windows wheels when I referred to the folks "working on pip". However, that said, I *completely* agree with Travis's remark:
"The other very real downside is that these efforts to promote numpy as wheels further encourages people to not use the better solution that already exists in conda."
I think there's a distinction between 'promote numpy as wheels' and 'make numpy available as a wheel'. I don't think you'll see much evidence of "promotion" here - it's not really the open-source way. I'm not quite sure what you mean about 'circling the wagons', but the general approach of staying on course and seeing how things shake out seems to me entirely sensible. Cheers too, Matthew

On Fri, Jan 15, 2016 at 2:38 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
I think there's a distinction between 'promote numpy as wheels' and 'make numpy available as a wheel'. I don't think you'll see much evidence of "promotion" here - it's not really the open-source way.
Depends on how you define "promotion" I suppose. But I think that supporting something is indeed promoting it. I've been a fan of getting the scipy stack available from pip for a long time -- I think it could be really useful for lots of folks not doing heavy scipy-style work, but folks are very wary of introducing a new, hard to install, dependency. But now I'm not so sure -- the trick is what you tell newbies. When I teach Python (not scipy), I start folks off with the python.org python. Then they can pip install ipython, which is the only part of the "scipy stack" I want them to have if they are not doing real numerical work. But what if they are? Now it's pretty clear to me that anyone interested in getting into data analysis, etc with python should just stating off with Anaconda (or Canopy) -- or maybe Gohlke's binaries for Windows users. But what if we have wheels on all platforms for the scipy stack (and a few others?). Now they can learn python, pip install numpy, scipy, etc, learn some more, get excited -- delve into some domain-specific work,and WHAM -- hit the wall of installation nightmares. NOW, they need to switch to Anaconda or Canopy... I think it's too bad to get that far into it and then have to switch and learn something new. -- Again, is there really a point to a 90% solution? So this is the point -- heading down this road takes us to a place where people can get much farther before hitting the wall -- but hot the wall they will, so we'll still need other solutions, so maybe it would be better for us to put our energies into those other solutions. By the way, I'm seeing more and more folks that are not scipy-focused starting off with conda, even for web apps.
I'm not quite sure what you mean about 'circling the wagons', but the general approach of staying on course and seeing how things shake out seems to me entirely sensible.
well, what I've observed in the PyPa community may not be circling the wagons, but it has been a "we're not ever going to solve the scipy problems" attitude. I'm pretty convinced that pip/wheel will never be even a 90% solution without some modifications -- so why go there? Indeed, it's not uncommon for folks on the distutils list to say "go use conda" in response to issues that pip does not address well. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Jan 15, 2016, at 21:22, Chris Barker <chris.barker@noaa.gov> wrote:
Indeed, it's not uncommon for folks on the distutils list to say "go use conda" in response to issues that pip does not address well.
I was in the room at the very first proto-PyData conference when Guido told the assembled crowd "if pip doesn't satisfy the needs of the SciPy community, they should build something that does, on their own." Well, here we are. Bryan

On Fri, Jan 15, 2016 at 9:56 PM, Steve Waterbury <waterbug@pangalactic.us> wrote:
On 01/15/2016 04:08 PM, Benjamin Root wrote:
So, again, I love conda for what it can do when it works well. I only take exception to the notion that it can address *all* problems, because there are some problems that it just simply isn't properly situated for.
Actually, I would say you didn't mention any ... ;) The issue is not that it "isn't properly situated for" (whatever that means) the problems you describe, but that -- in the case you mention, for example -- no one has conda-packaged those solutions yet.
FWIW, our sysadmins and I use conda for django / apache / mod_wsgi sites and we are very happy with it. IMO, compiling mod_wsgi in the conda environment and keeping it up is trivial compared to the awkwardnesses introduced by using pip/virtualenv in those cases.
We also use conda for sites with nginx and the conda-packaged uwsgi, which works great and even permits the use of a separate env (with, if necessary, different versions of django, etc.) for each application. No need to set up an entire VM for each app! *My* sysadmins love conda -- as soon as they saw how much better than pip/virtualenv it was, they have never looked back.
IMO, conda is by *far* the best packaging solution the python community has ever seen (and I have been using python for more than 20 years). I too have been stunned by some of the resistance to conda that one sometimes sees in the python packaging world. I've had a systems package maintainer tell me "it solves a different problem [than pip]" ... hmmm ... I would say it solves the same problem *and more*, *better*. I attribute some of the conda-ignoring to "NIH" and, to some extent, possibly defensiveness (I would be defensive too if I had been working on pip as long as they had when conda came along ;).
Conda and pip solve some of the same problems, but pip also does quite a bit more than conda (and vice et versa, as conda also acts akin rvm-for-python). Conda works so well because it supports a subset of what pip does: install things from binaries. This is the logical thing to do when you want to distribute binaries because in the python world, since for historical reasons, metadata in the general python lang are dynamic by the very nature of setup.py. For a long time, pip worked from sources instead of binaries (this is actually the reason why it was started following easy_install), and thus had to cope w/ those dynamic metadata. It also has to deal w/ building packages and the whole distutils/setuptools interoperability mess. Conda being solely a packaging solution can sidestep all this complexity (which is again a the logical and smart thing to do if what you care is deployment). Having pip understand conda packages is a non trivial endeavour: since conda packages are relocatable, they are not compatible with the usual python interpreters, and setuptools metadata is neither a subset or a superset of conda metadata. Regarding conda/pip interoperability, there are things that conda could (and IMO should) do, such as writing the expected metadata in site-packages (PEP 376). Currently, conda does not recognize packages installed by pip (because it does not implement PEP 376 and co), so if you do a "pip install ." of a package, it will likely break existing package if present. David

hmm -- didn't mean to rev this up quite so much -- sorry! But it's a good conversation to have, so... On Fri, Jan 15, 2016 at 1:08 PM, Benjamin Root <ben.v.root@gmail.com> wrote:
That being said... I take exception to your assertion that anaconda is *the* solution to the packaging problem.
I think we need to keep some things straight here: "conda" is a binary package management system. "Anaconda" is a python (and other stuff) distribution, built with conda. In practice, everyone ( I know of ) uses the Anaconda distribution (or at least the default conda channel) when using conda, but in theory, you could maintain your an entirely distinct distribution with conda as the tool. Also in practice, conda is so easy because continuum has done the hard work of building a lot of the packages we all need -- there are still a lot being maintained by the community in various ways, but frankly, we do depend on continuum for all the hard work. But working on/with conda does not lock you into that if you think it's not serving your needs. And this discussion, (for me anyway) is about tools and the way forward, not existing packages. So onward!
I still have a number of issues, particularly with the interactions of GDAL, shapely, and Basemap (they all seek out libgeos_c differently), and I have to use my own build of GDAL to enable many of the features that we use (the vanilla GDAL put out by Continuum just has the default options, and is quite limited).
Yeah, GDAL/OGR is a F%$#ing nightmare -- and I do wish that Anaconda had a better build, but frankly, there is no system that's going to make that any easier -- do any of the Linux distros ship a really good compatible, up to date set of these libs -- and OS-X and Windows? yow! (Though Chris Gohlke is a wonder!)
If I don't set up my environment *just right* one of those packages will fail to import in some way due to being unable to find their particular version of libgeos_c. I haven't figure it out exactly why this happens, but it is very easy to break such an environment this way after an update.
Maybe conda could be improved to make this easier, I don't know (though do checkout out the IOOS channel on anaconda.org Filipe has done some nice work on this)
In a clutch, we had our IT staff manually build mod_wsgi against anaconda's python, but they weren't too happy about that, due to mod_wsgi no longer getting updated via yum.
I'm not sure how pip helps you out here, either. sure for easy-to-compile from source packages, sure, you can just pip install, and you'll get a package compatible with your (system) python. But binary wheels will give you the same headaches -- so you're back to expecting your linux dstro to provide everything, which they don't :-( I understand that the IT folks want everything to come from their OS vendor -- they like that -- but it simply isn't practical for scipy-based web services. And once you've got most of your stack coming from another source, is it really a big deal for python to come from somewhere else also (and apache, and ???) -- conda at least is a technology that _can_ provide an integrated system that includes all this -- I don't think you're going to be pip-installling apache anytime soon!
But the design of conda is that it is intended to be a user-space package-manager. mod_wsgi is installed via root/apache user, which is siloed off from the user. I would have to (in theory) go and install conda for the apache user and likely have to install a conda "apache" package and mod_wsgi package.
This seems quite reasonable to me frankly. Plus, you can install conda centrally as well, and then the apache user can get access to it (bot not modify it)
I seriously doubt this would be an acceptable solution for many IT administrators who would rather depend upon the upstream distributions who are going to be very quick about getting updates out the door and are updated automatically through yum or apt.
This is true -- but nothing to do with the technology -- it's a social problem. And the auto updates? Ha! the real problem with admins that want to use the system package managers is that they still insist you run python 2.6 for god's sake :-)
So, again, I love conda for what it can do when it works well. I only take exception to the notion that it can address *all* problems, because there are some problems that it just simply isn't properly situated for.
still true, of course, but it can address many more than pip. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

hmm -- didn't mean to rev this up quite so much -- sorry! But it's a good conversation to have, so... On Fri, Jan 15, 2016 at 1:08 PM, Benjamin Root <ben.v.root@gmail.com> wrote:
That being said... I take exception to your assertion that anaconda is *the* solution to the packaging problem.
I think we need to keep some things straight here: "conda" is a binary package management system. "Anaconda" is a python (and other stuff) distribution, built with conda. In practice, everyone ( I know of ) uses the Anaconda distribution (or at least the default conda channel) when using conda, but in theory, you could maintain your an entirely distinct distribution with conda as the tool. Also in practice, conda is so easy because continuum has done the hard work of building a lot of the packages we all need -- there are still a lot being maintained by the community in various ways, but frankly, we do depend on continuum for all the hard work. But working on/with conda does not lock you into that if you think it's not serving your needs. And this discussion, (for me anyway) is about tools and the way forward, not existing packages. So onward!
I still have a number of issues, particularly with the interactions of GDAL, shapely, and Basemap (they all seek out libgeos_c differently), and I have to use my own build of GDAL to enable many of the features that we use (the vanilla GDAL put out by Continuum just has the default options, and is quite limited).
Yeah, GDAL/OGR is a F%$#ing nightmare -- and I do wish that Anaconda had a better build, but frankly, there is no system that's going to make that any easier -- do any of the Linux distros ship a really good compatible, up to date set of these libs -- and OS-X and Windows? yow! (Though Chris Gohlke is a wonder!)
If I don't set up my environment *just right* one of those packages will fail to import in some way due to being unable to find their particular version of libgeos_c. I haven't figure it out exactly why this happens, but it is very easy to break such an environment this way after an update.
Maybe conda could be improved to make this easier, I don't know (though do checkout out the IOOS channel on anaconda.org Filipe has done some nice work on this)
In a clutch, we had our IT staff manually build mod_wsgi against anaconda's python, but they weren't too happy about that, due to mod_wsgi no longer getting updated via yum.
I'm not sure how pip helps you out here, either. Sure for easy-to-compile from source packages, you can just pip install, and you'll get a package compatible with your (system) python. But binary wheels will give you the same headaches -- so you're back to expecting your linux dstro to provide everything, which they don't :-( I understand that the IT folks want everything to come from their OS vendor -- they like that -- but it simply isn't practical for scipy-based web services. And once you've got most of your stack coming from another source, is it really a big deal for python to come from somewhere else also (and apache, and ???) -- conda at least is a technology that _can_ provide an integrated system that includes all this -- I don't hink you're going to be pip-installling apache anytime soon! (or node, or ???)
If anaconda was the end-all, be-all solution, then it should just be a simple matter to do "conda install mod_wsgi". But the design of conda is that it is intended to be a user-space package-manager.
Then you can either install it as the web user (or apache user), or install it as a system access. I haven't done this, but I don't think it all that hard -- you're then going to need to sudo to install/upgrade anything new, but that's expected. So, again, I love conda for what it can do when it works well. I only take
exception to the notion that it can address *all* problems, because there are some problems that it just simply isn't properly situated for.
But pip isn't situated for any of these either -- I'm still confused as to the point here. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Thu, Jan 14, 2016 at 12:58 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Also, you have the problem that there is one PyPi -- so where do you
On Thu, Jan 14, 2016 at 9:14 AM, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote: put
your nifty wheels that depend on other binary wheels? you may need to fork every package you want to build :-(
Is this a real problem or a theoretical one? Do you know of some situation where this wheel to wheel dependency will occur that won't just be solved in some other way?
It's real -- at least during the whole bootstrapping period. Say I build a nifty hdf5 binary wheel -- I could probably just grab the name "libhdf5" on PyPI. So far so good. But the goal here would be to have netcdf and pytables and GDAL and who knows what else then link against that wheel. But those projects are all supported be different people, that all have their own distribution strategy. So where do I put binary wheels of each of those projects that depend on my libhdf5 wheel? _maybe_ I would put it out there, and it would all grow organically, but neither the culture nor the tooling support that approach now, so I'm not very confident you could gather adoption.
I don't think there's a very large amount of cultural work - but some to be sure.
We already have the following on OSX:
pip install numpy scipy matplotlib scikit-learn scikit-image pandas h5py
where all the wheels come from pypi. So, I don't think this is really outside our range, even if the problem is a little more difficult for Linux.
Even beyond the adoption period, sometimes you need to do stuff in more than one way -- look at the proliferation of channels on Anaconda.org.
This is more likely to work if there is a good infrastructure for third parties to build and distribute the binaries -- e.g. Anaconda.org.
I thought that Anaconda.org allows pypi channels as well?
It does: http://pypi.anaconda.org/ -Travis
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
-- *Travis Oliphant* *Co-founder and CEO* @teoliphant 512-222-5440 http://www.continuum.io

On 09/01/16 00:13, Nathaniel Smith wrote:
Right. There's a small problem which is that the base linux system isn't just "CentOS 5", it's "CentOS 5 and here's the list of libraries that you're allowed to link to: ...", where that list is empirically chosen to include only stuff that really is installed on ~all linux machines and for which the ABI really has been stable in practice over multiple years and distros (so e.g. no OpenSSL).
So the key next step is for someone to figure out and write down that list. Continuum and Enthought both have versions of it that we know are good...
You mean something more empirical than http://refspecs.linuxfoundation.org/lsb.shtml ? I tend to cross-reference with that when adding stuff to Ureka and just err on the side of including things where feasible, then of course test it on the main target platforms. We have also been building on CentOS 5-6 BTW (I believe the former is about to be unsupported). Just skimming the thread... Cheers, James.

On 8 January 2016 at 23:27, Chris Barker <chris.barker@noaa.gov> wrote:
On Fri, Jan 8, 2016 at 1:58 PM, Robert McGibbon <rmcgibbo@gmail.com> wrote:
I'm not sure if this is the right path for numpy or not,
probably not -- AFAICT, the PyPa folks aren't interested in solving teh problems we have in the scipy community -- we can tweak around the edges, but we wont get there without a commitment to really solve the issues -- and if pip did that, it would essentially be conda -- non one wants to re-impliment conda.
I think that's a little unfair to the PyPA people. They would like to solve all of these problems is just a question of priority and expertise. As always in open source you have to scratch your own itch and those guys are working on other things like the security, stability, scalability of the infrastructure, consistency of pip's version handling and dependency resolution etc. Linux wheels is a problem that has been discussed on distutils-sig. The reason it hasn't happened is that it's a lower priority than wheels for OSX/Windows because: 1) Most distros already package this stuff i.e. apt-get numpy 2) On Linux it's much easier to get the appropriate compilers so that pip can build e.g. numpy. 3) The average Linux user is more capable of solving these problems. 4) Getting binary distribution to work across all Linux distributions is significantly harder than for Windows/OSX because of the myriad different distros/versions. Considering point 2 pip install numpy etc. already works for a lot of Linux users even if it is slow because of the time taken to compile (3 minutes on my system). Depending on your use case that problem is partially solved by wheel caching. So if Linux wheels were allowed but didn't always work then that would be a regression for many users. On OSX binary wheels for numpy are already available and work fine AFAIK. The absence of binary numpy wheels for Windows is not down to PyPA. Considering point 4 the idea of compiling on an old base Linux system has been discussed on distutils-sig before and it seems likely to work. The problem is really about the external non-libc dependencies though. The reason progress there has stalled is not because the PyPA folks don't want to solve it but rather because they have other priorities and are hoping that people with more expertise in that area will step up to address those problems. Most of the issues stem from the scientific Python community so ideally someone from the scientific Python community would address how to solve those problems. Recently Nathaniel brought some suggestions to distutils-sig to address the problem of build-requires which is a particular pain point. I think that people there appreciated the effort from someone who understands the needs of hard-to-build packages to improve the way that pip/PyPI works in that area. There was a lot of confusion from people not understanding each others needs but ultimately I thought there was agreement on how to move forward. (Although what happened to that in the end?) The same can happen with other problems like Linux wheels. If you guys here have a clear idea of how to solve the external dependency problem then I'm sure they'll be receptive. Personally I think the best approach is the pyopenblas approach: internalise the external dependency so that pip can work with it. This is precisely what Anaconda does and there's actually no need to make substantive changes to the way pip/pypi/wheel works in order to achieve that. It just needs someone to package the external dependencies as sdist/wheel (and for PyPI to allow Linux wheels). -- Oscar

I wrote a page on using pip with Debian / Ubuntu here : https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Speaking with my numpy debian maintainer hat on, I would really appreciate if you dont suggest to use pip to install packages in Debian, or at least not as the only solution. -- Sandro "morph" Tosi My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi G+: https://plus.google.com/u/0/+SandroTosi

Hi Sandro, On Sat, Jan 9, 2016 at 4:44 AM, Sandro Tosi <morph@debian.org> wrote:
I wrote a page on using pip with Debian / Ubuntu here : https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Speaking with my numpy debian maintainer hat on, I would really appreciate if you dont suggest to use pip to install packages in Debian, or at least not as the only solution.
I'm very happy to accept alternative suggestions or PRs. I know what you mean, but I can't yet see how to write a page that would be good for explaining the benefits / tradeoffs of using deb packages vs mainly or only pip packages vs a mix of the two. Do you have any thoughts? Cheers, Matthew

On Sat, Jan 9, 2016 at 6:08 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi Sandro,
On Sat, Jan 9, 2016 at 4:44 AM, Sandro Tosi <morph@debian.org> wrote:
I wrote a page on using pip with Debian / Ubuntu here : https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Speaking with my numpy debian maintainer hat on, I would really appreciate if you dont suggest to use pip to install packages in Debian, or at least not as the only solution.
I'm very happy to accept alternative suggestions or PRs.
I know what you mean, but I can't yet see how to write a page that would be good for explaining the benefits / tradeoffs of using deb packages vs mainly or only pip packages vs a mix of the two. Do you have any thoughts?
you can start by making extremely clear that this is not the Debian supported way to install python modules on a Debian system, that if a user uses pip to do it, it's very likely other applications or modules will fail, that if they have any problem with anything python related, they are on their own as they "broke" their system on purpose. thanks for considering -- Sandro "morph" Tosi My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi G+: https://plus.google.com/u/0/+SandroTosi

On Sat, Jan 9, 2016 at 6:57 PM, Sandro Tosi <morph@debian.org> wrote:
On Sat, Jan 9, 2016 at 6:08 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi Sandro,
On Sat, Jan 9, 2016 at 4:44 AM, Sandro Tosi <morph@debian.org> wrote:
I wrote a page on using pip with Debian / Ubuntu here : https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Speaking with my numpy debian maintainer hat on, I would really appreciate if you dont suggest to use pip to install packages in Debian, or at least not as the only solution.
I'm very happy to accept alternative suggestions or PRs.
I know what you mean, but I can't yet see how to write a page that would be good for explaining the benefits / tradeoffs of using deb packages vs mainly or only pip packages vs a mix of the two. Do you have any thoughts?
you can start by making extremely clear that this is not the Debian supported way to install python modules on a Debian system, that if a user uses pip to do it, it's very likely other applications or modules will fail, that if they have any problem with anything python related, they are on their own as they "broke" their system on purpose. thanks for considering
I updated the page with more on reasons to prefer Debian packages over installing with pip: https://matthew-brett.github.io/pydagogue/installing_on_debian.html Is that enough to get the message across? Cheers, Matthew

On Sun, Jan 10, 2016 at 3:55 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
I updated the page with more on reasons to prefer Debian packages over installing with pip:
https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Is that enough to get the message across?
That looks a lot better, thanks! I also kinda agree with all Nathaniel said on the matter -- Sandro "morph" Tosi My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi G+: https://plus.google.com/u/0/+SandroTosi

On Sun, Jan 10, 2016 at 3:40 AM, Sandro Tosi <morph@debian.org> wrote:
On Sun, Jan 10, 2016 at 3:55 AM, Matthew Brett <matthew.brett@gmail.com> wrote:
I updated the page with more on reasons to prefer Debian packages over installing with pip:
https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Is that enough to get the message across?
That looks a lot better, thanks! I also kinda agree with all Nathaniel said on the matter
Good so. You probably saw, I already removed use of sudo on the page as well. Matthew

On Jan 9, 2016 10:09, "Matthew Brett" <matthew.brett@gmail.com> wrote:
Hi Sandro,
On Sat, Jan 9, 2016 at 4:44 AM, Sandro Tosi <morph@debian.org> wrote:
I wrote a page on using pip with Debian / Ubuntu here : https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Speaking with my numpy debian maintainer hat on, I would really appreciate if you dont suggest to use pip to install packages in Debian, or at least not as the only solution.
I'm very happy to accept alternative suggestions or PRs.
I know what you mean, but I can't yet see how to write a page that would be good for explaining the benefits / tradeoffs of using deb packages vs mainly or only pip packages vs a mix of the two. Do you have any thoughts?
Why not replace all the "sudo pip" calls with "pip --user"? The trade offs between Debian-installed packages versus pip --user installed packages are subtle, and both are good options. Personal I'd generally recommend anyone actively developing python code to skip straight to pip for most things, since you'll eventually end up there anyway, but this is definitely debatable and situation dependent. On the other hand, "sudo pip" specifically is something I'd never recommend, and indeed has the potential to totally break your system. -n

On Sat, Jan 9, 2016 at 8:49 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Jan 9, 2016 10:09, "Matthew Brett" <matthew.brett@gmail.com> wrote:
Hi Sandro,
On Sat, Jan 9, 2016 at 4:44 AM, Sandro Tosi <morph@debian.org> wrote:
I wrote a page on using pip with Debian / Ubuntu here : https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Speaking with my numpy debian maintainer hat on, I would really appreciate if you dont suggest to use pip to install packages in Debian, or at least not as the only solution.
I'm very happy to accept alternative suggestions or PRs.
I know what you mean, but I can't yet see how to write a page that would be good for explaining the benefits / tradeoffs of using deb packages vs mainly or only pip packages vs a mix of the two. Do you have any thoughts?
Why not replace all the "sudo pip" calls with "pip --user"? The trade offs between Debian-installed packages versus pip --user installed packages are subtle, and both are good options. Personal I'd generally recommend anyone actively developing python code to skip straight to pip for most things, since you'll eventually end up there anyway, but this is definitely debatable and situation dependent. On the other hand, "sudo pip" specifically is something I'd never recommend, and indeed has the potential to totally break your system.
Sure, but I don't think the page is suggesting doing ``sudo pip`` for anything other than upgrading pip and virtualenv(wrapper) - and I don't think that is likely to break the system. Matthew

On Sat, Jan 9, 2016 at 8:58 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
On Sat, Jan 9, 2016 at 8:49 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Jan 9, 2016 10:09, "Matthew Brett" <matthew.brett@gmail.com> wrote:
Hi Sandro,
On Sat, Jan 9, 2016 at 4:44 AM, Sandro Tosi <morph@debian.org> wrote:
I wrote a page on using pip with Debian / Ubuntu here : https://matthew-brett.github.io/pydagogue/installing_on_debian.html
Speaking with my numpy debian maintainer hat on, I would really appreciate if you dont suggest to use pip to install packages in Debian, or at least not as the only solution.
I'm very happy to accept alternative suggestions or PRs.
I know what you mean, but I can't yet see how to write a page that would be good for explaining the benefits / tradeoffs of using deb packages vs mainly or only pip packages vs a mix of the two. Do you have any thoughts?
Why not replace all the "sudo pip" calls with "pip --user"? The trade offs between Debian-installed packages versus pip --user installed packages are subtle, and both are good options. Personal I'd generally recommend anyone actively developing python code to skip straight to pip for most things, since you'll eventually end up there anyway, but this is definitely debatable and situation dependent. On the other hand, "sudo pip" specifically is something I'd never recommend, and indeed has the potential to totally break your system.
Sure, but I don't think the page is suggesting doing ``sudo pip`` for anything other than upgrading pip and virtualenv(wrapper) - and I don't think that is likely to break the system.
It could... a quick glance suggests that currently installing virtualenvwrapper like that will also pull in some random pypi snapshot of stevedore, which will shadow the built-in package version. And then stevedore is used by tons of different debian packages, including large parts of openstack... But more to the point, the target audience for your page is hardly equipped to perform that kind of analysis, never mind in the general case of using 'sudo pip' for arbitrary Python packages, and your very first example is one that demonstrates bad habits... So personally I'd avoid mentioning the possibility of 'sudo pip', or better yet explicitly warn against it. -n -- Nathaniel J. Smith -- http://vorpus.org
participants (18)
-
Benjamin Root
-
Bryan Van de Ven
-
Chris Barker
-
Chris Barker - NOAA Federal
-
David Cournapeau
-
James E.H. Turner
-
Julian Taylor
-
Matthew Brett
-
Nathan Goldbaum
-
Nathaniel Smith
-
Oscar Benjamin
-
Ralf Gommers
-
Robert McGibbon
-
Sandro Tosi
-
Steve Waterbury
-
Tony Kelman
-
Travis Oliphant
-
Yuxiang Wang