Handling the binary dependency management problem
I've had a couple of conversations recently that suggest it's high time I posted this particular idea publicly, so here's my suggestion for dealing with (or, more accurately, avoiding) the binary dependency management problem for upstream distribution in the near term. = Defining the problem = The core packaging standards and tools need to deal with a couple of different use cases. We want them to be usable for beginners and folks that aren't professional developers running Python locally, but we also want them to be suitable for professionally administered server environments, and to integrate reasonably well with downstream redistributors (most notably Linux distributions, but also other redistributors like ActiveState, Enthought and Continuum Analytics). For pure Python modules, these two objectives are easy to reconcile. For simple, self-contained, C extension modules, it gets harder, which is why PyPI only allows wheel files for Mac OS X and Windows, and expects them to be compatible with the binary installers provided on python.org. Resolving this to allow platform specific wheel files on PyPI for Linux distributions, Cygwin, MacPorts, homebrew, etc is going to require some work on better defining the "platform" tag in PEP 425. For arbitrary binary dependencies, however, I contend that reconciling the two different use cases is simply infeasible, as pip and venv have to abide by the following two restrictions: 1. Allow low impact security updates to components low in the stack (e.g. deploying a CPython security update shouldn't require rebuilding any other components) 2. Allow interoperability with a variety of packaging systems, including supporting access to Python modules provided solely through the system package manager (e.g. yum/rpm related modules on Fedora and derived distros) At this level, it's often impractical to offer prebuilt binary components with external dependencies, as you *also* need a mechanism to deliver the external binary dependencies themselves, that platform integrators can override if they want. = Proposed approach = Since supporting both cross-platform external binaries *and* allowing the substition of platform specific binaries in the same tool is difficult, I propose that we don't even try (at least not until we're much further down the packaging improvement road, and have solved all the *other* issues facing the packaging infrastructure) Instead, I suggest that the Python Packaging User Guide *explicitly* recommend a two level solution: 1. For simple cases without complex binary dependencies, and for cases where platform integration and/or low impact security updates are needed, use the core pip/virtualenv toolchain. This may sometimes mean needing to build components with external dependencies from source, and that's OK (it's the price we pay at this level for supporting platform integration). 2. For cross-platform handling of external binary dependencies, we recommend boostrapping the open source conda toolchain, and using that to install pre-built binaries (currently administered by the Continuum Analytics folks). Specifically, commands like the following should work on POSIX systems without needing any local build machinery, and without needing all the projects in the chain to publish wheels: "pip install conda && conda init && conda install ipython" For many end users just running things locally (especially beginners and non-developers), using conda will be the quickest and easiest way to get up and running. For professional developers and administrators, option 1 will provide the finer control and platform interoperability that they often need. conda already has good coverage of the scientific stack, but may need additional contributions to cover other notoriously hard to build components (such as crypto libraries and game development libraries). = What would this mean for the wheel format? = conda has its own binary distribution format, using hash based dependencies. It's this mechanism which allows it to provide reliable cross platform binary dependency management, but it's also the same mechanism that prevents low impact security updates and interoperability with platform provided packages. So wheels would remain the core binary format, and we'd continue to recommend publishing them whenever feasible. However, we'd postpone indefinitely the question of attempting to deal with arbitrary external dependency support. = Blocking issues = Even if we agree this a good way forward in the near term, there are a couple of technical issues that would need to be resolved before making it official in the user guide. There is currently at least one key blocking issue on the conda side, where it breaks if run inside a symlinked virtual environment (which both virtualenv and pyvenv create by default in POSIX environments): https://github.com/ContinuumIO/conda/issues/360 I believe conda also currently conflicts with virtualenvwrapper regarding the meaning of activate/deactivate at the command line. There aren't any blockers I am aware on the pip/virtualenv side, but the fact --always-copy is currently broken in virtualenv prevents a possible workaround for the conda issue above: https://github.com/pypa/virtualenv/issues/495 (pyvenv doesn't offer an --always-copy option, just the option to use symlinks on Windows where they're not used by default due to the associated permissions restrictions, so the primary resolution still needs to be for conda to correctly handle that situation and convert the venv Python to a copy rather than failing) Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 1 December 2013 04:15, Nick Coghlan <ncoghlan@gmail.com> wrote:
2. For cross-platform handling of external binary dependencies, we recommend boostrapping the open source conda toolchain, and using that to install pre-built binaries (currently administered by the Continuum Analytics folks). Specifically, commands like the following should work on POSIX systems without needing any local build machinery, and without needing all the projects in the chain to publish wheels: "pip install conda && conda init && conda install ipython"
Hmm, this is a somewhat surprising change of direction. You mention POSIX here - but do you intend this to be the standard approach on Windows too? Just as a test, I tried the above, on Python 3.3 on Windows 64-bit. This is python.org python, installed in a virtualenv. I'm just going off what you said above - if there are more explicit docs, I can try using them (but I *don't* want to follow the "official" Anaconda docs, as they talk about using Anaconda python, and about using conda to manage environments, rather than virtualenv). pip install conda worked OK, but it installed a pure-Python version of PyYAML (presumably because the C accelerator needs libyaml, so can't be built without a bit of extra work - that's a shame but see below). conda init did something, no idea what but it seemed to be fine. conda install ipython then worked, it seems to have installed a binary version of pyyaml. Then, however, yaml install numpy fails: >conda install numpy failed to create process. It looks like the binary yaml module is broken. Doing "import yaml" in a python session gives a runtime error "An application has made an attempt to load the C runtime library incorrectly". I can report this as a bug to conda, I guess (I won't, because I don't know where to report conda bugs, and I don't expect to have time to find out or help diagnose the issues when the developers investigate - it was something I tried purely for curiosity). But I wouldn't be happy to see this as the recommended approach until it's more robust than this. Paul
On Dec 1, 2013 1:10 PM, "Paul Moore" <p.f.moore@gmail.com> wrote:
On 1 December 2013 04:15, Nick Coghlan <ncoghlan@gmail.com> wrote:
2. For cross-platform handling of external binary dependencies, we recommend boostrapping the open source conda toolchain, and using that to install pre-built binaries (currently administered by the Continuum Analytics folks). Specifically, commands like the following should work on POSIX systems without needing any local build machinery, and without needing all the projects in the chain to publish wheels: "pip install conda && conda init && conda install ipython"
Hmm, this is a somewhat surprising change of direction.
Indeed it is. Can you clarify a little more how you've come to this conclusion, Nick and perhaps explain what conda is? I looked at conda some time ago and it seemed to be aimed at HPC (high performance computing) clusters which is a niche use case where you have large networks of computation nodes containing identical hardware. (unless I'm conflating it with something else). Oscar
, On 2 Dec 2013 01:04, "Vinay Sajip" <vinay_sajip@yahoo.co.uk> wrote:
On Sun, 1/12/13, Nick Coghlan <ncoghlan@gmail.com> wrote:
(pyvenv doesn't offer an --always-copy option, just the option to use
symlinks on
It does - you should be able to run pyvenv with --copies to force
copying, even on POSIX. Hmm, I didn't see that in --help. I'll take another look. Cheers, Nick.
Regards,
Vinay Sajip
For arbitrary binary dependencies, however, I contend that reconciling the two different use cases is simply infeasible, as pip and venv have to abide by the following two restrictions:
To be clear, what's a good example of a common non-science PyPI package that has an "arbitrary binary dependency"? psycopg2?
For many end users just running things locally (especially beginners and non-developers), using conda will be the quickest and easiest way to get up and running.
Conda/Anaconda is an alien world right now to most non-science people (including me) Working in an alien world, is never the "quickest" or "easiest" way at first, but I'm curious to try. Some PyPA people actually need to try using it for real, and get comfortable with it.
sometimes mean needing to build components with external dependencies from source
you mean build once (or maybe after system updates for wheels with external binary deps), and cache as a local wheel, right?
On 1 December 2013 19:21, Marcus Smith <qwcode@gmail.com> wrote:
sometimes mean needing to build components with external dependencies from source
you mean build once (or maybe after system updates for wheels with external binary deps), and cache as a local wheel, right?
Note that it is possible to convert binary packages from other formats to wheel. We already support egg and bdist_wininst. If conda has built packages that are worth supporting, I doubt that a conda-to-wheel converter would be hard to write. I'd be willing to write one if someone can point me at a spec for the conda package format (I couldn't find one from a brief look through the docs). Conversely, if I have a wininst installer, a wheel or an egg, is there a converter to conda format? I can't see having to use conda for some things and pip for others as being a move towards lessening user confusion. I see conda (and enthought) as more like distributions - they sit in the same space as rpms and debs. I don't see them as alternatives to pip. Am I wrong? After all, both conda and enthought supply Python interpreter builds as well as package repositories. And both feel like unified ecosystems (you manage everything Python-related with conda) rather than tools like pip (that works with existing Python standards and interoperates with easy_install, distutils, etc). If the issue is simply around defining compatibility tags that better describe the various environments around, then let's just get on with that - we're going to have to do it in the end anyway, why temporarily promote an alternative solution just to change our recommendation later? Paul
On Sun, 1/12/13, Paul Moore <p.f.moore@gmail.com> wrote:
If the issue is simply around defining compatibility tags that better describe the various environments around, then let's just get on with that - we're going to have to do it in the end anyway, why temporarily promote an alternative solution just to change our recommendation later?
This makes sense to me. We should refine the compatibility tags as much as is required. It would be nice if there was some place (on PyPI, or elsewhere) where users could request binary distributions for specific packages for particular environments, and then some kind people with those environments might be able to build those wheels and upload them ... a bit like Christoph Gohlke does for Windows. Regards, Vinay Sajip
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/01/2013 05:07 PM, Vinay Sajip wrote:
On Sun, 1/12/13, Paul Moore <p.f.moore@gmail.com> wrote:
If the issue is simply around defining compatibility tags that better describe the various environments around, then let's just get on with that - we're going to have to do it in the end anyway, why temporarily promote an alternative solution just to change our recommendation later?
This makes sense to me. We should refine the compatibility tags as much as is required. It would be nice if there was some place (on PyPI, or elsewhere) where users could request binary distributions for specific packages for particular environments, and then some kind people with those environments might be able to build those wheels and upload them ... a bit like Christoph Gohlke does for Windows.
The issue is combinatorial explosion in the compatibility tag space. There is basically zero chance that even Linux users (even RedHat users across RHEL version) would benefit from pre-built binary wheels (as opposed to packages from their distribution). Wheels on POSIX allow caching of the build process for deployment across a known set of hosts: they won't insulate you from the need to build in the first place. Wheels *might* be in play in the for-pay market, where a vendor supports a limited set platforms, but those solutions will use separate indexes anyway. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlKcizYACgkQ+gerLs4ltQ6kKwCfRa5s8XnM5SwlnnIHGGJ8dJSg hPUAn1TLWQNxtbQmPvvMPT2rEmlhCwq5 =xRsn -----END PGP SIGNATURE-----
On Mon, 2/12/13, Tres Seaver <tseaver@palladion.com> wrote:
The issue is combinatorial explosion in the compatibility tag space. There is basically zero chance that even Linux users (even RedHat users across RHEL version) would benefit from pre-built binary wheels (as opposed to packages from their distribution). Wheels on POSIX allow caching of the build process for deployment across a known set of hosts: they won't insulate you from the need to build in the first place.
The combinations are number of Python X.Y versions x the no. of platform architectures/ABI variants, or do you mean something more than this? The wheel format is supposed to be a cross-platform binary package format; are you saying it is completely useless for POSIX except as a cache for identical hosts? What about for the cases like simple C extensions which have no external dependencies, but are only for speedups? What about POSIX environments where compilers aren't available (e.g. restricted/embedded environments, or due to security policies)? Regards, Vinay Sajip
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/02/2013 12:23 PM, Vinay Sajip wrote:
On Mon, 2/12/13, Tres Seaver <tseaver@palladion.com> wrote:
The issue is combinatorial explosion in the compatibility tag space. There is basically zero chance that even Linux users (even RedHat users across RHEL version) would benefit from pre-built binary wheels (as opposed to packages from their distribution). Wheels on POSIX allow caching of the build process for deployment across a known set of hosts: they won't insulate you from the need to build in the first place.
The combinations are number of Python X.Y versions x the no. of platform architectures/ABI variants, or do you mean something more than this?
Trying to mark up wheels so that they can be safely shared with unknown POSIXy systems seems like a halting problem, to me: the chance I can build a wheel on my machine that you can use on yours (the only reason to distribute a wheel, rather than the sdist, in the first place) drops off sharply as wheel's "binariness" comes into play. I'm arguing that wheel is not an interesting *distribution* format for POSIX systems (at least, for non-Mac ones). It could still play out in *deployment* scenarios (as you note below). Note that wheel's main deployment advantage over a binary egg (installable by pip) is exactly reversed if you use 'easy_install' or 'zc.buildout'. Otherwise, in a controlled deployment, they are pretty much equivalent.
The wheel format is supposed to be a cross-platform binary package format; are you saying it is completely useless for POSIX except as a cache for identical hosts? What about for the cases like simple C extensions which have no external dependencies, but are only for speedups?
I have a lot of packages on PyPI which have such optimization-only speeedups. The time difference to build such extensions is trivial (e.g., for zope.interface, ~1 second on my old slow laptop, versus 0.4 seconds without the extension). Even for lxml (Daniel's original motivating case), the difference is ~45 seconds to build from source vs. 1 second to install a wheel (or and egg). The instant I have to think about whether the binary form might be subtly incompatbile, that 1 second *loses* to the 45 seconds I spend over here arguing with you guys while it builds again from source. :)
What about POSIX environments where compilers aren't available (e.g. restricted/embedded environments, or due to security policies)?
Such environments are almost certainly driven by development teams who can build wheels specifically for deployment to them (assuming the policies allow anything other than distro-package-managed software). This is still really a "cache the build" optimization to known platforms (w/ all binary dependencies the same), rather than distribution. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlKcyPsACgkQ+gerLs4ltQ4oBwCgvhoq8ovEn/Bl/0FpBEfI48JY znEAoJElD+R9SPnJXduwjCy7oxWRmcWH =a0TT -----END PGP SIGNATURE-----
On 2 Dec 2013 06:48, "Paul Moore" <p.f.moore@gmail.com> wrote:
On 1 December 2013 19:21, Marcus Smith <qwcode@gmail.com> wrote:
sometimes mean needing to build components with external dependencies from source
you mean build once (or maybe after system updates for wheels with
external
binary deps), and cache as a local wheel, right?
Right. The advantage of conda is that the use of hash based dependencies lets them distribute arbitrary binary dependencies, whether they're Python libraries or not.
Note that it is possible to convert binary packages from other formats to wheel. We already support egg and bdist_wininst. If conda has built packages that are worth supporting, I doubt that a conda-to-wheel converter would be hard to write. I'd be willing to write one if someone can point me at a spec for the conda package format (I couldn't find one from a brief look through the docs).
Conversely, if I have a wininst installer, a wheel or an egg, is there a converter to conda format? I can't see having to use conda for some things and pip for others as being a move towards lessening user confusion.
I see conda as existing at a similar level to apt and yum from a packaging point of view, with zc.buildout as a DIY equivalent at that level. For example, I installed Nikola into a virtualenv last night. That required installing the development headers for libxml2 and libxslt, but the error that tells you that is a C compiler one. I've been a C programmer longer than I have been a Python one, but I still had to resort to Google to try to figure out what dev libraries I needed. Outside the scientific space, crypto libraries are also notoriously hard to build, as are game engines and GUI toolkits. (I guess database bindings could also be a problem in some cases) We have the option to leave handling the arbitrary binary dependency problem to platforms, and I think we should take it. This is why I suspect there will be a better near term effort/reward trade-off in helping the conda folks improve the usability of their platform than there is in trying to expand the wheel format to cover arbitrary binary dependencies. Longer term, I expect we can expand pip's parallel install capabilities to support a better multi-version experience (after Python 3.4 is out the door, I intend to devote some quality time to trying to improve on the existing pkg_resources multi-version API now that I've been using it long enough to understand it's strengths and limitations) and potentially come up with a scheme for publishing supporting binaries inside prebuilt wheel files, but there are plenty of problems ahead of those on the todo list. By contrast, conda exists now and is being used happily by many members of the scientific and data analysis communities. It's not vapourware - it already works for its original target audience, and they love it.
I see conda (and enthought) as more like distributions - they sit in the same space as rpms and debs. I don't see them as alternatives to pip. Am I wrong? After all, both conda and enthought supply Python interpreter builds as well as package repositories. And both feel like unified ecosystems (you manage everything Python-related with conda) rather than tools like pip (that works with existing Python standards and interoperates with easy_install, distutils, etc).
Exactly correct - in my view, the binary dependency management problem is exactly what defines the distinction between a cross-platform toolkit like pip and virtualenv and a new platform like conda. It's precisely the fact that conda defines a new platform that mostly ignores the underlying system that: - would make it a terrible choice for the core cross-platform software distribution toolkit - makes it much easier for them to consistently support a compiler free experience for end users Hence the layering proposal: if you're willing to build binary extensions from source occasionally, pip and virtualenv are already all you need. If you want someone else to handle the builds for you, rely on a platform instead. Like all platforms, conda has gaps in what it provides, but streamlining the PyPI to conda pipeline is going to be easier than streamlining inclusion in Linux distros, and Windows and Mac OS X have no equivalent binary dependency management system in the first place.
If the issue is simply around defining compatibility tags that better describe the various environments around, then let's just get on with that - we're going to have to do it in the end anyway, why temporarily promote an alternative solution just to change our recommendation later?
That will get us to the point of correctly supporting self-contained extensions that only rely on the Python ABI and the platform C/C++ runtime, and I agree is a problem we need to solve at the cross-platform toolkit level. Conda is about solving the arbitrary binary dependency problem, in the same way other platforms do: by creating a pre-integrated collection of built libraries to cut down on the combinatorial explosion of possibilities. It isn't a coincidence that the "cross-platform platform" approach grew out of the scientific community - a solution like that is essential given the combination of ancient software with arcane build systems and end users that just want to run their data analysis rather than mess about with making the software work. My key point is that if you substitute "beginner" for "scientist" the desired end user experience is similar, but if you substitute "professional software developer" or "system integrator", it isn't - we often *want* (or need) to build from source and do our own integration, so a pre-integrated approach like conda is inappropriate to our use cases. I hadn't previously brought this up because I'm not entirely convinced conda is mature enough yet either, but as noted in the original post, it's become clear to me lately that people are *already* confused about how conda relates to pip, and that the lack of a shared understanding of how they differ has been causing friction on both sides. That friction is unnecessary - conda is no more a potential replacement for pip and virtualenv than zc.buildout is, but like zc.buildout, imposing additional constraints allows conda to solve problems that are difficult, perhaps even impossible, to handle at the core toolkit level. This also came up when Donald asked people to tell him about the problems they saw in packaging: Different tools: https://github.com/pypa/packaging-problems/issues/23 Build problems with standard tools: https://github.com/pypa/packaging-problems/issues/35 By being clear that we understand *why* the scientific community found a need to create their own tools, we can also start to make it clearer why their solutions to their specific problems don't extend to the full spectrum of use cases that need to be supported at the cross-platform toolkit level. Cheers, Nick.
Paul
On 1 December 2013 22:17, Nick Coghlan <ncoghlan@gmail.com> wrote:
For example, I installed Nikola into a virtualenv last night. That required installing the development headers for libxml2 and libxslt, but the error that tells you that is a C compiler one.
I've been a C programmer longer than I have been a Python one, but I still had to resort to Google to try to figure out what dev libraries I needed.
But that's a *build* issue, surely? How does that relate to installing Nikola from a set of binary wheels? I understand you are thinking about non-Python libraries, but all I can say is that this has *never* been an issue to my knowledge in the Windows world. People either ship DLLs with the Python extension, or build statically. I understand that things are different in the Unix world, but to be blunt why should Windows users care?
Outside the scientific space, crypto libraries are also notoriously hard to build, as are game engines and GUI toolkits. (I guess database bindings could also be a problem in some cases)
Build issues again...
We have the option to leave handling the arbitrary binary dependency problem to platforms, and I think we should take it.
Again, can we please be clear here? On Windows, there is no issue that I am aware of. Wheels solve the binary distribution issue fine in that environment (I know this is true, I've been using wheels for months now - sure there may be specialist areas that need some further work because they haven't had as much use yet, but that's details)
This is why I suspect there will be a better near term effort/reward trade-off in helping the conda folks improve the usability of their platform than there is in trying to expand the wheel format to cover arbitrary binary dependencies.
Excuse me if I'm feeling a bit negative towards this announcement. I've spent many months working on, and promoting, the wheel + pip solution, to the point where it is now part of Python 3.4. And now you're saying that you expect us to abandon that effort and work on conda instead? I never saw wheel as a pure-Python solution, installs from source were fine for me in that area. The only reason I worked so hard on wheel was to solve the Windows binary distribution issue. If the new message is that people should not distribute wheels for (for example) lxml, pyyaml, pymzq, numpy, scipy, pandas, gmpy, and pyside (to name a few that I use in wheel format relatively often) then effectively the work I've put in has been wasted. I'm hoping I've misunderstood here. Please clarify. Preferably with specifics for Windows (as "conda is a known stable platform" simply isn't true for me...) - I accept you're not a Windows user, so a pointer to already-existing documentation is fine (I couldn't find any myself). Paul.
On Mon, Dec 2, 2013 at 12:38 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 1 December 2013 22:17, Nick Coghlan <ncoghlan@gmail.com> wrote:
For example, I installed Nikola into a virtualenv last night. That required installing the development headers for libxml2 and libxslt, but the error that tells you that is a C compiler one.
I've been a C programmer longer than I have been a Python one, but I still had to resort to Google to try to figure out what dev libraries I needed.
But that's a *build* issue, surely? How does that relate to installing Nikola from a set of binary wheels?
I understand you are thinking about non-Python libraries, but all I can say is that this has *never* been an issue to my knowledge in the Windows world. People either ship DLLs with the Python extension, or build statically. I understand that things are different in the Unix world, but to be blunt why should Windows users care?
Outside the scientific space, crypto libraries are also notoriously hard to build, as are game engines and GUI toolkits. (I guess database bindings could also be a problem in some cases)
Build issues again...
We have the option to leave handling the arbitrary binary dependency problem to platforms, and I think we should take it.
Again, can we please be clear here? On Windows, there is no issue that I am aware of. Wheels solve the binary distribution issue fine in that environment (I know this is true, I've been using wheels for months now - sure there may be specialist areas that need some further work because they haven't had as much use yet, but that's details)
This is why I suspect there will be a better near term effort/reward trade-off in helping the conda folks improve the usability of their platform than there is in trying to expand the wheel format to cover arbitrary binary dependencies.
Excuse me if I'm feeling a bit negative towards this announcement. I've spent many months working on, and promoting, the wheel + pip solution, to the point where it is now part of Python 3.4. And now you're saying that you expect us to abandon that effort and work on conda instead? I never saw wheel as a pure-Python solution, installs from source were fine for me in that area. The only reason I worked so hard on wheel was to solve the Windows binary distribution issue. If the new message is that people should not distribute wheels for (for example) lxml, pyyaml, pymzq, numpy, scipy, pandas, gmpy, and pyside (to name a few that I use in wheel format relatively often) then effectively the work I've put in has been wasted.
Hi, scipy developer here. In the scientific python community people are definitely interested in and intending to standardize on wheels. Your work on wheel + pip is much appreciated. The problems above that you say are "build issues" aren't really build issues (where build means what distutils/bento do to build a package). Maybe the following concepts, shamelessly stolen from the thread linked below, help: - *build systems* handle the actual building of software, eg Make, CMake, distutils, Bento, autotools, etc - *package managers* handle the distribution and installation of built (or source) software, eg pip, apt, brew, ports - *build managers* are separate from the above and handle the automatic(?) preparation of packages from the results of build systems Conda is a package manager to the best of my understanding, but because it controls the whole stack it can also already do parts of the job of a build manager. This is not something that pip aims to do. Conda is fairly new and not well understood in our community either, but maybe this (long) thread helps: https://groups.google.com/forum/#!searchin/numfocus/build$20managers/numfocu.... Regards, Ralf I'm hoping I've misunderstood here. Please clarify. Preferably with
specifics for Windows (as "conda is a known stable platform" simply isn't true for me...) - I accept you're not a Windows user, so a pointer to already-existing documentation is fine (I couldn't find any myself).
Paul. _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 2 December 2013 09:38, Paul Moore <p.f.moore@gmail.com> wrote:
On 1 December 2013 22:17, Nick Coghlan <ncoghlan@gmail.com> wrote:
For example, I installed Nikola into a virtualenv last night. That required installing the development headers for libxml2 and libxslt, but the error that tells you that is a C compiler one.
I've been a C programmer longer than I have been a Python one, but I still had to resort to Google to try to figure out what dev libraries I needed.
But that's a *build* issue, surely? How does that relate to installing Nikola from a set of binary wheels?
Because libxml2 and libxslt aren't Python programs - they're external shared libraries. You would have to build statically for a wheel to work (which is, to be fair, exactly what we do in many cases - CPython itself bundles all its dependencies on Windows)
I understand you are thinking about non-Python libraries, but all I can say is that this has *never* been an issue to my knowledge in the Windows world. People either ship DLLs with the Python extension, or build statically. I understand that things are different in the Unix world, but to be blunt why should Windows users care?
Outside the scientific space, crypto libraries are also notoriously hard to build, as are game engines and GUI toolkits. (I guess database bindings could also be a problem in some cases)
Build issues again...
Yes, that's the point. wheels can solve the problem for cases where all external dependencies can be statically linked, or otherwise bundled inside the wheel. They *don't* work if you need multiple Python projects binding to the *same* copy of the external dependency.
We have the option to leave handling the arbitrary binary dependency problem to platforms, and I think we should take it.
Again, can we please be clear here? On Windows, there is no issue that I am aware of. Wheels solve the binary distribution issue fine in that environment (I know this is true, I've been using wheels for months now - sure there may be specialist areas that need some further work because they haven't had as much use yet, but that's details)
Wheels work fine if they're self contained (whether through static linking or bundling), or only depending on other projects within the PyPI ecosystem, or if you're building custom wheels from source that only need to work in an environment you control. They *don't* work as soon as multiple components need to share a common external binary dependency that isn't part of the PyPI ecosystem, and you want to share your wheels with someone else - at that point you need to have a mechanism for providing the external dependencies as well. It's that second problem that some members of the scientific community have solved through conda, and that's the one I am saying we should postpone trying to solve at the pip level indefinitely (because it's damn hard, and we don't need to - the people that care about that feature generally won't care about the system integration features that pip offers over conda).
This is why I suspect there will be a better near term effort/reward trade-off in helping the conda folks improve the usability of their platform than there is in trying to expand the wheel format to cover arbitrary binary dependencies.
Excuse me if I'm feeling a bit negative towards this announcement. I've spent many months working on, and promoting, the wheel + pip solution, to the point where it is now part of Python 3.4. And now you're saying that you expect us to abandon that effort and work on conda instead?
No, conda doesn't work for most of our use cases - it only works for the "I want this pre-integrated stack of software on this system, and I don't care about the details" use case. However, this is an area where pip/virtualenv can fall short because of the external shared binary dependency problem, and because there is a lot of bad software out there with appallingly unreliable build systems. If things are statically linked or bundled instead, then they fall into the scope that I agree pip/virtualenv *should* be able to handle (i.e. wheels that are self-contained assigned from the dependencies declared in their metadata). It's just the "arbitrary external shared binary dependency" problem that I want to put into the "will not solve" bucket. What that means is that remotely built wheels would *not* try to use system libraries, they would always use static linking and/or bundling for external binary dependencies. If you're already doing that on Windows, then *great*, that's working sensibly within the constraints of the design. Anyone that wanted dynamic linking of external shared dependencies would then need to either build from source (for integration with their local environment), or use a third party pre-integrated stack (like conda).
I never saw wheel as a pure-Python solution, installs from source were fine for me in that area. The only reason I worked so hard on wheel was to solve the Windows binary distribution issue. If the new message is that people should not distribute wheels for (for example) lxml, pyyaml, pymzq, numpy, scipy, pandas, gmpy, and pyside (to name a few that I use in wheel format relatively often) then effectively the work I've put in has been wasted.
If a distribution can be sensibly published as a self-contained wheel with no external binary dependencies, then great, it makes sense to do that (e.g. an lxml wheel containing or statically linked to libxml2 and libxslt). The only problem I want to take off the table is the one where multiple wheel files try to share a dynamically linked external binary dependency.
I'm hoping I've misunderstood here. Please clarify. Preferably with specifics for Windows (as "conda is a known stable platform" simply isn't true for me...) - I accept you're not a Windows user, so a pointer to already-existing documentation is fine (I couldn't find any myself).
Windows already has a culture of bundling all its dependencies, so what I'm suggesting isn't the least bit radical there. It *is* radical in other environments, as is the suggestion that the bundling philosophy of the CPython Windows installer be extended to all Windows targeted wheel files (at least for folks that mostly work on Linux). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 2 December 2013 07:31, Nick Coghlan <ncoghlan@gmail.com> wrote:
The only problem I want to take off the table is the one where multiple wheel files try to share a dynamically linked external binary dependency.
OK. Thanks for the clarification. Can I suggest that we need to be very careful how any recommendation in this area is stated? I certainly didn't get that impression from your initial posting, and from the other responses it doesn't look like I was the only one. We're only just starting to get real credibility for wheel as a distribution format, and we need to get a very strong message out that wheel is the future, and people should be distributing wheels as their primary binary format. My personal litmus test is the scientific community - when Christoph Gohlke is distributing his (Windows) binary builds as wheels, and projects like numpy, ipython, scipy etc are distributing wheels on PyPI, rather than bdist_wininst, I'll feel like we have got to the point where wheels are "the norm". The problem is, of course, that with conda being a scientific distribution at heart, any message we issue that promotes conda in any context will risk confusion in that community. My personal interest is as a non-scientific user who does a lot of data analysis, and finds IPython, Pandas, matplotlib numpy etc useful. At the moment I can pip install the tools I need (with a quick wheel convert from wininst format). I don't want to find that in the future I can't do that, but instead have to build from source or learn a new tool (conda). Paul
On 2 December 2013 09:19, Paul Moore <p.f.moore@gmail.com> wrote:
On 2 December 2013 07:31, Nick Coghlan <ncoghlan@gmail.com> wrote:
The only problem I want to take off the table is the one where multiple wheel files try to share a dynamically linked external binary dependency.
OK. Thanks for the clarification.
Can I suggest that we need to be very careful how any recommendation in this area is stated? I certainly didn't get that impression from your initial posting, and from the other responses it doesn't look like I was the only one.
I understood what Nick meant but I still don't understand how he's come to this conclusion.
We're only just starting to get real credibility for wheel as a distribution format, and we need to get a very strong message out that wheel is the future, and people should be distributing wheels as their primary binary format. My personal litmus test is the scientific community - when Christoph Gohlke is distributing his (Windows) binary builds as wheels, and projects like numpy, ipython, scipy etc are distributing wheels on PyPI, rather than bdist_wininst, I'll feel like we have got to the point where wheels are "the norm". The problem is, of course, that with conda being a scientific distribution at heart, any message we issue that promotes conda in any context will risk confusion in that community.
Nick's proposal is basically incompatible with allowing Cristoph Gohlke to use pip and wheels. Christoph provides a bewildering array of installers for prebuilt packages that are interchangeable with other builds at the level of Python code but not necessarily at the binary level. So, for example, His scipy is incompatible with the "official" (from SourceForge) Windows numpy build because it links with the non-free Intel MKL library and it needs numpy to link against the same. Installing his scipy over the other numpy results in this: https://mail.python.org/pipermail//python-list/2013-September/655669.html So Christoph can provide wheels and people can manually download them and install from them but would beginners find that any easier than running the .exe installers? The .exe installers are more powerful and can do things like the numpy super-pack that distributes binaries for different levels of SSE support (as discussed previously on this list the wheel format cannot currently achieve this). Beginners will also find .exe installers more intuitive than running pip on the command line and will typically get better error messages etc. than pip provides. So I don't really see why Cristoph should bother switching formats (as noted by Paul before anyone who wants a wheel cache can easily convert his installers into wheels). AFAICT what Nick is saying is that it's not possible for pip and PyPI to guarantee the compatibility of different binaries because unlike apt-get and friends only part of the software stack is controlled. However I think this is not the most relevant difference between pip and apt-get here. The crucial difference is that apt-get communicates with repositories where all code and all binaries are under control of a single organisation. Pip (when used normally) communicates with PyPI and no single organisation controls the content of PyPI. So there's no way for pip/PyPI to guarantee *anything* about the compatibility of the code that they distribute/install, whether the problems are to do with binary compatibility or just compatibility of pure Python code. For pure Python distributions package authors are expected to solve the compatibility problems and pip provides version specifiers etc that they can use to do this. For built distributions they could do the same - except that pip/PyPI don't provide a mechanism for them to do so. Because PyPI is not a centrally controlled single software stack it needs a different model for ensuring compatibility - one driven by the community. People in the Python community are prepared to spend a considerable amount of time, effort and other resources solving this problem. Consider how much time Cristoph Gohlke must spend maintaining such a large internally consistent set of built packages. He has created a single compatible binary software stack for scientific computation. It's just that PyPI doesn't give him any way to distribute it. If perhaps he could own a tag like "cgohlke" and upload numpy:cgohlke and scipy:cgohlke then his scipy:cgohlke wheel could depend on numpy:cgohlke and numpy:cgohlke could somehow communicate the fact that it is incompatible with any other scipy distribution. This is one way in which pip/PyPI could facilitate the Python community to solve the binary compatibility problems. [As an aside I don't know whether Cristoph's Intel license would permit distribution via PYPI.] Another way would be to allow the community to create compatibility tags so that projects like numpy would have mechanisms to indicate e.g. Fortran ABI compatibility. In this model no one owns a particular tag but projects that depend on one another could simply use them in a consistent way that pip could understand. The impression I got from Nick's initial post is that, having discovered that the compatibility tags used in the wheel format are insufficient for the needs of the Python community and that it's not possible to enumerate the tags needed, pip/PyPI should just give up on the problem of binary compatibility. I think it would be better to think about simple mechanisms that the authors of the concerned packages could use so that people in the Python community can solve these problems for each of the packages they contribute to. There is enough will out there to make this work for all the big packages and problematic operating systems if only PyPI will allow it. Oscar
On 2 December 2013 10:45, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Nick's proposal is basically incompatible with allowing Cristoph Gohlke to use pip and wheels. Christoph provides a bewildering array of installers for prebuilt packages that are interchangeable with other builds at the level of Python code but not necessarily at the binary level. So, for example, His scipy is incompatible with the "official" (from SourceForge) Windows numpy build because it links with the non-free Intel MKL library and it needs numpy to link against the same. Installing his scipy over the other numpy results in this: https://mail.python.org/pipermail//python-list/2013-September/655669.html
Ah, OK. I had not seen this issue as I've always either used Christoph's builds or not used them. I've never tried or needed to mix builds. This is probably because I'm very much only a casual user of the scientific stack, so my needs are pretty simple.
So Christoph can provide wheels and people can manually download them and install from them but would beginners find that any easier than running the .exe installers? The .exe installers are more powerful and can do things like the numpy super-pack that distributes binaries for different levels of SSE support (as discussed previously on this list the wheel format cannot currently achieve this). Beginners will also find .exe installers more intuitive than running pip on the command line and will typically get better error messages etc. than pip provides. So I don't really see why Cristoph should bother switching formats (as noted by Paul before anyone who wants a wheel cache can easily convert his installers into wheels).
The crucial answer here is that exe installers don't recognise virtualenvs. Again, I can imagine that a scientific user would naturally install Python and put all the scientific modules into the system Python - but precisely because I'm a casual user, I want to keep big dependencies like numpy/scipy out of my system Python, and so I use virtualenvs. The big improvement pip/wheel give over wininst is a consistent user experience, whether installing into the system Python, a virtualenv, or a Python 3.3+ venv. (I used to use wininsts in preference to pip, so please excuse a certain level of the enthusiasm of a convert here :-))
AFAICT what Nick is saying is that it's not possible for pip and PyPI to guarantee the compatibility of different binaries because unlike apt-get and friends only part of the software stack is controlled. However I think this is not the most relevant difference between pip and apt-get here. The crucial difference is that apt-get communicates with repositories where all code and all binaries are under control of a single organisation. Pip (when used normally) communicates with PyPI and no single organisation controls the content of PyPI. So there's no way for pip/PyPI to guarantee *anything* about the compatibility of the code that they distribute/install, whether the problems are to do with binary compatibility or just compatibility of pure Python code. For pure Python distributions package authors are expected to solve the compatibility problems and pip provides version specifiers etc that they can use to do this. For built distributions they could do the same - except that pip/PyPI don't provide a mechanism for them to do so.
Agreed. Expecting the same level of compatibility guarantees from PyPI as is provided by RPM/apt is unrealistic, in my view. Heck, even pure Python packages don't give any indication as to whether they are Python 3 compatible in some cases (I just hit this today with the binstar package, as an example). This is a fact of life with a repository that doesn't QA uploads.
Because PyPI is not a centrally controlled single software stack it needs a different model for ensuring compatibility - one driven by the community. People in the Python community are prepared to spend a considerable amount of time, effort and other resources solving this problem. Consider how much time Cristoph Gohlke must spend maintaining such a large internally consistent set of built packages. He has created a single compatible binary software stack for scientific computation. It's just that PyPI doesn't give him any way to distribute it. If perhaps he could own a tag like "cgohlke" and upload numpy:cgohlke and scipy:cgohlke then his scipy:cgohlke wheel could depend on numpy:cgohlke and numpy:cgohlke could somehow communicate the fact that it is incompatible with any other scipy distribution. This is one way in which pip/PyPI could facilitate the Python community to solve the binary compatibility problems.
Exactly.
[As an aside I don't know whether Cristoph's Intel license would permit distribution via PYPI.]
Yes, I'd expect Cristoph's packages would likely always have to remain off PyPI (if for no other reason than the fact that he isn't the owner of the packages he's providing distributions for). But if he did provide a PyPI style index, compatibility could be handled manually via `pip install --index-url <Cristoph's repo>`. And even if they are not on PyPI, your custom tag suggestion above would still be beneficial.
Another way would be to allow the community to create compatibility tags so that projects like numpy would have mechanisms to indicate e.g. Fortran ABI compatibility. In this model no one owns a particular tag but projects that depend on one another could simply use them in a consistent way that pip could understand.
The impression I got from Nick's initial post is that, having discovered that the compatibility tags used in the wheel format are insufficient for the needs of the Python community and that it's not possible to enumerate the tags needed, pip/PyPI should just give up on the problem of binary compatibility. I think it would be better to think about simple mechanisms that the authors of the concerned packages could use so that people in the Python community can solve these problems for each of the packages they contribute to. There is enough will out there to make this work for all the big packages and problematic operating systems if only PyPI will allow it.
Agreed - completely giving up on the issue in favour of a separately curated solution seems wrong to me, at least in the sense that it abandons people who don't fit cleanly into either solution (e.g., someone who needs to use virtualenv rather than conda environments, but needs numpy, and a package that Christoph distributes, but conda doesn't - who should that person turn to for help?). I don't see any problem with admitting we can't solve every aspect of the problem automatically, but that doesn't preclude providing extensibility to allow communities to solve the parts of the issue that impact their own area. As a quick sanity check question - what is the long-term advice for Christoph (and others like him)? Continue distributing wininst installers? Move to wheels? Move to conda packages? Do whatever you want, we don't care? We're supposedly pushing pip as "the officially supported solution to package management" - how can that be reconciled with *not* advising builders[1] to produce pip-compatible packages? Paul [1] At least, those who are not specifically affiliated with a group offering a curated solution (like conda or ActiveState).
On 2 Dec 2013 21:57, "Paul Moore" <p.f.moore@gmail.com> wrote:
On 2 December 2013 10:45, Oscar Benjamin <oscar.j.benjamin@gmail.com>
wrote:
Nick's proposal is basically incompatible with allowing Cristoph Gohlke to use pip and wheels. Christoph provides a bewildering array of installers for prebuilt packages that are interchangeable with other builds at the level of Python code but not necessarily at the binary level. So, for example, His scipy is incompatible with the "official" (from SourceForge) Windows numpy build because it links with the non-free Intel MKL library and it needs numpy to link against the same. Installing his scipy over the other numpy results in this:
https://mail.python.org/pipermail//python-list/2013-September/655669.html
Ah, OK. I had not seen this issue as I've always either used Christoph's builds or not used them. I've never tried or needed to mix builds. This is probably because I'm very much only a casual user of the scientific stack, so my needs are pretty simple.
So Christoph can provide wheels and people can manually download them and install from them but would beginners find that any easier than running the .exe installers? The .exe installers are more powerful and can do things like the numpy super-pack that distributes binaries for different levels of SSE support (as discussed previously on this list the wheel format cannot currently achieve this). Beginners will also find .exe installers more intuitive than running pip on the command line and will typically get better error messages etc. than pip provides. So I don't really see why Cristoph should bother switching formats (as noted by Paul before anyone who wants a wheel cache can easily convert his installers into wheels).
The crucial answer here is that exe installers don't recognise virtualenvs. Again, I can imagine that a scientific user would naturally install Python and put all the scientific modules into the system Python - but precisely because I'm a casual user, I want to keep big dependencies like numpy/scipy out of my system Python, and so I use virtualenvs.
The big improvement pip/wheel give over wininst is a consistent user experience, whether installing into the system Python, a virtualenv, or a Python 3.3+ venv. (I used to use wininsts in preference to pip, so please excuse a certain level of the enthusiasm of a convert here :-))
And the conda folks are working on playing nice with virtualenv - I don't we'll see a similar offer from Microsoft for MSI any time soon :)
AFAICT what Nick is saying is that it's not possible for pip and PyPI to guarantee the compatibility of different binaries because unlike apt-get and friends only part of the software stack is controlled. However I think this is not the most relevant difference between pip and apt-get here. The crucial difference is that apt-get communicates with repositories where all code and all binaries are under control of a single organisation. Pip (when used normally) communicates with PyPI and no single organisation controls the content of PyPI. So there's no way for pip/PyPI to guarantee *anything* about the compatibility of the code that they distribute/install, whether the problems are to do with binary compatibility or just compatibility of pure Python code. For pure Python distributions package authors are expected to solve the compatibility problems and pip provides version specifiers etc that they can use to do this. For built distributions they could do the same - except that pip/PyPI don't provide a mechanism for them to do so.
Agreed. Expecting the same level of compatibility guarantees from PyPI as is provided by RPM/apt is unrealistic, in my view. Heck, even pure Python packages don't give any indication as to whether they are Python 3 compatible in some cases (I just hit this today with the binstar package, as an example). This is a fact of life with a repository that doesn't QA uploads.
Exactly, this is the difference between pip and conda - conda is a solution for installing from curated *collections* of packages. It's somewhat related to the tagging system people are speculating about for PyPI, but instead of being purely hypothetical, it already exists. Because it uses hash based dependencies, there's no chance of things getting mixed up. That design has other problems which limit the niche where a tool like conda is the right answer, but within that niche, hash based dependency management helps bring the combinatorial explosion of possible variations under control.
Because PyPI is not a centrally controlled single software stack it needs a different model for ensuring compatibility - one driven by the community. People in the Python community are prepared to spend a considerable amount of time, effort and other resources solving this problem. Consider how much time Cristoph Gohlke must spend maintaining such a large internally consistent set of built packages. He has created a single compatible binary software stack for scientific computation. It's just that PyPI doesn't give him any way to distribute it. If perhaps he could own a tag like "cgohlke" and upload numpy:cgohlke and scipy:cgohlke then his scipy:cgohlke wheel could depend on numpy:cgohlke and numpy:cgohlke could somehow communicate the fact that it is incompatible with any other scipy distribution. This is one way in which pip/PyPI could facilitate the Python community to solve the binary compatibility problems.
Exactly.
[As an aside I don't know whether Cristoph's Intel license would permit distribution via PYPI.]
Yes, I'd expect Cristoph's packages would likely always have to remain off PyPI (if for no other reason than the fact that he isn't the owner of the packages he's providing distributions for). But if he did provide a PyPI style index, compatibility could be handled manually via `pip install --index-url <Cristoph's repo>`. And even if they are not on PyPI, your custom tag suggestion above would still be beneficial.
Yes, I think a "variant" tag would be a useful feature for wheels as well. It's not a substitute for hash identified fully curated stacks, though.
Another way would be to allow the community to create compatibility tags so that projects like numpy would have mechanisms to indicate e.g. Fortran ABI compatibility. In this model no one owns a particular tag but projects that depend on one another could simply use them in a consistent way that pip could understand.
The impression I got from Nick's initial post is that, having discovered that the compatibility tags used in the wheel format are insufficient for the needs of the Python community and that it's not possible to enumerate the tags needed, pip/PyPI should just give up on the problem of binary compatibility. I think it would be better to think about simple mechanisms that the authors of the concerned packages could use so that people in the Python community can solve these problems for each of the packages they contribute to. There is enough will out there to make this work for all the big packages and problematic operating systems if only PyPI will allow it.
Agreed - completely giving up on the issue in favour of a separately curated solution seems wrong to me, at least in the sense that it abandons people who don't fit cleanly into either solution (e.g., someone who needs to use virtualenv rather than conda environments, but needs numpy, and a package that Christoph distributes, but conda doesn't - who should that person turn to for help?). I don't see any problem with admitting we can't solve every aspect of the problem automatically, but that doesn't preclude providing extensibility to allow communities to solve the parts of the issue that impact their own area.
We already have more than enough work to do in the packaging space - why come up with a new solution for publication of curated stacks when the conda folks already have one, and it works cross-platform for the stack with the most complex external dependencies? If static linking or bundling is an option, then wheels can already handle it. If targeting a particular controlled environment (or a user base prepared to get the appropriate bits in place themselves), wheels can also handle shared external binary dependencies. What they can't easily do is share a fully curated stack of software, including external binary dependencies, in a way that works across platforms and isn't trivially easy to screw up by accidentally installing from the wrong place. Just as a while ago I started thinking it made more sense to rehabilitate setuptools than it did to try to replace it any time soon, and we mostly let the TUF folks run with the end-to-end security problem, I now think it makes sense to start working on collaborating with the conda folks on the "distribution of curated software stacks" problem, as that reduces the pressure on the core tools to address that concern. PyPI wheels would then be about publishing "default" versions of components, with the broadest compatibility, while conda would be a solution for getting access to alternate builds that may be faster, but require external shared dependencies.
As a quick sanity check question - what is the long-term advice for Christoph (and others like him)? Continue distributing wininst installers? Move to wheels? Move to conda packages? Do whatever you want, we don't care? We're supposedly pushing pip as "the officially supported solution to package management" - how can that be reconciled with *not* advising builders[1] to produce pip-compatible packages?
What Christoph is doing is producing a cross-platform curated binary software stack, including external dependencies. That's precisely the problem I'm suggesting we *not* try to solve in the core tools any time soon, but instead support bootstrapping conda to solve the problem at a different layer. So the pip compatible builds for those tools would likely miss out on some of the external acceleration features, while curated stacks built with other options could be made available through conda (using hash based dependencies to ensure consistency). Revising the wheel spec to include variant tags, or improve the way platforms are identified is a long way down the todo list, and that's without allowing the subsequent time needed update pip and other tools. By ceding the "distribution of cross-platform curated software stacks with external binary dependencies" problem to conda, users would get a solution to that problem that they can use *now*, rather than in some indefinite future after the metadata 2.0 and end-to-end security changes in the core tools have been resolved. Now, it may be we take a closer look at conda and decide there are critical issues that need to be addressed before it can be recommended rather than just mentioned (e.g. I don't know off the top of my head if the server comms is appropriately secured). This thread was mostly about pointing that there's a thorny subset of the cross-platform software distribution problem that I think we can offload to someone else and have something that mostly works *today*, rather than only getting to it in some speculative future after a bunch of additional currently hand-wavey design work has been completed. Cheers, Nick.
Paul
[1] At least, those who are not specifically affiliated with a group offering a curated solution (like conda or ActiveState).
On 2 December 2013 13:22, Nick Coghlan <ncoghlan@gmail.com> wrote:
As a quick sanity check question - what is the long-term advice for Christoph (and others like him)? Continue distributing wininst installers? Move to wheels? Move to conda packages? Do whatever you want, we don't care? We're supposedly pushing pip as "the officially supported solution to package management" - how can that be reconciled with *not* advising builders[1] to produce pip-compatible packages?
What Christoph is doing is producing a cross-platform curated binary software stack, including external dependencies. That's precisely the problem I'm suggesting we *not* try to solve in the core tools any time soon, but instead support bootstrapping conda to solve the problem at a different layer.
OK. From my perspective, that's *not* what Christoph is doing (I concede that it might be from his perspective, though). As far as I know, the only place where Christoph's builds are incompatible with standard builds is where numpy is involved (where he uses Intel compiler extensions). But what he does *for me* is to provide binary builds of lxml, pyyaml, matplotlib, pyside and a number of other packages that I haven't got the infrastructure set up locally to build. [He also provides apparently-incompatible binary builds of scientific packages like numpy/scipy/pandas, but that's a side-issue and as I get *all* of my scientific packages from him, the incompatibility is not a visible problem for me] If the named projects provided Windows binaries, then there would be no issue with Christoph's stuff. But AFAIK, there is no place I can get binary builds of matplotlib *except* from Christoph. And lxml provides limited sets of binaries - there's no Python 3.3 version, for example. I could continue :-) Oh, and by the way, in what sense do you mean "cross-platform" here? Win32 and Win64? Maybe I'm being narrow minded, but I tend to view "cross platform" as meaning "needs to think about at least two of Unix, Windows and OSX". The *platform* issues on Windows (and OSX, I thought) are solved - it's the ABI issues that we've ignored thus far (successfully till now :-)) But Christoph's site won't go away because of this debate, and as long as I can find wininst, egg or wheel binaries somewhere, I can maintain my own personal wheel index. So I don't really care much, and I'll stop moaning for now. I'll focus my energies on building that personal index instead. Paul
On 2 December 2013 13:54, Paul Moore <p.f.moore@gmail.com> wrote:
If the named projects provided Windows binaries, then there would be no issue with Christoph's stuff. But AFAIK, there is no place I can get binary builds of matplotlib *except* from Christoph. And lxml provides limited sets of binaries - there's no Python 3.3 version, for example. I could continue :-)
The matplotlib folks provide a list of binaries for Windows and OSX hosted by SourceForge: http://matplotlib.org/downloads.html So do numpy and scipy.
Oh, and by the way, in what sense do you mean "cross-platform" here? Win32 and Win64? Maybe I'm being narrow minded, but I tend to view "cross platform" as meaning "needs to think about at least two of Unix, Windows and OSX". The *platform* issues on Windows (and OSX, I thought) are solved - it's the ABI issues that we've ignored thus far (successfully till now :-))
Exactly. A python extension that uses Fortran needs to indicate which of the two Fortran ABIs it uses. Scipy must use the same ABI as the BLAS/LAPACK library that numpy was linked with. This is core compatibility data but there's no way to communicate it to pip. There's no need to actually provide downloadable binaries for both ABIs but there is a need to be able to detect incompatibilities. Basically if 1) There is at least one single consistent set of built wheels for Windows/OSX for any popular set of binary-interdependent packages. 2) A way to automatically detect incompatibilities and to automatically find compatible built wheels. then *a lot* of packaging problems have been solved. Part 1 already exists. There are multiple consistent sets of built installers (not wheels yet) for many hard to build packages. Part 2 requires at least some changes in pip/PyPI. I read somewhere that numpy is the most frequently cited dependency on PyPI. It can be built in multiple binary-incompatible ways. If there is at least a way for the installer to know that it was built in "the standard way" (for Windows/OSX) then there can be a set of binaries built to match that. There's no need for a combinatorial explosion of compatibility tags - just a single set of compatibility tags that has complete binaries (where the definition of complete obviously depends on your field). People who want to build in different incompatible ways can do so themselves, although it would still be nice to get an install time error message when you subsequently try to install something incompatible. For Linux this problem is basically solved as far as beginners are concerned because they can just use apt. Oscar
On 2 December 2013 14:19, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Basically if 1) There is at least one single consistent set of built wheels for Windows/OSX for any popular set of binary-interdependent packages. 2) A way to automatically detect incompatibilities and to automatically find compatible built wheels. then *a lot* of packaging problems have been solved.
Part 1 already exists. There are multiple consistent sets of built installers (not wheels yet) for many hard to build packages. Part 2 requires at least some changes in pip/PyPI.
Precisely. But isn't part 2 at least sort-of solved by users manually pointing at "the right" index? The only files on PyPI are compatible with each other and externally hosted files (thanks for the pointer to the matplotlib binaries, BTW) won't get picked up automatically by pip so users have to set up their own index (possibly converting wininst->wheel) and so can manually manage the compatibility process if they are careful. If people start uploading incompatible binaries to PyPI, I expect a rash of bug reports followed very quickly by people settling down to a community-agreed standard (in fact, that's probably already happened). Incompatible builds will remain on external hosts like Cristoph's. It's not perfect, certainly, but it's no worse than currently. For any sort of better solution to part 2, you need *installed metadata* recording the ABI / shared library details for the installed files. So this is a Metadata 2.0 question, and not a compatibility tag / wheel issue (except that when Metadata 2.0 gets such information, Wheel 2.0 probably needs to be specified to validate against it or something). And on that note, I agree with Nick that we don't want to be going there at the moment, if ever. I just disagree with what I thought he was saying, that we should be so quick to direct people to conda (at some point we could debate why conda rather than ActiveState or Enthought, but tbh I really don't care...) I'd go with something along the lines of: """ Wheels don't attempt to solve the issue of one package depending on another one that has been built with specific options/compilers, or links to specific external libraries. The binaries on PyPI should always be compatible with each other (although nothing checks this, it's simply a matter of community standardisation), but if you use distributions hosted outside of PyPI or build your own, you need to manage such compatibility yourself. Most of the time, outside of specialised areas, it should not be an issue[1]. If you want guaranteed compatibility, you should use a distribution that validates and guarantees compatibility of all hosted files. This might be your platform package manager (apt or RPM) or a bundled Python distribution like Enthought, Conda or Activestate. """ [1] That statement is based on *my* experience. If problems are sufficiently widespread, we can tone it down, but let's not reach the point of FUD. Paul
hash based dependencies
In the conda build guide, the yaml spec files reference dependencies by name/version (and the type of conda environment you're in will determine the rest) http://docs.continuum.io/conda/build.html#specifying-versions-in-requirement... Where does the hash come in? what do you mean? publication of curated stacks when the conda folks already have one,
so, I see the index: http://repo.continuum.io/pkgs/index.html Is they a way to contribute to this index yet? or is that what would need to be worked out. otherwise, I guess the option is you have to build out "recipes" for anything else you need from pypi, right? or is it easier than that?
In the conda build guide, the yaml spec files reference dependencies by name/version (and the type of conda environment you're in will determine the rest)
http://docs.continuum.io/conda/build.html#specifying-versions-in-requirement... Where does the hash come in? what do you mean?
e.g. here's the requirement section from the spec file for their recipe for fabric. https://github.com/ContinuumIO/conda-recipes/blob/master/fabric/meta.yaml#L2... requirements: build: - python - distribute - paramiko run: - python - distribute - paramiko
publication of curated stacks when the conda folks already have one,
so, I see the index: http://repo.continuum.io/pkgs/index.html Is they a way to contribute to this index yet? or is that what would need to be worked out.
probably a dumb question, but would it be possible to convert all the anaconda packages to wheels? even the non-python ones like: qt-4.7.4-0.tar.bz2<http://repo.continuum.io/pkgs/free/linux-64/qt-4.7.4-0.tar.bz2> certainly not the intent of wheels, but just wondering if it could be made to work? but I'm guessing there's pieces in the core anaconda distribution itself, that makes it all work? the point here being to provide a way to use the effort of conda in any kind of "normal" python environment, as long you consistently point at an index that just contains the "conda" wheels.
On 3 Dec 2013 08:17, "Marcus Smith" <qwcode@gmail.com> wrote:
publication of curated stacks when the conda folks already have one,
so, I see the index: http://repo.continuum.io/pkgs/index.html Is they a way to contribute to this index yet? or is that what would
need to be worked out.
probably a dumb question, but would it be possible to convert all the
even the non-python ones like: qt-4.7.4-0.tar.bz2 certainly not the intent of wheels, but just wondering if it could be made to work? but I'm guessing there's pieces in the core anaconda distribution itself,
anaconda packages to wheels? that makes it all work?
the point here being to provide a way to use the effort of conda in any kind of "normal" python environment, as long you consistently point at an index that just contains the "conda" wheels.
I'm not sure about the conda -> wheel direction, but "pip install conda && conda init" mostly works already if you're in a virtualenv that owns its copy of Python (this is also the answer to "why not ActiveState or Enthought" - the Continuum Analytics software distribution stuff is truly open source, and able to be used completely independently of their services). Their docs aren't that great in terms of explaining the *why* of conda - I'm definitely influenced by spending time talking about how it works with Travis and some of the other Continuum Analytics folks at PyCon US and the Austin Python user group. However, their approach to distribution of fully curated stacks seems basically sound, the scientific and data analysis users I know that have tried it have loved it, the devs have expressed a willingness to work on improving their interoperability with the standard tools (and followed through on that at least once by creating the "conda init" command) , and they're actively interested in participating in the broader community (hence the presentation at the packaging mini-summit at PyCon US, as well as assorted presentations at SciPy and PyData conferences). People are already confused about the differences between pip and conda and when they should use each, and unless we start working with the conda devs to cleanly define the different use cases, that's going to remain the case. POSIX users need ready access to a prebuilt scientific stack just as much (or more) than Mac OS X and Windows users (there's a reason "ScientificLinux" is a distribution in its own right) and that space is moving fast enough that the Linux distros (even SL) end up being too slow to update. conda solves that problem, and it solves it in a way that works on Windows as well. On the wheel side of things we haven't even solved the POSIX platform tagging problem yet, and I don't believe we should make users wait until we have figured that out when there's an existing solution to that particular problem that already works. Cheers, Nick.
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
I'm not sure about the conda -> wheel direction, but "pip install conda && conda init" mostly works already if you're in a virtualenv that owns its copy of Python
ok, I just tried conda in a throw-away altinstall of py2.7. I was thinking I would have to "conda create" new isolated environments from there. but there literally is a "conda init" (*not* documented on the website) like you mentioned that get's conda going in the current environment. pip and conda were both working, except that pip didn't know about everything conda had installed like sqllite, which is expected. and I found all the conda metadata which was helpful to look at. I still don't know what you mean by "hash based dependencies". I'm not seeing any requirements being locked by hashes in the metadata? what do you mean?
On Mon, Dec 2, 2013 at 5:22 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
And the conda folks are working on playing nice with virtualenv - I don't we'll see a similar offer from Microsoft for MSI any time soon :)
nice to know...
a single organisation. Pip (when used normally) communicates with PyPI and no single organisation controls the content of PyPI.
can't you point pip to a "wheelhouse'? How is that different?
For built distributions they could do
the same - except that pip/PyPI don't provide a mechanism for them to do so.
I'm still confused as to what conda provides here -- as near as I can tell, conda has a nice hash-based way to ensure binary compatibility -- which is a good thing. But the "curated set of packages" is an independent issue. What's stopping anyone from creating a nice curated set of packages with binary wheels (like the Gohlke repo....) And wouldn't it be better to make wheel a bit more robust in this regard than add yet another recommended tool to the mix?
Exactly, this is the difference between pip and conda - conda is a solution for installing from curated *collections* of packages. It's somewhat related to the tagging system people are speculating about for PyPI, but instead of being purely hypothetical, it already exists.
Does it? I only know of one repository of conda packages -- and it provides poor support for some things (like wxPython -- does it support any desktop GUI on OS-X?) So why do we think that conda is a better option for these unknown curatied repos? Also, I'm not sure I WANT anymore curated repos -- I'd rather a standard set by python.org that individual package maintainers can choose to support. PyPI wheels would then be about publishing "default" versions of
components, with the broadest compatibility, while conda would be a solution for getting access to alternate builds that may be faster, but require external shared dependencies.
I'm still confused as to why packages need to share external dependencies (though I can see why it's nice...) . But what's the new policy here? Anaconda and Canopy exist already? Do we need to endorse them? Why? If you want "PyPI wheels would then be about publishing "default" versions of components, with the broadest compatibility," -- then we still need to improve things a bit, but we can't say "we're done" What Christoph is doing is producing a cross-platform curated binary
software stack, including external dependencies. That's precisely the problem I'm suggesting we *not* try to solve in the core tools any time soon, but instead support bootstrapping conda to solve the problem at a different layer.
So we are advocating that others, like Christoph, create curated stack with conda? Asside from whether conda really provides much more than wheel to support doing this, I think it's a BAD idea to encourage it: I'd much rather encourage package maintainers to build "standard" packages, so we can get some extra interoperabilty. Example: you can't use wxPython with Anocoda (on the Mac, anyway). At least not without figuring out how to build it yourself, an I'm not sure it will even work then. (and it is a fricking nightmare to build). But it's getting harder to find "standard" packages for the mac for the SciPy stack, so people are really stuck. So the pip compatible builds for those tools would likely miss out on some
of the external acceleration features,
that's fine -- but we still need those pip compatible builds .... and the nice thing about pip-compatible builds (really python.orgcompatible builds...) is that they play well with the other binary installers --
By ceding the "distribution of cross-platform curated software stacks with external binary dependencies" problem to conda, users would get a solution to that problem that they can use *now*,
Well, to be fair, I've been starting a project to provide binaries for various packages for OS_X amd did intend to give conda a good look-see, but I really has hoped that wheels where "the way" now...oh well. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Thanks for the robust feedback folks - it's really helping me to clarify what I think, and why I consider this an important topic :) On 3 Dec 2013 10:36, "Chris Barker" <chris.barker@noaa.gov> wrote:
On Mon, Dec 2, 2013 at 5:22 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
And the conda folks are working on playing nice with virtualenv - I
don't we'll see a similar offer from Microsoft for MSI any time soon :)
nice to know...
a single organisation. Pip (when used normally) communicates with
PyPI
and no single organisation controls the content of PyPI.
can't you point pip to a "wheelhouse'? How is that different?
Right, you can do integrated environments with wheels, that's one of the use cases they excel at.
For built distributions they could do the same - except that pip/PyPI don't provide a mechanism for them to do so.
I'm still confused as to what conda provides here -- as near as I can
And wouldn't it be better to make wheel a bit more robust in this regard
Exactly, this is the difference between pip and conda - conda is a solution for installing from curated *collections* of packages. It's somewhat related to the tagging system people are speculating about for PyPI, but instead of being purely hypothetical, it already exists.
Does it? I only know of one repository of conda packages -- and it
tell, conda has a nice hash-based way to ensure binary compatibility -- which is a good thing. But the "curated set of packages" is an independent issue. What's stopping anyone from creating a nice curated set of packages with binary wheels (like the Gohlke repo....) Hmm, has anyone tried running devpi on a PaaS? :) than add yet another recommended tool to the mix? Software that works today is generally more useful to end users than software that might possibly handle their use case at some currently unspecified point in the future :) provides poor support for some things (like wxPython -- does it support any desktop GUI on OS-X?)
So why do we think that conda is a better option for these unknown
curatied repos? Because it already works for the scientific stack, and if we don't provide any explicit messaging around where conda fits into the distribution picture, users are going to remain confused about it for a long time.
Also, I'm not sure I WANT anymore curated repos -- I'd rather a standard set by python.org that individual package maintainers can choose to support.
PyPI wheels would then be about publishing "default" versions of components, with the broadest compatibility, while conda would be a solution for getting access to alternate builds that may be faster, but require external shared dependencies.
I'm still confused as to why packages need to share external dependencies (though I can see why it's nice...) .
But what's the new policy here? Anaconda and Canopy exist already? Do we need to endorse them? Why? If you want "PyPI wheels would then be about
Because they reference shared external data, communicate through shared memory, or otherwise need compatible memory layouts. It's exactly the same reason all C extensions need to be using the same C runtime as CPython on Windows: because things like file descriptors break if they don't. publishing "default" versions of components, with the broadest compatibility," -- then we still need to improve things a bit, but we can't say "we're done" Conda solves a specific problem for the scientific community, but in their enthusiasm, the developers are pitching it as a general purpose packaging solution. It isn't, but in the absence of a clear explanation of its limitations from us, both its developers and other Python users are likely to remain confused about the matter.
What Christoph is doing is producing a cross-platform curated binary
software stack, including external dependencies. That's precisely the problem I'm suggesting we *not* try to solve in the core tools any time soon, but instead support bootstrapping conda to solve the problem at a different layer.
So we are advocating that others, like Christoph, create curated stack
with conda? Asside from whether conda really provides much more than wheel to support doing this, I think it's a BAD idea to encourage it: I'd much rather encourage package maintainers to build "standard" packages, so we can get some extra interoperabilty.
Example: you can't use wxPython with Anocoda (on the Mac, anyway). At
least not without figuring out how to build it yourself, an I'm not sure it will even work then. (and it is a fricking nightmare to build). But it's getting harder to find "standard" packages for the mac for the SciPy stack, so people are really stuck.
So the pip compatible builds for those tools would likely miss out on
some of the external acceleration features,
that's fine -- but we still need those pip compatible builds ....
and the nice thing about pip-compatible builds (really python.orgcompatible builds...) is that they play well with the other binary
installers --
By ceding the "distribution of cross-platform curated software stacks
with external binary dependencies" problem to conda, users would get a solution to that problem that they can use *now*,
Well, to be fair, I've been starting a project to provide binaries for various packages for OS_X amd did intend to give conda a good look-see, but I really has hoped that wheels where "the way" now...oh well.
Wheels *are* the way if one or both of the following conditions hold: - you don't need to deal with build variants - you're building for a specific target environment That covers an awful lot of ground, but there's one thing it definitely doesn't cover: distributing multiple versions of NumPy built with different options and cohesive ecosystems on top of that. Now, there are various ideas for potentially making wheels handle that, from the scientific community all agreeing on a common set of build settings and publishing consistent wheels, to a variant tagging system, to having pip remember the original source of a software distribution, but they're all vapourware at this point, with no concrete plans to change that, and plenty of higher priority problems to deal with. By contrast, conda already exists, and already works, as it was designed *specifically* to handle the scientific Python stack. Unfortunately, the folks making conda don't quite grasp the full breadth of the use cases that pip handles, so their docs do a lousy job of explaining conda's limitations. It meets their needs perfectly though, along with the needs of many other people, so they're correspondingly enthusiastic in wanting to share its benefits with others. This means that one key reason I want to recommend it for the cases where it is a good fit (i.e. the scientific Python stack) is so we can explicitly advise *against* using it in other cases where it will just add complexity without adding value. Saying nothing is not an option, since people are already confused. Saying to never use it isn't an option either, since bootstrapping conda first *is* a substantially simpler cross-platform way to get up to date scientific Python software on to your system. The alternatives are platform specific and (at least in the Linux distro case) slower to get updates. Cheers, Nick.
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 3 December 2013 08:48, Nick Coghlan <ncoghlan@gmail.com> wrote:
And wouldn't it be better to make wheel a bit more robust in this regard than add yet another recommended tool to the mix?
Software that works today is generally more useful to end users than software that might possibly handle their use case at some currently unspecified point in the future :)
See my experience with conda under Windows. While I'm not saying that conda "doesn't work", being directed to software that turns out to have its own set of bugs, different to the ones you're used to, is a pretty frustrating experience. (BTW, I raised a bug report. Let's see what the response is like...) Paul
On 3 December 2013 09:11, Paul Moore <p.f.moore@gmail.com> wrote:
On 3 December 2013 08:48, Nick Coghlan <ncoghlan@gmail.com> wrote:
And wouldn't it be better to make wheel a bit more robust in this regard than add yet another recommended tool to the mix?
Software that works today is generally more useful to end users than software that might possibly handle their use case at some currently unspecified point in the future :)
See my experience with conda under Windows. While I'm not saying that conda "doesn't work", being directed to software that turns out to have its own set of bugs, different to the ones you're used to, is a pretty frustrating experience. (BTW, I raised a bug report. Let's see what the response is like...)
Looks like the conda stack is built around msvcr90, whereas python.org Python 3.3 is built around msvcr100. So conda is not interoperable *at all* with standard python.org Python 3.3 on Windows :-( Paul
On 3 December 2013 19:11, Paul Moore <p.f.moore@gmail.com> wrote:
On 3 December 2013 08:48, Nick Coghlan <ncoghlan@gmail.com> wrote:
And wouldn't it be better to make wheel a bit more robust in this regard than add yet another recommended tool to the mix?
Software that works today is generally more useful to end users than software that might possibly handle their use case at some currently unspecified point in the future :)
See my experience with conda under Windows. While I'm not saying that conda "doesn't work", being directed to software that turns out to have its own set of bugs, different to the ones you're used to, is a pretty frustrating experience.
Yeah, I hit the one where it tries to upgrade the symlinked Python in a virtualenv on POSIX systems and fails: https://github.com/ContinuumIO/conda/issues/360
(BTW, I raised a bug report.
For anyone else that is curious: https://github.com/ContinuumIO/conda/issues/396 In looking for a clear explanation of the runtime compatibility requirements for extensions, I realised that such a thing doesn't appear to exist. And then I realised I wasn't aware of the existence of *any* good overview of C extensions for Python, their benefits, their limitations, alternatives to creating them by hand, and that such a thing might be a good addition to the "Advanced topics" section of the packaging user guide: https://bitbucket.org/pypa/python-packaging-user-guide/issue/36/add-a-sectio...
Let's see what the response is like...)
Since venv in Python 3.4 has a working --copies option, I bashed away at the conda+venv combination a bit more, and filed another couple of conda bugs: Gets shebang lines wrong in a virtual environment: https://github.com/ContinuumIO/conda/issues/397 Doesn't currently support "python -m conda": https://github.com/ContinuumIO/conda/issues/398 Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 3 December 2013 08:48, Nick Coghlan <ncoghlan@gmail.com> wrote:
This means that one key reason I want to recommend it for the cases where it is a good fit (i.e. the scientific Python stack) is so we can explicitly advise *against* using it in other cases where it will just add complexity without adding value.
Saying nothing is not an option, since people are already confused. Saying to never use it isn't an option either, since bootstrapping conda first *is* a substantially simpler cross-platform way to get up to date scientific Python software on to your system. The alternatives are platform specific and (at least in the Linux distro case) slower to get updates.
But you're not saying "use conda for the scientific Python stack". You're saying to use it "when you have binary external dependencies" which is a phrase that I (and I suspect many Windows users) don't really understand and will take to mean "C extensions, or at least ones that interface to another library, sich as pyyaml, lxml, ...) Also, this presumes an either/or situation. What about someone who just wants to use matplotlib to display a graph of some business data? Is matplotlib part of "the scientific stack"? Should I use conda *just* to get matplotlib in an otherwise wheel-based application? Or how about a scientist that wants wxPython (to use Chris' example)? Apparently the conda repo doesn't include wxPython, so do they need to learn how to install pip into a conda environment? (Note that there's no wxPython wheel, so this isn't a good example yet, but I'd hope it will be in due course...) Reducing confusion is good, I'm all for that. But we need to have a clear picture of what we're saying before we can state it clearly... Paul
On 3 December 2013 19:22, Paul Moore <p.f.moore@gmail.com> wrote:
On 3 December 2013 08:48, Nick Coghlan <ncoghlan@gmail.com> wrote:
This means that one key reason I want to recommend it for the cases where it is a good fit (i.e. the scientific Python stack) is so we can explicitly advise *against* using it in other cases where it will just add complexity without adding value.
Saying nothing is not an option, since people are already confused. Saying to never use it isn't an option either, since bootstrapping conda first *is* a substantially simpler cross-platform way to get up to date scientific Python software on to your system. The alternatives are platform specific and (at least in the Linux distro case) slower to get updates.
But you're not saying "use conda for the scientific Python stack". You're saying to use it "when you have binary external dependencies" which is a phrase that I (and I suspect many Windows users) don't really understand and will take to mean "C extensions, or at least ones that interface to another library, sich as pyyaml, lxml, ...)
That's not what I meant though - I only mean the case where there's a binary dependency that's completely outside the Python ecosystem and can't be linked or bundled because it needs to be shared between multiple components on the Python side. However, there haven't been any compelling examples presented other than the C runtime (which wheel needs to handle as part of the platform tag and/or the ABI tag) and the scientific stack, so I agree limiting the recommendation to the scientific stack is a reasonable approach. Only folks that actually understand the difference between static and dynamic linking and wrapper modules vs self-contained accelerator modules are likely to understand what "shared external binary dependency" means, so I agree it's not a useful phrase to use in a recommendation aimed at folks that aren't already experienced developers. If Windows and Mac OS X users have alternatives they strongly favour over conda that are virtualenv compatible, then sure, we can consider those as well, but I'm not aware of any (as the "virtualenv compatible" bit rules out anything based on platform installers).
Also, this presumes an either/or situation. What about someone who just wants to use matplotlib to display a graph of some business data? Is matplotlib part of "the scientific stack"? Should I use conda *just* to get matplotlib in an otherwise wheel-based application?
Ultimately, it depends on if matplotlib is coupled to the NumPy build options or not. However, I think the more practical recommendation would be to say: - if there's no wheel - and you can't build it from source yourself - then you can try "pip install conda && conda init && conda install <pkg>" as a fallback option. And then we encourage the conda devs to follow the installation database standard properly (if they aren't already), so things installed with conda play nice with things installed with pip. It sounds like we also need to get them to ensure they're using the right compiler/C runtime on Windows so their packages are interoperable with the standard python.org installers.
Or how about a scientist that wants wxPython (to use Chris' example)? Apparently the conda repo doesn't include wxPython, so do they need to learn how to install pip into a conda environment? (Note that there's no wxPython wheel, so this isn't a good example yet, but I'd hope it will be in due course...)
No, it's the other way around - for cases where wheels aren't yet available, but conda provides it, then we should try to ensure that "pip install conda && conda init && conda install <package>" does the right thing (including conda upgrading previously pip installed packages when necessary, as well as bailing out gracefully when it needs to). At the moment, we're getting people trying to use conda as the base, and stuff falling apart at a later stage, since conda isn't structured properly to handle use cases other than the scientific one where simplicity and repeatabilitly for people that aren't primarily developers trumps platform integration and easier handling of security updates.
Reducing confusion is good, I'm all for that. But we need to have a clear picture of what we're saying before we can state it clearly...
Agreed, that's a large part of why I started this thread. It's definitely clarified several points for me. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 3 December 2013 20:19, Nick Coghlan <ncoghlan@gmail.com> wrote:
Only folks that actually understand the difference between static and dynamic linking and wrapper modules vs self-contained accelerator modules are likely to understand what "shared external binary dependency" means, so I agree it's not a useful phrase to use in a recommendation aimed at folks that aren't already experienced developers.
"... aren't already experienced C/C++/etc developers". There are lots of higher level languages (including Python itself) that people can be an experienced in and still have never had the "pleasure" of learning the ins and outs of dynamic linking and binary ABIs. Foundations made of sand - it isn't surprising that software sometimes fails, it's a miracle that it ever works at all :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 3 December 2013 10:19, Nick Coghlan <ncoghlan@gmail.com> wrote:
Or how about a scientist that wants wxPython (to use Chris' example)? Apparently the conda repo doesn't include wxPython, so do they need to learn how to install pip into a conda environment? (Note that there's no wxPython wheel, so this isn't a good example yet, but I'd hope it will be in due course...)
No, it's the other way around - for cases where wheels aren't yet available, but conda provides it, then we should try to ensure that "pip install conda && conda init && conda install <package>" does the right thing (including conda upgrading previously pip installed packages when necessary, as well as bailing out gracefully when it needs to).
Perhaps it would help if there were wheels for conda and its dependencies. "pycosat" (whatever that is) breaks when I pip install conda: $ pip install conda Downloading/unpacking pycosat (from conda) Downloading pycosat-0.6.0.tar.gz (58kB): 58kB downloaded Running setup.py egg_info for package pycosat Downloading/unpacking pyyaml (from conda) Downloading PyYAML-3.10.tar.gz (241kB): 241kB downloaded Running setup.py egg_info for package pyyaml Installing collected packages: pycosat, pyyaml Running setup.py install for pycosat building 'pycosat' extension q:\tools\MinGW\bin\gcc.exe -mdll -O -Wall -Iq:\tools\Python27\include -IQ:\venv\PC -c pycosat.c -o build\temp.win32-2.7\Release\pycosat.o In file included from pycosat.c:18:0: picosat.c: In function 'picosat_stats': picosat.c:8179:4: warning: unknown conversion type character 'l' in format [-Wformat] picosat.c:8179:4: warning: too many arguments for format [-Wformat-extra-args] picosat.c:8180:4: warning: unknown conversion type character 'l' in format [-Wformat] picosat.c:8180:4: warning: too many arguments for format [-Wformat-extra-args] In file included from pycosat.c:18:0: picosat.c: At top level: picosat.c:8210:26: fatal error: sys/resource.h: No such file or directory compilation terminated. error: command 'gcc' failed with exit status 1 Complete output from command Q:\venv\Scripts\python.exe -c "import setuptools;__file__='Q:\\venv\\build\\pycosat\\setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record c:\docume~1\enojb\locals~1\temp\pip-lobu76-record\install-record.txt --single-version-externally-managed --install-headers Q:\venv\include\site\python2.7: running install running build running build_py creating build creating build\lib.win32-2.7 copying test_pycosat.py -> build\lib.win32-2.7 running build_ext building 'pycosat' extension creating build\temp.win32-2.7 creating build\temp.win32-2.7\Release q:\tools\MinGW\bin\gcc.exe -mdll -O -Wall -Iq:\tools\Python27\include -IQ:\venv\PC -c pycosat.c -o build\temp.win32-2.7\Release\pycosat.o In file included from pycosat.c:18:0: picosat.c: In function 'picosat_stats': picosat.c:8179:4: warning: unknown conversion type character 'l' in format [-Wformat] picosat.c:8179:4: warning: too many arguments for format [-Wformat-extra-args] picosat.c:8180:4: warning: unknown conversion type character 'l' in format [-Wformat] picosat.c:8180:4: warning: too many arguments for format [-Wformat-extra-args] In file included from pycosat.c:18:0: picosat.c: At top level: picosat.c:8210:26: fatal error: sys/resource.h: No such file or directory compilation terminated. error: command 'gcc' failed with exit status 1 ---------------------------------------- Cleaning up... Command Q:\venv\Scripts\python.exe -c "import setuptools;__file__='Q:\\venv\\build\\pycosat\\setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record c:\docume~1\enojb\locals~1\temp\pip-lobu76-record\install-record.txt --single-version-externally-managed --install-headers Q:\venv\include\site\python2.7 failed with error code 1 in Q:\venv\build\pycosat Storing complete log in c:/Documents and Settings/enojb\pip\pip.log Oscar
On 3 December 2013 10:36, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Perhaps it would help if there were wheels for conda and its dependencies.
That may well be a good idea. One thing pip does is go to great lengths to *not* have any dependencies (by vendoring everything it needs, and relying only on pure Python code). It looks like the conda devs haven't (yet? ;-)) found the need to do that. So a suitable set of wheels would go a long way to improving the bootstrap experience. Having to have MSVC (or gcc, I guess, if they can get your build issues fixed) if you want to bootstrap conda is a pretty significant roadblock... Paul
El 03/12/2013 10:22, Paul Moore escribió:
On 3 December 2013 08:48, Nick Coghlan <ncoghlan@gmail.com> wrote:
This means that one key reason I want to recommend it for the cases where it is a good fit (i.e. the scientific Python stack) is so we can explicitly advise *against* using it in other cases where it will just add complexity without adding value.
Saying nothing is not an option, since people are already confused. Saying to never use it isn't an option either, since bootstrapping conda first *is* a substantially simpler cross-platform way to get up to date scientific Python software on to your system. The alternatives are platform specific and (at least in the Linux distro case) slower to get updates. But you're not saying "use conda for the scientific Python stack". You're saying to use it "when you have binary external dependencies" which is a phrase that I (and I suspect many Windows users) don't really understand and will take to mean "C extensions, or at least ones that interface to another library, sich as pyyaml, lxml, ...)
Also, this presumes an either/or situation. What about someone who just wants to use matplotlib to display a graph of some business data? Is matplotlib part of "the scientific stack"? Should I use conda *just* to get matplotlib in an otherwise wheel-based application? Or how about a scientist that wants wxPython (to use Chris' example)? Apparently the conda repo doesn't include wxPython, so do they need to learn how to install pip into a conda environment? (Note that there's no wxPython wheel, so this isn't a good example yet, but I'd hope it will be in due course...)
Reducing confusion is good, I'm all for that. But we need to have a clear picture of what we're saying before we can state it clearly...
Paul _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig A first and non-native try to get a clearer wording for this:
"Some collections of python packages may have further compatibility needs than those expressed by the current set of platform tags used in wheels. That is the case of the Python scientific stack, where interoperability depends on the choice of a shared binary data format that is decided at build time. This problem can be solved by packagers' consensus on a common choice of compatibility options or by using curated indices. Also, package managers like conda do additional checks to ensure a coherent set of Python and non-Python packages and may offer at this time a better user experience for package collections with such complex dependencies." Regards, -- Pachi
On Tue, Dec 3, 2013 at 12:48 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Because it already works for the scientific stack, and if we don't provide any explicit messaging around where conda fits into the distribution picture, users are going to remain confused about it for a long time.
Do we have to have explicit messaging for every useful third-party package out there?
I'm still confused as to why packages need to share external dependencies (though I can see why it's nice...) .
Because they reference shared external data, communicate through shared memory, or otherwise need compatible memory layouts. It's exactly the same reason all C extensions need to be using the same C runtime as CPython on Windows: because things like file descriptors break if they don't.
OK -- maybe we need a better term than shared external dependencies -- that makes me think shared library. Also even the scipy stack is not as dependent in build env as we seem to thin it is -- I don't think there is any reason you can't use the "standard" MPL with Golke's MKL-build numpy, for instance. And I"m pretty sure that even scipy and numpy don't need to share their build environment more than any other extension (i.e. they could use different BLAS implementations, etc... numpy version matters, but that's handled by the usual dependency handling. The reason Gohke's repo, and Anoconda and Canopy all exist is because it's a pain to build some of this stuff, period, not complex compatibly issues -- and the real pain goes beyond the standard scipy stack (VTK is a killer!)
Conda solves a specific problem for the scientific community,
well, we are getting Anaconda, the distribution, and conda, the package manager, conflated here: Having a nice full distribution of all the packages you are likely to need to great, but you could so that with wheels, and Gohlke is already doing it with MSIs (which don't handle dependencies at all -- whic is a problem).
but in their enthusiasm, the developers are pitching it as a general purpose packaging solution. It isn't,
It's not? Aside from momentum, and all that, could it not be a replacement for pip and wheel?
Wheels *are* the way if one or both of the following conditions hold:
- you don't need to deal with build variants - you're building for a specific target environment
That covers an awful lot of ground, but there's one thing it definitely doesn't cover: distributing multiple versions of NumPy built with different options and cohesive ecosystems on top of that.
hmm -- I'm not sure, you could have an Anoconda-like repo built with wheels, could you not? granted, it would be easier to make a mistake, and pull wheels from two different wheelhouses that are incompatible, so there is a real advantage to conda there.
By contrast, conda already exists, and already works, as it was designed *specifically* to handle the scientific Python stack.
I'm not sure we how well it works -- it works for Anoconda, and good point about the scientifc stack -- does it work equally well for other stacks? or mixing and matching?
This means that one key reason I want to recommend it for the cases where it is a good fit (i.e. the scientific Python stack) is so we can explicitly advise *against* using it in other cases where it will just add complexity without adding value.
I'm actually pretty concerned about this: lately the scipy community has defined a core "scipy stack": http://www.scipy.org/stackspec.html Along with this is a push to encourage users to just go with a scipy distribution to get that "stack": http://www.scipy.org/install.html and http://ipython.org/install.html I think this is in response to a years of pain of each package trying to build binaries for various platforms, and keeping it all in sync, etc. I feel their pain, and "just go with Anaconda or Canopy" is good advise for folks who want to get the "stack" up and running as easily as possible. But it does not server everyone else well -- web developers that need MPL for some plotting , scientific users that need a desktop GUI toolkit, pyhton newbies that want iPython, but none of that other stuff... What would serve all those folks well is a "standard build" of packages -- i.e. built to go with the python.org builds, that can be downloaded with: pip install the_package. And I think, with binary wheels, we have the tools to do that.
Saying nothing is not an option, since people are already confused. Saying to never use it isn't an option either, since bootstrapping conda first *is* a substantially simpler cross-platform way to get up to date scientific Python software on to your system.
again, it is Anoconda that helps here, not conda itself. Or
how about a scientist that wants wxPython (to use Chris' example)? Apparently the conda repo doesn't include wxPython, so do they need to learn how to install pip into a conda environment? (Note that there's no wxPython wheel, so this isn't a good example yet, but I'd hope it will be in due course...)
Actually if only it were as simple as "install pip', but as you point out, there is no wxPython binary wheel, but if there were, it would be compatible with the pyton.org python, and maybe not Anoconda (would conda catch that?) Looks like the conda stack is built around msvcr90, whereas python.org
Python 3.3 is built around msvcr100. So conda is not interoperable *at all* with standard python.org Python 3.3 on Windows :-(
again, Anaconda the distribution, is not, but I assume conda, the package manager, is. And IIUC, then conda would catch that incompatibly if you tried to install incompatible packages. That's the whole point, yes? And this would help the recent concerns from the stackless folks about building a pyton binary for Windows with a newer MSVC (see pyton-dev) - if there's no wheel
- and you can't build it from source yourself - then you can try "pip install conda && conda init && conda install <pkg>" as a fallback option. And then we encourage the conda devs to follow the installation database standard properly (if they aren't already), so things installed with conda play nice with things installed with pip. It sounds like we also need to get them to ensure they're using the right compiler/C runtime on Windows so their packages are interoperable with the standard python.org installers.
maybe we should just have conda talk to PyPi? As it stands, one of the POINTS of Anoconda is that it ISN'T the standard pyhton.org installer! But really, this just puts us back in the state that we want to avoid -- a bunch of binary-incompatible builds out there to get confused by -- again thouhg, at least conda apparently wont let you install binary incompatible packages... However, there haven't been any compelling examples presented other
than the C runtime (which wheel needs to handle as part of the platform tag and/or the ABI tag) and the scientific stack,
Again, I'm pretty sure it doesn't even apply to the scientific stack in any special way... At the moment, we're getting people trying to use conda as the base,
and stuff falling apart at a later stage,
Still confused: conda the package manager, or Anaconda the distribution? well, except that the anaconda index covers non-python projects like "qt",
which a private wheel index wouldn't cover (at least with the normal intended use of wheels)
umm, why not? you couldn't have a pySide wheel??? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
well, except that the anaconda index covers non-python projects like "qt",
which a private wheel index wouldn't cover (at least with the normal intended use of wheels)
umm, why not? you couldn't have a pySide wheel???
just saying that the anaconda index literally has packages for "qt" itself, the c++ library. http://repo.continuum.io/pkgs/free/linux-64/qt-4.8.5-0.tar.bz2 and it's pyside packages require that. my understanding is that you could build a pyside wheel that was statically linked to qt. as to whether a wheel could just package "qt". that's what I don't know, and if it could, the wheel spec doesn't cover that use case.
-Chris
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Tue, Dec 3, 2013 at 3:50 PM, Marcus Smith <qwcode@gmail.com> wrote:
umm, why not? you couldn't have a pySide wheel???
just saying that the anaconda index literally has packages for "qt" itself, the c++ library. http://repo.continuum.io/pkgs/free/linux-64/qt-4.8.5-0.tar.bz2
and it's pyside packages require that.
my understanding is that you could build a pyside wheel that was statically linked to qt.
which is how it's usually done for stuff like that -- or a lot of shared
That appears to be how Anaconda does things -- creates a conda package that hols just the shared libs, then another one that holds the python packages that use those libs. (I think there is a freetype one, for instance) It's a nice way to let multiple packages share the same dynamic libs, while still keeping them part of the packaging and dependency system. But I don't see why you couldn't do the same thing with wheels... (and I'm not sure there's a point if there is nothing else that uses the same libs...) libs are bundles in.
as to whether a wheel could just package "qt". that's what I don't know, and if it could, the wheel spec doesn't cover that use case.
you'd have to make up a "fake" python package -- but it wouldn't have to have anything in it (Or not much...) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On 3 December 2013 22:18, Chris Barker <chris.barker@noaa.gov> wrote:
On Tue, Dec 3, 2013 at 12:48 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Because it already works for the scientific stack, and if we don't provide any explicit messaging around where conda fits into the distribution picture, users are going to remain confused about it for a long time.
Do we have to have explicit messaging for every useful third-party package out there?
I'm still confused as to why packages need to share external dependencies (though I can see why it's nice...) .
Because they reference shared external data, communicate through shared memory, or otherwise need compatible memory layouts. It's exactly the same reason all C extensions need to be using the same C runtime as CPython on Windows: because things like file descriptors break if they don't.
OK -- maybe we need a better term than shared external dependencies -- that makes me think shared library. Also even the scipy stack is not as dependent in build env as we seem to thin it is -- I don't think there is any reason you can't use the "standard" MPL with Golke's MKL-build numpy, for instance. And I"m pretty sure that even scipy and numpy don't need to share their build environment more than any other extension (i.e. they could use different BLAS implementations, etc... numpy version matters, but that's handled by the usual dependency handling.
Sorry, I was being vague earlier. The BLAS information is not important but the Fortran ABI it exposes is: http://docs.scipy.org/doc/numpy/user/install.html#fortran-abi-mismatch MPL - matplotlib for those unfamiliar with the acronym - depends on the numpy C API/ABI but not the Fortran ABI. So it would be incompatible with, say, a pure Python implementation of numpy (or with numpypy) but it should work fine with any of the numpy binaries currently out there. (Numpy's C ABI has been unchanged from version 1.0 to 1.7 precisely because changing it has been too too painful to contemplate).
The reason Gohke's repo, and Anoconda and Canopy all exist is because it's a pain to build some of this stuff, period, not complex compatibly issues -- and the real pain goes beyond the standard scipy stack (VTK is a killer!)
I agree that the binary compatibility issues are not as complex as some are making out but it is a fact that his binaries are sometimes binary-incompatible with other builds. I have seen examples of it going wrong and he gives a clear warning at the top of his downloads page: http://www.lfd.uci.edu/~gohlke/pythonlibs/
but in their enthusiasm, the developers are pitching it as a general purpose packaging solution. It isn't,
It's not? Aside from momentum, and all that, could it not be a replacement for pip and wheel?
Conda/binstar could indeed be a replacement for pip and wheel and PyPI. It currently lacks many packages but less so than PyPI if you're mainly interested in binaries. For me pip+PyPI is a non-starter (as a complete solution) if I can't install numpy and matplotlib.
By contrast, conda already exists, and already works, as it was designed *specifically* to handle the scientific Python stack.
I'm not sure we how well it works -- it works for Anoconda, and good point about the scientifc stack -- does it work equally well for other stacks? or mixing and matching?
I don't even know how well it works for the "scientific stack". It didn't work for me! But I definitely know that pip+PyPI doesn't yet work for me and working around that has caused me a lot more pain then it would be to diagnose and fix the problem I had with conda. They might even accept a one line, no-brainer pull request for my fix in less then 3 months :) https://github.com/pypa/pip/pull/1187
This means that one key reason I want to recommend it for the cases where it is a good fit (i.e. the scientific Python stack) is so we can explicitly advise *against* using it in other cases where it will just add complexity without adding value.
I'm actually pretty concerned about this: lately the scipy community has defined a core "scipy stack":
http://www.scipy.org/stackspec.html
Along with this is a push to encourage users to just go with a scipy distribution to get that "stack":
http://www.scipy.org/install.html
and
http://ipython.org/install.html
I think this is in response to a years of pain of each package trying to build binaries for various platforms, and keeping it all in sync, etc. I feel their pain, and "just go with Anaconda or Canopy" is good advise for folks who want to get the "stack" up and running as easily as possible.
The scientific Python community are rightfully worried about potential users losing interest in Python because these installation problems occur for every noob who wants to use Python. In scientific usage Python just isn't fully installed yet until numpy/scipy/matplotlib etc. is. It makes perfect sense to try and get people introduced to Python for scientific use in a way that minimises (delays?) their encounter with the packaging problems in the Python ecosystem.
But it does not server everyone else well -- web developers that need MPL for some plotting , scientific users that need a desktop GUI toolkit, pyhton newbies that want iPython, but none of that other stuff...
What would serve all those folks well is a "standard build" of packages -- i.e. built to go with the python.org builds, that can be downloaded with:
pip install the_package.
And I think, with binary wheels, we have the tools to do that.
Yes but there will never (rather not any time soon) be a single universally compatible binary configuration even for say Windows 32 bit. I think that the binary wheels do need more compatibility information. You mentioned C runtimes, also Fortran ABIs. But I don't think the solution is for a PEP to enumerate the possibilities - although it might be worth having an official stance on C runtimes for Windows and for POSIX. I think the solution is to have community-extensible compatibility information. [snip]
maybe we should just have conda talk to PyPi?
As it stands, one of the POINTS of Anoconda is that it ISN'T the standard pyhton.org installer!
Actually I think conda does (or will soon) just invoke pip under the hood for packages that aren't in the binstar/Anaconda channels but are on PyPI. Oscar
On 3 December 2013 22:18, Chris Barker <chris.barker@noaa.gov> wrote:
Looks like the conda stack is built around msvcr90, whereas python.org Python 3.3 is built around msvcr100. So conda is not interoperable *at all* with standard python.org Python 3.3 on Windows :-(
again, Anaconda the distribution, is not, but I assume conda, the package manager, is. And IIUC, then conda would catch that incompatibly if you tried to install incompatible packages. That's the whole point, yes? And this would help the recent concerns from the stackless folks about building a pyton binary for Windows with a newer MSVC (see pyton-dev)
conda the installer only looks in the Anaconda repos (at the moment, and by default - you can add your own conda-format repos if you have any). So no, this *is* a problem with conda, not just Anaconda. And no, it doesn't catch the incompatibility, which says something about the robustness of their compatibility checking solution, I guess... Paul
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/01/2013 06:38 PM, Paul Moore wrote:
I understand that things are different in the Unix world, but to be blunt why should Windows users care?
You're kidding, right? 90% or more of the reason for wheels in the first place is because Windows users can't build their own software from source. The amount of effort put in by non-Windows package owners to support them dwarfs whatever is bothering you here. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlKcjTgACgkQ+gerLs4ltQ7fQQCg0Pfd5tp3vvEsJnJ0aNLNeIXH bVwAn2av6wxVMXEqe4jIQLL+2W4oqQ9G =foOx -----END PGP SIGNATURE-----
On 2 December 2013 13:38, Tres Seaver <tseaver@palladion.com> wrote:
On 12/01/2013 06:38 PM, Paul Moore wrote:
I understand that things are different in the Unix world, but to be blunt why should Windows users care?
You're kidding, right? 90% or more of the reason for wheels in the first place is because Windows users can't build their own software from source. The amount of effort put in by non-Windows package owners to support them dwarfs whatever is bothering you here.
My point is that most of the complex binary compatibility problems seem to be Unix-related, and as you imply, Unix users don't seem to have much interest in using wheels except for local caching. So why build that complexity into the spec if the main users (Windows, and Unix users who won't ever publish wheels outside their own systems) don't have a need for it? Let's just stick with something simple that has limitations but works (practicality beats purity). My original bdist_simple proposal was a pure-Windows replacement for wininst. Daniel developed that into wheels which cater for non-Windows systems (I believe, precisely because he had an interest in the local cache use case). We're now seeing the complexities of the Unix world affect the design of wheels, and it's turning out to be a hard problem. All I'm trying to say is let's not give up on binary wheels for Windows, just because we have unsolved issues on Unix. Whether solving the Unix issues is worth it is the Unix users' call - I'll help solve the issues, if they choose to, but I won't support abandoning the existing Windows solution just because it can't be extended to cater for Unix as well. I'm immensely grateful for the amount of work projects which are developed on Unix (and 3rd parties like Cristoph) put into supporting Windows. Far from dismissing that, I want to avoid making things any harder than they already are for such people - current wheels are no more complex to distribute than wininst installers, and I want to keep the impact on non-Windows projects at that level. If I come across as ungrateful, I apologise. Paul
On 3 Dec 2013 00:19, "Paul Moore" <p.f.moore@gmail.com> wrote:
On 2 December 2013 13:38, Tres Seaver <tseaver@palladion.com> wrote:
On 12/01/2013 06:38 PM, Paul Moore wrote:
I understand that things are different in the Unix world, but to be blunt why should Windows users care?
You're kidding, right? 90% or more of the reason for wheels in the
first
place is because Windows users can't build their own software from source. The amount of effort put in by non-Windows package owners to support them dwarfs whatever is bothering you here.
My point is that most of the complex binary compatibility problems seem to be Unix-related, and as you imply, Unix users don't seem to have much interest in using wheels except for local caching. So why build that complexity into the spec if the main users (Windows, and Unix users who won't ever publish wheels outside their own systems) don't have a need for it? Let's just stick with something simple that has limitations but works (practicality beats purity). My original bdist_simple proposal was a pure-Windows replacement for wininst. Daniel developed that into wheels which cater for non-Windows systems (I believe, precisely because he had an interest in the local cache use case). We're now seeing the complexities of the Unix world affect the design of wheels, and it's turning out to be a hard problem. All I'm trying to say is let's not give up on binary wheels for Windows, just because we have unsolved issues on Unix.
Huh? This is *exactly* what I am saying we should do - wheels *already* work so long as they're self-contained. They *don't* work (automatically) when they have an external dependency: users have to obtain the external dependency by other means, and ensure that everything is properly configured to find it, and that everything is compatible with the retrieved version. You're right that Christoph is doing two different things, though, so our advice to him (or anyone that wanted to provide the cross-platform equivalent of his current Windows-only stack) would be split: - for all self-contained installers, also publish a wheel file on a custom index server (although having a "builder" role on PyPI where project owners can grant someone permission to upload binaries after the sdist is published could be interesting) - for those installers which actually form an integrated stack with shared external binary dependencies, use the mechanisms provided by conda rather than getting users to manage the external dependencies by hand (as licensing permits, anyway)
Whether solving the Unix issues is worth it is the Unix users' call - I'll help solve the issues, if they choose to, but I won't support abandoning the existing Windows solution just because it can't be extended to cater for Unix as well.
You appear to still be misunderstanding my proposal, as we're actually in violent agreement. All that extra complexity you're worrying about is precisely what I'm saying we should *leave out* of the wheel spec. In most cases of accelerator and wrapper modules, the static linking and/or bundling solutions will work fine, and that's the domain I believe we should *deliberately* restrict wheels to, so we don't get distracted trying to solve an incredibly hard external dependency management problem that we don't actually need to solve at the wheel level, since anyone that actually needs it solved can just bootstrap conda instead.
I'm immensely grateful for the amount of work projects which are developed on Unix (and 3rd parties like Cristoph) put into supporting Windows. Far from dismissing that, I want to avoid making things any harder than they already are for such people - current wheels are no more complex to distribute than wininst installers, and I want to keep the impact on non-Windows projects at that level. If I come across as ungrateful, I apologise.
The only problem I want to explicitly declare out of scope for wheel files is the one the wininst installers can't handle cleanly either: the subset of Christoph's installers which need a shared external binary dependency, and any other components in a similar situation. Using wheels or native Windows installers can get you in trouble in that case, since you may accidentally set up conflicts in your environment. The solution is curation of a software stack built around that external dependency (or dependencies), backed up by a packaging system that prevents conflicts within a given local installation. The mainstream Linux distros approach this problem by mapping everything to platform-specific packages and trying to get parallel installation working cleanly (a part of the problem I plan to work on improving post Python 3.4), but that approach doesn't scale well and is one of the factors responsible for the notorious time lags between software being released on PyPI and it being available in the Linux system package managers (streamlining that conversion is one of my main goals for metadata 2.0). The conda folks approach it differently by using hashes to lock in *exact* versions of dependencies. From a compatibility management perspective, it's functionally equivalent to bundling a complete stack into a single monolithic installer, but has the benefit of still making it easy to only install the pieces that you actually need locally. Regards, Nick.
Paul _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 2 December 2013 22:26, Nick Coghlan <ncoghlan@gmail.com> wrote:
Whether solving the Unix issues is worth it is the Unix users' call - I'll help solve the issues, if they choose to, but I won't support abandoning the existing Windows solution just because it can't be extended to cater for Unix as well.
You appear to still be misunderstanding my proposal, as we're actually in violent agreement. All that extra complexity you're worrying about is precisely what I'm saying we should *leave out* of the wheel spec. In most cases of accelerator and wrapper modules, the static linking and/or bundling solutions will work fine, and that's the domain I believe we should *deliberately* restrict wheels to, so we don't get distracted trying to solve an incredibly hard external dependency management problem that we don't actually need to solve at the wheel level, since anyone that actually needs it solved can just bootstrap conda instead.
OK. I think I've finally seen what you're suggesting, and yes, it's essentially the same as I'd like to see (at least for now). I'd hoped that wheels could be more useful for Unix users than seems likely now - mainly because I really do think that a lot of the benefits of binary distributions are *not* restricted to Windows, and if Unix users could use them, it'd lessen the tendency to think that supporting anything other than source installs was purely "to cater for Windows users not having a compiler" :-) But if that's not a practical possibility (and I defer to the Unix users' opinions on that matter) then so be it. On the other hand, I still don't see where the emphasis on conda in your original message came from. There are lots of "full stack" solutions available - I'd have thought system packages like RPM and apt are the obvious first suggestion for users that need a curated stack. If they are not appropriate, then there are Enthought, ActiveState and Anaconda/conda that I know of. Why single out conda to be blessed? Also, I'd like the proposal to explicitly point out that 99% of the time, Windows is the simple case (because static linking and bundling DLLs is common). Getting Windows users to switch to wheels will be enough change to ask, without confusing the message. A key point here is that packages like lxml, matplotlib, or Pillow would have "arbitrary binary dependency issues" on Unix, but (because of static linking/bundling) be entirely appropriate for wheels on Windows. Let's make sure the developers don't miss this point! Paul
On 3 Dec 2013 09:03, "Paul Moore" <p.f.moore@gmail.com> wrote:
On 2 December 2013 22:26, Nick Coghlan <ncoghlan@gmail.com> wrote:
Whether solving the Unix issues is worth it is the Unix users' call - I'll help solve the issues, if they choose to, but I won't support abandoning the existing Windows solution just because it can't be extended to cater for Unix as well.
You appear to still be misunderstanding my proposal, as we're actually
in
violent agreement. All that extra complexity you're worrying about is precisely what I'm saying we should *leave out* of the wheel spec. In most cases of accelerator and wrapper modules, the static linking and/or bundling solutions will work fine, and that's the domain I believe we should *deliberately* restrict wheels to, so we don't get distracted trying to solve an incredibly hard external dependency management problem that we don't actually need to solve at the wheel level, since anyone that actually needs it solved can just bootstrap conda instead.
OK. I think I've finally seen what you're suggesting, and yes, it's essentially the same as I'd like to see (at least for now). I'd hoped that wheels could be more useful for Unix users than seems likely now - mainly because I really do think that a lot of the benefits of binary distributions are *not* restricted to Windows, and if Unix users could use them, it'd lessen the tendency to think that supporting anything other than source installs was purely "to cater for Windows users not having a compiler" :-) But if that's not a practical possibility (and I defer to the Unix users' opinions on that matter) then so be it.
On the other hand, I still don't see where the emphasis on conda in your original message came from. There are lots of "full stack" solutions available - I'd have thought system packages like RPM and apt are the obvious first suggestion for users that need a curated stack. If they are not appropriate, then there are Enthought, ActiveState and Anaconda/conda that I know of. Why single out conda to be blessed?
Also, I'd like the proposal to explicitly point out that 99% of the time, Windows is the simple case (because static linking and bundling DLLs is common). Getting Windows users to switch to wheels will be enough change to ask, without confusing the message. A key point here is that packages like lxml, matplotlib, or Pillow would have "arbitrary binary dependency issues" on Unix, but (because of static linking/bundling) be entirely appropriate for wheels on Windows. Let's make sure the developers don't miss this point!
Once we solve the platform tagging problem, wheels will also work on any POSIX system for the simple cases of accelerator and wrapper modules. Long term the only persistent problem is with software stacks that need consistent build settings and offer lots of build options. That applies to Windows as well - the SSE build variants of NumPy were one of the original cases brought up as not being covered by the wheel compatibility tag format. Near term, platform independent stacks *also* serve as a workaround for the POSIX platform tagging issues and the fact there isn't yet a "default" build configuration for the scientific stack. As for "Why conda?": - open source - cross platform - can be installed with pip - gets new releases of Python components faster than Linux distributions - uses Continuum Analytics services by default, but can be configured to use custom servers - created by the creator of NumPy For ActiveState and Enthought, as far as I am aware, their package managers are closed source and tied fairly closely to their business model, while the Linux distros are not only platform specific, but have spotty coverage of PyPI packages, and even those which are covered, often aren't reliably kept up to date (although I hope metadata 2.0 will help improve that situation by streamlining the conversion to policy compliant system packages). Cheers, Nick.
Paul
had to resort to Google to try to figure out what dev libraries I needed.
But that's a *build* issue, surely? How does that relate to installing Nikola from a set of binary wheels?
Exactly -- I've mostly dealt with this for OS-X -- there are a cadre of users that want binaries, and want them to "just work" -- we've had mpkg
Side note about naming: I'm no expert, but I'm pretty sure "Anoconda" is a python distribution -- python itself and set of pre-build packages. "conda" is the package manager that is used by Anoconda -- kind of like rpm is used by RedHat -- conda is an open-source project, and thus could be used by any of us completely apart from the Anoconda distribution. On Sun, Dec 1, 2013 at 3:38 PM, Paul Moore <p.f.moore@gmail.com> wrote: packages for a good while, analogous to Windows installers. Binary eggs never worked quite right, 'cause setuptools didn't understand "universal" binaries -- but it wasn't that far from working. Not really tested much yet, but it ;looks like binary wheels should be just fine. The concern there is that someone will be running, say, a homebrew-built python, and accidentally install a binary wheel built for the python.org python -- we should address that with better platform tags (and making sure pip at least give a warning if you try to install an incompatible wheel...) So what problem are we trying to solve here? 1) It's still a pain to actually build the packages -- similarly to Windows, you really need to build the dependent libraries statically and link them in - and you need to make sure that you build them with teh right sdk, and universally -- this is hard to do right. - does Conda help you do any of that??? 2) non-python binary dependencies: As it turns out, a number of python packages depend on the same third-party non-python dependencies: I have quite a few that use libpng, libfreetype, libhdf, ??? currently if you want to distribute binary python packages, you need to statically link or supply the dlls, so we end up with multiple coples of the same lib -- is this a problem? Maybe not -- memory is pretty cheap these days, and maybe different packages actually rely on different versions of the dependencies -- this way, at least the package builder controls that. Anoconda (the distribution seems to address this by having conda packages that are essentially containers for the shared libs, and other packages that need those libs depend on them. I like this method, but it seems to me to be more a feature of the Anoconda distribution than the conda package manager -- in fact, I've been thinking of doing this exact same thing with binary wheels -- I haven't tried it yet, but don't see why it wouldn't work. I understand you are thinking about non-Python libraries, but all I
can say is that this has *never* been an issue to my knowledge in the Windows world.
yes, it's a HUGE issue in the Windows world -- in fact such a huge issue that almost non one ever tries to build things themselves, or build a different python distro -- so, in fact, when someone does make a binary, it's pretty likely to work. But those binaries are a major pain to build! (by the way, over on python-dev there has been a recent discussion about stackless building a new python2.7 windows binary with a newer MS compiler -- which will then create exacty these issues...)
Outside the scientific space, crypto libraries are also notoriously hard to
build, as are game engines and GUI toolkits. (I guess database bindings could also be a problem in some cases)
Build issues again...
Yes, major ones. (another side note: you can't get wxPython for OS-X to work with Anoconda -- there is no conda binary package, and python itself is not built in a way that it can access the window manager ... so no, this stuff in NOT suddenly easier with conda.) Again, can we please be clear here? On Windows, there is no issue that
I am aware of. Wheels solve the binary distribution issue fine in that environment
They will if/when we make sure that the wheel contains meta-data about what compiler (really run-time version) was used for the python build and wheel build -- but we should, indeed, do that.
trade-off in helping the conda folks improve the usability of their
This is why I suspect there will be a better near term effort/reward platform
than there is in trying to expand the wheel format to cover arbitrary binary dependencies.
and have yet anoto=her way to do it? AARRG! I'm also absolutely unclear on what conda offers that isn't quite easy to address with binary wheels. And it seems to need help too, so it will play better with virtualenv.... If conda really is a better solution, then I suppose we could go deprecate wheel before it gets too much "traction"...;-) But let's please not another one to the mix to confuse people. Excuse me if I'm feeling a bit negative towards this announcement.
I've spent many months working on, and promoting, the wheel + pip solution, to the point where it is now part of Python 3.4.
And I was really lookingn forward to it as a solution for OS-X too.... I'm hoping I've misunderstood here. Please clarify. Preferably with
specifics for Windows (as "conda is a known stable platform" simply isn't true for me...) - I accept you're not a Windows user, so a pointer to already-existing documentation is fine (I couldn't find any myself).
I appreciate the Windows viewpoint -- and I think we really do want one solution for all. The linux distros already do all this let them keep doing it.... -Chris PS: a number of scipy-related packages have been promoting Anoconda and Canopy as a way to get their package without dealing with building (rather than say, providing binary wheels). That works great for the SciPy Stack, but has NOT worked well for others. My example: I'm teaching an Intro to Python class -- I really like iPython, so have been using it for demos and recommend it to Students. The IPython web site makes it look like you really need to go get Anaconda or Canopy if you want iPython -- so a number of my students went and did that. All well. Until I did the extra class on wxPython, and now I've got a bunch of students that have no way to install it. And they mostly had no idea that they were running a different Python at all... Anyway -- this is a mess, particularly on OS-X (now we have python binaries from: Apple, python.org, fink, macports, homebrew, Anaconda, Canopy, ???) So yes, we need a solution, but I think binary wheels are a pretty good one, and I'm not sure what conda buys us... -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
I think Wheels are the way forward for Python dependencies. Perhaps not for things like fortran. I hope that the scientific community can start publishing wheels at least in addition too. I don't believe that Conda will gain the mindshare that pip has outside of the scientific community so I hope we don't end up with two systems that can't interoperate.
On Dec 2, 2013, at 7:00 PM, Chris Barker <chris.barker@noaa.gov> wrote:
Side note about naming:
I'm no expert, but I'm pretty sure "Anoconda" is a python distribution -- python itself and set of pre-build packages. "conda" is the package manager that is used by Anoconda -- kind of like rpm is used by RedHat -- conda is an open-source project, and thus could be used by any of us completely apart from the Anoconda distribution.
On Sun, Dec 1, 2013 at 3:38 PM, Paul Moore <p.f.moore@gmail.com> wrote:
had to resort to Google to try to figure out what dev libraries I needed.
But that's a *build* issue, surely? How does that relate to installing Nikola from a set of binary wheels? Exactly -- I've mostly dealt with this for OS-X -- there are a cadre of users that want binaries, and want them to "just work" -- we've had mpkg packages for a good while, analogous to Windows installers. Binary eggs never worked quite right, 'cause setuptools didn't understand "universal" binaries -- but it wasn't that far from working. Not really tested much yet, but it ;looks like binary wheels should be just fine. The concern there is that someone will be running, say, a homebrew-built python, and accidentally install a binary wheel built for the python.org python -- we should address that with better platform tags (and making sure pip at least give a warning if you try to install an incompatible wheel...)
So what problem are we trying to solve here?
1) It's still a pain to actually build the packages -- similarly to Windows, you really need to build the dependent libraries statically and link them in - and you need to make sure that you build them with teh right sdk, and universally -- this is hard to do right. - does Conda help you do any of that???
2) non-python binary dependencies: As it turns out, a number of python packages depend on the same third-party non-python dependencies: I have quite a few that use libpng, libfreetype, libhdf, ??? currently if you want to distribute binary python packages, you need to statically link or supply the dlls, so we end up with multiple coples of the same lib -- is this a problem? Maybe not -- memory is pretty cheap these days, and maybe different packages actually rely on different versions of the dependencies -- this way, at least the package builder controls that.
Anoconda (the distribution seems to address this by having conda packages that are essentially containers for the shared libs, and other packages that need those libs depend on them. I like this method, but it seems to me to be more a feature of the Anoconda distribution than the conda package manager -- in fact, I've been thinking of doing this exact same thing with binary wheels -- I haven't tried it yet, but don't see why it wouldn't work.
I understand you are thinking about non-Python libraries, but all I can say is that this has *never* been an issue to my knowledge in the Windows world.
yes, it's a HUGE issue in the Windows world -- in fact such a huge issue that almost non one ever tries to build things themselves, or build a different python distro -- so, in fact, when someone does make a binary, it's pretty likely to work. But those binaries are a major pain to build!
(by the way, over on python-dev there has been a recent discussion about stackless building a new python2.7 windows binary with a newer MS compiler -- which will then create exacty these issues...)
Outside the scientific space, crypto libraries are also notoriously hard to build, as are game engines and GUI toolkits. (I guess database bindings could also be a problem in some cases)
Build issues again...
Yes, major ones.
(another side note: you can't get wxPython for OS-X to work with Anoconda -- there is no conda binary package, and python itself is not built in a way that it can access the window manager ... so no, this stuff in NOT suddenly easier with conda.)
Again, can we please be clear here? On Windows, there is no issue that I am aware of. Wheels solve the binary distribution issue fine in that environment
They will if/when we make sure that the wheel contains meta-data about what compiler (really run-time version) was used for the python build and wheel build -- but we should, indeed, do that.
This is why I suspect there will be a better near term effort/reward trade-off in helping the conda folks improve the usability of their platform than there is in trying to expand the wheel format to cover arbitrary binary dependencies.
and have yet anoto=her way to do it? AARRG! I'm also absolutely unclear on what conda offers that isn't quite easy to address with binary wheels. And it seems to need help too, so it will play better with virtualenv....
If conda really is a better solution, then I suppose we could go deprecate wheel before it gets too much "traction"...;-) But let's please not another one to the mix to confuse people.
Excuse me if I'm feeling a bit negative towards this announcement. I've spent many months working on, and promoting, the wheel + pip solution, to the point where it is now part of Python 3.4.
And I was really lookingn forward to it as a solution for OS-X too....
I'm hoping I've misunderstood here. Please clarify. Preferably with specifics for Windows (as "conda is a known stable platform" simply isn't true for me...) - I accept you're not a Windows user, so a pointer to already-existing documentation is fine (I couldn't find any myself).
I appreciate the Windows viewpoint -- and I think we really do want one solution for all.
The linux distros already do all this let them keep doing it....
-Chris
PS: a number of scipy-related packages have been promoting Anoconda and Canopy as a way to get their package without dealing with building (rather than say, providing binary wheels). That works great for the SciPy Stack, but has NOT worked well for others. My example:
I'm teaching an Intro to Python class -- I really like iPython, so have been using it for demos and recommend it to Students. The IPython web site makes it look like you really need to go get Anaconda or Canopy if you want iPython -- so a number of my students went and did that. All well. Until I did the extra class on wxPython, and now I've got a bunch of students that have no way to install it. And they mostly had no idea that they were running a different Python at all...
Anyway -- this is a mess, particularly on OS-X (now we have python binaries from: Apple, python.org, fink, macports, homebrew, Anaconda, Canopy, ???) So yes, we need a solution, but I think binary wheels are a pretty good one, and I'm not sure what conda buys us...
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Anoconda (the distribution seems to address this by having conda packages that are essentially containers for the shared libs, and other packages that need those libs depend on them. I like this method, but it seems to me to be more a feature of the Anoconda distribution than the conda package manager -- in fact, I've been thinking of doing this exact same thing with binary wheels -- I haven't tried it yet, but don't see why it wouldn't work.
3 or 4 us now have mentioned curiosity in converting anaconda packages to wheels (with specific interest in the non-python lib dependencies as wheels). Anyone who tries this, please post your success or lack thereof. I'm pretty curious.
The IPython web site makes it look like you really need to go get Anaconda or Canopy if you want iPython
interesting...
On 3 December 2013 21:34, Marcus Smith <qwcode@gmail.com> wrote:
Anoconda (the distribution seems to address this by having conda packages that are essentially containers for the shared libs, and other packages that need those libs depend on them. I like this method, but it seems to me to be more a feature of the Anoconda distribution than the conda package manager -- in fact, I've been thinking of doing this exact same thing with binary wheels -- I haven't tried it yet, but don't see why it wouldn't work.
3 or 4 us now have mentioned curiosity in converting anaconda packages to wheels (with specific interest in the non-python lib dependencies as wheels). Anyone who tries this, please post your success or lack thereof. I'm pretty curious.
I couldn't find a spec for the conda format files. If it's documented somewhere I'd be happy to try writing a converter. But it'd be useless for Python 3.3 on Windows because the conda binaries are built against the wrong version of the C runtime. Might be interesting on other platforms, though. Paul
On 3 December 2013 21:34, Marcus Smith <qwcode@gmail.com> wrote:
The IPython web site makes it look like you really need to go get Anaconda or Canopy if you want iPython
interesting...
You really don't. I've been running IPython on Windows from wheels for ages. And you can actually build from source easily if you don't want the Qt console - the only component that includes C code is pyzmq and that builds out of the box if you have a C compiler. Paul.
On Tue, Dec 3, 2013 at 4:13 PM, Donald Stufft <donald@stufft.io> wrote:
I think Wheels are the way forward for Python dependencies. Perhaps not for things like fortran. I hope that the scientific community can start publishing wheels at least in addition too.
I don't believe that Conda will gain the mindshare that pip has outside of the scientific community so I hope we don't end up with two systems that can't interoperate.
On Dec 2, 2013, at 7:00 PM, Chris Barker <chris.barker@noaa.gov> wrote:
Side note about naming:
I'm no expert, but I'm pretty sure "Anoconda" is a python distribution -- python itself and set of pre-build packages. "conda" is the package manager that is used by Anoconda -- kind of like rpm is used by RedHat -- conda is an open-source project, and thus could be used by any of us completely apart from the Anoconda distribution.
On Sun, Dec 1, 2013 at 3:38 PM, Paul Moore <p.f.moore@gmail.com> wrote:
had to resort to Google to try to figure out what dev libraries I needed.
But that's a *build* issue, surely? How does that relate to installing Nikola from a set of binary wheels?
Exactly -- I've mostly dealt with this for OS-X -- there are a cadre of users that want binaries, and want them to "just work" -- we've had mpkg packages for a good while, analogous to Windows installers. Binary eggs never worked quite right, 'cause setuptools didn't understand "universal" binaries -- but it wasn't that far from working. Not really tested much yet, but it ;looks like binary wheels should be just fine. The concern there is that someone will be running, say, a homebrew-built python, and accidentally install a binary wheel built for the python.org python -- we should address that with better platform tags (and making sure pip at least give a warning if you try to install an incompatible wheel...)
We are at least as worried about the homebrew user uploading a popular package as a binary wheel, and having it fail to work for the more common? non-homebrew user.
So what problem are we trying to solve here?
1) It's still a pain to actually build the packages -- similarly to Windows, you really need to build the dependent libraries statically and link them in - and you need to make sure that you build them with teh right sdk, and universally -- this is hard to do right. - does Conda help you do any of that???
2) non-python binary dependencies: As it turns out, a number of python packages depend on the same third-party non-python dependencies: I have quite a few that use libpng, libfreetype, libhdf, ??? currently if you want to distribute binary python packages, you need to statically link or supply the dlls, so we end up with multiple coples of the same lib -- is this a problem? Maybe not -- memory is pretty cheap these days, and maybe different packages actually rely on different versions of the dependencies -- this way, at least the package builder controls that.
Anoconda (the distribution seems to address this by having conda packages that are essentially containers for the shared libs, and other packages that need those libs depend on them. I like this method, but it seems to me to be more a feature of the Anoconda distribution than the conda package manager -- in fact, I've been thinking of doing this exact same thing with binary wheels -- I haven't tried it yet, but don't see why it wouldn't work.
I understand you are thinking about non-Python libraries, but all I can say is that this has *never* been an issue to my knowledge in the Windows world.
yes, it's a HUGE issue in the Windows world -- in fact such a huge issue that almost non one ever tries to build things themselves, or build a different python distro -- so, in fact, when someone does make a binary, it's pretty likely to work. But those binaries are a major pain to build!
(by the way, over on python-dev there has been a recent discussion about stackless building a new python2.7 windows binary with a newer MS compiler -- which will then create exacty these issues...)
Outside the scientific space, crypto libraries are also notoriously hard to build, as are game engines and GUI toolkits. (I guess database bindings could also be a problem in some cases)
Build issues again...
Yes, major ones.
(another side note: you can't get wxPython for OS-X to work with Anoconda -- there is no conda binary package, and python itself is not built in a way that it can access the window manager ... so no, this stuff in NOT suddenly easier with conda.)
Again, can we please be clear here? On Windows, there is no issue that I am aware of. Wheels solve the binary distribution issue fine in that environment
They will if/when we make sure that the wheel contains meta-data about what compiler (really run-time version) was used for the python build and wheel build -- but we should, indeed, do that.
This is why I suspect there will be a better near term effort/reward trade-off in helping the conda folks improve the usability of their platform than there is in trying to expand the wheel format to cover arbitrary binary dependencies.
and have yet anoto=her way to do it? AARRG! I'm also absolutely unclear on what conda offers that isn't quite easy to address with binary wheels. And it seems to need help too, so it will play better with virtualenv....
If conda really is a better solution, then I suppose we could go deprecate wheel before it gets too much "traction"...;-) But let's please not another one to the mix to confuse people.
Excuse me if I'm feeling a bit negative towards this announcement. I've spent many months working on, and promoting, the wheel + pip solution, to the point where it is now part of Python 3.4.
And I was really lookingn forward to it as a solution for OS-X too....
I'm hoping I've misunderstood here. Please clarify. Preferably with specifics for Windows (as "conda is a known stable platform" simply isn't true for me...) - I accept you're not a Windows user, so a pointer to already-existing documentation is fine (I couldn't find any myself).
I appreciate the Windows viewpoint -- and I think we really do want one solution for all.
The linux distros already do all this let them keep doing it....
-Chris
PS: a number of scipy-related packages have been promoting Anoconda and Canopy as a way to get their package without dealing with building (rather than say, providing binary wheels). That works great for the SciPy Stack, but has NOT worked well for others. My example:
I'm teaching an Intro to Python class -- I really like iPython, so have been using it for demos and recommend it to Students. The IPython web site makes it look like you really need to go get Anaconda or Canopy if you want iPython -- so a number of my students went and did that. All well. Until I did the extra class on wxPython, and now I've got a bunch of students that have no way to install it. And they mostly had no idea that they were running a different Python at all...
Anyway -- this is a mess, particularly on OS-X (now we have python binaries from: Apple, python.org, fink, macports, homebrew, Anaconda, Canopy, ???) So yes, we need a solution, but I think binary wheels are a pretty good one, and I'm not sure what conda buys us...
The most striking difference may be that conda also installs and manages Python itself. For example, conda create -n py33 python=3.3 will download and install Python 3.3 into a new environment named py33. This is completely different than pip which tends to run inside the same Python environment that it's installing into. Wheels were only designed to hold Python libraries to be installed into site-packages, with package names registered on pypi, and wheels were designed to allow the parts of the distribution to be relocated at install time as in putting scripts into Scripts\ or bin/ depending on the platform. If you wanted to interoperate you would probably install pip with "conda install pip" and then play with some wheels, or you could probably create a virtualenv from your conda environment and proceed. I would not recommend trying to package the CPython interpreter as a wheel. It wasn't designed for that. If you try let us know what happened :-) It's interesting that there is so much talk about compatibility with numpy-built-with-different-options. When I was thinking about the problem of binary compatibility I was more concerned about dynamically linking with different versions of system libraries (glibc) than the ones that the pre-built [wheel] package expects. In summary conda is very different than pip+virtualenv.
Filed https://github.com/ContinuumIO/conda-recipes/issues/42 :( On Dec 3, 2013, at 4:48 PM, Donald Stufft <donald@stufft.io> wrote:
On Dec 3, 2013, at 4:46 PM, Daniel Holth <dholth@gmail.com> wrote:
In summary conda is very different than pip+virtualenv.
Conda is a cross platform Homebrew.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
The most striking difference may be that conda also installs and manages Python itself. For example, conda create -n py33 python=3.3 will download and install Python 3.3 into a new environment named py33. This is completely different than pip which tends to run inside the same Python environment that it's installing into.
we've been talking about (and I've tried) "conda init" , not "conda create". that sure seems to setup conda in your *current* python. I had pip (the one that installed conda) and conda working in the same environment.
"conda init" isn't in the website docs. On Tue, Dec 3, 2013 at 2:00 PM, Marcus Smith <qwcode@gmail.com> wrote:
The most striking difference may be that conda also installs and manages Python itself. For example, conda create -n py33 python=3.3 will download and install Python 3.3 into a new environment named py33. This is completely different than pip which tends to run inside the same Python environment that it's installing into.
we've been talking about (and I've tried) "conda init" , not "conda create". that sure seems to setup conda in your *current* python. I had pip (the one that installed conda) and conda working in the same environment.
On 3 December 2013 21:13, Donald Stufft <donald@stufft.io> wrote:
I think Wheels are the way forward for Python dependencies. Perhaps not for things like fortran. I hope that the scientific community can start publishing wheels at least in addition too.
The Fortran issue is not that complicated. Very few packages are affected by it. It can easily be fixed with some kind of compatibility tag that can be used by the small number of affected packages.
I don't believe that Conda will gain the mindshare that pip has outside of the scientific community so I hope we don't end up with two systems that can't interoperate.
Maybe conda won't gain mindshare outside the scientific community but wheel really needs to gain mindshare *within* the scientific community. The root of all this is numpy. It is the biggest dependency on PyPI, is hard to build well, and has the Fortran ABI issue. It is used by very many people who wouldn't consider themselves part of the "scientific community". For example matplotlib depends on it. The PyPy devs have decided that it's so crucial to the success of PyPy that numpy's basically being rewritten in their stdlib (along with the C API). A few times I've seen Paul Moore refer to numpy as the "litmus test" for wheels. I actually think that it's more important than that. If wheels are going to fly then there *needs* to be wheels for numpy. As long as there isn't a wheel for numpy then there will be lots of people looking for a non-pip/PyPI solution to their needs. One way of getting the scientific community more on board here would be to offer them some tangible advantages. So rather than saying "oh well scientific use is a special case so they should just use conda or something", the message should be "the wheel system provides solutions to many long-standing problems and is even better than conda in (at least) some ways because it cleanly solves the Fortran ABI issue for example". Oscar
On Dec 3, 2013, at 7:36 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
On 3 December 2013 21:13, Donald Stufft <donald@stufft.io> wrote:
I think Wheels are the way forward for Python dependencies. Perhaps not for things like fortran. I hope that the scientific community can start publishing wheels at least in addition too.
The Fortran issue is not that complicated. Very few packages are affected by it. It can easily be fixed with some kind of compatibility tag that can be used by the small number of affected packages.
I don't believe that Conda will gain the mindshare that pip has outside of the scientific community so I hope we don't end up with two systems that can't interoperate.
Maybe conda won't gain mindshare outside the scientific community but wheel really needs to gain mindshare *within* the scientific community. The root of all this is numpy. It is the biggest dependency on PyPI, is hard to build well, and has the Fortran ABI issue. It is used by very many people who wouldn't consider themselves part of the "scientific community". For example matplotlib depends on it. The PyPy devs have decided that it's so crucial to the success of PyPy that numpy's basically being rewritten in their stdlib (along with the C API).
A few times I've seen Paul Moore refer to numpy as the "litmus test" for wheels. I actually think that it's more important than that. If wheels are going to fly then there *needs* to be wheels for numpy. As long as there isn't a wheel for numpy then there will be lots of people looking for a non-pip/PyPI solution to their needs.
One way of getting the scientific community more on board here would be to offer them some tangible advantages. So rather than saying "oh well scientific use is a special case so they should just use conda or something", the message should be "the wheel system provides solutions to many long-standing problems and is even better than conda in (at least) some ways because it cleanly solves the Fortran ABI issue for example".
Oscar
I’d love to get Wheels to the point they are more suitable then they are for SciPy stuff, I’m not sure what the diff between the current state and what they need to be are but if someone spells it out (I’ve only just skimmed your last email so perhaps it’s contained in that!) I’ll do the arguing for it. I just need someone who actually knows what’s needed to advise me :) ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Wed, Dec 4, 2013 at 1:54 AM, Donald Stufft <donald@stufft.io> wrote:
On Dec 3, 2013, at 7:36 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
On 3 December 2013 21:13, Donald Stufft <donald@stufft.io> wrote:
I think Wheels are the way forward for Python dependencies. Perhaps not for things like fortran. I hope that the scientific community can start publishing wheels at least in addition too.
The Fortran issue is not that complicated. Very few packages are affected by it. It can easily be fixed with some kind of compatibility tag that can be used by the small number of affected packages.
I don't believe that Conda will gain the mindshare that pip has outside of the scientific community so I hope we don't end up with two systems that can't interoperate.
Maybe conda won't gain mindshare outside the scientific community but wheel really needs to gain mindshare *within* the scientific community. The root of all this is numpy. It is the biggest dependency on PyPI, is hard to build well, and has the Fortran ABI issue. It is used by very many people who wouldn't consider themselves part of the "scientific community". For example matplotlib depends on it. The PyPy devs have decided that it's so crucial to the success of PyPy that numpy's basically being rewritten in their stdlib (along with the C API).
A few times I've seen Paul Moore refer to numpy as the "litmus test" for wheels. I actually think that it's more important than that. If wheels are going to fly then there *needs* to be wheels for numpy. As long as there isn't a wheel for numpy then there will be lots of people looking for a non-pip/PyPI solution to their needs.
One way of getting the scientific community more on board here would be to offer them some tangible advantages. So rather than saying "oh well scientific use is a special case so they should just use conda or something", the message should be "the wheel system provides solutions to many long-standing problems and is even better than conda in (at least) some ways because it cleanly solves the Fortran ABI issue for example".
Oscar
I’d love to get Wheels to the point they are more suitable then they are for SciPy stuff,
That would indeed be a good step forward. I'm interested to try to help get to that point for Numpy and Scipy. I’m not sure what the diff between the current state and what
they need to be are but if someone spells it out (I’ve only just skimmed your last email so perhaps it’s contained in that!) I’ll do the arguing for it. I just need someone who actually knows what’s needed to advise me :)
To start with, the SSE stuff. Numpy and scipy are distributed as "superpack" installers for Windows containing three full builds: no SSE, SSE2 and SSE3. Plus a script that runs at install time to check which version to use. These are built with ``paver bdist_superpack``, see https://github.com/numpy/numpy/blob/master/pavement.py#L224. The NSIS and CPU selector scripts are under tools/win32build/. How do I package those three builds into wheels and get the right one installed by ``pip install numpy``? If this is too difficult at the moment, an easier (but much less important one) would be to get the result of ``paver bdist_wininst_simple`` as a wheel. For now I think it's OK that the wheels would just target 32-bit Windows and python.org compatible Pythons (given that that's all we currently distribute). Once that works we can look at OS X and 64-bit Windows. Ralf
On 4 December 2013 07:40, Ralf Gommers <ralf.gommers@gmail.com> wrote:
I’m not sure what the diff between the current state and what they need to be are but if someone spells it out (I’ve only just skimmed your last email so perhaps it’s contained in that!) I’ll do the arguing for it. I just need someone who actually knows what’s needed to advise me :)
To start with, the SSE stuff. Numpy and scipy are distributed as "superpack" installers for Windows containing three full builds: no SSE, SSE2 and SSE3. Plus a script that runs at install time to check which version to use. These are built with ``paver bdist_superpack``, see https://github.com/numpy/numpy/blob/master/pavement.py#L224. The NSIS and CPU selector scripts are under tools/win32build/.
How do I package those three builds into wheels and get the right one installed by ``pip install numpy``?
I think that needs a compatibility tag. Certainly it isn't immediately soluble now. Could you confirm how the correct one of the 3 builds is selected (i.e., what the code is to detect which one is appropriate)? I could look into what options we have here.
If this is too difficult at the moment, an easier (but much less important one) would be to get the result of ``paver bdist_wininst_simple`` as a wheel.
That I will certainly look into. Simple answer is "wheel convert <wininst>". But maybe it would be worth adding a "paver bdist_wheel" command. That should be doable in the same wahy setuptools added a bdist_wheel command.
For now I think it's OK that the wheels would just target 32-bit Windows and python.org compatible Pythons (given that that's all we currently distribute). Once that works we can look at OS X and 64-bit Windows.
Ignoring the SSE issue, I believe that simply wheel converting Christoph Gohlke's repository gives you that right now. The only issues there are (1) the MKL license limitation, (2) hosting, and (3) whether Christoph would be OK with doing this (he goes to lengths on his site to prevent spidering his installers). I genuinely believe that "a schientific stack for non-scientists" is trivially solved in this way. For scientists, of course, we'd need to look deeper, but having a base to start from would be great. Paul
On 4 December 2013 08:13, Paul Moore <p.f.moore@gmail.com> wrote:
If this is too difficult at the moment, an easier (but much less important one) would be to get the result of ``paver bdist_wininst_simple`` as a wheel.
That I will certainly look into. Simple answer is "wheel convert <wininst>". But maybe it would be worth adding a "paver bdist_wheel" command. That should be doable in the same wahy setuptools added a bdist_wheel command.
Actually, I just installed paver and wheel into a virtualenv, converted a trivial project to use paver, and ran "paver bdist_wheel" and it worked out of the box. I don't know if there could be problems with more complex projects, but if you hit any issues, flag them up and I'll take a look. Paul
On Wed, Dec 4, 2013 at 9:13 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 4 December 2013 07:40, Ralf Gommers <ralf.gommers@gmail.com> wrote:
I’m not sure what the diff between the current state and what they need to be are but if someone spells it out (I’ve only just skimmed your last email so perhaps it’s contained in that!) I’ll do the arguing for it. I just need someone who actually knows what’s needed to advise me :)
To start with, the SSE stuff. Numpy and scipy are distributed as "superpack" installers for Windows containing three full builds: no SSE, SSE2 and SSE3. Plus a script that runs at install time to check which version to use. These are built with ``paver bdist_superpack``, see https://github.com/numpy/numpy/blob/master/pavement.py#L224. The NSIS and CPU selector scripts are under tools/win32build/.
How do I package those three builds into wheels and get the right one installed by ``pip install numpy``?
I think that needs a compatibility tag. Certainly it isn't immediately soluble now.
Could you confirm how the correct one of the 3 builds is selected (i.e., what the code is to detect which one is appropriate)? I could look into what options we have here.
The stuff under tools/win32build I mentioned above. Specifically: https://github.com/numpy/numpy/blob/master/tools/win32build/cpuid/cpuid.c
If this is too difficult at the moment, an easier (but much less important one) would be to get the result of ``paver bdist_wininst_simple`` as a wheel.
That I will certainly look into. Simple answer is "wheel convert <wininst>". But maybe it would be worth adding a "paver bdist_wheel" command. That should be doable in the same wahy setuptools added a bdist_wheel command.
For now I think it's OK that the wheels would just target 32-bit Windows and python.org compatible Pythons (given that that's all we currently distribute). Once that works we can look at OS X and 64-bit Windows.
Ignoring the SSE issue, I believe that simply wheel converting Christoph Gohlke's repository gives you that right now. The only issues there are (1) the MKL license limitation, (2) hosting, and (3) whether Christoph would be OK with doing this (he goes to lengths on his site to prevent spidering his installers).
Besides the issues you mention, the problem is that it creates a single point of failure. I really appreciate everything Christoph does, but it's not appropriate as the default way to provide binary releases for a large number of projects. There needs to be a reproducible way that the devs of each project can build wheels - this includes the right metadata, but ideally also a good way to reproduce the whole build environment including compilers, blas/lapack implementations, dependencies etc. The latter part is probably out of scope for this list, but is discussed right now on the numfocus list.
I genuinely believe that "a schientific stack for non-scientists" is trivially solved in this way.
That would be nice, but no. The only thing you'd have achieved is to take a curated stack of .exe installers and converted it to the same stack of wheels. Which is nice and a step forward, but doesn't change much in the bigger picture. The problem is certainly nontrivial. Ralf
For scientists, of course, we'd need to look deeper, but having a base to start from would be great.
Paul
On 4 December 2013 21:13, Ralf Gommers <ralf.gommers@gmail.com> wrote:
Besides the issues you mention, the problem is that it creates a single point of failure. I really appreciate everything Christoph does, but it's not appropriate as the default way to provide binary releases for a large number of projects. There needs to be a reproducible way that the devs of each project can build wheels - this includes the right metadata, but ideally also a good way to reproduce the whole build environment including compilers, blas/lapack implementations, dependencies etc. The latter part is probably out of scope for this list, but is discussed right now on the numfocus list.
You're right - what I said ignored the genuine work being done by the rest of the scientific community to solve the real issues involved. I apologise, that wasn't at all fair. Paul
On Wed, Dec 4, 2013 at 10:59 PM, Paul Moore <p.f.moore@gmail.com> wrote:
Besides the issues you mention, the problem is that it creates a single point of failure. I really appreciate everything Christoph does, but it's not appropriate as the default way to provide binary releases for a large number of projects. There needs to be a reproducible way that the devs of each project can build wheels - this includes the right metadata, but ideally also a good way to reproduce the whole build environment including compilers, blas/lapack implementations, dependencies etc. The latter
On 4 December 2013 21:13, Ralf Gommers <ralf.gommers@gmail.com> wrote: part is
probably out of scope for this list, but is discussed right now on the numfocus list.
You're right - what I said ignored the genuine work being done by the rest of the scientific community to solve the real issues involved. I apologise, that wasn't at all fair.
No need to apologize at all. Ralf
On 4 December 2013 07:40, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Wed, Dec 4, 2013 at 1:54 AM, Donald Stufft <donald@stufft.io> wrote:
I’d love to get Wheels to the point they are more suitable then they are for SciPy stuff,
That would indeed be a good step forward. I'm interested to try to help get to that point for Numpy and Scipy.
Thanks Ralf. Please let me know what you think of the following.
I’m not sure what the diff between the current state and what they need to be are but if someone spells it out (I’ve only just skimmed your last email so perhaps it’s contained in that!) I’ll do the arguing for it. I just need someone who actually knows what’s needed to advise me :)
To start with, the SSE stuff. Numpy and scipy are distributed as "superpack" installers for Windows containing three full builds: no SSE, SSE2 and SSE3. Plus a script that runs at install time to check which version to use. These are built with ``paver bdist_superpack``, see https://github.com/numpy/numpy/blob/master/pavement.py#L224. The NSIS and CPU selector scripts are under tools/win32build/.
How do I package those three builds into wheels and get the right one installed by ``pip install numpy``?
This was discussed previously on this list: https://mail.python.org/pipermail/distutils-sig/2013-August/022362.html Essentially the current wheel format and specification does not provide a way to do this directly. There are several different possible approaches. One possibility is that the wheel spec can be updated to include a post-install script (I believe this will happen eventually - someone correct me if I'm wrong). Then the numpy for Windows wheel can just do the same as the superpack installer: ship all variants, then delete/rename in a post-install script so that the correct variant is in place after install. Another possibility is that the pip/wheel/PyPI/metadata system can be changed to allow a "variant" field for wheels/sdists. This was also suggested in the same thread by Nick Coghlan: https://mail.python.org/pipermail/distutils-sig/2013-August/022432.html The variant field could be used to upload multiple variants e.g. numpy-1.7.1-cp27-cp22m-win32.whl numpy-1.7.1-cp27-cp22m-win32-sse.whl numpy-1.7.1-cp27-cp22m-win32-sse2.whl numpy-1.7.1-cp27-cp22m-win32-sse3.whl then if the user requests 'numpy:sse3' they will get the wheel with sse3 support. Of course how would the user know if their CPU supports SSE3? I know roughly what SSE is but I don't know what level of SSE is avilable on each of the machines I use. There is a Python script/module in numpexpr that can detect this: https://github.com/eleddy/numexpr/blob/master/numexpr/cpuinfo.py When I run that script on this machine I get: $ python cpuinfo.py CPU information: CPUInfoBase__get_nbits=32 getNCPUs=2 has_mmx has_sse2 is_32bit is_Core2 is_Intel is_i686 So perhaps someone could break that script out of numexpr and release it as a separate package on PyPI. Then the instructions for installing numpy could be something like """ You can install numpy with $pip install numpy which will download the default version without any CPU-specific optimisations. If you know what level of SSE support your CPU has then you can download a more optimised numpy with either of: $ pip install numpy:sse2 $ pip install numpy:sse3 To determine whether or not your CPU has SSE2 or SSE3 or no SSE support you can install and run the cpuinfo script. For example on this machine: $ pip install cpuinfo $ python -m cpuinfo --sse This CPU supports the SSE3 instruction set. That means we can install numpy:sse3. """ Of course it would be a shame to have a solution that is so close to automatic without quite being automatic. Also the problem is that having no SSE support in the default numpy means that lots of people would lose out on optimisations. For example if numpy is installed as a dependency of something else then the user would always end up with the unoptimised no-SSE binary. Another possibility is that numpy could depend on the cpuinfo package so that it gets installed automatically before numpy. Then if the cpuinfo package has a traditional setup.py sdist (not a wheel) it could detect the CPU information at install time and store that in its package metadata. Then pip would be aware of this metadata and could use it to determine which wheel is appropriate. I don't quite know if this would work but perhaps the cpuinfo could announce that it "Provides" e.g. cpuinfo:sse2. Then a numpy wheel could "Requires" cpuinfo:sse2 or something along these lines. Or perhaps this is better handled by the metadata extensions Nick suggested earlier in this thread. I think it would be good to work out a way of doing this with e.g. a cpuinfo package. Many other packages beyond numpy could make good use of that metadata if it were available. Similarly having an extensible mechanism for selecting wheels based on additional information about the user's system could be used for many more things than just CPU architectures. Oscar
On 4 December 2013 20:41, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
On 4 December 2013 07:40, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Wed, Dec 4, 2013 at 1:54 AM, Donald Stufft <donald@stufft.io> wrote:
I’d love to get Wheels to the point they are more suitable then they are for SciPy stuff,
That would indeed be a good step forward. I'm interested to try to help get to that point for Numpy and Scipy.
Thanks Ralf. Please let me know what you think of the following.
I’m not sure what the diff between the current state and what they need to be are but if someone spells it out (I’ve only just skimmed your last email so perhaps it’s contained in that!) I’ll do the arguing for it. I just need someone who actually knows what’s needed to advise me :)
To start with, the SSE stuff. Numpy and scipy are distributed as "superpack" installers for Windows containing three full builds: no SSE, SSE2 and SSE3. Plus a script that runs at install time to check which version to use. These are built with ``paver bdist_superpack``, see https://github.com/numpy/numpy/blob/master/pavement.py#L224. The NSIS and CPU selector scripts are under tools/win32build/.
How do I package those three builds into wheels and get the right one installed by ``pip install numpy``?
This was discussed previously on this list: https://mail.python.org/pipermail/distutils-sig/2013-August/022362.html
Essentially the current wheel format and specification does not provide a way to do this directly. There are several different possible approaches.
One possibility is that the wheel spec can be updated to include a post-install script (I believe this will happen eventually - someone correct me if I'm wrong). Then the numpy for Windows wheel can just do the same as the superpack installer: ship all variants, then delete/rename in a post-install script so that the correct variant is in place after install.
Yes, export hooks in metadata 2.0 would support this approach. However, export hooks require allowing just-downloaded code to run with elevated privileges, so we're trying to minimise the number of cases where they're needed.
Another possibility is that the pip/wheel/PyPI/metadata system can be changed to allow a "variant" field for wheels/sdists. This was also suggested in the same thread by Nick Coghlan: https://mail.python.org/pipermail/distutils-sig/2013-August/022432.html
The variant field could be used to upload multiple variants e.g. numpy-1.7.1-cp27-cp22m-win32.whl numpy-1.7.1-cp27-cp22m-win32-sse.whl numpy-1.7.1-cp27-cp22m-win32-sse2.whl numpy-1.7.1-cp27-cp22m-win32-sse3.whl then if the user requests 'numpy:sse3' they will get the wheel with sse3 support.
That was what I was originally thinking for the variant field, but I later realised it makes more sense to treat the "variant" marker as part of the *platform* tag, rather than being an independent tag in its own right: https://bitbucket.org/pypa/pypi-metadata-formats/issue/15/enhance-the-platfo... Under that approach, pip would figure out all the variants that applied to the current system (with some default preference order between variants for platforms where one system may support multiple variants). Using the Linux distro variants (based on ID and RELEASE_ID in /etc/os-release) as an example rather than the Windows SSE variants, this might look like: cp33-cp33m-linux_x86_64_fedora_19 cp33-cp33m-linux_x86_64_fedora cp33-cp33m-linux_x86_64 The Windows SSE variants might look like: cp33-cp33m-win32_sse3 cp33-cp33m-win32_sse2 cp33-cp33m-win32_sse cp33-cp33m-win32
Of course how would the user know if their CPU supports SSE3? I know roughly what SSE is but I don't know what level of SSE is avilable on each of the machines I use.
Asking this question is how I realised the variant tag should probably be part of the platform field and handled automatically by pip rather than users needing to request it explicitly. However, it's not without its problems (more on that below)
There is a Python script/module in numpexpr that can detect this: https://github.com/eleddy/numexpr/blob/master/numexpr/cpuinfo.py
When I run that script on this machine I get: $ python cpuinfo.py CPU information: CPUInfoBase__get_nbits=32 getNCPUs=2 has_mmx has_sse2 is_32bit is_Core2 is_Intel is_i686
So perhaps someone could break that script out of numexpr and release it as a separate package on PyPI. Then the instructions for installing numpy could be something like """ You can install numpy with
$pip install numpy
which will download the default version without any CPU-specific optimisations.
If you know what level of SSE support your CPU has then you can download a more optimised numpy with either of:
$ pip install numpy:sse2 $ pip install numpy:sse3
To determine whether or not your CPU has SSE2 or SSE3 or no SSE support you can install and run the cpuinfo script. For example on this machine:
$ pip install cpuinfo $ python -m cpuinfo --sse This CPU supports the SSE3 instruction set.
That means we can install numpy:sse3. """
Of course it would be a shame to have a solution that is so close to automatic without quite being automatic. Also the problem is that having no SSE support in the default numpy means that lots of people would lose out on optimisations. For example if numpy is installed as a dependency of something else then the user would always end up with the unoptimised no-SSE binary.
The other question I asked that made me realise the SSE information should be an optional part of the platform tag :)
Another possibility is that numpy could depend on the cpuinfo package so that it gets installed automatically before numpy. Then if the cpuinfo package has a traditional setup.py sdist (not a wheel) it could detect the CPU information at install time and store that in its package metadata. Then pip would be aware of this metadata and could use it to determine which wheel is appropriate.
I don't quite know if this would work but perhaps the cpuinfo could announce that it "Provides" e.g. cpuinfo:sse2. Then a numpy wheel could "Requires" cpuinfo:sse2 or something along these lines. Or perhaps this is better handled by the metadata extensions Nick suggested earlier in this thread.
I think it would be good to work out a way of doing this with e.g. a cpuinfo package. Many other packages beyond numpy could make good use of that metadata if it were available. Similarly having an extensible mechanism for selecting wheels based on additional information about the user's system could be used for many more things than just CPU architectures.
Yes, the lack of extensibility is the one concern I have with baking the CPU SSE info into the platform tag. On the other hand, the CPU architecture info is already in there, so appending the vectorisation support isn't an obviously bad idea, is orthogonal to the "python.expects" consistency enforcement metadata and would cover the NumPy use case, which is the one we really care about at this point. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 4 December 2013 12:10, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 4 December 2013 20:41, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Another possibility is that the pip/wheel/PyPI/metadata system can be changed to allow a "variant" field for wheels/sdists. This was also suggested in the same thread by Nick Coghlan: https://mail.python.org/pipermail/distutils-sig/2013-August/022432.html
The variant field could be used to upload multiple variants e.g. numpy-1.7.1-cp27-cp22m-win32.whl numpy-1.7.1-cp27-cp22m-win32-sse.whl numpy-1.7.1-cp27-cp22m-win32-sse2.whl numpy-1.7.1-cp27-cp22m-win32-sse3.whl then if the user requests 'numpy:sse3' they will get the wheel with sse3 support.
That was what I was originally thinking for the variant field, but I later realised it makes more sense to treat the "variant" marker as part of the *platform* tag, rather than being an independent tag in its own right: https://bitbucket.org/pypa/pypi-metadata-formats/issue/15/enhance-the-platfo...
Under that approach, pip would figure out all the variants that applied to the current system (with some default preference order between variants for platforms where one system may support multiple variants). Using the Linux distro variants (based on ID and RELEASE_ID in /etc/os-release) as an example rather than the Windows SSE variants, this might look like:
cp33-cp33m-linux_x86_64_fedora_19 cp33-cp33m-linux_x86_64_fedora cp33-cp33m-linux_x86_64
I find that a bit strange to look at since I expect it to be like a taxonomic hierarchy like so: cp33-cp33m-linux cp33-cp33m-linux_fedora cp33-cp33m-linux_fedora_19 cp33-cp33m-linux_fedora_19_x86_64 Really you always need the architecture information though so cp33-cp33m-linux_x86_64 cp33-cp33m-linux_fedora_x86_64 cp33-cp33m-linux_fedora_19_x86_64
The Windows SSE variants might look like:
cp33-cp33m-win32_sse3 cp33-cp33m-win32_sse2 cp33-cp33m-win32_sse cp33-cp33m-win32
I would have thought something like: cp33-cp33m-win32 cp33-cp33m-win32_nt cp33-cp33m-win32_nt_vista cp33-cp33m-win32_nt_vista_sp2 Also CPU information isn't hierarchical, so what happens when e.g. pyfftw wants to ship wheels with and without MMX instructions?
I think it would be good to work out a way of doing this with e.g. a cpuinfo package. Many other packages beyond numpy could make good use of that metadata if it were available. Similarly having an extensible mechanism for selecting wheels based on additional information about the user's system could be used for many more things than just CPU architectures.
Yes, the lack of extensibility is the one concern I have with baking the CPU SSE info into the platform tag. On the other hand, the CPU architecture info is already in there, so appending the vectorisation support isn't an obviously bad idea, is orthogonal to the "python.expects" consistency enforcement metadata and would cover the NumPy use case, which is the one we really care about at this point.
An extensible solution would be a big win. Maybe there should be an explicit metadata option that says "to get this piece of metadata you should install the following package and then run this command (without elevated privileges?)". Oscar
Am 04.12.2013 11:41, schrieb Oscar Benjamin:
On 4 December 2013 07:40, Ralf Gommers <ralf.gommers@gmail.com> wrote:
How do I package those three builds into wheels and get the right one installed by ``pip install numpy``?
This was discussed previously on this list: https://mail.python.org/pipermail/distutils-sig/2013-August/022362.html
Essentially the current wheel format and specification does not provide a way to do this directly. There are several different possible approaches.
One possibility is that the wheel spec can be updated to include a post-install script (I believe this will happen eventually - someone correct me if I'm wrong). Then the numpy for Windows wheel can just do the same as the superpack installer: ship all variants, then delete/rename in a post-install script so that the correct variant is in place after install.
Another possibility is that the pip/wheel/PyPI/metadata system can be changed to allow a "variant" field for wheels/sdists. This was also suggested in the same thread by Nick Coghlan: https://mail.python.org/pipermail/distutils-sig/2013-August/022432.html
The variant field could be used to upload multiple variants e.g. numpy-1.7.1-cp27-cp22m-win32.whl numpy-1.7.1-cp27-cp22m-win32-sse.whl numpy-1.7.1-cp27-cp22m-win32-sse2.whl numpy-1.7.1-cp27-cp22m-win32-sse3.whl then if the user requests 'numpy:sse3' they will get the wheel with sse3 support.
Why does numpy not create a universal distribution, where the actual extensions used are determined at runtime? This would simplify the installation (all the stuff that you describe would not be required). Another benefit would be for users that create and distribute 'frozen' executables (py2exe, py2app, cx_freeze, pyinstaller), the exe would work on any machine independend from the sse - level. Thomas
On Wed, Dec 4, 2013 at 11:41 AM, Oscar Benjamin <oscar.j.benjamin@gmail.com>wrote:
On 4 December 2013 07:40, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Wed, Dec 4, 2013 at 1:54 AM, Donald Stufft <donald@stufft.io> wrote:
I’d love to get Wheels to the point they are more suitable then they are for SciPy stuff,
That would indeed be a good step forward. I'm interested to try to help get to that point for Numpy and Scipy.
Thanks Ralf. Please let me know what you think of the following.
I’m not sure what the diff between the current state and what they need to be are but if someone spells it out (I’ve only just skimmed your last email so perhaps it’s contained in that!) I’ll do the arguing for it. I just need someone who actually knows what’s needed to advise me :)
To start with, the SSE stuff. Numpy and scipy are distributed as "superpack" installers for Windows containing three full builds: no SSE, SSE2 and SSE3. Plus a script that runs at install time to check which version to use. These are built with ``paver bdist_superpack``, see https://github.com/numpy/numpy/blob/master/pavement.py#L224. The NSIS and CPU selector scripts are under tools/win32build/.
How do I package those three builds into wheels and get the right one installed by ``pip install numpy``?
This was discussed previously on this list: https://mail.python.org/pipermail/distutils-sig/2013-August/022362.html
Thanks, I'll go read that. Essentially the current wheel format and specification does not
provide a way to do this directly. There are several different possible approaches.
One possibility is that the wheel spec can be updated to include a post-install script (I believe this will happen eventually - someone correct me if I'm wrong). Then the numpy for Windows wheel can just do the same as the superpack installer: ship all variants, then delete/rename in a post-install script so that the correct variant is in place after install.
Another possibility is that the pip/wheel/PyPI/metadata system can be changed to allow a "variant" field for wheels/sdists. This was also suggested in the same thread by Nick Coghlan: https://mail.python.org/pipermail/distutils-sig/2013-August/022432.html
The variant field could be used to upload multiple variants e.g. numpy-1.7.1-cp27-cp22m-win32.whl numpy-1.7.1-cp27-cp22m-win32-sse.whl numpy-1.7.1-cp27-cp22m-win32-sse2.whl numpy-1.7.1-cp27-cp22m-win32-sse3.whl then if the user requests 'numpy:sse3' they will get the wheel with sse3 support.
Of course how would the user know if their CPU supports SSE3? I know roughly what SSE is but I don't know what level of SSE is avilable on each of the machines I use. There is a Python script/module in numpexpr that can detect this: https://github.com/eleddy/numexpr/blob/master/numexpr/cpuinfo.py
When I run that script on this machine I get: $ python cpuinfo.py CPU information: CPUInfoBase__get_nbits=32 getNCPUs=2 has_mmx has_sse2 is_32bit is_Core2 is_Intel is_i686
So perhaps someone could break that script out of numexpr and release it as a separate package on PyPI.
That's similar to what numpy has - actually it's a copy from numpy.distutils.cpuinfo
Then the instructions for installing numpy could be something like """ You can install numpy with
$pip install numpy
which will download the default version without any CPU-specific optimisations.
If you know what level of SSE support your CPU has then you can download a more optimised numpy with either of:
$ pip install numpy:sse2 $ pip install numpy:sse3
To determine whether or not your CPU has SSE2 or SSE3 or no SSE support you can install and run the cpuinfo script. For example on this machine:
$ pip install cpuinfo $ python -m cpuinfo --sse This CPU supports the SSE3 instruction set.
That means we can install numpy:sse3. """
The problem with all of the above is indeed that it's not quite automatic. You don't want your user to have to know or care about what SSE is. Nor do you want to create a new package just to hack around a pip limitation. I like the post-install (or pre-install) option much better.
Of course it would be a shame to have a solution that is so close to automatic without quite being automatic. Also the problem is that having no SSE support in the default numpy means that lots of people would lose out on optimisations. For example if numpy is installed as a dependency of something else then the user would always end up with the unoptimised no-SSE binary.
Another possibility is that numpy could depend on the cpuinfo package so that it gets installed automatically before numpy. Then if the cpuinfo package has a traditional setup.py sdist (not a wheel) it could detect the CPU information at install time and store that in its package metadata. Then pip would be aware of this metadata and could use it to determine which wheel is appropriate.
I don't quite know if this would work but perhaps the cpuinfo could announce that it "Provides" e.g. cpuinfo:sse2. Then a numpy wheel could "Requires" cpuinfo:sse2 or something along these lines. Or perhaps this is better handled by the metadata extensions Nick suggested earlier in this thread.
I think it would be good to work out a way of doing this with e.g. a cpuinfo package. Many other packages beyond numpy could make good use of that metadata if it were available. Similarly having an extensible mechanism for selecting wheels based on additional information about the user's system could be used for many more things than just CPU architectures.
I agree extensibility is quite important. Whatever scheme you'd think of with pre-defined tags will fail the next time anyone has a new idea (random example: what if we start shipping parallel sets of binaries that only differ in whether they're linked against ATLAS, OpenBLAS or MKL). Ralf
On 5 Dec 2013 07:29, "Ralf Gommers" <ralf.gommers@gmail.com> wrote:
On Wed, Dec 4, 2013 at 11:41 AM, Oscar Benjamin <
On 4 December 2013 07:40, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Wed, Dec 4, 2013 at 1:54 AM, Donald Stufft <donald@stufft.io> wrote:
I’d love to get Wheels to the point they are more suitable then they
are
for SciPy stuff,
That would indeed be a good step forward. I'm interested to try to help get to that point for Numpy and Scipy.
Thanks Ralf. Please let me know what you think of the following.
I’m not sure what the diff between the current state and what they need to be are but if someone spells it out (I’ve only just skimmed your last email so perhaps it’s contained in that!) I’ll do the arguing for it. I just need someone who actually knows what’s needed to advise me :)
To start with, the SSE stuff. Numpy and scipy are distributed as "superpack" installers for Windows containing three full builds: no SSE, SSE2 and SSE3. Plus a script that runs at install time to check which version to use. These are built with ``paver bdist_superpack``, see https://github.com/numpy/numpy/blob/master/pavement.py#L224. The NSIS and CPU selector scripts are under tools/win32build/.
How do I package those three builds into wheels and get the right one installed by ``pip install numpy``?
This was discussed previously on this list: https://mail.python.org/pipermail/distutils-sig/2013-August/022362.html
Thanks, I'll go read that.
Essentially the current wheel format and specification does not provide a way to do this directly. There are several different possible approaches.
One possibility is that the wheel spec can be updated to include a post-install script (I believe this will happen eventually - someone correct me if I'm wrong). Then the numpy for Windows wheel can just do the same as the superpack installer: ship all variants, then delete/rename in a post-install script so that the correct variant is in place after install.
Another possibility is that the pip/wheel/PyPI/metadata system can be changed to allow a "variant" field for wheels/sdists. This was also suggested in the same thread by Nick Coghlan: https://mail.python.org/pipermail/distutils-sig/2013-August/022432.html
The variant field could be used to upload multiple variants e.g. numpy-1.7.1-cp27-cp22m-win32.whl numpy-1.7.1-cp27-cp22m-win32-sse.whl numpy-1.7.1-cp27-cp22m-win32-sse2.whl numpy-1.7.1-cp27-cp22m-win32-sse3.whl then if the user requests 'numpy:sse3' they will get the wheel with sse3 support.
Of course how would the user know if their CPU supports SSE3? I know roughly what SSE is but I don't know what level of SSE is avilable on each of the machines I use. There is a Python script/module in numpexpr that can detect this: https://github.com/eleddy/numexpr/blob/master/numexpr/cpuinfo.py
When I run that script on this machine I get: $ python cpuinfo.py CPU information: CPUInfoBase__get_nbits=32 getNCPUs=2 has_mmx has_sse2 is_32bit is_Core2 is_Intel is_i686
So perhaps someone could break that script out of numexpr and release it as a separate package on PyPI.
That's similar to what numpy has - actually it's a copy from numpy.distutils.cpuinfo
Then the instructions for installing numpy could be something like """ You can install numpy with
$pip install numpy
which will download the default version without any CPU-specific
optimisations.
If you know what level of SSE support your CPU has then you can download a more optimised numpy with either of:
$ pip install numpy:sse2 $ pip install numpy:sse3
To determine whether or not your CPU has SSE2 or SSE3 or no SSE support you can install and run the cpuinfo script. For example on this machine:
$ pip install cpuinfo $ python -m cpuinfo --sse This CPU supports the SSE3 instruction set.
That means we can install numpy:sse3. """
The problem with all of the above is indeed that it's not quite automatic. You don't want your user to have to know or care about what SSE is. Nor do you want to create a new package just to hack around a pip
oscar.j.benjamin@gmail.com> wrote: limitation. I like the post-install (or pre-install) option much better.
Of course it would be a shame to have a solution that is so close to automatic without quite being automatic. Also the problem is that having no SSE support in the default numpy means that lots of people would lose out on optimisations. For example if numpy is installed as a dependency of something else then the user would always end up with the unoptimised no-SSE binary.
Another possibility is that numpy could depend on the cpuinfo package so that it gets installed automatically before numpy. Then if the cpuinfo package has a traditional setup.py sdist (not a wheel) it could detect the CPU information at install time and store that in its package metadata. Then pip would be aware of this metadata and could use it to determine which wheel is appropriate.
I don't quite know if this would work but perhaps the cpuinfo could announce that it "Provides" e.g. cpuinfo:sse2. Then a numpy wheel could "Requires" cpuinfo:sse2 or something along these lines. Or perhaps this is better handled by the metadata extensions Nick suggested earlier in this thread.
I think it would be good to work out a way of doing this with e.g. a cpuinfo package. Many other packages beyond numpy could make good use of that metadata if it were available. Similarly having an extensible mechanism for selecting wheels based on additional information about the user's system could be used for many more things than just CPU architectures.
I agree extensibility is quite important. Whatever scheme you'd think of
with pre-defined tags will fail the next time anyone has a new idea (random example: what if we start shipping parallel sets of binaries that only differ in whether they're linked against ATLAS, OpenBLAS or MKL). Hmm, rather than adding complexity most folks don't need directly to the base wheel spec, here's a possible "multiwheel" notion - embed multiple wheels with different names inside the multiwheel, along with a self-contained selector function for choosing which ones to actually install on the current system. This could be used not only for the NumPy use case, but also allow the distribution of external dependencies while allowing their installation to be skipped if they're already present on the target system. Cheers, Nick.
Ralf
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 4 December 2013 23:31, Nick Coghlan <ncoghlan@gmail.com> wrote:
Hmm, rather than adding complexity most folks don't need directly to the base wheel spec, here's a possible "multiwheel" notion - embed multiple wheels with different names inside the multiwheel, along with a self-contained selector function for choosing which ones to actually install on the current system.
That sounds like a reasonable approach. I'd be willing to try to put together a proof of concept implementation, if people think it's viable. What would we need to push this forward? A new PEP?
This could be used not only for the NumPy use case, but also allow the distribution of external dependencies while allowing their installation to be skipped if they're already present on the target system.
I'm not sure how this would work - wheels don't seem to me to be appropriate for installing "external dependencies", but as I'm not 100% clear on what you mean by that term I may be misunderstanding. Can you provide a concrete example? Paul
On 5 December 2013 19:40, Paul Moore <p.f.moore@gmail.com> wrote:
On 4 December 2013 23:31, Nick Coghlan <ncoghlan@gmail.com> wrote:
Hmm, rather than adding complexity most folks don't need directly to the base wheel spec, here's a possible "multiwheel" notion - embed multiple wheels with different names inside the multiwheel, along with a self-contained selector function for choosing which ones to actually install on the current system.
That sounds like a reasonable approach. I'd be willing to try to put together a proof of concept implementation, if people think it's viable. What would we need to push this forward? A new PEP?
This could be used not only for the NumPy use case, but also allow the distribution of external dependencies while allowing their installation to be skipped if they're already present on the target system.
I'm not sure how this would work - wheels don't seem to me to be appropriate for installing "external dependencies", but as I'm not 100% clear on what you mean by that term I may be misunderstanding. Can you provide a concrete example?
If you put stuff in the "data" scheme dir, it allows you to install files anywhere you like relative to the installation root. That means you can already use the wheel format to distribute arbitrary files, you may just have to build it via some mechanism other than bdist_wheel. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 5 December 2013 09:52, Nick Coghlan <ncoghlan@gmail.com> wrote:
I'm not sure how this would work - wheels don't seem to me to be appropriate for installing "external dependencies", but as I'm not 100% clear on what you mean by that term I may be misunderstanding. Can you provide a concrete example?
If you put stuff in the "data" scheme dir, it allows you to install files anywhere you like relative to the installation root. That means you can already use the wheel format to distribute arbitrary files, you may just have to build it via some mechanism other than bdist_wheel.
Ah, OK. I see. Paul
On Dec 5, 2013, at 1:40 AM, Paul Moore <p.f.moore@gmail.com> wrote:
I'm not sure how this would work - wheels don't seem to me to be appropriate for installing "external dependencies", but as I'm not 100% clear on what you mean by that term
One of the key features of conda is that it is not specifically tied to python--it can manage any binary package for a system: this is a key reason for it's existance -- continuum wants to support it's users with one way to install all they stuff they need to do their work with one cross-platform solution. This includes not just libraries that python extensions require, but also non-python stuff like Fortran compilers, other languages (like R), or who knows what? As wheels and conda packages are both just archives, there's no reason wheel couldn't grow that capability -- but I'm not at all sure we want it to. -Chris
Ralf, Great to have you on this thread! Note: supporting "variants" on one way or another is a great idea, but for right now, maybe we can get pretty far without it. There are options for "serious" scipy users that need optimum performance, and newbies that want the full stack. So our primary audience for "default" installs and pypi wheels are folks that need the core packages ( maybe a web dev that wants some MPL plots) and need things to "just work" more than anything optimized. So a lowest common denominator wheel would be very, very, useful. As for what that would be: the superpack is great, but it's been around a while (long while in computer years) How many non-sse machines are there still out there? How many non-sse2? And how big is the performance boost anyway? What I'm getting at is that we may well be able to build a reasonable win32 binary wheel that we can put up on pypi right now, with currently available tools. Then MPL and pandas and I python... Scipy is trickier-- what with the Fortran and all, but I think we could do Win32 anyway. And what's the hold up with win64? Is that fortran and scipy? If so, then why not do win64 for the rest of the stack? (I, for one, have been a heavy numpy user since the Numeric days, and I still hardly use scipy) By the way, we can/should do OS-X too-- it seems easier in fact (fewer hardware options to support, and the Mac's universal binaries) -Chris Note on OS-X : how long has it been since Apple shipped a 32 bit machine? Can we dump default 32 bit support? I'm pretty sure we don't need to do PPC anymore... On Dec 3, 2013, at 11:40 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote: On Wed, Dec 4, 2013 at 1:54 AM, Donald Stufft <donald@stufft.io> wrote:
On Dec 3, 2013, at 7:36 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
On 3 December 2013 21:13, Donald Stufft <donald@stufft.io> wrote:
I think Wheels are the way forward for Python dependencies. Perhaps not for things like fortran. I hope that the scientific community can start publishing wheels at least in addition too.
The Fortran issue is not that complicated. Very few packages are affected by it. It can easily be fixed with some kind of compatibility tag that can be used by the small number of affected packages.
I don't believe that Conda will gain the mindshare that pip has outside of the scientific community so I hope we don't end up with two systems that can't interoperate.
Maybe conda won't gain mindshare outside the scientific community but wheel really needs to gain mindshare *within* the scientific community. The root of all this is numpy. It is the biggest dependency on PyPI, is hard to build well, and has the Fortran ABI issue. It is used by very many people who wouldn't consider themselves part of the "scientific community". For example matplotlib depends on it. The PyPy devs have decided that it's so crucial to the success of PyPy that numpy's basically being rewritten in their stdlib (along with the C API).
A few times I've seen Paul Moore refer to numpy as the "litmus test" for wheels. I actually think that it's more important than that. If wheels are going to fly then there *needs* to be wheels for numpy. As long as there isn't a wheel for numpy then there will be lots of people looking for a non-pip/PyPI solution to their needs.
One way of getting the scientific community more on board here would be to offer them some tangible advantages. So rather than saying "oh well scientific use is a special case so they should just use conda or something", the message should be "the wheel system provides solutions to many long-standing problems and is even better than conda in (at least) some ways because it cleanly solves the Fortran ABI issue for example".
Oscar
I’d love to get Wheels to the point they are more suitable then they are for SciPy stuff,
That would indeed be a good step forward. I'm interested to try to help get to that point for Numpy and Scipy. I’m not sure what the diff between the current state and what
they need to be are but if someone spells it out (I’ve only just skimmed your last email so perhaps it’s contained in that!) I’ll do the arguing for it. I just need someone who actually knows what’s needed to advise me :)
To start with, the SSE stuff. Numpy and scipy are distributed as "superpack" installers for Windows containing three full builds: no SSE, SSE2 and SSE3. Plus a script that runs at install time to check which version to use. These are built with ``paver bdist_superpack``, see https://github.com/numpy/numpy/blob/master/pavement.py#L224. The NSIS and CPU selector scripts are under tools/win32build/. How do I package those three builds into wheels and get the right one installed by ``pip install numpy``? If this is too difficult at the moment, an easier (but much less important one) would be to get the result of ``paver bdist_wininst_simple`` as a wheel. For now I think it's OK that the wheels would just target 32-bit Windows and python.org compatible Pythons (given that that's all we currently distribute). Once that works we can look at OS X and 64-bit Windows. Ralf _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Wed, Dec 4, 2013 at 5:05 PM, Chris Barker - NOAA Federal < chris.barker@noaa.gov> wrote:
Ralf,
Great to have you on this thread!
Note: supporting "variants" on one way or another is a great idea, but for right now, maybe we can get pretty far without it.
There are options for "serious" scipy users that need optimum performance, and newbies that want the full stack.
So our primary audience for "default" installs and pypi wheels are folks that need the core packages ( maybe a web dev that wants some MPL plots) and need things to "just work" more than anything optimized.
The problem is explaining to people what they want - no one reads docs before grabbing a binary. On the other hand, using wheels does solve the issue that people download 32-bit installers for 64-bit Windows systems.
So a lowest common denominator wheel would be very, very, useful.
As for what that would be: the superpack is great, but it's been around a while (long while in computer years)
How many non-sse machines are there still out there? How many non-sse2?
Hard to tell. Probably <2%, but that's still too much. Some older Athlon XPs don't have it for example. And what if someone submits performance optimizations (there has been a focus on those recently) to numpy that use SSE4 or AVX for example? You don't want to reject those based on the limitations of your distribution process. And how big is the performance boost anyway?
Large. For a long time we've put a non-SSE installer for numpy on pypi so that people would stop complaining that ``easy_install numpy`` didn't work. Then there were regular complaints about dot products being an order of magnitude slower than Matlab or R. What I'm getting at is that we may well be able to build a reasonable win32
binary wheel that we can put up on pypi right now, with currently available tools.
Then MPL and pandas and I python...
Scipy is trickier-- what with the Fortran and all, but I think we could do Win32 anyway.
And what's the hold up with win64? Is that fortran and scipy? If so, then why not do win64 for the rest of the stack?
Yes, 64-bit MinGW + gfortran doesn't yet work (no place to install dlls from the binary, long story). A few people including David C are working on this issue right now. Visual Studio + Intel Fortran would work, but going with only an expensive toolset like that is kind of a no-go - especially since I think you'd force everyone else that builds other Fortran extensions to then also use the same toolset. (I, for one, have been a heavy numpy user since the Numeric days, and I
still hardly use scipy)
By the way, we can/should do OS-X too-- it seems easier in fact (fewer hardware options to support, and the Mac's universal binaries)
-Chris
Note on OS-X : how long has it been since Apple shipped a 32 bit machine? Can we dump default 32 bit support? I'm pretty sure we don't need to do PPC anymore...
I'd like to, but we decided to ship the exact same set of binaries as python.org - which means compiling on OS X 10.5/10.6 and including PPC + 32-bit Intel. Ralf
On Dec 3, 2013, at 11:40 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Wed, Dec 4, 2013 at 1:54 AM, Donald Stufft <donald@stufft.io> wrote:
On Dec 3, 2013, at 7:36 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
On 3 December 2013 21:13, Donald Stufft <donald@stufft.io> wrote:
I think Wheels are the way forward for Python dependencies. Perhaps not for things like fortran. I hope that the scientific community can start publishing wheels at least in addition too.
The Fortran issue is not that complicated. Very few packages are affected by it. It can easily be fixed with some kind of compatibility tag that can be used by the small number of affected packages.
I don't believe that Conda will gain the mindshare that pip has outside of the scientific community so I hope we don't end up with two systems that can't interoperate.
Maybe conda won't gain mindshare outside the scientific community but wheel really needs to gain mindshare *within* the scientific community. The root of all this is numpy. It is the biggest dependency on PyPI, is hard to build well, and has the Fortran ABI issue. It is used by very many people who wouldn't consider themselves part of the "scientific community". For example matplotlib depends on it. The PyPy devs have decided that it's so crucial to the success of PyPy that numpy's basically being rewritten in their stdlib (along with the C API).
A few times I've seen Paul Moore refer to numpy as the "litmus test" for wheels. I actually think that it's more important than that. If wheels are going to fly then there *needs* to be wheels for numpy. As long as there isn't a wheel for numpy then there will be lots of people looking for a non-pip/PyPI solution to their needs.
One way of getting the scientific community more on board here would be to offer them some tangible advantages. So rather than saying "oh well scientific use is a special case so they should just use conda or something", the message should be "the wheel system provides solutions to many long-standing problems and is even better than conda in (at least) some ways because it cleanly solves the Fortran ABI issue for example".
Oscar
I’d love to get Wheels to the point they are more suitable then they are for SciPy stuff,
That would indeed be a good step forward. I'm interested to try to help get to that point for Numpy and Scipy.
I’m not sure what the diff between the current state and what
they need to be are but if someone spells it out (I’ve only just skimmed your last email so perhaps it’s contained in that!) I’ll do the arguing for it. I just need someone who actually knows what’s needed to advise me :)
To start with, the SSE stuff. Numpy and scipy are distributed as "superpack" installers for Windows containing three full builds: no SSE, SSE2 and SSE3. Plus a script that runs at install time to check which version to use. These are built with ``paver bdist_superpack``, see https://github.com/numpy/numpy/blob/master/pavement.py#L224. The NSIS and CPU selector scripts are under tools/win32build/.
How do I package those three builds into wheels and get the right one installed by ``pip install numpy``?
If this is too difficult at the moment, an easier (but much less important one) would be to get the result of ``paver bdist_wininst_simple`` as a wheel.
For now I think it's OK that the wheels would just target 32-bit Windows and python.org compatible Pythons (given that that's all we currently distribute). Once that works we can look at OS X and 64-bit Windows.
Ralf
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Wed, Dec 4, 2013 at 12:56 PM, Ralf Gommers <ralf.gommers@gmail.com>wrote:
The problem is explaining to people what they want - no one reads docs before grabbing a binary.
right -- so we want a default "pip install" install that will work for most people. And I think "works for most people" is far more important than "optimized for your system" How many non-sse machines are there still out there? How many non-sse2?
Hard to tell. Probably <2%, but that's still too much.
I have no idea how to tell, but I agree 2% is too much, however, 0.2% would not be too much (IMHO) -- anyway, I'm just wondering how much we are making this hard for very little return. Anyway, best would be a select-at-runtime option -- I think that's what MKL does. IF someone can figure that out, great, but I still think a numpy wheel that works for most would still be worth doing ,and we can do it now. Some older Athlon XPs don't have it for example. And what if someone
submits performance optimizations (there has been a focus on those recently) to numpy that use SSE4 or AVX for example? You don't want to reject those based on the limitations of your distribution process.
No, but we also don't want to distribute nothing because we can't distribute the best thing. And how big is the performance boost anyway?
Large. For a long time we've put a non-SSE installer for numpy on pypi so that people would stop complaining that ``easy_install numpy`` didn't work. Then there were regular complaints about dot products being an order of magnitude slower than Matlab or R.
Does SSE by you that? or do you need a good BLAS? But same point, anyway. Though I think we lose more users by people not getting an install at all then we lose by people installing and then finding out they need a to install an optimized version to a get a good "dot".
Yes, 64-bit MinGW + gfortran doesn't yet work (no place to install dlls from the binary, long story). A few people including David C are working on this issue right now. Visual Studio + Intel Fortran would work, but going with only an expensive toolset like that is kind of a no-go -
too bad there is no MS-fortran-express... On the other hand, saying "no one can have a 64 bit scipy, because people that want to build fortran extensions that are compatible with it are out of luck" is less than ideal. Right now, we are giving the majority of potential scipy users nothing for Win64. You know what they say "done is better than perfect" [Side note: scipy really shouldn't be a monolithic package with everything and the kitchen sink in it -- this would all be a lot easier if it was a namespace package and people could get the non-Fortran stuff by itself...but I digress.] Note on OS-X : how long has it been since Apple shipped a 32 bit machine?
Can we dump default 32 bit support? I'm pretty sure we don't need to do PPC anymore...
I'd like to, but we decided to ship the exact same set of binaries as python.org - which means compiling on OS X 10.5/10.6 and including PPC + 32-bit Intel.
no it doesn't -- if we decide not to ship the 3.9, PPC + 32-bit Intel. binary -- why should that mean that we can't ship the Intel32+64 bit one? And as for that -- if someone gets a binary with only 64 bit in it, it will run fine with the 32+64 bit build, as long as it's run on a 64 bit machine. So if, in fact, no one has a 32 bit Mac anymore (I'm not saying that's the case) we don't need to build for it. And maybe the next python.org builds could be 64 bit Intel only. Probably not yet, but we shouldn't be locked in forever.... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Thu, Dec 5, 2013 at 1:09 AM, Chris Barker <chris.barker@noaa.gov> wrote:
On Wed, Dec 4, 2013 at 12:56 PM, Ralf Gommers <ralf.gommers@gmail.com>wrote:
The problem is explaining to people what they want - no one reads docs before grabbing a binary.
right -- so we want a default "pip install" install that will work for most people. And I think "works for most people" is far more important than "optimized for your system"
How many non-sse machines are there still out there? How many non-sse2?
Hard to tell. Probably <2%, but that's still too much.
I have no idea how to tell, but I agree 2% is too much, however, 0.2% would not be too much (IMHO) -- anyway, I'm just wondering how much we are making this hard for very little return.
I also don't know.
Anyway, best would be a select-at-runtime option -- I think that's what MKL does. IF someone can figure that out, great, but I still think a numpy wheel that works for most would still be worth doing ,and we can do it now.
I'll start playing with wheels in the near future.
Some older Athlon XPs don't have it for example. And what if someone
submits performance optimizations (there has been a focus on those recently) to numpy that use SSE4 or AVX for example? You don't want to reject those based on the limitations of your distribution process.
No, but we also don't want to distribute nothing because we can't distribute the best thing.
And how big is the performance boost anyway?
Large. For a long time we've put a non-SSE installer for numpy on pypi so that people would stop complaining that ``easy_install numpy`` didn't work. Then there were regular complaints about dot products being an order of magnitude slower than Matlab or R.
Does SSE by you that? or do you need a good BLAS? But same point, anyway. Though I think we lose more users by people not getting an install at all then we lose by people installing and then finding out they need a to install an optimized version to a get a good "dot".
Yes, 64-bit MinGW + gfortran doesn't yet work (no place to install dlls from the binary, long story). A few people including David C are working on this issue right now. Visual Studio + Intel Fortran would work, but going with only an expensive toolset like that is kind of a no-go -
too bad there is no MS-fortran-express...
On the other hand, saying "no one can have a 64 bit scipy, because people that want to build fortran extensions that are compatible with it are out of luck" is less than ideal. Right now, we are giving the majority of potential scipy users nothing for Win64.
There are multiple ways to get a win64 install - Anaconda, EPD, WinPython, Christoph's installers. So there's no big hurry here.
You know what they say "done is better than perfect"
[Side note: scipy really shouldn't be a monolithic package with everything and the kitchen sink in it -- this would all be a lot easier if it was a namespace package and people could get the non-Fortran stuff by itself...but I digress.]
Namespace packages have been tried with scikits - there's a reason why scikit-learn and statsmodels spent a lot of effort dropping them. They don't work. Scipy, while monolithic, works for users.
Note on OS-X : how long has it been since Apple shipped a 32 bit
machine? Can we dump default 32 bit support? I'm pretty sure we don't need to do PPC anymore...
I'd like to, but we decided to ship the exact same set of binaries as python.org - which means compiling on OS X 10.5/10.6 and including PPC + 32-bit Intel.
no it doesn't -- if we decide not to ship the 3.9, PPC + 32-bit Intel. binary -- why should that mean that we can't ship the Intel32+64 bit one?
But we do ship the 32+64-bit one (at least for Python 2.7 and 3.3). So there shouldn't be any issue here. Ralf
And as for that -- if someone gets a binary with only 64 bit in it, it will run fine with the 32+64 bit build, as long as it's run on a 64 bit machine. So if, in fact, no one has a 32 bit Mac anymore (I'm not saying that's the case) we don't need to build for it.
And maybe the next python.org builds could be 64 bit Intel only. Probably not yet, but we shouldn't be locked in forever....
-Chris
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
On 5 December 2013 17:35, Ralf Gommers <ralf.gommers@gmail.com> wrote:
Namespace packages have been tried with scikits - there's a reason why scikit-learn and statsmodels spent a lot of effort dropping them. They don't work. Scipy, while monolithic, works for users.
The namespace package emulation that was all that was available in versions prior to 3.3 can certainly be a bit fragile at times. The native namespace packages in 3.3+ should be more robust (although even one package erroneously including an __init__.py file can still cause trouble). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Dec 4, 2013, at 11:35 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote I'm just wondering how much we are making this hard for very little return. I also don't know. I wonder if a poll on the relevant lists would be helpful... I'll start playing with wheels in the near future. Great! Thanks! There are multiple ways to get a win64 install - Anaconda, EPD, WinPython, Christoph's installers. So there's no big hurry here. well, this discussion is about pip-installability, but yes, some of those are python.org compatible: I know I always point people to Christoph's repo.
[Side note: scipy really shouldn't be a monolithic package with everything and the kitchen sink in it -- this would all be a lot easier if it was a namespace package and people could get the non-Fortran stuff by itself...but I digress.]
Namespace packages have been tried with scikits - there's a reason why scikit-learn and statsmodels spent a lot of effort dropping them. They don't work. Scipy, while monolithic, works for users. True--I've been trying out namespace packages for some far easier problems, and you're right--not a robust solution. That really should be fixed--but a whole new topic! Note on OS-X : how long has it been since Apple shipped a 32 bit machine?
Can we dump default 32 bit support? I'm pretty sure we don't need to do PPC anymore...
I'd like to, but we decided to ship the exact same set of binaries as python.org - which means compiling on OS X 10.5/10.6 and including PPC + 32-bit Intel.
no it doesn't -- if we decide not to ship the 3.9, PPC + 32-bit Intel. binary -- why should that mean that we can't ship the Intel32+64 bit one?
But we do ship the 32+64-bit one (at least for Python 2.7 and 3.3). So there shouldn't be any issue here. Right--we just need the wheel. Which should be trivial for numpy on OS-X -- not the same sse issues. Thanks for working on this. - Chris
On 4 December 2013 20:56, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Wed, Dec 4, 2013 at 5:05 PM, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
So a lowest common denominator wheel would be very, very, useful.
As for what that would be: the superpack is great, but it's been around a while (long while in computer years)
How many non-sse machines are there still out there? How many non-sse2?
Hard to tell. Probably <2%, but that's still too much. Some older Athlon XPs don't have it for example. And what if someone submits performance optimizations (there has been a focus on those recently) to numpy that use SSE4 or AVX for example? You don't want to reject those based on the limitations of your distribution process.
And how big is the performance boost anyway?
Large. For a long time we've put a non-SSE installer for numpy on pypi so that people would stop complaining that ``easy_install numpy`` didn't work. Then there were regular complaints about dot products being an order of magnitude slower than Matlab or R.
Yes, I wouldn't want that kind of bad PR getting around about scientific Python "Python is slower than Matlab" etc. It seems as if there is a need to extend the pip+wheel+PyPI system before this can fully work for numpy. I'm sure that the people here who have been working on all of this would be very interested to know what kinds of solutions would work best for numpy and related packages. You mentioned in another message that a post-install script seems best to you. I suspect there is a little reluctance to go this way because one of the goals of the wheel system is to reduce the situation where users execute arbitrary code from the internet with admin privileges e.g. "sudo pip install X" will download and run the setup.py from X with root privileges. Part of the point about wheels is that they don't need to be "executed" for installation. I know that post-install scripts are common in .deb and .rpm packages but I think that the use case there is slightly different as the files are downloaded from controlled repositories whereas PyPI has no quality assurance. BTW, how do the distros handle e.g. SSE? My understanding is that they just strip out all the SSE and related non-portable extensions and ship generic 686 binaries. My experience is with Ubuntu and I know they're not very good at handling BLAS with numpy and they don't seem to be able to compile fftpack as well as Cristoph can. Perhaps a good near-term plan might be to 1) Add the bdist_wheel command to numpy - which may actually be almost automatic with new enough setuptools/pip and wheel installed. 2) Upload wheels for OSX to PyPI - for OSX SSE support can be inferred from OS version which wheels can currently handle. 3) Upload wheels for Windows to somewhere other than PyPI e.g. SourceForge pending a distribution solution that can detect SSE support on Windows. I think it would be good to have a go at wheels even if they're not fully ready for PyPI (just in case some other issue surfaces in the process). Oscar
On Thu, Dec 5, 2013 at 10:12 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com>wrote:
On 4 December 2013 20:56, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Wed, Dec 4, 2013 at 5:05 PM, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
So a lowest common denominator wheel would be very, very, useful.
As for what that would be: the superpack is great, but it's been around
a
while (long while in computer years)
How many non-sse machines are there still out there? How many non-sse2?
Hard to tell. Probably <2%, but that's still too much. Some older Athlon XPs don't have it for example. And what if someone submits performance optimizations (there has been a focus on those recently) to numpy that use SSE4 or AVX for example? You don't want to reject those based on the limitations of your distribution process.
And how big is the performance boost anyway?
Large. For a long time we've put a non-SSE installer for numpy on pypi so that people would stop complaining that ``easy_install numpy`` didn't work. Then there were regular complaints about dot products being an order of magnitude slower than Matlab or R.
Yes, I wouldn't want that kind of bad PR getting around about scientific Python "Python is slower than Matlab" etc.
It seems as if there is a need to extend the pip+wheel+PyPI system before this can fully work for numpy. I'm sure that the people here who have been working on all of this would be very interested to know what kinds of solutions would work best for numpy and related packages.
You mentioned in another message that a post-install script seems best to you. I suspect there is a little reluctance to go this way because one of the goals of the wheel system is to reduce the situation where users execute arbitrary code from the internet with admin privileges e.g. "sudo pip install X" will download and run the setup.py from X with root privileges. Part of the point about wheels is that they don't need to be "executed" for installation. I know that post-install scripts are common in .deb and .rpm packages but I think that the use case there is slightly different as the files are downloaded from controlled repositories whereas PyPI has no quality assurance.
I don't think it's avoidable - anything that is transparant to the user will have to execute code. The multiwheel idea of Nick looks good to me.
BTW, how do the distros handle e.g. SSE?
I don't know exactly to be honest.
My understanding is that they just strip out all the SSE and related non-portable extensions and ship generic 686 binaries. My experience is with Ubuntu and I know they're not very good at handling BLAS with numpy and they don't seem to be able to compile fftpack as well as Cristoph can.
Perhaps a good near-term plan might be to 1) Add the bdist_wheel command to numpy - which may actually be almost automatic with new enough setuptools/pip and wheel installed. 2) Upload wheels for OSX to PyPI - for OSX SSE support can be inferred from OS version which wheels can currently handle. 3) Upload wheels for Windows to somewhere other than PyPI e.g. SourceForge pending a distribution solution that can detect SSE support on Windows.
That's a reasonable plan. I have an OS X wheel already, which required only a minor change to numpy's setup.py.
I think it would be good to have a go at wheels even if they're not fully ready for PyPI (just in case some other issue surfaces in the process).
Agreed. Ralf
On Dec 5, 2013, at 1:12 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Yes, I wouldn't want that kind of bad PR getting around about scientific Python "Python is slower than Matlab" etc.
Well, is that better or worse that 2% or less people finding they can't run it on their old machines....
It seems as if there is a need to extend the pip+wheel+PyPI system before this can fully work for numpy.
Maybe, in this case, but with the whole fortran ABI thing, yes.
You mentioned in another message that a post-install script seems best to you.
What would really be best is run-time selection of the appropriate lib -- it would solve this problem, and allow users to re-distribute working binaries via py2exe, etc. And not require opening a security hole in wheels... Not sure how hard that would be to do, though.
3) Upload wheels for Windows to somewhere other than PyPI e.g. SourceForge pending a distribution solution that can detect SSE support on Windows.
The hard-core "I want to use python instead of matlab" users are being re-directed to Anaconda or Canopy anyway. So maybe sub-optimal binaries on pypi is OK. By the way, anyone know what Anaconda and Canopy do about SSE and a good BLAS?
I think it would be good to have a go at wheels even if they're not fully ready for PyPI (just in case some other issue surfaces in the process).
Absolutely! - Chris
On Dec 5, 2013, at 8:48 PM, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
What would really be best is run-time selection of the appropriate lib -- it would solve this problem, and allow users to re-distribute working binaries via py2exe, etc. And not require opening a security hole in wheels...
Not sure how hard that would be to do, though.
Install time selectors probably isn’t a huge deal as long as there’s a way to force a particular variant to install and to disable the executing code. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Thu, Dec 5, 2013 at 5:52 PM, Donald Stufft <donald@stufft.io> wrote:
On Dec 5, 2013, at 8:48 PM, Chris Barker - NOAA Federal < chris.barker@noaa.gov> wrote:
What would really be best is run-time selection of the appropriate lib -- it would solve this problem, and allow users to re-distribute working binaries via py2exe, etc. And not require opening a security hole in wheels...
Not sure how hard that would be to do, though.
Install time selectors probably isn’t a huge deal as long as there’s a way to force a particular variant to install and to disable the executing code.
I was proposing run-time -- so the same package would work "right" when moved to another machine via py2exe, etc. I imagine that's harder, particularly with permissions issues... -Chris
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On 6 December 2013 11:52, Donald Stufft <donald@stufft.io> wrote:
On Dec 5, 2013, at 8:48 PM, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
What would really be best is run-time selection of the appropriate lib -- it would solve this problem, and allow users to re-distribute working binaries via py2exe, etc. And not require opening a security hole in wheels...
Not sure how hard that would be to do, though.
Install time selectors probably isn’t a huge deal as long as there’s a way to force a particular variant to install and to disable the executing code.
Hmm, I just had an idea for how to do the runtime selection thing. It actually shouldn't be that hard, so long as the numpy folks are OK with a bit of __path__ manipulation in package __init__ modules. Specifically, what could be done is this: - all of the built SSE level dependent modules would move out of their current package directories into a suitable named subdirectory (say "_nosse, _sse2, _sse3") - in the __init__.py file for each affected subpackage, you would have a snippet like: numpy._add_sse_subdir(__path__) where _add_sse_subdir would be something like: def _add_sse_subdir(search_path): if len(search_path) > 1: return # Assume the SSE dependent dir has already been added # Could likely do this SSE availability check once at import time if _have_sse3(): sub_dir = "_sse3" elif _have_sse2(): sub_dir = "_sse2" else: sub_dir = "_nosse" main_dir = search_path[0] search_path.append(os.path.join(main_dir, sub_dir) With that approach, the existing wheel model would work (no need for a variant system), and numpy installations could be freely moved between machines (or shared via a network directory). To avoid having the implicit namespace packages in 3.3+ cause any problems with this approach, the SSE subdirectories should contain __init__.py files that explicitly raise ImportError. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Am 06.12.2013 06:47, schrieb Nick Coghlan:
On 6 December 2013 11:52, Donald Stufft <donald@stufft.io> wrote:
On Dec 5, 2013, at 8:48 PM, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
What would really be best is run-time selection of the appropriate lib -- it would solve this problem, and allow users to re-distribute working binaries via py2exe, etc. And not require opening a security hole in wheels...
Not sure how hard that would be to do, though.
Install time selectors probably isn’t a huge deal as long as there’s a way to force a particular variant to install and to disable the executing code.
Hmm, I just had an idea for how to do the runtime selection thing. It actually shouldn't be that hard, so long as the numpy folks are OK with a bit of __path__ manipulation in package __init__ modules.
Manipulation of __path__ at runtime usually makes it harder for modulefinder to find all the required modules. Thomas
On 6 December 2013 17:10, Thomas Heller <theller@ctypes.org> wrote:
Am 06.12.2013 06:47, schrieb Nick Coghlan:
Hmm, I just had an idea for how to do the runtime selection thing. It actually shouldn't be that hard, so long as the numpy folks are OK with a bit of __path__ manipulation in package __init__ modules.
Manipulation of __path__ at runtime usually makes it harder for modulefinder to find all the required modules.
Not usually, always. That's why http://docs.python.org/2/library/modulefinder#modulefinder.AddPackagePath exists :) However, the interesting problem in this case is that we want to package 3 different versions of the modules, choosing one of them at runtime, and modulefinder definitely *won't* cope with that. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Am 06.12.2013 13:22, schrieb Nick Coghlan:
On 6 December 2013 17:10, Thomas Heller <theller@ctypes.org> wrote:
Am 06.12.2013 06:47, schrieb Nick Coghlan:
Hmm, I just had an idea for how to do the runtime selection thing. It actually shouldn't be that hard, so long as the numpy folks are OK with a bit of __path__ manipulation in package __init__ modules.
Manipulation of __path__ at runtime usually makes it harder for modulefinder to find all the required modules.
Not usually, always. That's why http://docs.python.org/2/library/modulefinder#modulefinder.AddPackagePath exists :)
Well, as the py2exe author and the (inactive, I admit) modulefinder maintainer I already know this.
However, the interesting problem in this case is that we want to package 3 different versions of the modules, choosing one of them at runtime, and modulefinder definitely *won't* cope with that.
The new importlib implementation in python3.3 offers a lot a new possibilities, probably not all of them have been explored yet. For example, I have written a ModuleMapper object that, when inserted into sys.meta_path, allows transparent mapping of module names between Python2 and Python3 - no need to use six. And the new modulefinder(*) that I've written works great with that. Thomas (*) which will be part of py2exe for python3, but it is too late for python3.4.
On Fri, Dec 6, 2013 at 5:16 AM, Thomas Heller <theller@ctypes.org> wrote:
Am 06.12.2013 13:22, schrieb Nick Coghlan:
Manipulation of __path__ at runtime usually makes it harder for
modulefinder to find all the required modules.
Not usually, always. That's why http://docs.python.org/2/library/modulefinder#modulefinder.AddPackagePath exists :)
Well, as the py2exe author and the (inactive, I admit) modulefinder maintainer I already know this.
modulefinder fails often enough that I"ve never been able ot package a non-trivial app without a bit of "force-include all of this package", (and don't-include this other thing!). So while too bad, this should not be considered deal breaker. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Fri, Dec 6, 2013 at 6:47 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 6 December 2013 11:52, Donald Stufft <donald@stufft.io> wrote:
On Dec 5, 2013, at 8:48 PM, Chris Barker - NOAA Federal <
chris.barker@noaa.gov> wrote:
What would really be best is run-time selection of the appropriate lib -- it would solve this problem, and allow users to re-distribute working binaries via py2exe, etc. And not require opening a security hole in wheels...
Not sure how hard that would be to do, though.
Install time selectors probably isn’t a huge deal as long as there’s a
way
to force a particular variant to install and to disable the executing code.
Hmm, I just had an idea for how to do the runtime selection thing. It actually shouldn't be that hard, so long as the numpy folks are OK with a bit of __path__ manipulation in package __init__ modules.
Specifically, what could be done is this:
- all of the built SSE level dependent modules would move out of their current package directories into a suitable named subdirectory (say "_nosse, _sse2, _sse3") - in the __init__.py file for each affected subpackage, you would have a snippet like:
numpy._add_sse_subdir(__path__)
where _add_sse_subdir would be something like:
def _add_sse_subdir(search_path): if len(search_path) > 1: return # Assume the SSE dependent dir has already been added # Could likely do this SSE availability check once at import time if _have_sse3(): sub_dir = "_sse3" elif _have_sse2(): sub_dir = "_sse2" else: sub_dir = "_nosse" main_dir = search_path[0] search_path.append(os.path.join(main_dir, sub_dir)
With that approach, the existing wheel model would work (no need for a variant system), and numpy installations could be freely moved between machines (or shared via a network directory).
Hmm, taking a compile flag and encoding it in the package layout seems like a fundamentally wrong approach. And in order to not litter the source tree and all installs with lots of empty dirs, the changes to __init__.py will have to be made at build time based on whether you're building Windows binaries or something else. Path manipulation is usually fragile as well. So I suspect this is not going to fly. Ralf
To avoid having the implicit namespace packages in 3.3+ cause any problems with this approach, the SSE subdirectories should contain __init__.py files that explicitly raise ImportError.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 6 December 2013 17:21, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, Dec 6, 2013 at 6:47 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
With that approach, the existing wheel model would work (no need for a variant system), and numpy installations could be freely moved between machines (or shared via a network directory).
Hmm, taking a compile flag and encoding it in the package layout seems like a fundamentally wrong approach. And in order to not litter the source tree and all installs with lots of empty dirs, the changes to __init__.py will have to be made at build time based on whether you're building Windows binaries or something else. Path manipulation is usually fragile as well. So I suspect this is not going to fly.
In the absence of the perfect solution (i.e. picking the right variant out of no SSE, SSE2, SSE3 automatically), would it be a reasonable compromise to standardise on SSE2 as "lowest acceptable common denominator"? Users with no sse capability at all or that wanted to take advantage of the SSE3 optimisations, would need to grab one of the Windows installers or something from conda, but for a lot of users, a "pip install numpy" that dropped the SSE2 version onto their system would be just fine, and a much lower barrier to entry than "well, first install this other packaging system that doesn't interoperate with your OS package manager at all...". Are we letting perfect be the enemy of better, here? (punting on the question for 6 months and seeing if we can deal with the install-time variant problem in pip 1.6 is certainly an option, but if we don't *need* to wait that long...) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
How does conda handle SSE vs SSE2 vs SSE3? I’m digging through it’s source code and just installed numpy with it and I can’t seem to find any handling of that? On Dec 6, 2013, at 7:33 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 6 December 2013 17:21, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, Dec 6, 2013 at 6:47 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
With that approach, the existing wheel model would work (no need for a variant system), and numpy installations could be freely moved between machines (or shared via a network directory).
Hmm, taking a compile flag and encoding it in the package layout seems like a fundamentally wrong approach. And in order to not litter the source tree and all installs with lots of empty dirs, the changes to __init__.py will have to be made at build time based on whether you're building Windows binaries or something else. Path manipulation is usually fragile as well. So I suspect this is not going to fly.
In the absence of the perfect solution (i.e. picking the right variant out of no SSE, SSE2, SSE3 automatically), would it be a reasonable compromise to standardise on SSE2 as "lowest acceptable common denominator"?
Users with no sse capability at all or that wanted to take advantage of the SSE3 optimisations, would need to grab one of the Windows installers or something from conda, but for a lot of users, a "pip install numpy" that dropped the SSE2 version onto their system would be just fine, and a much lower barrier to entry than "well, first install this other packaging system that doesn't interoperate with your OS package manager at all...".
Are we letting perfect be the enemy of better, here? (punting on the question for 6 months and seeing if we can deal with the install-time variant problem in pip 1.6 is certainly an option, but if we don't *need* to wait that long...)
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Fri, Dec 6, 2013 at 12:44 PM, Donald Stufft <donald@stufft.io> wrote:
How does conda handle SSE vs SSE2 vs SSE3? I’m digging through it’s source code and just installed numpy with it and I can’t seem to find any handling of that?
I can't speak for conda, but @enthought, we solve it by using the MKL, which selects the right implementation at runtime. Linux distributions have system to cope with it (the hwcap capabtility of ld), but even there few packages use it. Atlas, libc are the ones I am aware of. And this breaks anyway when you use static linking obviously. David
On Dec 6, 2013, at 7:33 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Fri, Dec 6, 2013 at 6:47 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
With that approach, the existing wheel model would work (no need for a variant system), and numpy installations could be freely moved between machines (or shared via a network directory).
Hmm, taking a compile flag and encoding it in the package layout seems
a fundamentally wrong approach. And in order to not litter the source
On 6 December 2013 17:21, Ralf Gommers <ralf.gommers@gmail.com> wrote: like tree
and all installs with lots of empty dirs, the changes to __init__.py will have to be made at build time based on whether you're building Windows binaries or something else. Path manipulation is usually fragile as well. So I suspect this is not going to fly.
In the absence of the perfect solution (i.e. picking the right variant out of no SSE, SSE2, SSE3 automatically), would it be a reasonable compromise to standardise on SSE2 as "lowest acceptable common denominator"?
Users with no sse capability at all or that wanted to take advantage of the SSE3 optimisations, would need to grab one of the Windows installers or something from conda, but for a lot of users, a "pip install numpy" that dropped the SSE2 version onto their system would be just fine, and a much lower barrier to entry than "well, first install this other packaging system that doesn't interoperate with your OS package manager at all...".
Are we letting perfect be the enemy of better, here? (punting on the question for 6 months and seeing if we can deal with the install-time variant problem in pip 1.6 is certainly an option, but if we don't *need* to wait that long...)
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Fri, Dec 6, 2013 at 1:33 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Fri, Dec 6, 2013 at 6:47 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
With that approach, the existing wheel model would work (no need for a variant system), and numpy installations could be freely moved between machines (or shared via a network directory).
Hmm, taking a compile flag and encoding it in the package layout seems
a fundamentally wrong approach. And in order to not litter the source
On 6 December 2013 17:21, Ralf Gommers <ralf.gommers@gmail.com> wrote: like tree
and all installs with lots of empty dirs, the changes to __init__.py will have to be made at build time based on whether you're building Windows binaries or something else. Path manipulation is usually fragile as well. So I suspect this is not going to fly.
In the absence of the perfect solution (i.e. picking the right variant out of no SSE, SSE2, SSE3 automatically), would it be a reasonable compromise to standardise on SSE2 as "lowest acceptable common denominator"?
Maybe, yes. It's hard to figure out the impact of this, but I'll bring it up on the numpy list. If no one has a good way to get some statistics on cpu's that don't support these instruction sets, it may be worth a try for one of the Python versions and see how many users will run into the issue. On accident we've released an incorrect binary once before by the way (scipy 0.8.0 for Python 2.5) and that was a problem fairly quickly: https://github.com/scipy/scipy/issues/1697. That was 2010 though.
Users with no sse capability at all or that wanted to take advantage of the SSE3 optimisations, would need to grab one of the Windows installers or something from conda, but for a lot of users, a "pip install numpy" that dropped the SSE2 version onto their system would be just fine, and a much lower barrier to entry than "well, first install this other packaging system that doesn't interoperate with your OS package manager at all...".
Well, for most Windows users grabbing a .exe and clicking on it is a lower barrier that opening a console and typing "pip install numpy":)
Are we letting perfect be the enemy of better, here? (punting on the question for 6 months and seeing if we can deal with the install-time variant problem in pip 1.6 is certainly an option, but if we don't *need* to wait that long...)
Let's first get the OS X wheels up, that can be done now. And then see what is decided on the numpy list for the compromise you propose above. Ralf
On Fri, Dec 6, 2013 at 4:33 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
In the absence of the perfect solution (i.e. picking the right variant out of no SSE, SSE2, SSE3 automatically), would it be a reasonable compromise to standardise on SSE2 as "lowest acceptable common denominator"?
+1
Users with no sse capability at all or that wanted to take advantage of the SSE3 optimisations, would need to grab one of the Windows installers or something from conda, but for a lot of users, a "pip install numpy" that dropped the SSE2 version onto their system would be just fine, and a much lower barrier to entry than "well, first install this other packaging system that doesn't interoperate with your OS package manager at all...".
exactly -- for example, I work with a web dev that could really use Matplotlib for a little task -- if I could tell him to "pip install matplotlib", he's do it, but he just sees it as too much hassle at the point...
Are we letting perfect be the enemy of better, here?
I think so, yes. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Thu, Dec 5, 2013 at 11:21 PM, Ralf Gommers <ralf.gommers@gmail.com>wrote:
Hmm, taking a compile flag and encoding it in the package layout seems like a fundamentally wrong approach.
well, it's pretty ugly hack, but sometimes an ugly hack that does the job is better than nothing. IIUC, the Intel MKL libs do some sort of dynamic switching at run time too -- and that is a great feature.
And in order to not litter the source tree and all installs with lots of empty dirs,
where "lots" what, 3? Is that so bad in a project the size of numpy? the changes to __init__.py will have to be made at build time based on
whether you're building Windows binaries or something else.
That might in fact be nicer than the "litter", but also may be a less robust and more annoying way to do it.
Path manipulation is usually fragile as well.
My first instinct was that you'd re-name directories on the fly, which might be more robust, but wouldn't work in any kind of secure environment. so a no-go. But could you elaborate on the fragile nature of sys.path manipulation? What might go wrong there? Also, it's not out of the question that once such a system was in place, that it could be used on systems other than Windows.... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Fri, Dec 6, 2013 at 5:47 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 6 December 2013 11:52, Donald Stufft <donald@stufft.io> wrote:
On Dec 5, 2013, at 8:48 PM, Chris Barker - NOAA Federal <
chris.barker@noaa.gov> wrote:
What would really be best is run-time selection of the appropriate lib -- it would solve this problem, and allow users to re-distribute working binaries via py2exe, etc. And not require opening a security hole in wheels...
Not sure how hard that would be to do, though.
Install time selectors probably isn’t a huge deal as long as there’s a
way
to force a particular variant to install and to disable the executing code.
Hmm, I just had an idea for how to do the runtime selection thing. It actually shouldn't be that hard, so long as the numpy folks are OK with a bit of __path__ manipulation in package __init__ modules.
As Ralf, I think it is overkill. The problem of SSE vs non SSE is because of one library, ATLAS, which as IMO the design flaw of being arch specific. I always hoped we could get away from this when I built those special installers for numpy :) MKL does not have this issue, and now that openblas (under a BSD license) can be used as well, we can alleviate this for deployment. Building a deployment story for this is not justified. David
Specifically, what could be done is this:
- all of the built SSE level dependent modules would move out of their current package directories into a suitable named subdirectory (say "_nosse, _sse2, _sse3") - in the __init__.py file for each affected subpackage, you would have a snippet like:
numpy._add_sse_subdir(__path__)
where _add_sse_subdir would be something like:
def _add_sse_subdir(search_path): if len(search_path) > 1: return # Assume the SSE dependent dir has already been added # Could likely do this SSE availability check once at import time if _have_sse3(): sub_dir = "_sse3" elif _have_sse2(): sub_dir = "_sse2" else: sub_dir = "_nosse" main_dir = search_path[0] search_path.append(os.path.join(main_dir, sub_dir)
With that approach, the existing wheel model would work (no need for a variant system), and numpy installations could be freely moved between machines (or shared via a network directory).
To avoid having the implicit namespace packages in 3.3+ cause any problems with this approach, the SSE subdirectories should contain __init__.py files that explicitly raise ImportError.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 6 December 2013 13:06, David Cournapeau <cournape@gmail.com> wrote:
As Ralf, I think it is overkill. The problem of SSE vs non SSE is because of one library, ATLAS, which as IMO the design flaw of being arch specific. I always hoped we could get away from this when I built those special installers for numpy :)
MKL does not have this issue, and now that openblas (under a BSD license) can be used as well, we can alleviate this for deployment. Building a deployment story for this is not justified.
Oh, okay that's great. How hard would it be to get openblas numpy wheels up and running? Would they be compatible with the existing scipy etc. binaries? Oscar
On Fri, Dec 6, 2013 at 2:48 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com>wrote:
On 6 December 2013 13:06, David Cournapeau <cournape@gmail.com> wrote:
As Ralf, I think it is overkill. The problem of SSE vs non SSE is
because of
one library, ATLAS, which as IMO the design flaw of being arch specific. I always hoped we could get away from this when I built those special installers for numpy :)
MKL does not have this issue, and now that openblas (under a BSD license) can be used as well, we can alleviate this for deployment. Building a deployment story for this is not justified.
Oh, okay that's great. How hard would it be to get openblas numpy wheels up and running? Would they be compatible with the existing scipy etc. binaries?
OpenBLAS is still pretty buggy compared to ATLAS (although performance in many cases seems to be on par); I don't think that will be well received for the official releases. We actually did discuss it as an alternative for Accelerate on OS X, but there was quite a bit of opposition. Ralf
On Fri, Dec 6, 2013 at 5:06 AM, David Cournapeau <cournape@gmail.com> wrote:
As Ralf, I think it is overkill. The problem of SSE vs non SSE is because of one library, ATLAS, which as IMO the design flaw of being arch specific.
yup -- really designed for the end user to built it themselves....
MKL does not have this issue, and now that openblas (under a BSD license) can be used as well, we can alleviate this for deployment. Building a deployment story for this is not justified.
So Openblas has run-time selection of the right binary? very cool! So are we done here? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Fri, Dec 6, 2013 at 5:50 PM, Chris Barker <chris.barker@noaa.gov> wrote:
On Fri, Dec 6, 2013 at 5:06 AM, David Cournapeau <cournape@gmail.com>wrote:
As Ralf, I think it is overkill. The problem of SSE vs non SSE is because of one library, ATLAS, which as IMO the design flaw of being arch specific.
yup -- really designed for the end user to built it themselves....
MKL does not have this issue, and now that openblas (under a BSD license) can be used as well, we can alleviate this for deployment. Building a deployment story for this is not justified.
So Openblas has run-time selection of the right binary? very cool! So are we done here?
Not that I know of, but you can easily build one for a given architecture, which is essentially impossible to do with Atlas reliably. I did not know about openblas instabilities, though. I guess we will have to do some more testing. David
-Chris
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/01/2013 05:17 PM, Nick Coghlan wrote:
I see conda as existing at a similar level to apt and yum from a packaging point of view, with zc.buildout as a DIY equivalent at that level.
FTR: zc.buildout does nothing to insulate you from the need for a compiler; it does allow you to create repeatable builds from source for non-Python components which would otherwise vary with the underlying platform. The actual recipes for such components often involve a *lot* of yak shaving. ;) Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlKcjIMACgkQ+gerLs4ltQ5XlQCeMmoyvAOvJGChhpGOF2Phkut0 nfwAnjj2pbr8bHKfS8+lzt/XorPVNzSe =QmuK -----END PGP SIGNATURE-----
2. For cross-platform handling of external binary dependencies, we recommend boostrapping the open source conda toolchain, and using that to install pre-built binaries (currently administered by the Continuum Analytics folks).
We already have a recommendation for conda: "If you’re looking for management of fully integrated cross-platform software stacks". That's admittedly pretty abstract. I certainly see no harm in expanding this so that it does a better job of attracting users that would seriously benefit. Maybe even link to a section on "Conda Environments" on the Advanced page. I don't have a good sense for the scope of what's currently available in anaconda to know how many users it could help. I guess I'd have to see the wording and placement of the recommendation, to judge it fully. My concern is just to prevent a horde of users trying to use conda environments that don't really need it.
On 1 December 2013 04:15, Nick Coghlan <ncoghlan@gmail.com> wrote:
conda has its own binary distribution format, using hash based dependencies. It's this mechanism which allows it to provide reliable cross platform binary dependency management, but it's also the same mechanism that prevents low impact security updates and interoperability with platform provided packages.
Nick can you provide a link to somewhere that explains the hash based dependency thing please? I've read the following... http://docs.continuum.io/conda/ https://speakerdeck.com/teoliphant/packaging-and-deployment-with-conda http://docs.continuum.io/anaconda/index.html http://continuum.io/blog/new-advances-in-conda http://continuum.io/blog/conda http://docs.continuum.io/conda/build.html ...but I see no reference to hash-based dependencies. In fact the only place I have seen a reference to hash-based dependencies is your comment at the bottom of this github issue: https://github.com/ContinuumIO/conda/issues/292 AFAICT conda/binstar are alternatives for pip/PyPI that happen to host binaries for some packages that don't have binaries on PyPI. (conda also provides a different - incompatible - take on virtualenvs but that's not relevant to this proposal). Oscar
On 3 December 2013 21:22, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
AFAICT conda/binstar are alternatives for pip/PyPI that happen to host binaries for some packages that don't have binaries on PyPI. (conda also provides a different - incompatible - take on virtualenvs but that's not relevant to this proposal).
It sounds like I may have been confusing two presentations at the packaging mini-summit, as I would have sworn conda used hashes to guarantee a consistent set of packages. I know I have mixed up features between hashdist and conda in the past (and there have been some NixOS features mixed in there as well), so it wouldn't be the first time that has happened - the downside of mining different distribution systems for ideas is that sometimes I forget where I encountered particular features :) If conda doesn't offer such an internal consistency guarantee for published package sets, then I agree with the criticism that it's just an alternative to running a private PyPI index server hosting wheel files pre-built with particular options, and thus it becomes substantially less interesting to me :( Under that model, what conda is doing is *already covered* in the draft metadata 2.0 spec (as of the changes I posted about the other day), since that now includes an "integrator suffix" (to indicate when a downstream rebuilder has patched the software), as well as a "python.integrator" metadata extension to give details of the rebuild. The namespacing in the wheel case is handled by not allowing rebuilds to be published on PyPI - they have to be published on a separate index server, and thus can be controlled based on where you tell pip to look. So, I apologise for starting the thread based on what appears to be a fundamentally false premise, although I think it has still been useful despite that error on my part (as the user confusion is real, even though my specific proposal no longer seems as useful as I first thought). I believe helping the conda devs to get it to play nice with virtual environments is still a worthwhile exercise though (even if just by pointing out areas where it *doesn't* currently interoperate well, as we've been doing in the last day or so), and if the conda bootstrapping issue is fixed by publishing wheels (or vendoring dependencies), then "try conda if there's no wheel" may still be a reasonable fallback recommendation. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 3 December 2013 11:54, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 3 December 2013 21:22, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
AFAICT conda/binstar are alternatives for pip/PyPI that happen to host binaries for some packages that don't have binaries on PyPI. (conda also provides a different - incompatible - take on virtualenvs but that's not relevant to this proposal).
It sounds like I may have been confusing two presentations at the packaging mini-summit, as I would have sworn conda used hashes to guarantee a consistent set of packages. I know I have mixed up features between hashdist and conda in the past (and there have been some NixOS features mixed in there as well), so it wouldn't be the first time that has happened - the downside of mining different distribution systems for ideas is that sometimes I forget where I encountered particular features :)
I had the same confusion with hashdist at the start of this thread when I said that conda was targeted at HPC. So if we both make the same mistake I guess it's forgiveable :)
If conda doesn't offer such an internal consistency guarantee for published package sets, then I agree with the criticism that it's just an alternative to running a private PyPI index server hosting wheel files pre-built with particular options, and thus it becomes substantially less interesting to me :(
Perhaps Travis who is still CC'ed here could comment on this since it is apparent that no one here really understands what conda is and he apparently works for Continuum Analytics so should (hopefully) know a little more...
Under that model, what conda is doing is *already covered* in the draft metadata 2.0 spec (as of the changes I posted about the other day), since that now includes an "integrator suffix" (to indicate when a downstream rebuilder has patched the software), as well as a "python.integrator" metadata extension to give details of the rebuild. The namespacing in the wheel case is handled by not allowing rebuilds to be published on PyPI - they have to be published on a separate index server, and thus can be controlled based on where you tell pip to look.
Do you mean to say that PyPI can (should) only host a binary-compatible set of wheels and that other index servers should do the same? I still think that there needs to be some kind of compatibility tags either way.
So, I apologise for starting the thread based on what appears to be a fundamentally false premise, although I think it has still been useful despite that error on my part (as the user confusion is real, even though my specific proposal no longer seems as useful as I first thought).
I believe helping the conda devs to get it to play nice with virtual environments is still a worthwhile exercise though (even if just by pointing out areas where it *doesn't* currently interoperate well, as we've been doing in the last day or so), and if the conda bootstrapping issue is fixed by publishing wheels (or vendoring dependencies), then "try conda if there's no wheel" may still be a reasonable fallback recommendation.
Well for a start conda (at least according to my failed build) over-writes the virtualenv activate scripts with its own scripts that do something completely different and can't even be called with the same signature. So it looks to me as if there is no intention of virtualenv compatibility. As for "try conda if there's no wheel" according to what I've read that seems to be what people who currently use conda do. I thought about another thing during the course of this thread. To what extent can Provides/Requires help out with the binary incompatibility problems? For example numpy really does provide multiple interfaces: 1) An importable Python module that can be used from Python code. 2) A C-API that can be used by compiled C-extensions. 3) BLAS/LAPACK libraries with a particular Fortran ABI to any other libraries in the same process. Perhaps the solution is that a build of a numpy wheel should clarify explicitly what it Provides at each level e.g.: Provides: numpy Provides: numpy-capi-v1 Provides: numpy-openblas-g77 Then a built wheel for scipy can Require the same things. Cristoph Gohlke could provide a numpy wheel with: Provides: numpy Provides: numpy-capi-v1 Provides: numpy-intelmkl And his scipy wheel can require the same. This would mean that pip would understand the binary dependency problems during dependency resolution and could reject an incompatible wheel at install time as well as being able to find a compatible wheel automatically if one exists in the server. Unlike the hash-based dependencies we can see that it is possible to depend on the numpy C-API without necessarily depending on any particular BLAS/LAPACK library and Fortran compiler combination. The confusing part would be that then a built wheel doesn't Provide the same thing as the corresponding sdist. How would anyone know what would be Provided by an sdist without first building it into a wheel? Would there need to be a way for pip to tell the sdist what pip wants it to Provide when building it? Oscar
On 3 December 2013 22:49, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
On 3 December 2013 11:54, Nick Coghlan <ncoghlan@gmail.com> wrote:
I believe helping the conda devs to get it to play nice with virtual environments is still a worthwhile exercise though (even if just by pointing out areas where it *doesn't* currently interoperate well, as we've been doing in the last day or so), and if the conda bootstrapping issue is fixed by publishing wheels (or vendoring dependencies), then "try conda if there's no wheel" may still be a reasonable fallback recommendation.
Well for a start conda (at least according to my failed build) over-writes the virtualenv activate scripts with its own scripts that do something completely different and can't even be called with the same signature. So it looks to me as if there is no intention of virtualenv compatibility.
Historically there hadn't been much work in that direction, but I think there's been some increasing awareness of the importance of compatibility with the standard tools recently (I'm not certain, but the acceptance of PEP 453 may have had some impact there). I also consider Travis a friend, and have bent his ear over some of the compatibility issues, as well as the fact that pip has to handle additional usage scenarios that just aren't relevant to most of the scientific community, but are critical for professional application developers and system integrators :) The recent addition of "conda init" (in order to reuse a venv or virtualenv environment) was a big step in the right direction, and there's an issue filed about activate getting clobbered: https://github.com/ContinuumIO/conda/issues/374 (before conda init, you couldn't really mix conda and virtualenv, so the fact they both had activate scripts didn't matter. Now it does, since it affects the usability of conda init)
As for "try conda if there's no wheel" according to what I've read that seems to be what people who currently use conda do.
I thought about another thing during the course of this thread. To what extent can Provides/Requires help out with the binary incompatibility problems? For example numpy really does provide multiple interfaces: 1) An importable Python module that can be used from Python code. 2) A C-API that can be used by compiled C-extensions. 3) BLAS/LAPACK libraries with a particular Fortran ABI to any other libraries in the same process.
Perhaps the solution is that a build of a numpy wheel should clarify explicitly what it Provides at each level e.g.:
Provides: numpy Provides: numpy-capi-v1 Provides: numpy-openblas-g77
Then a built wheel for scipy can Require the same things. Cristoph Gohlke could provide a numpy wheel with:
Provides: numpy Provides: numpy-capi-v1 Provides: numpy-intelmkl
Hmm, I likely wouldn't build it into the core requirement system (that all operates at the distribution level), but the latest metadata updates split out a bunch of the optional stuff to extensions (see https://bitbucket.org/pypa/pypi-metadata-formats/src/default/standard-metada...). What we're really after at this point is the ability to *detect* conflicts if somebody tries to install incompatible builds into the same virtual environment (e.g. you installed from custom index server originally, but later you forget and install from PyPI). So perhaps we could have a "python.expects" extension, where we can assert certain things about the metadata of other distributions in the environment. So, say that numpy were to define a custom extension where they can define the exported binary interfaces: "extensions": { "numpy.compatibility": { "api_version": 1, "fortran_abi": "openblas-g77" } } And for the Gohlke rebuilds: "extensions": { "numpy.compatibility": { "api_version": 1, "fortran_abi": "intelmki" } } Then another component might have in its metadata: "extensions": { "python.expects": { "numpy": { "extensions": { "numpy.compatibility": { "fortran_abi": "openblas-g77" } } } } } The above would be read as '"this distribution expects the numpy distribution in this environment to publish the "numpy.compatibility" extension in its metadata, with the "fortran_abi" field set to "openblas-g77"' If you attempted to install that component into an environment with the intelmki FORTRAN ABI declared, it would fail, since the expectation wouldn't match the reality.
And his scipy wheel can require the same. This would mean that pip would understand the binary dependency problems during dependency resolution and could reject an incompatible wheel at install time as well as being able to find a compatible wheel automatically if one exists in the server. Unlike the hash-based dependencies we can see that it is possible to depend on the numpy C-API without necessarily depending on any particular BLAS/LAPACK library and Fortran compiler combination.
I like the general idea of being able to detect conflicts through the published metadata, but would like to use the extension mechanism to avoid name conflicts.
The confusing part would be that then a built wheel doesn't Provide the same thing as the corresponding sdist. How would anyone know what would be Provided by an sdist without first building it into a wheel? Would there need to be a way for pip to tell the sdist what pip wants it to Provide when building it?
I think that's a separate (harder) problem, but one that the expectation approach potentially solves, since we'd just have to provide a list of expectations for a distribution to the build process (and individual distributions would have full control over defining what expectations will influence the build process, most likely through custom extensions). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 3 December 2013 13:53, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 3 December 2013 22:49, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Hmm, I likely wouldn't build it into the core requirement system (that all operates at the distribution level), but the latest metadata updates split out a bunch of the optional stuff to extensions (see https://bitbucket.org/pypa/pypi-metadata-formats/src/default/standard-metada...). What we're really after at this point is the ability to *detect* conflicts if somebody tries to install incompatible builds into the same virtual environment (e.g. you installed from custom index server originally, but later you forget and install from PyPI).
So perhaps we could have a "python.expects" extension, where we can assert certain things about the metadata of other distributions in the environment. So, say that numpy were to define a custom extension where they can define the exported binary interfaces:
"extensions": { "numpy.compatibility": { "api_version": 1, "fortran_abi": "openblas-g77" } } [snip]
I like the general idea of being able to detect conflicts through the published metadata, but would like to use the extension mechanism to avoid name conflicts.
Helping to prevent borken installs in this way would definitely be an improvement. It would be a real shame though if PyPI would contain all the metadata needed to match up compatible binary wheels but pip would only use it to show error messages rather than to actually locate the wheel that the user wants. Oscar
On 4 Dec 2013 01:31, "Oscar Benjamin" <oscar.j.benjamin@gmail.com> wrote:
On 3 December 2013 13:53, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 3 December 2013 22:49, Oscar Benjamin <oscar.j.benjamin@gmail.com>
wrote:
Hmm, I likely wouldn't build it into the core requirement system (that all operates at the distribution level), but the latest metadata updates split out a bunch of the optional stuff to extensions (see
https://bitbucket.org/pypa/pypi-metadata-formats/src/default/standard-metada... ).
What we're really after at this point is the ability to *detect* conflicts if somebody tries to install incompatible builds into the same virtual environment (e.g. you installed from custom index server originally, but later you forget and install from PyPI).
So perhaps we could have a "python.expects" extension, where we can assert certain things about the metadata of other distributions in the environment. So, say that numpy were to define a custom extension where they can define the exported binary interfaces:
"extensions": { "numpy.compatibility": { "api_version": 1, "fortran_abi": "openblas-g77" } } [snip]
I like the general idea of being able to detect conflicts through the published metadata, but would like to use the extension mechanism to avoid name conflicts.
Helping to prevent borken installs in this way would definitely be an improvement. It would be a real shame though if PyPI would contain all the metadata needed to match up compatible binary wheels but pip would only use it to show error messages rather than to actually locate the wheel that the user wants.
One of the side benefits of the TUF end-to-end security proposal is that it allows for more reliable local metadata caching. That means that even if a mechanism like this was initially only used for conflict detection, it could eventually also be used to choose between multiple possible resolutions of a dependency. Tangentially related, I'm considering adding an additional recommended URL field to the "python.integrator" extension, which would be a pointer to a custom index server operated by the integrator. Cheers, Nick.
Oscar
If conda doesn't offer such an internal consistency guarantee for published package sets, then I agree with the criticism that it's just an alternative to running a private PyPI index server hosting wheel files pre-built with particular options, and thus it becomes substantially less interesting to me :(
well, except that the anaconda index covers non-python projects like "qt", which a private wheel index wouldn't cover (at least with the normal intended use of wheels)
On 4 Dec 2013 05:54, "Marcus Smith" <qwcode@gmail.com> wrote:
If conda doesn't offer such an internal consistency guarantee for published package sets, then I agree with the criticism that it's just an alternative to running a private PyPI index server hosting wheel files pre-built with particular options, and thus it becomes substantially less interesting to me :(
well, except that the anaconda index covers non-python projects like
"qt", which a private wheel index wouldn't cover (at least with the normal intended use of wheels) Ah, true - there's still the non-trivial matter of getting hold of the external dependencies *themselves*. Anyway, this thread has at least satisfied me that we don't need to rush anything at this point - we can see how the conda folks go handling the interoperability issues, come up with an overview of the situation for creating and publishing binary extensions, keep working on getting the Python 3.4 + pip 1.5 combination out the door, and then decide later exactly how we think conda fits into the overall picture, as well as what influence the problems it solves for the scientific stack should have on the metadata 2.0 design. Cheers, Nick.
participants (14)
-
Chris Barker
-
Chris Barker - NOAA Federal
-
Daniel Holth
-
David Cournapeau
-
Donald Stufft
-
Marcus Smith
-
Nick Coghlan
-
Oscar Benjamin
-
Pachi
-
Paul Moore
-
Ralf Gommers
-
Thomas Heller
-
Tres Seaver
-
Vinay Sajip