On 5 Dec 2013 07:29, "Ralf Gommers" <ralf.gommers@gmail.com> wrote:
On Wed, Dec 4, 2013 at 11:41 AM, Oscar Benjamin <
On 4 December 2013 07:40, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Wed, Dec 4, 2013 at 1:54 AM, Donald Stufft <donald@stufft.io> wrote:
I’d love to get Wheels to the point they are more suitable then they
are
for SciPy stuff,
That would indeed be a good step forward. I'm interested to try to help get to that point for Numpy and Scipy.
Thanks Ralf. Please let me know what you think of the following.
I’m not sure what the diff between the current state and what they need to be are but if someone spells it out (I’ve only just skimmed your last email so perhaps it’s contained in that!) I’ll do the arguing for it. I just need someone who actually knows what’s needed to advise me :)
To start with, the SSE stuff. Numpy and scipy are distributed as "superpack" installers for Windows containing three full builds: no SSE, SSE2 and SSE3. Plus a script that runs at install time to check which version to use. These are built with ``paver bdist_superpack``, see https://github.com/numpy/numpy/blob/master/pavement.py#L224. The NSIS and CPU selector scripts are under tools/win32build/.
How do I package those three builds into wheels and get the right one installed by ``pip install numpy``?
This was discussed previously on this list: https://mail.python.org/pipermail/distutils-sig/2013-August/022362.html
Thanks, I'll go read that.
Essentially the current wheel format and specification does not provide a way to do this directly. There are several different possible approaches.
One possibility is that the wheel spec can be updated to include a post-install script (I believe this will happen eventually - someone correct me if I'm wrong). Then the numpy for Windows wheel can just do the same as the superpack installer: ship all variants, then delete/rename in a post-install script so that the correct variant is in place after install.
Another possibility is that the pip/wheel/PyPI/metadata system can be changed to allow a "variant" field for wheels/sdists. This was also suggested in the same thread by Nick Coghlan: https://mail.python.org/pipermail/distutils-sig/2013-August/022432.html
The variant field could be used to upload multiple variants e.g. numpy-1.7.1-cp27-cp22m-win32.whl numpy-1.7.1-cp27-cp22m-win32-sse.whl numpy-1.7.1-cp27-cp22m-win32-sse2.whl numpy-1.7.1-cp27-cp22m-win32-sse3.whl then if the user requests 'numpy:sse3' they will get the wheel with sse3 support.
Of course how would the user know if their CPU supports SSE3? I know roughly what SSE is but I don't know what level of SSE is avilable on each of the machines I use. There is a Python script/module in numpexpr that can detect this: https://github.com/eleddy/numexpr/blob/master/numexpr/cpuinfo.py
When I run that script on this machine I get: $ python cpuinfo.py CPU information: CPUInfoBase__get_nbits=32 getNCPUs=2 has_mmx has_sse2 is_32bit is_Core2 is_Intel is_i686
So perhaps someone could break that script out of numexpr and release it as a separate package on PyPI.
That's similar to what numpy has - actually it's a copy from numpy.distutils.cpuinfo
Then the instructions for installing numpy could be something like """ You can install numpy with
$pip install numpy
which will download the default version without any CPU-specific
optimisations.
If you know what level of SSE support your CPU has then you can download a more optimised numpy with either of:
$ pip install numpy:sse2 $ pip install numpy:sse3
To determine whether or not your CPU has SSE2 or SSE3 or no SSE support you can install and run the cpuinfo script. For example on this machine:
$ pip install cpuinfo $ python -m cpuinfo --sse This CPU supports the SSE3 instruction set.
That means we can install numpy:sse3. """
The problem with all of the above is indeed that it's not quite automatic. You don't want your user to have to know or care about what SSE is. Nor do you want to create a new package just to hack around a pip
oscar.j.benjamin@gmail.com> wrote: limitation. I like the post-install (or pre-install) option much better.
Of course it would be a shame to have a solution that is so close to automatic without quite being automatic. Also the problem is that having no SSE support in the default numpy means that lots of people would lose out on optimisations. For example if numpy is installed as a dependency of something else then the user would always end up with the unoptimised no-SSE binary.
Another possibility is that numpy could depend on the cpuinfo package so that it gets installed automatically before numpy. Then if the cpuinfo package has a traditional setup.py sdist (not a wheel) it could detect the CPU information at install time and store that in its package metadata. Then pip would be aware of this metadata and could use it to determine which wheel is appropriate.
I don't quite know if this would work but perhaps the cpuinfo could announce that it "Provides" e.g. cpuinfo:sse2. Then a numpy wheel could "Requires" cpuinfo:sse2 or something along these lines. Or perhaps this is better handled by the metadata extensions Nick suggested earlier in this thread.
I think it would be good to work out a way of doing this with e.g. a cpuinfo package. Many other packages beyond numpy could make good use of that metadata if it were available. Similarly having an extensible mechanism for selecting wheels based on additional information about the user's system could be used for many more things than just CPU architectures.
I agree extensibility is quite important. Whatever scheme you'd think of
with pre-defined tags will fail the next time anyone has a new idea (random example: what if we start shipping parallel sets of binaries that only differ in whether they're linked against ATLAS, OpenBLAS or MKL). Hmm, rather than adding complexity most folks don't need directly to the base wheel spec, here's a possible "multiwheel" notion - embed multiple wheels with different names inside the multiwheel, along with a self-contained selector function for choosing which ones to actually install on the current system. This could be used not only for the NumPy use case, but also allow the distribution of external dependencies while allowing their installation to be skipped if they're already present on the target system. Cheers, Nick.
Ralf
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig