[Distutils] Handling the binary dependency management problem

Wed Dec 4 13:10:15 CET 2013

On 4 December 2013 20:41, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:
> On 4 December 2013 07:40, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>> On Wed, Dec 4, 2013 at 1:54 AM, Donald Stufft <donald at stufft.io> wrote:
>>>
>>> I’d love to get Wheels to the point they are more suitable then they are
>>> for SciPy stuff,
>>
>> That would indeed be a good step forward. I'm interested to try to help get
>> to that point for Numpy and Scipy.
>
> Thanks Ralf. Please let me know what you think of the following.
>
>>> I’m not sure what the diff between the current state and what
>>> they need to be are but if someone spells it out (I’ve only just skimmed
>>> your last email so perhaps it’s contained in that!) I’ll do the arguing
>>> for it. I
>>> just need someone who actually knows what’s needed to advise me :)
>>
>> To start with, the SSE stuff. Numpy and scipy are distributed as "superpack"
>> installers for Windows containing three full builds: no SSE, SSE2 and SSE3.
>> Plus a script that runs at install time to check which version to use. These
>> are built with ``paver bdist_superpack``, see
>> https://github.com/numpy/numpy/blob/master/pavement.py#L224. The NSIS and
>> CPU selector scripts are under tools/win32build/.
>>
>> How do I package those three builds into wheels and get the right one
>> installed by ``pip install numpy``?
>
> This was discussed previously on this list:
> https://mail.python.org/pipermail/distutils-sig/2013-August/022362.html
>
> Essentially the current wheel format and specification does not
> provide a way to do this directly. There are several different
> possible approaches.
>
> One possibility is that the wheel spec can be updated to include a
> post-install script (I believe this will happen eventually - someone
> correct me if I'm wrong). Then the numpy for Windows wheel can just do
> the same as the superpack installer: ship all variants, then
> delete/rename in a post-install script so that the correct variant is
> in place after install.

Yes, export hooks in metadata 2.0 would support this approach.
However, export hooks require allowing just-downloaded code to run
with elevated privileges, so we're trying to minimise the number of
cases where they're needed.

> Another possibility is that the pip/wheel/PyPI/metadata system can be
> changed to allow a "variant" field for wheels/sdists. This was also
> suggested in the same thread by Nick Coghlan:
> https://mail.python.org/pipermail/distutils-sig/2013-August/022432.html
>
> The variant field could be used to upload multiple variants e.g.
> numpy-1.7.1-cp27-cp22m-win32.whl
> numpy-1.7.1-cp27-cp22m-win32-sse.whl
> numpy-1.7.1-cp27-cp22m-win32-sse2.whl
> numpy-1.7.1-cp27-cp22m-win32-sse3.whl
> then if the user requests 'numpy:sse3' they will get the wheel with
> sse3 support.

That was what I was originally thinking for the variant field, but I
later realised it makes more sense to treat the "variant" marker as
part of the *platform* tag, rather than being an independent tag in
its own right: https://bitbucket.org/pypa/pypi-metadata-formats/issue/15/enhance-the-platform-tag-definition-for

Under that approach, pip would figure out all the variants that
applied to the current system (with some default preference order
between variants for platforms where one system may support multiple
variants). Using the Linux distro variants (based on ID and RELEASE_ID
in /etc/os-release) as an example rather than the Windows SSE
variants, this might look like:

  cp33-cp33m-linux_x86_64_fedora_19
  cp33-cp33m-linux_x86_64_fedora
  cp33-cp33m-linux_x86_64

The Windows SSE variants might look like:

  cp33-cp33m-win32_sse3
  cp33-cp33m-win32_sse2
  cp33-cp33m-win32_sse
  cp33-cp33m-win32

> Of course how would the user know if their CPU supports SSE3? I know
> roughly what SSE is but I don't know what level of SSE is avilable on
> each of the machines I use.

Asking this question is how I realised the variant tag should probably
be part of the platform field and handled automatically by pip rather
than users needing to request it explicitly. However, it's not without
its problems (more on that below)

> There is a Python script/module in
> numpexpr that can detect this:
> https://github.com/eleddy/numexpr/blob/master/numexpr/cpuinfo.py
>
> When I run that script on this machine I get:
> $ python cpuinfo.py
> CPU information: CPUInfoBase__get_nbits=32 getNCPUs=2 has_mmx has_sse2
> is_32bit is_Core2 is_Intel is_i686
>
> So perhaps someone could break that script out of numexpr and release
> it as a separate package on PyPI. Then the instructions for installing
> numpy could be something like
> """
> You can install numpy with
>
>     $pip install numpy
>
> which will download the default version without any CPU-specific optimisations.
>
> If you know what level of SSE support your CPU has then you can
> download a more optimised numpy with either of:
>
>     $ pip install numpy:sse2
>     $ pip install numpy:sse3
>
> To determine whether or not your CPU has SSE2 or SSE3 or no SSE
> support you can install and run the cpuinfo script. For example on
> this machine:
>
>     $ pip install cpuinfo
>     $ python -m cpuinfo --sse
>     This CPU supports the SSE3 instruction set.
>
> That means we can install numpy:sse3.
> """
>
> Of course it would be a shame to have a solution that is so close to
> automatic without quite being automatic. Also the problem is that
> having no SSE support in the default numpy means that lots of people
> would lose out on optimisations. For example if numpy is installed as
> a dependency of something else then the user would always end up with
> the unoptimised no-SSE binary.

The other question I asked that made me realise the SSE information
should be an optional part of the platform tag :)

> Another possibility is that numpy could depend on the cpuinfo package
> so that it gets installed automatically before numpy. Then if the
> cpuinfo package has a traditional setup.py sdist (not a wheel) it
> could detect the CPU information at install time and store that in its
> package metadata. Then pip would be aware of this metadata and could
> use it to determine which wheel is appropriate.
>
> I don't quite know if this would work but perhaps the cpuinfo could
> announce that it "Provides" e.g. cpuinfo:sse2. Then a numpy wheel
> could "Requires" cpuinfo:sse2 or something along these lines. Or
> perhaps this is better handled by the metadata extensions Nick
> suggested earlier in this thread.
>
> I think it would be good to work out a way of doing this with e.g. a
> cpuinfo package. Many other packages beyond numpy could make good use
> of that metadata if it were available. Similarly having an extensible
> mechanism for selecting wheels based on additional information about
> the user's system could be used for many more things than just CPU
> architectures.

Yes, the lack of extensibility is the one concern I have with baking
the CPU SSE info into the platform tag. On the other hand, the CPU
architecture info is already in there, so appending the vectorisation
support isn't an obviously bad idea, is orthogonal to the
"python.expects" consistency enforcement metadata and would cover the
NumPy use case, which is the one we really care about at this point.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia