[Numpy-discussion] Intel random number package

Nathaniel Smith njs at pobox.com
Wed Oct 26 15:24:48 EDT 2016


On Wed, Oct 26, 2016 at 9:10 AM, Julian Taylor
<jtaylor.debian at googlemail.com> wrote:
> On 10/26/2016 06:00 PM, Julian Taylor wrote:
>>
>> On 10/26/2016 10:59 AM, Ralf Gommers wrote:
>>>
>>>
>>>
>>> On Wed, Oct 26, 2016 at 8:33 PM, Julian Taylor
>>> <jtaylor.debian at googlemail.com <mailto:jtaylor.debian at googlemail.com>>
>>> wrote:
>>>
>>>     On 26.10.2016 06:34, Charles R Harris wrote:
>>>     > Hi All,
>>>     >
>>>     > There is a proposed random number package PR now up on github:
>>>     > https://github.com/numpy/numpy/pull/8209
>>>     <https://github.com/numpy/numpy/pull/8209>. It is from
>>>     > oleksandr-pavlyk <https://github.com/oleksandr-pavlyk
>>>     <https://github.com/oleksandr-pavlyk>> and implements
>>>     > the number random number package using MKL for increased speed.
>>> I think
>>>     > we are definitely interested in the improved speed, but I'm not
>>> sure
>>>     > numpy is the best place to put the package. I'd welcome any
>>> comments on
>>>     > the PR itself, as well as any thoughts on the best way organize
>>> or use
>>>     > of this work. Maybe scikit-random
>>>
>>>
>>> Note that this thread is a continuation of
>>> https://mail.scipy.org/pipermail/numpy-discussion/2016-July/075822.html
>>>
>>>
>>>
>>>     I'm not a fan of putting code depending on a proprietary library
>>>     into numpy.
>>>     This should be a standalone package which may provide the same
>>> interface
>>>     as numpy.
>>>
>>>
>>> I don't really see a problem with that in principle. Numpy can use Intel
>>> MKL (and Accelerate) as well if it's available. It needs some thought
>>> put into the API though - a ``numpy.random_intel`` module is certainly
>>> not what we want.
>>>
>>
>> For me there is a difference between being able to optionally use a
>> proprietary library as an alternative to free software libraries if the
>> user wishes to do so and offering functionality that only works with
>> non-free software.
>> We are providing a form of advertisement for them by allowing it (hey if
>> you buy this black box that you cannot modify or use freely you get this
>> neat numpy feature!).
>>
>> I prefer for the full functionality of numpy to stay available with a
>> stack of community owned software, even if it may be less powerful that
>> way.
>
> But then if this is really just the same random numbers numpy already
> provides just faster, it is probably acceptable in principle. I haven't
> actually looked at the PR yet.

The RNG stream is totally different, so yeah, it can't just be a
silent drop-in replacement like BLAS/LAPACK.

The patch also adds ~10,000 lines of code; here's an example of what
some of it looks like:

    https://github.com/oleksandr-pavlyk/numpy/blob/b53880432c19356f4e54b520958272516bf391a2/numpy/random_intel/mklrand/mkl_distributions.cpp#L1724-L1833

I don't see how we can realistically commit to maintaining this.

I'm also not really seeing how shipping it as part of numpy provides
extra benefits to maintainers or users? AFAICT right now it's
basically structured as a standalone library that's been dropped into
the numpy source tree, and it would be just as easy to ship separately
(or am I wrong?). And since the public API is that all the
functionality comes from importing this specific new module
('numpy.random_intel'), it'd be a one-line change for users to import
from a non-numpy namespace, like 'mkl.random' or whatever. If it were
more integrated with the rest of numpy then the trade-offs would be
more complicated, but in its present form this seems like an easy
call.

The other question is whether it could/should change to *become* more
integrated... that's more tricky. There's been some work towards
supporting swappable backends inside np.random; but the focus has
mostly been on allowing new core generators, though, and this code
seems to want to take over the whole thing (core generator +
distributions), so even once the swappable backends stuff is working
I'm not sure it would be relevant here. The one case I can think of
that does seem promising is that if we get an API for users to say "I
don't care about stream compatibility, just give me un-reproducible
variates as fast as you can", then it might make sense for that to
silently use MKL if available -- this would be pretty analogous to the
use of MKL in np.linalg. But we don't have that API yet, I'm not sure
how the MKL fallback could be maintainably implemented given that it
would require somehow swapping the entire RandomState implementation,
and it's entirely possible that once we figure out solutions to those
then it'd still make sense for the actual MKL wrappers to live in a
third-party library that numpy imports.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org



More information about the NumPy-Discussion mailing list