[Numpy-discussion] NumPy-Discussion Digest, Vol 100, Issue 28

Lars Buitinck larsmans at gmail.com
Fri Jan 16 09:14:24 EST 2015


2015-01-16 13:29 GMT+01:00  <numpy-discussion-request at scipy.org>:
> Date: Fri, 16 Jan 2015 12:43:43 +0100
> From: Julian Taylor <jtaylor.debian at googlemail.com>
> Subject: Re: [Numpy-discussion] Sorting refactor
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Message-ID: <54B8F96F.7090903 at googlemail.com>
> Content-Type: text/plain; charset=windows-1252
>
> On 16.01.2015 12:33, Lars Buitinck wrote:
>> * Quicksort has a quadratic time worst case. I think it should be
>> turned into an introsort [1] for O(n log n) worst case; we have the
>> heapsort needed to do that.
>
> This probably rarely happens in numeric data, and we do have guaranteed
> nlog runtime algorithms available.

It's no more likely or unlikely than in any other type of data
(AFAIK), but it's possible for an adversary to DOS any (web server)
code that uses np.sort.

> I was also thinking about this, an advantage of a sorting network is
> that it can be vectorized to be significantly faster than an insertion
> sort. Handling NaN's should also be directly possible.

Tried that, and it didn't give any speedup, at least not without
explicit vector instructions.

Just moving the NaNs aside didn't cost anything in my preliminary
benchmarks (without sorting nets), the cost of the operation was
almost exactly compensated by simpler comparisons.

> The issue is that its probably too much complicated code for only a very
> small gain.

Maybe. The thing is that the selection algorithms are optimized for
NaNs and seem to depend on the current comparison code. We'd need
distinct <TYPE>_LT and <TYPE>_LT_NONAN for each <TYPE>.

The sorting nets themselves aren't complicated, just lengthy. My
branch has the length-optimal (not optimally parallel) ones for n <=
16.

> But I assume you will get the same trouble as openmp but that needs
> testing, also cilk+ in gcc is not really production ready yet, I got
> lots of crashes when I last tried it (it will be in 5.0 though).

The data parallel constructs tend to crash the compiler, but task
spawning seems to be stable in 4.9.2. I've still to see how it handles
multiprocessing/fork.

What do you mean by will be in 5.0, did they do a big push?


> Date: Fri, 16 Jan 2015 13:28:56 +0100
> From: Da?id <davidmenhur at gmail.com>
> Subject: Re: [Numpy-discussion] Sorting refactor
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Message-ID:
>         <CAJhcF=1O5Own_5ydzu+To8HHbm3e66k=iuNQrEiaSDY23DNW6w at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> On 16 January 2015 at 13:15, Eelco Hoogendoorn
> <hoogendoorn.eelco at gmail.com> wrote:
>> Perhaps an algorithm can be made faster, but often these multicore
>> algorithms are also less efficient, and a less data-dependent way of putting
>> my cores to good use would have been preferable. Potentially, other code
>> could slow down due to cache trashing if too many parallel tasks run in
>> parallel. Id rather be in charge of such matters myself; but I imagine
>> adding a keyword arg for these matters would not be much trouble?
>
> As I understand it, that is where the strength of Cilk+ lies. It does
> not force parallelisation, just suggests it. The decision to actually
> spawn parallel is decided at runtime depending on the load of the
> other cores.

cilk+ guarantees that the amount of space used by a pool of P threads
is at most P times the stack space used by the sequential version (+ a
constant). The idea is that you can say

for (i = 0; i < 1000000; i++) {
    cilk_spawn f(a[i]);
}

and it will never create more than P work items in memory, rather than
1e6, even if each f() spawns a bunch itself. Of course, it won't
guarantee that OpenMP will not also spawn P threads and/or check that
you're one of P processes cooperating on a task using multiprocessing.

Personally I'd rather have an np.setnumthreads() to turn this on or
off for a process and have the runtime distribute work for me instead
of having to do it myself.



More information about the NumPy-Discussion mailing list