[Numpy-discussion] argsort speed

Sat Feb 22 00:35:50 EST 2014

On Mon, Feb 17, 2014 at 11:40 AM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
>
> On Mon, Feb 17, 2014 at 11:32 AM, Julian Taylor
> <jtaylor.debian at googlemail.com> wrote:
>>
>> On 17.02.2014 15:18, Francesc Alted wrote:
>> > On 2/17/14, 1:08 AM, josef.pktd at gmail.com wrote:
>> >> On Sun, Feb 16, 2014 at 6:12 PM, Daπid <davidmenhur at gmail.com> wrote:
>> >>> On 16 February 2014 23:43, <josef.pktd at gmail.com> wrote:
>> >>>> What's the fastest argsort for a 1d array with around 28 Million
>> >>>> elements, roughly uniformly distributed, random order?
>> >>>
>> >>> On numpy latest version:
>> >>>
>> >>> for kind in ['quicksort', 'mergesort', 'heapsort']:
>> >>>      print kind
>> >>>      %timeit np.sort(data, kind=kind)
>> >>>      %timeit np.argsort(data, kind=kind)
>> >>>
>> >>>
>> >>> quicksort
>> >>> 1 loops, best of 3: 3.55 s per loop
>> >>> 1 loops, best of 3: 10.3 s per loop
>> >>> mergesort
>> >>> 1 loops, best of 3: 4.84 s per loop
>> >>> 1 loops, best of 3: 9.49 s per loop
>> >>> heapsort
>> >>> 1 loops, best of 3: 12.1 s per loop
>> >>> 1 loops, best of 3: 39.3 s per loop
>> >>>
>> >>>
>> >>> It looks quicksort is quicker sorting, but mergesort is marginally
>> >>> faster
>> >>> sorting args. The diference is slim, but upon repetition, it remains
>> >>> significant.
>> >>>
>> >>> Why is that? Probably part of the reason is what Eelco said, and part
>> >>> is
>> >>> that in sort comparison are done accessing the array elements
>> >>> directly, but
>> >>> in argsort you have to index the array, introducing some overhead.
>> >> Thanks, both.
>> >>
>> >> I also gain a second with mergesort.
>> >>
>> >> matlab would be nicer in my case, it returns both.
>> >> I still need to use the argsort to index into the array to also get
>> >> the sorted array.
>> >
>> > Many years ago I needed something similar, so I made some functions for
>> > sorting and argsorting in one single shot.  Maybe you want to reuse
>> > them.  Here it is an example of the C implementation:
>> >
>> > https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619
>> >
>> > and here the Cython wrapper for all of them:
>> >
>> >
>> > https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129
>> >
>> > Francesc
>> >
>>
>> that doesn't really make a big difference if the data is randomly
>> distributed.
>> the sorting operation is normally much more expensive than latter
>> applying the indices:
>>
>> In [1]: d = np.arange(10000000)
>>
>> In [2]: np.random.shuffle(d)
>>
>> In [3]: %timeit  np.argsort(d)
>> 1 loops, best of 3: 1.99 s per loop
>>
>> In [4]: idx = np.argsort(d)
>>
>> In [5]: %timeit d[idx]
>> 1 loops, best of 3: 213 ms per loop
>>
>>
>>
>> But if your data is not random it can make a difference as even
>> quicksort can be a lot faster then.
>> timsort would be a nice addition to numpy, it performs very well for
>> partially sorted data. Unfortunately its quite complicated to implement.
>
>
> Quicksort and shellsort gain speed by having simple inner loops. I have the
> impression that timsort is optimal when compares and memory access are
> expensive, but I haven't seen any benchmarks for native types in contiguous
> memory.

I found some benchmarks for continuous memory here:

https://github.com/swenson/sort/
https://github.com/gfx/cpp-TimSort

The first one seems the best, it probably can be directly reused in numpy.
The only issue is that it only sorts the array, but does not provide argsort.

Ondrej