[Numpy-discussion] argsort speed
josef.pktd at gmail.com
josef.pktd at gmail.com
Mon Feb 17 10:11:32 EST 2014
On Mon, Feb 17, 2014 at 9:18 AM, Francesc Alted <francesc at continuum.io> wrote:
> On 2/17/14, 1:08 AM, josef.pktd at gmail.com wrote:
>> On Sun, Feb 16, 2014 at 6:12 PM, Daπid <davidmenhur at gmail.com> wrote:
>>> On 16 February 2014 23:43, <josef.pktd at gmail.com> wrote:
>>>> What's the fastest argsort for a 1d array with around 28 Million
>>>> elements, roughly uniformly distributed, random order?
>>>
>>> On numpy latest version:
>>>
>>> for kind in ['quicksort', 'mergesort', 'heapsort']:
>>> print kind
>>> %timeit np.sort(data, kind=kind)
>>> %timeit np.argsort(data, kind=kind)
>>>
>>>
>>> quicksort
>>> 1 loops, best of 3: 3.55 s per loop
>>> 1 loops, best of 3: 10.3 s per loop
>>> mergesort
>>> 1 loops, best of 3: 4.84 s per loop
>>> 1 loops, best of 3: 9.49 s per loop
>>> heapsort
>>> 1 loops, best of 3: 12.1 s per loop
>>> 1 loops, best of 3: 39.3 s per loop
>>>
>>>
>>> It looks quicksort is quicker sorting, but mergesort is marginally faster
>>> sorting args. The diference is slim, but upon repetition, it remains
>>> significant.
>>>
>>> Why is that? Probably part of the reason is what Eelco said, and part is
>>> that in sort comparison are done accessing the array elements directly, but
>>> in argsort you have to index the array, introducing some overhead.
>> Thanks, both.
>>
>> I also gain a second with mergesort.
>>
>> matlab would be nicer in my case, it returns both.
>> I still need to use the argsort to index into the array to also get
>> the sorted array.
>
> Many years ago I needed something similar, so I made some functions for
> sorting and argsorting in one single shot. Maybe you want to reuse
> them. Here it is an example of the C implementation:
>
> https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619
>
> and here the Cython wrapper for all of them:
>
> https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129
Thanks Francesc
That would be very useful to have.
However, I don't speak C, and it would require too much maintenance
time if I were to include that in statsmodels.
I'm trying to concentrate more on stats (and my backlog there), and
leave other parts to developers that know those better.
Josef
>
> Francesc
>
>>
>> Josef
>>
>>
>>> I seem unable to find the code for ndarray.sort, so I can't check. I have
>>> tried to grep it tring all possible combinations of "def ndarray",
>>> "self.sort", etc. Where is it?
>>>
>>>
>>> /David.
>>>
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> --
> Francesc Alted
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
More information about the NumPy-Discussion
mailing list