<br><br><div class="gmail_quote">On Sun, Apr 7, 2013 at 5:56 PM, Charles R Harris <span dir="ltr"><<a href="mailto:charlesr.harris@gmail.com" target="_blank">charlesr.harris@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br><br><div class="gmail_quote"><div class="im">On Sun, Apr 7, 2013 at 5:23 PM, Tom Aldcroft <span dir="ltr"><<a href="mailto:aldcroft@head.cfa.harvard.edu" target="_blank">aldcroft@head.cfa.harvard.edu</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I'm seeing about a factor of 50 difference in performance between<br>

sorting a random integer array versus sorting that same array viewed<br>

as a structured array.  Am I doing anything wrong here?<br>

<br>

In [2]: x = np.random.randint(10000, size=10000)<br>

<br>

In [3]: xarr = x.view(dtype=[('a', <a href="http://np.int" target="_blank">np.int</a>)])<br>

<br>

In [4]: timeit np.sort(x)<br>

1000 loops, best of 3: 588 us per loop<br>

<br>

In [5]: timeit np.sort(xarr)<br>

10 loops, best of 3: 29 ms per loop<br>

<br>

In [6]: timeit np.sort(xarr, order=('a',))<br>

10 loops, best of 3: 28.9 ms per loop<br>

<br>

I was wondering if this slowdown is expected (maybe the comparison is<br>

dropping back to pure Python or ??).  I'm showing a simple example<br>

here, but in reality I'm working with non-trivial structured arrays<br>

where I might want to sort on multiple columns.<br>

<br>

Does anyone have suggestions for speeding things up, or have a sort<br>

implementation (perhaps Cython) that has better performance for<br>

structured arrays?<br></blockquote></div><div><br>This is probably due to the comparison function used. For straight integers the C operator `<` is used, for dtypes the dtype comparison function is passed as a pointer to the routines. I doubt Cython would make any difference in this case, but making the dtype comparison routine better would probably help a lot. For all I know, the dtype gets parsed on every call to the comparison function.<br>


<br></div></div></blockquote><div><br>Note that even sorting as a byte string is notably faster <br><br>In [13]: sarr = x.view(dtype='<S8')<br><br>In [14]: timeit sort(sarr)<br>1000 loops, best of 3: 1.31 ms per loop<br>

<br>Chuck<br></div><br></div>