[Numpy-discussion] String sort

Francesc Altet faltet at carabos.com
Sat Feb 9 08:47:25 EST 2008


A Friday 08 February 2008, Charles R Harris escrigué:
> On Feb 8, 2008 10:31 AM, Francesc Altet <faltet at carabos.com> wrote:
> > A Friday 08 February 2008, Francesc Altet escrigué:
> > > A Friday 08 February 2008, Charles R Harris escrigué:
> > > > > Also, in the context of my work in indexing, and because of
> > > > > the slowness of the current implementation in NumPy, I've
> > > > > ended with an implementation of the quicksort method for 1-D
> > > > > array strings. For moderately large arrays, it is about
> > > > > 2.5x-3x faster than the (supposedly) mergesort version in
> > > > > NumPy, not only due to the quicksort, but also because I've
> > > > > implemented a couple of macros for efficient string swapping
> > > > > and copy.  If this is of interest for NumPy developers, tell
> > > > > me and I will provide the code.
> > > >
> > > > I have some code for this too and was going to merge it. Send
> > > > yours along and I'll get to it this weekend.
> > >
> > > Ok, great.  I'm attaching it.  Tell me if you need some
> > > clarification on the code.
> >
> > Ops.  I've introduced a last-minute problem in my code.  To fix
> > this, just replace the flawed opt_strncmp() that I sent before by:
> >
> > static int inline opt_strncmp(char *a, char *b, int n) {
> >  int i;
> >  for (i=0; i<n; i++) {
> >    if (a[i] > b[i]) return i+1;
> >    if (a[i] < b[i]) return -(i+1);
> >    /* Another way, but seems equivalent in speed, at least here */
> > /*     if (a[i] != b[i]) */
> > /*       return (((unsigned char *)a)[i] - ((unsigned char
> > *)b)[i]); */ }
> >  return 0;
> > }
> >
> > Apparently, this version works just fine.
>
> Did you find this significantly faster than strncmp? There is also a
> unicode compare, do you have thoughts about that?

Well, for the unicode case it wouldn't be enough by replacing 'char' 
by 'Py_ArrayUCS4'?  Maybe this afternoon I can do some benchmarking too 
in this regard.

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"



More information about the NumPy-Discussion mailing list