[Numpy-discussion] searchsorted() and memory cache

Tue May 13 19:59:12 EDT 2008

Thanks for all the comments on my original question. I was more offline
than intended after I sent it until now, so I'm sorry I wasn't
immediately able to participate in the discussion.

Anyhow, after working on this a bit more, I came up with a few
implementations of search algorithms doing just what I needed with the
same interface available using bazaar and launchpad at
http://launchpad.net/~astraw/+junk/fastsearch (MIT license). I have
attached the output of the plot_comparisons.py benchmarking script to
this email (note that this benchmarking is pretty crude).

For the problem I originally wrote about, I get what a nearly
unbelievable speedup of ~250x using the
fastsearch.downsamp.DownSampledPreSearcher class, which is very similar
in spirit to Charles' suggestion. It takes 1000 values from the original
array to create a new first-level array that is itself localized in
memory and points to a more localized region of the full original array.
Also, I get a similar (though slightly slower) result using AVL trees
using the fastsearch.avlsearch.AvlSearcher class, which uses pyavl (
http://sourceforge.net/projects/pyavl ).

Using the benchmarking code included in the bzr branch, I don't get
anything like this speedup (e.g. the attached figure), so I'm not sure
exactly what's going on at this point, but I'm not going to argue with a
250x speedup, so the fastsearch.downsamp code is now being put to use in
one of my projects.

Stefan -- I think your code simply implements the classic binary search
-- I don't see how it will reduce cache misses.

Anyhow, perhaps someone will find the above useful. I guess it would
still be a substantial amount of work to make a numpy-types-aware
implementation of AVL trees or similar algorithms. These sorts of binary
search trees seem like the right way to solve this problem and thus
there might be an interesting project in this. I imagine that a
numpy-types-aware Cython might make such implementation significantly
easier and still blazingly fast compared to the binary search
implemented in searchsorted() given today's cached memory architectures.

-Andrew

Andrew Straw wrote:
> I've got a big element array (25 million int64s) that searchsorted()
> takes a long time to grind through. After a bit of digging in the
> literature and the numpy source code, I believe that searchsorted() is
> implementing a classic binary search, which is pretty bad in terms of
> cache misses. There are several modern implementations of binary search
> which arrange items in memory such that cache misses are much more rare.
> Clearly making such an indexing arrangement would take time, but in my
> particular case, I can spare the time to create an index if searching
> was faster, since I'd make the index once but do the searching many times.
>
> Is there an implementation of such an algorithm that works easilty with
> numpy? Also, can you offer any advice, suggestions, and comments to me
> if I attempted to implement such an algorithm?
>
> Thanks,
> Andrew
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fastsearch.png
Type: image/png
Size: 14205 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080513/b9852b01/attachment.png>