[Numpy-discussion] searchsorted() and memory cache

Thu May 8 23:55:56 EDT 2008

On Thu, May 8, 2008 at 10:51 PM, Andrew Straw <strawman at astraw.com> wrote:
> I've got a big element array (25 million int64s) that searchsorted()
> takes a long time to grind through. After a bit of digging in the
> literature and the numpy source code, I believe that searchsorted() is
> implementing a classic binary search,

Yes.

> which is pretty bad in terms of
> cache misses. There are several modern implementations of binary search
> which arrange items in memory such that cache misses are much more rare.
> Clearly making such an indexing arrangement would take time, but in my
> particular case, I can spare the time to create an index if searching
> was faster, since I'd make the index once but do the searching many times.
>
> Is there an implementation of such an algorithm that works easilty with
> numpy? Also, can you offer any advice, suggestions, and comments to me
> if I attempted to implement such an algorithm?

I'm no help. You seem to know more than I do. Sadly, the first few
Google hits I get for "binary search minimize cache misses" are
patents. I don't know what the substantive content of those patents
are; I have a strict policy of not reading patents.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco