On Thu, May 8, 2008 at 11:06 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:


On Thu, May 8, 2008 at 10:30 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:


On Thu, May 8, 2008 at 9:55 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Thu, May 8, 2008 at 10:51 PM, Andrew Straw <strawman@astraw.com> wrote:
> I've got a big element array (25 million int64s) that searchsorted()
> takes a long time to grind through. After a bit of digging in the
> literature and the numpy source code, I believe that searchsorted() is
> implementing a classic binary search,

Yes.

> which is pretty bad in terms of
> cache misses. There are several modern implementations of binary search
> which arrange items in memory such that cache misses are much more rare.
> Clearly making such an indexing arrangement would take time, but in my
> particular case, I can spare the time to create an index if searching
> was faster, since I'd make the index once but do the searching many times.
>
> Is there an implementation of such an algorithm that works easilty with
> numpy? Also, can you offer any advice, suggestions, and comments to me
> if I attempted to implement such an algorithm?

I'm no help. You seem to know more than I do. Sadly, the first few
Google hits I get for "binary search minimize cache misses" are
patents. I don't know what the substantive content of those patents
are; I have a strict policy of not reading patents.

I would be interested in adding such a thing if it wasn't patent encumbered. A good start would be a prototype in python to show how it all went together and whether it needed a separate indexing/lookup function or could be fit into the current setup.

One way I can think of doing this is to have two indices. One is the usual sorted list, the second consists of, say, every 1024'th entry in the first list. Then search the second list first to find the part of the first list to search. That won't get you into the very best cache, but it could buy you a factor of 2x-4x in speed. It's sort of splitting the binary tree into two levels.

You could even do it with your data from python, generating the second list and calling searchsorted multiple times. If you are searching for a bunch of values, it's probably good to also sort them first so they bunch together in the same part of the big list.

Chuck