[Numpy-discussion] searchsorted() and memory cache

Charles R Harris charlesr.harris at gmail.com
Wed May 14 15:43:13 EDT 2008


On Wed, May 14, 2008 at 8:09 AM, Andrew Straw <strawman at astraw.com> wrote:

>
> > I will post any new insights as I continue to work on this...
> >
> OK, I save isolated a sample of my data that illustrates the terrible
> performance with the binarysearch. I have uploaded it as a pytables file
> to http://astraw.com/framenumbers.h5 in case anyone wants to have a look
> themselves. Here's an example of the type of benchmark I've been running:
>
> import fastsearch.downsamp
> import fastsearch.binarysearch
> import tables
>
> h5=tables.openFile('framenumbers.h5',mode='r')
> framenumbers=h5.root.framenumbers.read()
> keys=h5.root.keys.read()
> h5.close()
>
> def bench( implementation ):
>    for key in keys:
>        implementation.index( key )
>
> downsamp = fastsearch.downsamp.DownSampledPreSearcher( framenumbers )
> binary = fastsearch.binarysearch.BinarySearcher( framenumbers )
>
> # The next two lines are IPython-specific, and the 2nd takes a looong time:
>
> %timeit bench(downsamp)
> %timeit bench(binary)
>
>
>
> Running the above gives:
>
> In [14]: %timeit bench(downsamp)
> 10 loops, best of 3: 64 ms per loop
>
> In [15]: %timeit bench(binary)
>
> 10 loops, best of 3: 184 s per loop
>
> Quite a difference (a factor of about 3000)! At this point, I haven't
> delved into the dataset to see what makes it so pathological --
> performance is nowhere near this bad for the binary search algorithm
> with other sets of keys.
>

It can't be that bad Andrew, something else is going on. And 191 MB isn's
*that* big, I expect it should bit in memory with no problem.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080514/79a84d49/attachment.html>


More information about the NumPy-Discussion mailing list