[Numpy-discussion] Keyword added to searchsorted.

Charles R Harris charlesr.harris at gmail.com
Sat Sep 2 22:26:42 EDT 2006


Hi all,

I added the keyword side to the searchsorted method and functions.
Documentation:

"""a.searchsorted(values=v, side='left') -> array of indices.

    Required Arguments:
        v -- keys to be searched for in a.

    Keyword arguments
        side -- {'left', 'right'}, default('left').

    If a is a 1-D array in ascending order, then

        a.searchsorted(v, side='left')

    returns an array of indices i such that for each element of values the
    following holds:

           a[j] < key <= a[i] for all j < i,

    If such an index does not exist, a.size() is used. The result is such
that
    if the key were to be inserted in the slot before the index i, then the
    order of a would be preserved and i would be the smallest index with
that
    property.

    If a is a 1-D array in ascending order, then

        a.searchsorted(v, side='right')

    returns an array of indices i such that for each element of values the
    following holds:

           a[j] <= key < a[i] for all j < i,

    If such an index does not exist, a.size() is used. The result is that if
the
    key were to be inserted in the slot before the index i, then the order
of a
    would be preserved and i would be the largest index with that property.

"""

I also replaced the old search routine as it was linear time worst case:

In [27]: t1 = t.Timer('N.searchsorted(a,1,side="left")','import numpy as N;
a = N.ones(1000000)')

In [28]: t2 = t.Timer('N.searchsorted(a,1,side="right")','import numpy as N;
a = N.ones(1000000)')

In [29]: t1.repeat(3,100)
Out[29]: [0.5301368236541748, 0.4924161434173584, 0.46317386627197266]

In [30]: t2.repeat(3,100)
Out[30]: [0.0011379718780517578, 0.00081586837768554688,
0.00083994865417480469]

where left was the original routine. It is now noticeably faster in some
situations:

In [2]: t1 = t.Timer('N.searchsorted(a,1,side="left")','import numpy as N; a
= N.ones(1000000)')

In [3]: t1.repeat(3,100)
Out[3]: [0.00082802772521972656, 0.00077795982360839844,
0.00076913833618164062]

I am open to suggestions as to better names for the keyword kind. It also
might be worth it to make type-specific routines if anyone commonly uses
large lists of keys and large sorted arrays.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20060902/19673625/attachment.html>


More information about the NumPy-Discussion mailing list