[Python-Dev] Re: Re: lists v. tuples

Tim Peters tim_one@email.msn.com
Thu, 20 Mar 2003 00:58:55 -0500

> The binary-search routines in the C++ standard library mostly avoid
> having to do != comparisons by defining their interfaces in the
> following clever way:
>         binary_search   returns a boolean that indicates whether the
>                         value sought is in the sequence.  It does not
>                         say where that value is.
>         lower_bound     returns the first position ahead of which
>                         the given value could be inserted without
>                         disrupting the ordering of the sequence.
>         upper_bound     returns the last position ahead of which
>                         the given value could be inserted without
>                         disrupting the ordering of the sequence.

These last two are quite like Python's bisect.bisect_{left, right}, which
are implemented using only __lt__ element comparison.

>         equal_range     returns (lower_bound, upper_bound) as a pair.
> In Python terms:
>         binary_search([3, 5, 7], 6)  would yield False
>         binary_search([3, 5, 7], 7)  would yield True
>         lower_bound([1, 3, 5, 7, 9, 11], 9)    would yield 4
>         lower_bound([1, 3, 5, 7, 9, 11], 8)    would also yield 4
>         upper_bound([1, 3, 5, 7, 9, 11], 9)    would yield 5

>>> import bisect
>>> x = [1, 3, 5, 7, 9, 11]
>>> bisect.bisect_left(x, 9)
>>> bisect.bisect_left(x, 8)
>>> bisect.bisect_right(x, 9)

We conclude that C++ did something right <wink>.

>         equal_range([1, 1, 3, 3, 3, 5, 5, 5, 7], 3)
>                                 would yield (2, 5).
> If you like, equal_range(seq, x) returns (l, h) such that all the
> elements of seq[l:h] are equal to x.  If l == h, the subsequence is
> the empty sequence between the two adjacent elements with values that
> bracket x.
> These definitions turn out to be useful in practice, and are also
> easy to implement efficiently using only < comparisons.

I think Python got the most valuable of these, and they're useful in Python
too.  Nevertheless, if you're coding an explicit conventional binary search
tree (nodes containing a value, a reference to "a left" node, and a
reference to "a right" node), cmp() is more convenient; and even more so if
you're coding a ternary search tree.

Sometimes cmp allows for more compact code.  Python's previous samplesort
implementation endured a *little* clumsiness to infer equality (a == b) from
not (a<b or a>b).  The current adaptive mergesort feels the restriction to <
more acutely and in more places.  For example, when merging two runs A and
B, part of the adaptive strategy is to precompute, via a form of binary
search, where A[0] belongs in B, and where B[-1] belongs in A.  This sounds
like two instances of the same task, but they're maddeningly different
because-- in order to preserve stability --the first search needs to be of
the bisect_left flavor and the second of bisect_right.  Combining both modes
of operation in a single search routine with a flag argument, and sticking
purely to __lt__, leads to horridly obscure code, so these searches are
actually implemented by distinct functions.  If it were able to use cmp()
instead, folding them into one routine would have been unobjectionable (if <
is needed, check for cmp < 0; if <= is needed, check for cmp <= 0 same-as
cmp < 1; so 0 or 1 could be passed in to select between < and <= very
efficiently and reasonably clearly).