[ python-Feature Requests-1185383 ] Make bisect.* functions accept an optional compare function

Fri Jun 3 03:25:23 CEST 2005

Feature Requests item #1185383, was opened at 2005-04-18 14:26
Message generated for change (Comment added) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1185383&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Marcin Ciura (mciura)
Assigned to: Nobody/Anonymous (nobody)
Summary: Make bisect.* functions accept an optional compare function

Initial Comment:
The usability of bisect.bisect_right, bisect.bisect_left,
bisect.insort_right and bisect.insort_left would increase
if they accepted an optional less_than function to compare
the keys.

Here is my use case: I have a sorted list of reversed words
and a parallel list of flags associated with these words
(taken from ispell). The list of reversed words is the one
searched; the flags are the result of a search.

Issue #1:  Now I cannot use simply a list of tuples
(rev_word,flags) and a custom less_than function that
compares only the first elements of two tuples.

Issue #2: When a word is not found in the list, I'd
like to make
an educated guess as to its flags (this makes more
sense in non-English languages, where e.g. infinitives
have a special ending), using bisect_left and bisect_right:

from bisect import *

less_than = lambda x,y: x[:3]<y[:3]
lp = bisect_left(word_list, given_rev_word, lt=less_than)
rp = bisect_right(word_list, given_rev_word, lt=less_than)
# return the union of flag_list[lp:rp]

An example (given_rev_word = 'abcpqr'):
word_list:
'abbx',
'abcaa', <- lp
'abcdd',
'abcss',
'abdf'   <- rp

Currently, the first search could be replaced with
lp = bisect_left(word_list, given_rev_word[:3])
but I can see only non-nice ways to replace the other
search.

Rolling my own class that stores a word and its flags,
with __lt__ depending on some global setting is not
thread-safe, and I find such a solution too heavyweight.

I hope that I expressed myself clearly enough. If not, let
me know, and I'll try to clarify my point.

----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2005-06-02 20:25

Message:
Logged In: YES 
user_id=80475

Overall, I'm -1 on this RFE.

The comparison to nsmallest() and nlargest() is inaccurate.
 They run start-to-finish in one function call.  The other
heapq methods do not use key functions because they have to
leave the original data structure unmolested between calls;
hence, there is no ability to limit the key function calls
to one per record.

Likewise, with this request, the key function calls get
wasted.  The sort() method calls key() for every record and
tosses the result afterwards.  Each subsequect call to
bisect() would need to repeat those calls for a log(N)
subset of the data.  Hence, accepting this request would
create an API that encourages a wasteful design.

----------------------------------------------------------------------

Comment By: Jonas Kölker (jonaskoelker)
Date: 2005-04-28 15:13

Message:
Logged In: YES 
user_id=1153395

In a similar vein, I'd like to see that a `key' keyword
argument was added to bisect.* (and perhaps `reversed'
too)--it's really annoying to sort something by (not __lt__)
and not be able to bsearch it. It got added to min/max and
heapq.n(smallest|largest) in 2.5 fwiw ;)

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2005-04-18 15:09

Message:
Logged In: YES 
user_id=80475

A few thoughts:

* If bisect provided an optional lt argument, it would still
not be thread-safe.  The code for the lt function can call
arbitrary Python code that runs through the eval-loop and is
hence subject to a thread-switch.

* An advantage of the class wrapper approach is that  the
prefix function gets computed just once per word.  This is
not a big deal for the specific case of [:3], but it could
be a significant benefit for other functions that are
expensive to compute (because repeated calls to bisect will
access the lt function more than once).

* My own approach to the particular use case would be to map
prefixes to a list of revwords or (revword, flag) pairs:
  d = dict(abb=['abbc'], abc=['abcaa', 'abcdd', 'abcss'],
abd=['abdf'])
This gives O(1) lookup time and limits the calls to the
prefix function to once per word.

Will leave this request open as it may yet be a good idea. 
My concern is that it will clutter the code and the API for
only a small benefit. 

Also, I'm looking at a more general purpose solution that
would make this feature request moot.  This idea is to
create a fast comparison decorator class used like this:

   dwordlist = map(ComparisonDecorator(lambda x: x[:3]),
wordlist)
   lp = bisect_left(dwordlist, given_rev_word)
   rp = bisect_right(dwordlist, given_rev_word)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1185383&group_id=5470