difflib + object sequences

Chris Green cmg at dok.org
Tue Mar 23 15:04:55 EST 2004


Hello Folks,

I'm attempting to use difflib's SequenceMatcher to pull out the
matching subsequences of a list of objects.  Code is below.

object matches
a[0] and b[0] match for 3 elements
a[5] and b[5] match for 0 elements
integer matches
a[0] and b[0] match for 3 elements
a[4] and b[4] match for 1 elements
a[5] and b[5] match for 0 elements

Note that the expected matches at a[4] and b[4] are missing with the
object.

I've narrowed this down to difflib.py ( both 2.3 and CVS's ) using a
dictionary on the backend and hash(Misc(1)) != hash(Misc(1)).

Is this expected behavior?
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=660098&group_id=5470
shows some coments on objects inheriting a default __hash__ ret
self(id).

If I define __hash__ as return hash(self.val), things work as
expected.  Now the question is, what should I log as a bug?

The existing bug 660098?
Documentation could use an example like this test?
Something busted with difflib?

Thanks for your help, Looking forward to pycon tommorrow.
Chris

Code:

from difflib import SequenceMatcher

class Misc(object):
    def __init__(self, val):
        self.val = val
    def __cmp__(self, other):
        return cmp(self.val,other.val)
    def __str__(self):
        return str(self.val)

def test_sm(seqA, seqB):        
    # new sequence matcher with no junk defined
    sm = SequenceMatcher(None, seqA, seqB)
    
    blockList = sm.get_matching_blocks()
    
    for block in blockList:
        print "a[%d] and b[%d] match for %d elements" % block

if __name__=='__main__':


    # define 2 very similar sequences with a cmp operator
    objA = [Misc(1), Misc(2), Misc(3), Misc(96), Misc(24)]
    objB = [Misc(1), Misc(2), Misc(3), Misc(42), Misc(24)]

    intA = [1,2,3,96,24]
    intB = [1,2,3,42,24]

    print "object matches"
    test_sm(objA,objB)

    print "integer matches"
    test_sm(intA,intB)
-- 
Chris Green <cmg at dok.org>
Fame may be fleeting but obscurity is forever.



More information about the Python-list mailing list