difflib + object sequences

Chris Green cmg at dok.org
Tue Mar 23 21:04:55 CET 2004

Hello Folks,

I'm attempting to use difflib's SequenceMatcher to pull out the
matching subsequences of a list of objects.  Code is below.

object matches
a[0] and b[0] match for 3 elements
a[5] and b[5] match for 0 elements
integer matches
a[0] and b[0] match for 3 elements
a[4] and b[4] match for 1 elements
a[5] and b[5] match for 0 elements

Note that the expected matches at a[4] and b[4] are missing with the

I've narrowed this down to difflib.py ( both 2.3 and CVS's ) using a
dictionary on the backend and hash(Misc(1)) != hash(Misc(1)).

Is this expected behavior?
shows some coments on objects inheriting a default __hash__ ret

If I define __hash__ as return hash(self.val), things work as
expected.  Now the question is, what should I log as a bug?

The existing bug 660098?
Documentation could use an example like this test?
Something busted with difflib?

Thanks for your help, Looking forward to pycon tommorrow.


from difflib import SequenceMatcher

class Misc(object):
    def __init__(self, val):
        self.val = val
    def __cmp__(self, other):
        return cmp(self.val,other.val)
    def __str__(self):
        return str(self.val)

def test_sm(seqA, seqB):        
    # new sequence matcher with no junk defined
    sm = SequenceMatcher(None, seqA, seqB)
    blockList = sm.get_matching_blocks()
    for block in blockList:
        print "a[%d] and b[%d] match for %d elements" % block

if __name__=='__main__':

    # define 2 very similar sequences with a cmp operator
    objA = [Misc(1), Misc(2), Misc(3), Misc(96), Misc(24)]
    objB = [Misc(1), Misc(2), Misc(3), Misc(42), Misc(24)]

    intA = [1,2,3,96,24]
    intB = [1,2,3,42,24]

    print "object matches"

    print "integer matches"
Chris Green <cmg at dok.org>
Fame may be fleeting but obscurity is forever.

More information about the Python-list mailing list