difflib + object sequences
Chris Green
cmg at dok.org
Tue Mar 23 15:04:55 EST 2004
Hello Folks,
I'm attempting to use difflib's SequenceMatcher to pull out the
matching subsequences of a list of objects. Code is below.
object matches
a[0] and b[0] match for 3 elements
a[5] and b[5] match for 0 elements
integer matches
a[0] and b[0] match for 3 elements
a[4] and b[4] match for 1 elements
a[5] and b[5] match for 0 elements
Note that the expected matches at a[4] and b[4] are missing with the
object.
I've narrowed this down to difflib.py ( both 2.3 and CVS's ) using a
dictionary on the backend and hash(Misc(1)) != hash(Misc(1)).
Is this expected behavior?
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=660098&group_id=5470
shows some coments on objects inheriting a default __hash__ ret
self(id).
If I define __hash__ as return hash(self.val), things work as
expected. Now the question is, what should I log as a bug?
The existing bug 660098?
Documentation could use an example like this test?
Something busted with difflib?
Thanks for your help, Looking forward to pycon tommorrow.
Chris
Code:
from difflib import SequenceMatcher
class Misc(object):
def __init__(self, val):
self.val = val
def __cmp__(self, other):
return cmp(self.val,other.val)
def __str__(self):
return str(self.val)
def test_sm(seqA, seqB):
# new sequence matcher with no junk defined
sm = SequenceMatcher(None, seqA, seqB)
blockList = sm.get_matching_blocks()
for block in blockList:
print "a[%d] and b[%d] match for %d elements" % block
if __name__=='__main__':
# define 2 very similar sequences with a cmp operator
objA = [Misc(1), Misc(2), Misc(3), Misc(96), Misc(24)]
objB = [Misc(1), Misc(2), Misc(3), Misc(42), Misc(24)]
intA = [1,2,3,96,24]
intB = [1,2,3,42,24]
print "object matches"
test_sm(objA,objB)
print "integer matches"
test_sm(intA,intB)
--
Chris Green <cmg at dok.org>
Fame may be fleeting but obscurity is forever.
More information about the Python-list
mailing list