[Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken

Tim Peters tim.peters at gmail.com
Thu Jul 8 15:39:08 CEST 2010


[Antoine Pitrou]
> I don't think 2.7 should get any change at all here. Only 3.2 should be
> modified. As Tim said, difflib works ok for its intended use (regular
> text diffs).

That was the use case that drove the implementation, but it's going
too far to say that was the only "intended" case.  I believe (but
can't prove) that remains the most common use (& overwhelmingly so),
but it was indeed _intended_ to work for any sequences of hashable
elements.

And it always did, and it still does, in the sense that it computes a
diff that transforms the first sequence into the second sequence.  The
problem is that I introduced a heuristic speedup with the primary use
case in mind that turned out to vastly damage the _quality_ of the
results for some other uses (a correct diff isn't necessarily a useful
diff - for example, "delete the entire sequence you started with, then
insert the entire new sequence" is a correct diff for any pair of
input sequences, but not a useful diff for most purposes).

> Making it work for other uses is a new feature, not a bugfix.

Definitely not a new feature.  These other cases used to deliver much
better diffs, before I introduced the heuristic in question.  People
with these other cases are asking for a way to get the results they
used to get - and we know that's so because a few figured out they get
what they want just by (in effect) reverting the checkin (made about 8
years ago) that _introduced_ the heuristic.  So they're looking for a
way to restore older behavior, not to introduce new behavior.  Of
course this is obscured by that the change happened so long ago that I
bet most of them don't know at first that it _was_ the old behavior.


More information about the Python-Dev mailing list