[Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken

Kevin Jacobs <jacobs@bioinformed.com> bioinformed at gmail.com
Wed Jul 7 03:47:41 CEST 2010


On Tue, Jul 6, 2010 at 7:18 PM, Terry Reedy <tjreedy at udel.edu> wrote:

> [Also posted to http://bugs.python.org/issue2986
> A much faster way to find the first mismatch would be
>   i = 0
>   while first[i] == second[i]:
>      i+=1
> The match ratio, based on the initial matching prefix only, is spuriously
> low.
>
>
I don't have much experience with the Python sequence matcher, but many
classical edit distance and alignment algorithms benefit from stripping any
common prefix and suffix before engaging in heavy-lifting.  This is
trivially optimal for Hamming-like distances and easily shown to be for
Levenshtein and Damerau type distances.

-Kevin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20100706/d91dcdcc/attachment.html>


More information about the Python-Dev mailing list