[Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken
Kevin Jacobs <jacobs@bioinformed.com>
bioinformed at gmail.com
Wed Jul 7 03:47:41 CEST 2010
On Tue, Jul 6, 2010 at 7:18 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> [Also posted to http://bugs.python.org/issue2986
> A much faster way to find the first mismatch would be
> i = 0
> while first[i] == second[i]:
> i+=1
> The match ratio, based on the initial matching prefix only, is spuriously
> low.
>
>
I don't have much experience with the Python sequence matcher, but many
classical edit distance and alignment algorithms benefit from stripping any
common prefix and suffix before engaging in heavy-lifting. This is
trivially optimal for Hamming-like distances and easily shown to be for
Levenshtein and Damerau type distances.
-Kevin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20100706/d91dcdcc/attachment.html>
More information about the Python-Dev
mailing list