<div class="gmail_quote">On Tue, Jul 6, 2010 at 7:18 PM, Terry Reedy <span dir="ltr"><<a href="mailto:tjreedy@udel.edu">tjreedy@udel.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
[Also posted to <a href="http://bugs.python.org/issue2986" target="_blank">http://bugs.python.org/issue2986</a><br>A much faster way to find the first mismatch would be<br>
i = 0<br>
while first[i] == second[i]:<br>
i+=1<br>
The match ratio, based on the initial matching prefix only, is spuriously low.<br><br></blockquote><div><br></div><div>I don't have much experience with the Python sequence matcher, but many classical edit distance and alignment algorithms benefit from stripping any common prefix and suffix before engaging in heavy-lifting. This is trivially optimal for Hamming-like distances and easily shown to be for Levenshtein and Damerau type distances. </div>
<div><br></div><div>-Kevin</div><div><br></div></div>