On Tue, Jul 6, 2010 at 7:18 PM, Terry Reedy <tjreedy@udel.edu> wrote:
[Also posted to http://bugs.python.org/issue2986
A much faster way to find the first mismatch would be
i = 0
while first[i] == second[i]:
i+=1
The match ratio, based on the initial matching prefix only, is spuriously low.
I don't have much experience with the Python sequence matcher, but many classical edit distance and alignment algorithms benefit from stripping any common prefix and suffix before engaging in heavy-lifting. This is trivially optimal for Hamming-like distances and easily shown to be for Levenshtein and Damerau type distances.
-Kevin