difflib

Wed Apr 25 20:21:17 EDT 2001

[Mikkel Rasmussen]
> I have written my own sequence matcher as a part of my master thesis in
> computational linguistics (I have made a spelling checker for
> dyslexics), so I have a fair amount of information on various fuzzy
> match algorithms, but I couldn't find anything useful via Google on the
> difflib algorithm. The origin of the algorithm wasn't exactly clear
> from the documentation :-)

I expect it's been rediscovered independently many times.  If you want a
history lesson, reading Knuth is *always* more productive than reading
comments in production libraries, and even if Knuth doesn't mention the topic
at hand <wink>.

> I'm going to test your algorithm later today to see if it
> outperforms my own both in quality and speed. Quality is the most
> important. Speed is just nice.

You later reported getting good results.  I was surprised!  SequenceMatcher
was written for ndiff.py specifically:  something seeking to reconstruct how
one well-formed version of a text document got transformed into a another
well-formed version.  It's not at all trying to recover from noise or
mistakes, and so has no interest in being graceful in the presence of, e.g.,
transpositions or reversals.  Insertions and deletions, yes.  So I wouldn't
expect it to be particularly effective for a spell-checker.