difflib

Mikkel Rasmussen footech at get2net.dk
Sun Apr 22 06:54:30 EDT 2001


Has anybody got any references for the algorithm used in difflib. The
documentation says:

"The basic algorithm predates, and is a little fancier than, an algorithm
published in the late 1980's by Ratcliff and Obershelp under the hyperbolic
name ``gestalt pattern matching.'' The idea is to find the longest
contiguous matching subsequence that contains no ``junk'' elements (the
Ratcliff and Obershelp algorithm doesn't address junk). The same idea is
then applied recursively to the pieces of the sequences to the left and to
the right of the matching subsequence. This does not yield minimal edit
sequences, but does tend to yield matches that ``look right'' to people."

and there is a link to Dr. Dobbs journal, but the article is only available
on cd-rom.

Are there any explanations available elsewhere?


Mikkel Rasmussen






More information about the Python-list mailing list