How fuzzy is get_close_matches() in difflib?

John Henry john106henry at hotmail.com
Fri Nov 17 14:00:39 EST 2006


Steven D'Aprano wrote:

<snip>

>
> >>> s = difflib.SequenceMatcher(None, "HIDEDCT1", "HIDESCT1")
> >>> t = difflib.SequenceMatcher(None, "HIDEDST1", "HIDESCT1")
> >>>
> >>> for block in s.get_matching_blocks():
> ...     print "a[%d] and b[%d] match for %d elements" % block
> ...
> a[0] and b[0] match for 4 elements
> a[5] and b[5] match for 3 elements
> a[8] and b[8] match for 0 elements
> >>>
> >>> for block in t.get_matching_blocks():
> ...     print "a[%d] and b[%d] match for %d elements" % block
> ...
> a[0] and b[0] match for 4 elements
> a[5] and b[4] match for 1 elements
> a[6] and b[6] match for 2 elements
> a[8] and b[8] match for 0 elements
> >>>
>
> I think what you are seeing is an artifact of the precise details of the
> pattern matcher. Feel free to subclass the SequenceMatcher or Differ
> classes to get the results you expect :-)
>

Looks like for this particular case, looking at number of matched
groups worked better.




More information about the Python-list mailing list