How fuzzy is get_close_matches() in difflib?
john106henry at hotmail.com
Fri Nov 17 07:59:41 CET 2006
I encountered a case where I am trying to match "HIDESST1" and
"HIDESCT1" against ["HIDEDST1", "HIDEDCT1", "HIDEDCT2", "HIDEDCT3"]
Well, they both hit "HIDEDST1" as the first match which is not exactly
the result I was looking for. I don't understand why "HIDESCT1" would
not hit "HIDEDCT1" as a first choice.
Steven D'Aprano wrote:
> On Thu, 16 Nov 2006 20:19:50 -0800, John Henry wrote:
> > I did try them and I am impressed. It helped me found a lot of useful
> > info. I just want to get a feel as to what constitutes a "match".
> The source code has lots of comments, but they don't explain the basic
> algorithm (at least not in the difflib.py supplied with Python 2.3).
> There is no single diff algorithm, but I believe that the basic idea is to
> look for insertions and/or deletions of strings. If you want more
> detail, google "diff". Once you have a list of differences, the closest
> match is the search string with the fewest differences.
> As for getting a feel of what constitutes a match, I really can't make any
> better suggestion than just try lots of examples with the interactive
> Python shell.
> Steven D'Aprano
More information about the Python-list