How fuzzy is get_close_matches() in difflib?
john106henry at hotmail.com
Fri Nov 17 19:44:02 CET 2006
I suppose you are right. I guess I ended up with an odd case.
I was thinking that:
To change "HIDE*S*ST1" to "HIDE*D*ST1", all you do is remove the "*S*"
from the source and the "*D*" from the target.
In order to change "HIDE*SC*T1" to "HIDE*DS*T1", I thought you have to
remove 2 characters *SC* from the source. Then I realize that it's
not true. If you remove the "C" from the source, and the "D" from the
*DS* of the destination, it's a match (!)
So, yes, they have the same distance!
Antoon Pardon wrote:
> On 2006-11-17, John Henry <john106henry at hotmail.com> wrote:
> > I encountered a case where I am trying to match "HIDESST1" and
> > "HIDESCT1" against ["HIDEDST1", "HIDEDCT1", "HIDEDCT2", "HIDEDCT3"]
> > Well, they both hit "HIDEDST1" as the first match which is not exactly
> > the result I was looking for. I don't understand why "HIDESCT1" would
> > not hit "HIDEDCT1" as a first choice.
> H I D E D S T 1 H I D E D C T 1
> H . .
> I . .
> D . .
> E . .
> S .
> C .
> T . .
> 1 . .
> As far as I can see the distance of HIDEDCT1 to HIDESCT1 is
> the same as the distance of HIDEDCT1 to HIDEDST1. In both
> cases you have to remove one character from the target as well
> as one character from the candidate in order to get the
> same substring.
> Antoon Pardon
More information about the Python-list