gregpinero at gmail.com
Tue Jan 31 22:24:46 CET 2006
I wonder which algorithm determines the similarity between two strings better?
On 1/31/06, Kent Johnson <kent at kentsjohnson.com> wrote:
> Gregory Piñero wrote:
> > Ok, ok, I got it! The Pythonic way is to use an existing library ;-)
> > import difflib
> > CloseMatches=difflib.get_close_matches(AFileName,AllFiles,20,.7)
> > I wrote a script to delete duplicate mp3's by filename a few years
> > back with this. If anyone's interested in seeing it, I'll post a blog
> > entry on it. I'm betting it uses a similiar algorithm your functions.
> A quick trip to difflib.py produces this description of the matching
> The basic
> algorithm predates, and is a little fancier than, an algorithm
> published in the late 1980's by Ratcliff and Obershelp under the
> hyperbolic name "gestalt pattern matching". The basic idea is to find
> the longest contiguous matching subsequence that contains no "junk"
> elements (R-O doesn't address junk). The same idea is then applied
> recursively to the pieces of the sequences to the left and to the
> right of the matching subsequence.
> So no, it doesn't seem to be using Levenshtein distance.
Chief Innovation Officer
More information about the Python-list