Fuzzy string matching?
Magnus L. Hetland
mlh at idt.ntnu.no
Sat Aug 28 11:28:55 EDT 1999
"John Machin" <sjmachin at lexicon.net> writes:
> On 26 Aug 99, at 21:31, Magnus L. Hetland wrote:
>
> > mlh at idt.ntnu.no (Magnus L. Hetland) writes:
> >
[...]
>
> Magnus,
> One presumes that your first assignment for your students is to go to
> the library and/or think a little and come back with the well-known
> O(min(m,n)) instead of O(mn) space requirement optimisation to the
> algorithm you presented and therefore you would prefer not to have it
> posted to this list, just in case they read it.
Nope. I can only plead ignorance here... :)
(I'm not really the lecturer - just responsible for the exercises,
so... ;)
> Oh and for what it's
> worth, although the time is still O(mn), it decreases --- I have yet to
> see a language where foo[i,j] takes no more time than bar[j]. Python,
> with foo being a dictionary and bar being a list, is definitely not
> such a language.
I see.
I didn't know *that* much about this algorithm, really - not even that
the measure was called Levinshtein distance - until I did some
research on the net... (Nice thing that, the net :)
Found some nice stuff at
http://www.dir.univ-rouen.fr/~charras/seqcomp/
which had several algorithms for sequence comparisons (useful for
fuzzy matching). I didn't find any description of how to reduce the
space requirements, thoug... So I had to figure that out myself ;)
A new and improved version:
def distance(a,b):
n = len(a); m = len(b)
if n>m: a,b,n,m = b,a,m,n
c = []
for i in range(0,n+1):
c.append(i)
for i in range(1,m+1):
p = c; c = [i]
for j in range(1,n+1):
x = p[j]+1
y = c[j-1]+1
if a[j-1] == b[i-1]:
z = p[j-1]
else:
z = p[j-1]+1
c.append(min(x,y,z))
return c[n]
Could perhaps be a bit more readable, though... (Maybe I'll work on
that...)
> Cheers,
> John
>
--
Magnus Making no sound / Yet smouldering with passion
Lie The firefly is still sadder / Than the moaning insect
Hetland : Minamoto Shigeyuki
More information about the Python-list
mailing list