Fuzzy string matching?

Magnus L. Hetland mlh at idt.ntnu.no
Sat Aug 28 17:28:55 CEST 1999

```"John Machin" <sjmachin at lexicon.net> writes:

> On 26 Aug 99, at 21:31, Magnus L. Hetland wrote:
>
> > mlh at idt.ntnu.no (Magnus L. Hetland) writes:
> >
[...]
>
> Magnus,
> One presumes that your first assignment for your students is to go to
> the library and/or think a little and come back with the well-known
> O(min(m,n)) instead of O(mn) space requirement optimisation to the
> algorithm you presented and therefore you would prefer not to have it
> posted to this list, just in case they read it.

Nope. I can only plead ignorance here... :)

(I'm not really the lecturer - just responsible for the exercises,
so... ;)

> Oh and for what it's
> worth, although the time is still O(mn), it decreases --- I have yet to
> see a language where foo[i,j] takes no more time than bar[j]. Python,
> with foo being a dictionary and bar being a list, is definitely not
> such a language.

I see.

I didn't know *that* much about this algorithm, really - not even that
the measure was called Levinshtein distance - until I did some
research on the net... (Nice thing that, the net :)

Found some nice stuff at

http://www.dir.univ-rouen.fr/~charras/seqcomp/

which had several algorithms for sequence comparisons (useful for
fuzzy matching). I didn't find any description of how to reduce the
space requirements, thoug... So I had to figure that out myself ;)

A new and improved version:

def distance(a,b):
n = len(a); m = len(b)
if n>m: a,b,n,m = b,a,m,n
c = []
for i in range(0,n+1):
c.append(i)
for i in range(1,m+1):
p = c; c = [i]
for j in range(1,n+1):
x = p[j]+1
y = c[j-1]+1
if a[j-1] == b[i-1]:
z = p[j-1]
else:
z = p[j-1]+1
c.append(min(x,y,z))
return c[n]

Could perhaps be a bit more readable, though... (Maybe I'll work on
that...)

> Cheers,
> John
>

--

Magnus              Making no sound / Yet smouldering with passion
Lie          The firefly is still sadder / Than the moaning insect
Hetland                                       : Minamoto Shigeyuki

```