Fuzzy string matching?

Magnus L. Hetland mlh at idt.ntnu.no
Sat Aug 28 11:28:55 EDT 1999


"John Machin" <sjmachin at lexicon.net> writes:

> On 26 Aug 99, at 21:31, Magnus L. Hetland wrote:
> 
> > mlh at idt.ntnu.no (Magnus L. Hetland) writes:
> > 
[...]
> 
> Magnus,
> One presumes that your first assignment for your students is to go to 
> the library and/or think a little and come back with the well-known 
> O(min(m,n)) instead of O(mn) space requirement optimisation to the 
> algorithm you presented and therefore you would prefer not to have it 
> posted to this list, just in case they read it.

Nope. I can only plead ignorance here... :)

(I'm not really the lecturer - just responsible for the exercises,
so... ;)

> Oh and for what it's 
> worth, although the time is still O(mn), it decreases --- I have yet to 
> see a language where foo[i,j] takes no more time than bar[j]. Python, 
> with foo being a dictionary and bar being a list, is definitely not 
> such a language.

I see.

I didn't know *that* much about this algorithm, really - not even that
the measure was called Levinshtein distance - until I did some
research on the net... (Nice thing that, the net :)

Found some nice stuff at

  http://www.dir.univ-rouen.fr/~charras/seqcomp/

which had several algorithms for sequence comparisons (useful for
fuzzy matching). I didn't find any description of how to reduce the
space requirements, thoug... So I had to figure that out myself ;)

A new and improved version:

def distance(a,b):
    n = len(a); m = len(b)
    if n>m: a,b,n,m = b,a,m,n
    c = []
    for i in range(0,n+1):
        c.append(i)
    for i in range(1,m+1):
        p = c; c = [i]
        for j in range(1,n+1):
            x = p[j]+1
            y = c[j-1]+1
            if a[j-1] == b[i-1]:
                z = p[j-1]
            else:
                z = p[j-1]+1
            c.append(min(x,y,z))
    return c[n]


Could perhaps be a bit more readable, though... (Maybe I'll work on
that...)

> Cheers,
> John
> 

--

  Magnus              Making no sound / Yet smouldering with passion
  Lie          The firefly is still sadder / Than the moaning insect
  Hetland                                       : Minamoto Shigeyuki




More information about the Python-list mailing list