Fuzzy string matching?
Magnus L. Hetland
mlh at idt.ntnu.no
Sat Aug 28 13:31:23 EDT 1999
Moshe Zadka <moshez at server.python.net> writes:
> On 28 Aug 1999, Magnus L. Hetland wrote:
[...]
> > Could perhaps be a bit more readable, though... (Maybe I'll work on
> > that...)
>
> No need, my good man. We have a saying in Hebrew "The work of the rightous
> gets done by others"
OK - thanks :)
>
> Added a comment, made a few variables more meaningful, precalculated
> ranges, changed some ; to , and here is the changed, and a bit tested,
> version:
Awright... I guess it's bit more readable now... However, there are a
few things I think I would do differently - surly only a matter of
taste... (and not meant as a sign of ingratitude :)
>
> def distance(a,b):
> n, m = map(len, (a,b))
Is the use of map really clearer than not using it here, with only two
variables? (It may be faster since you don't have to look up len
twice, but...)
> if n>m: # Make sure n<=m
> a,b,n,m = b,a,m,n
I guess I wrote this statement in the original - but I think perhaps I
would split the strings and lengths into two different assignments to
make it more readable, really...
> range_m, range_n= map(range, (m, n))
Hm. Why? Why is range_m better than range(m)?
> curr = range(0, n+1)
> for i in range_m:
> prev, curr = curr, [i+1]
> for j in range_n:
> add, dele = prev[j+1]+1, curr[j]+1
> change = prev[j] + (a[j]!=b[i] and 1 or 0)
Hm. This is perhaps more compact - and the and/or thing *is* an idiom,
but I don't think it's more readable than a plain if-statement...
> curr.append(min(add, dele, change))
> return curr[n]
>
Here's another take - mainly an expression of my tastes in the matter
(for lack of anything more meaningful to do in the evening... ;)
def distance(a,b):
"Calculates the Levenshtein distance between a and b."
n, m = len(a), len(b)
if n > m:
# Make sure n <= m, to use O(min(n,m)) space
a,b = b,a
n,m = m,n
current = range(0, n+1)
for i in range(m):
previous, current = current, [i+1]
for j in range(n):
add, delete = previous[j+1]+1, current[j]+1
change = previous[j]
if a[j] != b[i]:
change = change + 1
current.append(min(add, delete, change))
return current[n]
Oh well... There's nothing like fiddling with a little piece of code.
Much more fun than writing large programs methinks :)
--
Magnus Making no sound / Yet smouldering with passion
Lie The firefly is still sadder / Than the moaning insect
Hetland : Minamoto Shigeyuki
More information about the Python-list
mailing list