how to find difference in number of characters
Peter Otten
__peter__ at web.de
Sat Oct 9 05:45:30 EDT 2010
harryos wrote:
> I am trying to write a compare method which takes two strings and find
> how many characters have changed.
>
>
> def compare_strings(s1,s2):
> pass
>
>
> text1="goat milk"
> text2="cow milk"
> print compare_strings(text1,text2)
>
> This must give 3 ,since 3 characters are changed between strings.I was
> advised to use levenshtein algorithm ..but then the matrix ops take a
> long time for nontrivial strings of say 20000 characters ..Can this
I tried it with a string of 20000 chars and the python-levenshtein package
that comes with ubuntu. It took about one second to calculate the distance:
import functools
import random
import string
import time
from Levenshtein import distance
def make_strings(n, delete, insert, swap, replace,
charset=string.ascii_letters):
def gen_chars():
while True:
yield random.choice(charset)
chars = gen_chars()
a = [next(chars) for i in xrange(n)]
s = "".join(a)
for i in range(delete):
del a[random.randrange(len(a))]
for i in range(insert):
a.insert(random.randrange(len(a)+1), next(chars))
for i in range(swap):
p = random.randrange(len(a)-1)
a[p], a[p+1] = a[p+1], a[p]
for i in range(replace):
a[random.randrange(len(a))] = next(chars)
t = "".join(a)
return s, t
N = 20000
M = 100
ms = functools.partial(make_strings, N, M, M, M, M)
def measure(f, *args):
start = time.time()
try:
return f(*args)
finally:
print time.time() - start
if __name__ == "__main__":
import sys
args = sys.argv[1:]
if args:
N, M = map(int, args)
s, t = make_strings(N, M, M, M, M)
print measure(distance, s, t)
$ python levenshtein_demo.py 10000 1000
0.225363969803
3644
$ python levenshtein_demo.py 20000 1000
1.05217313766
4197
$ python levenshtein_demo.py 30000 1000
2.38736391068
4390
$ python levenshtein_demo.py 40000 1000
4.1686527729
4558
What would be an acceptable time?
Peter
More information about the Python-list
mailing list