# how to find difference in number of characters

Peter Otten __peter__ at web.de
Sat Oct 9 11:45:30 CEST 2010

```harryos wrote:

> I am trying to write a compare method which takes two strings and find
> how many characters have changed.
>
>
> def compare_strings(s1,s2):
>     pass
>
>
> text1="goat milk"
> text2="cow milk"
> print compare_strings(text1,text2)
>
> This must give 3 ,since 3 characters are changed between strings.I was
> advised to use levenshtein algorithm ..but then the matrix ops take a
> long time for nontrivial strings of say 20000 characters ..Can this

I tried it with a string of 20000 chars and the python-levenshtein package
that comes with ubuntu. It took about one second to calculate the distance:

import functools
import random
import string
import time

from Levenshtein import distance

def make_strings(n, delete, insert, swap, replace,
charset=string.ascii_letters):
def gen_chars():
while True:
yield random.choice(charset)
chars = gen_chars()
a = [next(chars) for i in xrange(n)]
s = "".join(a)
for i in range(delete):
del a[random.randrange(len(a))]
for i in range(insert):
a.insert(random.randrange(len(a)+1), next(chars))
for i in range(swap):
p = random.randrange(len(a)-1)
a[p], a[p+1] = a[p+1], a[p]
for i in range(replace):
a[random.randrange(len(a))] = next(chars)
t = "".join(a)
return s, t

N = 20000
M = 100
ms = functools.partial(make_strings, N, M, M, M, M)

def measure(f, *args):
start = time.time()
try:
return f(*args)
finally:
print time.time() - start

if __name__ == "__main__":
import sys
args = sys.argv[1:]
if args:
N, M = map(int, args)

s, t = make_strings(N, M, M, M, M)
print measure(distance, s, t)

\$ python levenshtein_demo.py 10000 1000
0.225363969803
3644
\$ python levenshtein_demo.py 20000 1000
1.05217313766
4197
\$ python levenshtein_demo.py 30000 1000
2.38736391068
4390
\$ python levenshtein_demo.py 40000 1000
4.1686527729
4558

What would be an acceptable time?

Peter

```