String comparision
S.Selvam Siva
s.selvamsiva at gmail.com
Mon Jan 26 07:30:05 CET 2009
Thank You Gabriel,
On Sun, Jan 25, 2009 at 7:12 AM, Gabriel Genellina
<gagsl-py2 at yahoo.com.ar>wrote:
> En Sat, 24 Jan 2009 15:08:08 -0200, S.Selvam Siva <s.selvamsiva at gmail.com>
> escribió:
>
>
> I am developing spell checker for my local language(tamil) using python.
>> I need to generate alternative word list for a miss-spelled word from the
>> dictionary of words.The alternatives must be as much as closer to the
>> miss-spelled word.As we know, ordinary string comparison wont work here .
>> Any suggestion for this problem is welcome.
>>
>
> I think it would better to add Tamil support to some existing library like
> GNU aspell: http://aspell.net/
That was my plan earlier,But i am not sure how aspell integrates with other
editors.Better i will ask it in aspell mailing list.
> You are looking for "fuzzy matching":
> http://en.wikipedia.org/wiki/Fuzzy_string_searching
> In particular, the Levenshtein distance is widely used; I think there is a
> Python extension providing those calculations.
>
> --
> Gabriel Genellina
The following code served my purpose,(thanks for some unknown contributors)
def distance(a,b):
c = {}
n = len(a); m = len(b)
for i in range(0,n+1):
c[i,0] = i
for j in range(0,m+1):
c[0,j] = j
for i in range(1,n+1):
for j in range(1,m+1):
x = c[i-1,j]+1
y = c[i,j-1]+1
if a[i-1] == b[j-1]:
z = c[i-1,j-1]
else:
z = c[i-1,j-1]+1
c[i,j] = min(x,y,z)
return c[n,m]
a=sys.argv[1]
b=sys.argv[2]
d=distance(a,b)
print "d=",d
longer = float(max((len(a), len(b))))
shorter = float(min((len(a), len(b))))
r = ((longer - d) / longer) * (shorter / longer)
# r ranges between 0 and 1
--
Yours,
S.Selvam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090126/95594bc9/attachment.html>
More information about the Python-list
mailing list