help in algorithm
Bengt Richter
bokr at oz.net
Thu Aug 11 14:19:12 EDT 2005
On Wed, 10 Aug 2005 16:51:55 +0200, Paolino <paolo_veronelli at tiscali.it> wrote:
>I have a self organizing net which aim is clustering words.
>Let's think the clustering is about their 2-grams set.
>Words then are instances of this class.
>
>class clusterable(str):
> def __abs__(self):# the set of q-grams (to be calculated only once)
> return set([(self+self[0])[n:n+2] for n in range(len(self))])
> def __sub__(self,other): # the q-grams distance between 2 words
> set1=abs(self)
> set2=abs(other)
> return len(set1|set2)-len(set1&set2)
>
>I'm looking for the medium of a set of words, as the word which
>minimizes the sum of the distances from those words.
>
>Aka:sum([medium-word for word in words])
>
>
>Thanks for ideas, Paolino
>
Just wondering if this is a desired result:
>>> clusterable('banana')-clusterable('bananana')
0
i.e., resulting from
>>> abs(clusterable('banana'))-abs(clusterable('bananana'))
set([])
>>> abs(clusterable('banana'))
set(['na', 'ab', 'ba', 'an'])
>>> abs(clusterable('bananana'))
set(['na', 'ab', 'ba', 'an'])
Regards,
Bengt Richter
More information about the Python-list
mailing list