help in algorithm

Bengt Richter bokr at oz.net
Thu Aug 11 20:19:12 CEST 2005

```On Wed, 10 Aug 2005 16:51:55 +0200, Paolino <paolo_veronelli at tiscali.it> wrote:

>I have  a self organizing net which aim is clustering words.
>Let's think the clustering is about their 2-grams set.
>Words then are instances of this class.
>
>class clusterable(str):
>   def __abs__(self):# the set of q-grams (to be calculated only once)
>     return set([(self+self[0])[n:n+2] for n in range(len(self))])
>   def __sub__(self,other): # the q-grams distance between 2 words
>     set1=abs(self)
>     set2=abs(other)
>     return len(set1|set2)-len(set1&set2)
>
>I'm looking  for the medium  of a set of words, as the word  which
>minimizes the sum of the distances from those words.
>
>Aka:sum([medium-word for word in words])
>
>
>Thanks for ideas, Paolino
>
Just wondering if this is a desired result:

>>> clusterable('banana')-clusterable('bananana')
0

i.e., resulting from

>>> abs(clusterable('banana'))-abs(clusterable('bananana'))
set([])
>>> abs(clusterable('banana'))
set(['na', 'ab', 'ba', 'an'])
>>> abs(clusterable('bananana'))
set(['na', 'ab', 'ba', 'an'])

Regards,
Bengt Richter

```