[Spambayes] sharing wordlists - better numbers

bill parducci bill at parducci.net
Tue May 27 09:02:47 EDT 2003


Brad Clements wrote:
> The presumption is that members of a group, by sharing the same language and 
> similar social settings will have a higher percentage of their words in common. So if 
> 95% of your words are also in Alex's list, and 92% of his words are in your list, thats 
> a lot better than only 50-50.

even if this were 100%...

> Note that this doesn't mean members of the group weight their words the same, only 
> that they see the same words.

...you would still have to have some sort of local index to refer the 
common token (as pointed out earlier in the discussion) so the size 
savings would be limited.

 > Is it possible/practical to target this problem by using a
 > split database?

the answers would seem to be yes/no respectively once you consider the 
additional overhead of referring to external data, merging tokens and 
increased fragility (single point of failure).

this is a very interesting idea, but after working it through in my 
head, it doesn't seem to offer an architectural improvement over the 
existing system (even for 7000 users). this is not to say that the 
pursuit of commonality, etc. won't bear fruit down the road, but that i 
personally don't think the original intent will be served.

just my two cents...

b




More information about the Spambayes mailing list