[Spambayes] sharing wordlists - better numbers
Skip Montanaro
skip at pobox.com
Tue May 27 10:36:31 EDT 2003
Brad> Are you suggesting including a "starter database" in the spambayes
Brad> distribution, by noting which words are common to more than one
Brad> person?
I thought that was the direction you were headed with this exercise.
I guess I misunderstood.
Brad> I wonder .. if we only used words which were "common" when
Brad> determining spaminess, how well would that work?
Should work pretty well if we include "common" words which turn out to be
strong spam or ham indicators for a suitable cross-section of the group.
Brad> Lets suppose in a "semi-shared database" mode, there was a
Brad> mechanism for 'upscaling' hapaxes into the "common word list", so
Brad> that long term the collective wordlist would continue to evolve.
Alex and I both have fairly large word databases. I suspect hapaxes will
remain hapaxes. I'm thinking of just a starter database of a reasonable
size. It could be shipped in plain text form then installed using Tim
Stone's (I believe) database importer/exporter tool.
Brad> Do Skip and Alex have a small std deviation in their virtual
Brad> group? ;-)
I realize there's a smiley, but what do you mean by "virtual group"?
Skip
More information about the Spambayes
mailing list