[Spambayes] SpamBayes for 500.000 users

Skip Montanaro skip at pobox.com
Tue Dec 16 17:00:26 EST 2003


    Chris> But I can give you some first-hand knowledge from a much smaller
    Chris> user base.  I'm setting the same thing up for an office of 5
    Chris> people, and here's the bare-bones fact; I need a separate
    Chris> database for each user.  I've tried using one database for
    Chris> everyone, and it does work.  But it only catches about 30-40
    Chris> percent of spam.  Not sure why this is the case, but it is
    Chris> (unbalanced training?).

Does your shared database draw fairly equally on mail sent to all five
people?  If not, you may find that some of the clues in the header will
"poison" your database.  Tim discovered this effect in spades during early
testing.  I believe one of the larger spam databases he used initially were
all sent to one person.  The recipient-oriented clues related to that user
poisoned his tests.

Skip



More information about the Spambayes mailing list