[Spambayes] SpamBayes for 500.000 users

Christopher Jastram cej at intech.com
Wed Dec 17 11:54:17 EST 2003


Skip Montanaro wrote:

>    Chris> But I can give you some first-hand knowledge from a much smaller
>    Chris> user base.  I'm setting the same thing up for an office of 5
>    Chris> people, and here's the bare-bones fact; I need a separate
>    Chris> database for each user.  I've tried using one database for
>    Chris> everyone, and it does work.  But it only catches about 30-40
>    Chris> percent of spam.  Not sure why this is the case, but it is
>    Chris> (unbalanced training?).
>
>Does your shared database draw fairly equally on mail sent to all five
>people?  If not, you may find that some of the clues in the header will
>"poison" your database.  Tim discovered this effect in spades during early
>testing.  I believe one of the larger spam databases he used initially were
>all sent to one person.  The recipient-oriented clues related to that user
>poisoned his tests.
>
>Skip
>
>  
>
Nope.  Not at all.  My script scans messages stored in "Junk" or "Spam," 
grabs an equal number of messages in non-Inbox/Trash/Outbox/Spam/Junk 
folders as ham, and trains on everyone.  Quite clumsy.

Chris




More information about the Spambayes mailing list