[Spambayes] SpamBayes for 500.000 users
cej at intech.com
Wed Dec 17 11:54:17 EST 2003
Skip Montanaro wrote:
> Chris> But I can give you some first-hand knowledge from a much smaller
> Chris> user base. I'm setting the same thing up for an office of 5
> Chris> people, and here's the bare-bones fact; I need a separate
> Chris> database for each user. I've tried using one database for
> Chris> everyone, and it does work. But it only catches about 30-40
> Chris> percent of spam. Not sure why this is the case, but it is
> Chris> (unbalanced training?).
>Does your shared database draw fairly equally on mail sent to all five
>people? If not, you may find that some of the clues in the header will
>"poison" your database. Tim discovered this effect in spades during early
>testing. I believe one of the larger spam databases he used initially were
>all sent to one person. The recipient-oriented clues related to that user
>poisoned his tests.
Nope. Not at all. My script scans messages stored in "Junk" or "Spam,"
grabs an equal number of messages in non-Inbox/Trash/Outbox/Spam/Junk
folders as ham, and trains on everyone. Quite clumsy.
More information about the Spambayes