[Spambayes] SpamBayes for 500.000 users
Dreas van Donselaar
dreas at emailaccount.nl
Tue Dec 16 18:25:58 EST 2003
So does this mean that Bayesian won't be effective for such a large user
database?
Regards,
Dreas van Donselaar
-----Original Message-----
From: spambayes-bounces+dreas=emailaccount.nl at python.org
[mailto:spambayes-bounces+dreas=emailaccount.nl at python.org] On Behalf Of
Skip Montanaro
Sent: dinsdag 16 december 2003 23:00
To: Christopher Jastram
Cc: spambayes at python.org
Subject: Re: [Spambayes] SpamBayes for 500.000 users
Chris> But I can give you some first-hand knowledge from a much smaller
Chris> user base. I'm setting the same thing up for an office of 5
Chris> people, and here's the bare-bones fact; I need a separate
Chris> database for each user. I've tried using one database for
Chris> everyone, and it does work. But it only catches about 30-40
Chris> percent of spam. Not sure why this is the case, but it is
Chris> (unbalanced training?).
Does your shared database draw fairly equally on mail sent to all five
people? If not, you may find that some of the clues in the header will
"poison" your database. Tim discovered this effect in spades during early
testing. I believe one of the larger spam databases he used initially were
all sent to one person. The recipient-oriented clues related to that user
poisoned his tests.
Skip
_______________________________________________
Spambayes at python.org
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html
More information about the Spambayes
mailing list