[Spambayes] SpamBayes for 500.000 users

Dreas van Donselaar dreas at emailaccount.nl
Tue Dec 16 18:25:58 EST 2003


So does this mean that Bayesian won't be effective for such a large user
database?

Regards,

Dreas van Donselaar

-----Original Message-----
From: spambayes-bounces+dreas=emailaccount.nl at python.org
[mailto:spambayes-bounces+dreas=emailaccount.nl at python.org] On Behalf Of
Skip Montanaro
Sent: dinsdag 16 december 2003 23:00
To: Christopher Jastram
Cc: spambayes at python.org
Subject: Re: [Spambayes] SpamBayes for 500.000 users


    Chris> But I can give you some first-hand knowledge from a much smaller
    Chris> user base.  I'm setting the same thing up for an office of 5
    Chris> people, and here's the bare-bones fact; I need a separate
    Chris> database for each user.  I've tried using one database for
    Chris> everyone, and it does work.  But it only catches about 30-40
    Chris> percent of spam.  Not sure why this is the case, but it is
    Chris> (unbalanced training?).

Does your shared database draw fairly equally on mail sent to all five
people?  If not, you may find that some of the clues in the header will
"poison" your database.  Tim discovered this effect in spades during early
testing.  I believe one of the larger spam databases he used initially were
all sent to one person.  The recipient-oriented clues related to that user
poisoned his tests.

Skip

_______________________________________________
Spambayes at python.org
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html




More information about the Spambayes mailing list