[Spambayes] RE: SpamBayes for 500.000 users

Robert K. Coe bob at 1776.com
Wed Dec 17 17:39:23 EST 2003


If you genuinely have to go to a server-based solution, you're probably dealing with a user community that's abnormally spam-averse (e.g., it may include lots of small children) or whose management is unusually sensitive to the time consumed by users in dealing with spam. In either case, you won't be able to populate your database with contributions from users. I think any server-based solution that depends on user contributions is probably doomed.

Bob


> -----Original Message-----
> From: Skip Montanaro [mailto:skip at pobox.com]
> Sent: Wednesday, December 17, 2003 10:47 AM
> To: Dreas van Donselaar
> Cc: spambayes at python.org
> Subject: RE: [Spambayes] SpamBayes for 500.000 users
> 
> 
> 
>     Dreas> So does this mean that Bayesian won't be effective for such a
>     Dreas> large user database?
> 
> Dunno.  You'll definitely have to do some testing.  It might be sufficient
> to do one or more of the following:
> 
>     * suppress tokenizing of headers which would generate such a strong
>       user-oriented bias
> 
>     * make it easy for your users to submit mail for inclusion in the
>       database
> 
> The first is a surmountable problem.  The key option to tweak is
> address_headers in the Tokenizer section.  The second will be more
> difficult.  In my limited multi-user experience:
> 
>     * people only submit spam
> 
>     * people (understandably) never submit anything very sensitive




More information about the Spambayes mailing list