[Spambayes] SpamBayes for 500.000 users

Skip Montanaro skip at pobox.com
Wed Dec 17 10:46:37 EST 2003


    Dreas> So does this mean that Bayesian won't be effective for such a
    Dreas> large user database?

Dunno.  You'll definitely have to do some testing.  It might be sufficient
to do one or more of the following:

    * suppress tokenizing of headers which would generate such a strong
      user-oriented bias

    * make it easy for your users to submit mail for inclusion in the
      database

The first is a surmountable problem.  The key option to tweak is
address_headers in the Tokenizer section.  The second will be more
difficult.  In my limited multi-user experience:

    * people only submit spam

    * people (understandably) never submit anything very sensitive

Skip



More information about the Spambayes mailing list