[Spambayes] Obvious Spam Missed...

Skip Montanaro skip at pobox.com
Wed Sep 17 10:19:04 EDT 2003

    Tim> One thing to watch out for is that if you put "too much" data into
    Tim> the starter database, more additional training is needed to cater
    Tim> to personal quirks than if a new user starts with an empty
    Tim> database.

If we do something like that, I think we should train on a very small set of
mails, maybe no more than 20-30 of each class.  After all, I think all we're
trying to do is give the new user's incoming mail an initial nudge in the
right direction.  Ideally, the mails should be spread across domains,
senders and recipients.  If that's not possible, header clues which relate
to the senders or recipients should be deleted before shipping.


More information about the Spambayes mailing list