[Spambayes] Obvious Spam Missed...
Skip Montanaro
skip at pobox.com
Wed Sep 17 10:19:04 EDT 2003
Tim> One thing to watch out for is that if you put "too much" data into
Tim> the starter database, more additional training is needed to cater
Tim> to personal quirks than if a new user starts with an empty
Tim> database.
If we do something like that, I think we should train on a very small set of
mails, maybe no more than 20-30 of each class. After all, I think all we're
trying to do is give the new user's incoming mail an initial nudge in the
right direction. Ideally, the mails should be spread across domains,
senders and recipients. If that's not possible, header clues which relate
to the senders or recipients should be deleted before shipping.
Skip
More information about the Spambayes
mailing list