[spambayes-dev] Generating a sample training database
Tim Peters
tim.one at comcast.net
Wed Sep 17 22:35:36 EDT 2003
[Skip]
>> I'll take the lead in grabbing the ham and spam and putting
>> together a sample training database (pickle format seems
>> easiest). If you'd like to contribute (no more than two ham
>> and two spam per person please), forward such messages to me
[Tony Meyer]
> Is there any particular sort of message that we should contribute?
> Something extremely hammy/spammy? Something that we think is really
> generic? Or just any random message we click on?
I suggest only msgs that score 1.00 (rounded) and 0.00 (rounded) when
originally received (not after training on them) -- we're trying to catch a
good deal of blatant spam with a starter database, and can't fine-tune
anyway. You probably don't want to forward ham containing personal details.
More information about the spambayes-dev
mailing list