[spambayes-dev] Generating a sample training database

Skip Montanaro skip at pobox.com
Wed Sep 17 13:54:42 EDT 2003

    bill> since the skew can work both ways (should someone like tim include
    bill> their extracurricular activities in the ham training sample :o),
    bill> wouldn't it make sense to create a number of initial databases
    bill> with *only* spam in them and let the user train an appropriate
    bill> amount of ham as part of the install? anecdotal evidence suggests
    bill> that just about everyone has some ham laying around, yet not
    bill> everyone keeps spam about.

In theory, however I don't think it's trivial for people using the Outlook
plugin to train on a single mail message.  They'd have to move several
messages from valid hammy mailboxes to a new one, train on it, then move the
messages back to their original locations.  

We'll just have to try it and see.  

I'll take the lead in grabbing the ham and spam and putting together a
sample training database (pickle format seems easiest).  If you'd like to
contribute (no more than two ham and two spam per person please), forward
such messages to me and make sure the Subject: includes "Sample Ham" or
"Sample Spam".  I will filter such messages out using procmail before
SpamBayes can see them.


More information about the spambayes-dev mailing list