[Spambayes] Importing to Db

Tony Meyer tameyer at ihug.co.nz
Thu Apr 15 00:30:50 EDT 2004


> I am trying to setupo one big DB to send to other users. I 
> have been saving spam as .msg files and sending them to trend 
> to add to the server BUT this is a long process and the spam 
> gets coming faster and stronger. I was wondering If I could 
> import these .msg files directly into the spam db as spam and 
> then save amnd send the db to other users?

If you can put all the saved spam in an Outlook folder, then you can do
this:

 1. Close Outlook.

 2. Move aside your own databases (default_bayes_database.db and
default_message_database.db in your Data directory, which the FAQ explains
how to find).

 3. Start Outlook up again and cancel out of the Wizard if it appears.

 4. Go to the "SpamBayes Manager" dialog from the SpamBayes toolbar.

 5. On the "Training" tab, use the top two "Browse" buttons to select no ham
folders and the spam folder(s) you have put the saved spam into.

 6. Click "Start Training".

Once this is done, you can send the default_bayes_database.db and
default_message_database.db files to whoever you like to use as a starting
database.  If you want to combine databases, this will be more tricky.  To
get back to how you were, do this:

 7. Close Outlook.

 8. Replace the new database files (after sending them!) with the ones you
moved aside in step 2.

 9. Open Outlook again, and you're back how you were.

Some additional notes:

 * SpamBayes learns fairly quickly - maybe you don't need to kickstart
other's training?

 * A strength of SpamBayes is that it learns personal definitions of ham and
spam.  Are you certain that your definition of spam will match theirs?

 * SpamBayes works best with a roughly even number of ham and spam trained.
Are they going to have enough ham to be able to train to match the amount of
trained spam you're giving them?

 * I've never seen test results showing that "Training on everything" is a
good option, and the majority (AFAICT) of people don't use it in practice.
Training on mistakes is a good, simple, solution.  This means that the
database stays relatively small, too, which is good.  Does this mesh well
with the spam you're giving them?  (it should almost certainly be 'hard to
classify' spam).

 * A small database often beats a large one (see above point).  Giving
people a large amount of spam training to start with might not give good
results.  Giving them a smaller database might be better.

Hope this is of use.

=Tony Meyer

---
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.




More information about the Spambayes mailing list