[Spambayes] POP3 Server Performance Issue Win2K SP4

Kenny Pitt kennypitt at hotmail.com
Mon Dec 8 14:32:38 EST 2003


W. Eliot Kimber wrote:
> Kenny Pitt wrote:
>> It usually isn't necessary to train on every message you receive,
>> and if you've received 25,000 spams in about 3 months then I suspect
>> your training is heavily over-balanced toward spam anyway.  I would
>> suggest retraining on a much smaller set of messages (50-100 of each
>> is probably more than sufficient).  After that, be more selective
>> about which messages you actually train on.  If most of your
>> messages classify correctly then you shouldn't need to train much at
>> all. 
> 
> I'm feeling a bit slow--but what is the process for doing this sort of
> re-training? Do I simply delete hammie.db and then retrain again using
> either new messages or old messages that I think are representative?
> 
> I couldn't find any docs that spoke to this process directly.

I use sb_server mostly when testing and not on a day-to-day basis so I
may not be the best person to address that, but I'll give it a shot
anyway.

Yes, you can delete your database and start over.  You should probably
delete both your statistics_database and your message_info_database just
to make sure they stay in sync.  You can then retrain using a small,
representative subset of messages.

IIRC, you said you were using Mozilla Mail?  If that is correct, then
each of the folders in your Local Folders is stored in "mbox" format
which is understood by the training option.  You can create two folders
such as "Ham Training" and "Spam Training" and copy the messages that
you want to train on into those folders.  You can then browse to the
storage files for those folders (which are buried deep under your
Mozilla profile directory) and feed each to the training option with the
appropriate classification.

After that, just watch your mail for unsures and mistakes.  IMHO, there
really isn't much reason to Review Messages as long as SpamBayes
classifies everything correctly.  Unreviewed messages in the cache will
eventually expire, and you can use the Advanced Configuration page to
set how long they are kept.  When you do get some unsures or mistakes,
do Review Messages to correct them.  I would recommend also training on
a few of the other messages there as needed, just enough to keep your
training set balanced, and discard the rest.

-- 
Kenny Pitt




More information about the Spambayes mailing list