[Spambayes] RE: POP3 proxy blew up while trying to train - Uh, never mind...

Tony Meyer tameyer at ihug.co.nz
Mon Feb 23 21:15:18 EST 2004


> Replying to my own message:
> 
> I simply exited SpamBayes, renamed hammie.db, and restarted it.

Yes, your database was corrupted.  If you can do this repeatedly, then we
would be *very* interested to hear how (and which version you are using).
We see occasional reports of this, but haven't been able to track down all
the causes, as yet.

> However, I noticed while doing this that I have 5162 hams 
> cached, and only 577 spams cached. I train on everything, 
> which may not be the best strategy, but it's the easiest I can see.

Try training on mistakes, which typically does better than
train-on-everything.  IOW, just train on unsures, false positives and false
negatives.  With 1.0a9, you can set the default values of the radio buttons
in the review pages to make this easier (Ham->Discard, Unsure->Defer,
Spam->Discard, for example).

> As a matter of hygiene, would it make a difference if I 
> started cleaning out the ham cache, to bring it in to line 
> with the size of the spam cache?

Note that it doesn't matter how many files are in those directories -
they're moved there once they are trained, and not used afterwards (unless
you correct training).  If you want to undo training, the only way to do
this with sb_server at the moment is to rename/remove your hammie.db file.

> If that doesn't make a difference, is there a relatively easy 
> way to implement one of the other strategies?

1.0a9 also has some new options to help with "nonedge" training, where you
train everything inside certain edges (say 0.05 - 0.95).  You can set the
review page to display only messages within these ranges.  From most
reports, nonedge or mistake-based-training works best*.

=Tony Meyer

* Ignoring train-to-exhaustion, which sb_server isn't setup for.

---
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.




More information about the Spambayes mailing list