[Spambayes] RE: POP3 proxy blew up while trying to train - Uh,
tameyer at ihug.co.nz
Mon Feb 23 21:15:18 EST 2004
> Replying to my own message:
> I simply exited SpamBayes, renamed hammie.db, and restarted it.
Yes, your database was corrupted. If you can do this repeatedly, then we
would be *very* interested to hear how (and which version you are using).
We see occasional reports of this, but haven't been able to track down all
the causes, as yet.
> However, I noticed while doing this that I have 5162 hams
> cached, and only 577 spams cached. I train on everything,
> which may not be the best strategy, but it's the easiest I can see.
Try training on mistakes, which typically does better than
train-on-everything. IOW, just train on unsures, false positives and false
negatives. With 1.0a9, you can set the default values of the radio buttons
in the review pages to make this easier (Ham->Discard, Unsure->Defer,
Spam->Discard, for example).
> As a matter of hygiene, would it make a difference if I
> started cleaning out the ham cache, to bring it in to line
> with the size of the spam cache?
Note that it doesn't matter how many files are in those directories -
they're moved there once they are trained, and not used afterwards (unless
you correct training). If you want to undo training, the only way to do
this with sb_server at the moment is to rename/remove your hammie.db file.
> If that doesn't make a difference, is there a relatively easy
> way to implement one of the other strategies?
1.0a9 also has some new options to help with "nonedge" training, where you
train everything inside certain edges (say 0.05 - 0.95). You can set the
review page to display only messages within these ranges. From most
reports, nonedge or mistake-based-training works best*.
* Ignoring train-to-exhaustion, which sb_server isn't setup for.
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.
More information about the Spambayes