[Spambayes] Tony Meyer - Training question
rcoe at CambridgeMA.GOV
Thu Sep 22 13:52:35 CEST 2005
But there's still a fundamental problem with that training regimen: If
you use it, your data base *will* eventually contain far more spam than
ham. And the developers profess to believe that this is a sub-optimal,
and possibly harmful, situation. Absent a solution that allows the user
to selectively prune the data base (or does so automatically), you may
want to recommend that users monitor their ham/spam ratio and retrain
from scratch if it gets too low.
Having said that, I should note that although my ham/spam ratio
currently stands at only .24, it doesn't seem to have appreciably
affected the performance of the filter. I'm just taking the developers'
word for it that there are cases where it seriously matters.
> -----Original Message-----
> From: spambayes-bounces at python.org
> [mailto:spambayes-bounces at python.org] On Behalf Of Tony Meyer
> Sent: Sunday, September 18, 2005 4:37 AM
> To: sales at technology.cc
> Cc: spambayes at python.org
> Subject: Re: [Spambayes] Tony Meyer - Training question
> > At this time I would like to know if you have changed your opinion
> > training since then. Here's what you said in a message to me on
> > 10, 2004 after reading my draft chapter.
> Yes, I would stick to what I said then. I would perhaps add that
> after time it is probably worth adjusting the thresholds, so that not
> quite as much ends up in the unsure folder, but maybe that's too
> > I basically distilled your advice down to "do no pre-training at all
> > train only on the UNSURE folder".
> *And* any mistakes.
> > Where do you stand on training these days, for people who simply
> > not or cannot follow a complicated set of instructions.
> It still seems that 'fpfnunsure' (all mistakes and unsures) gives
> generally good results with SpamBayes, and it's certainly the easiest
> method to use with the Outlook plug-in. I would (and do) still
> recommend it.
More information about the Spambayes