[Spambayes] Upgrading from 1.0a2
Gary Benson
gary at inauspicious.org
Fri Dec 5 09:40:47 EST 2003
Richie Hindle wrote:
> > I see Paul Graham quoting hitrates of 99.7% or whatever and I lust ;)
>
> I don't know what my percentage hit rate is, but it's well over 99%.
> To give you an idea, I receive around 50 hams per day, and around
> 300 spams. I get on average one false negative a day, and yesterday
> I had my first false positive (a confirmation for registering a .NET
> passport, don't get me started on Microsoft's support policies) in
> at least a month.
Mine is nowhere near that good, though hopefully my new spam_cutoff
will help.
> My database is 1292 spams and 646 hams. I initially trained with a couple
> of hundred of each,
Same here.
> and have been mistake-based training ever since (training on spams
> classified as ham or unsure, and on hams classified as spam - though
> they're vanishingly rare, hence the imbalance).
The way I have mine set up is that everything that ends up in my inbox
is classified as ham or spam, and every month I train it on the caught
spams to redress the balance. I may increase this to every week.
> > I have copies of every email it was trained on, so a retrain would
> > simply be a matter spending of half an hour or so writing a
> > script.
>
> You shouldn't need to write a script - if you can get your messages
> into an mbox file or a Maildir directory then you train either
> through the web interface or by using sb_mboxtrain.py or
> sb_filter.py.
They're in a funny format; the script would be to generate normal
mboxes from it.
Gary
[ gary at inauspicious.org ][ GnuPG 85A8F78B ][ http://inauspicious.org/ ]
More information about the Spambayes
mailing list