[Spambayes] Upgrading from 1.0a2

Gary Benson gary at inauspicious.org
Fri Dec 5 09:40:47 EST 2003

Richie Hindle wrote:
> > I see Paul Graham quoting hitrates of 99.7% or whatever and I lust ;)
> I don't know what my percentage hit rate is, but it's well over 99%.
> To give you an idea, I receive around 50 hams per day, and around
> 300 spams.  I get on average one false negative a day, and yesterday
> I had my first false positive (a confirmation for registering a .NET
> passport, don't get me started on Microsoft's support policies) in
> at least a month.

Mine is nowhere near that good, though hopefully my new spam_cutoff
will help.

> My database is 1292 spams and 646 hams.  I initially trained with a couple
> of hundred of each,

Same here.

> and have been mistake-based training ever since (training on spams
> classified as ham or unsure, and on hams classified as spam - though
> they're vanishingly rare, hence the imbalance).

The way I have mine set up is that everything that ends up in my inbox
is classified as ham or spam, and every month I train it on the caught
spams to redress the balance.  I may increase this to every week.

> > I have copies of every email it was trained on, so a retrain would
> > simply be a matter spending of half an hour or so writing a
> > script.
> You shouldn't need to write a script - if you can get your messages
> into an mbox file or a Maildir directory then you train either
> through the web interface or by using sb_mboxtrain.py or
> sb_filter.py.

They're in a funny format; the script would be to generate normal
mboxes from it.


