[Spambayes] training WAS: aging information

Mark Hammond mhammond at skippinet.com.au
Wed Feb 19 22:01:08 EST 2003


[Paul]

> From: D. R. Evans [mailto:N7DR at arrisi.com]
> > I saw a comment in the LJ article that one should train on roughly 
> > equal numbers of spam and ham. Is this actually true? (This 
> question of 
> > course merely demonstrates that I'm too lazy to do the 
> maths myself.)
> 
> That's something I'd be interested in, too - particularly as 
> the ham:spam ratio people get is utterly out of their control. 

Yes, but the number we use to train on isn't.

> I'm also too lazy - or possibly incompetent - to do the maths, 

I'm certainly the latter <wink>, but:
> but IIRC, there were some
> experiments done at one stage. A pointer to the relevant 
> posts (or better
> still, a summary on the website) would be very useful.

AFAIK, this experiment is still ongoing.  Particularly, the Outlook default
config file still has:
---
# This will probably go away if testing confirms it's a Good Thing.
experimental_ham_spam_imbalance_adjustment: True
---

I guess it can safely be stated that testing has not proved it a bad thing,
but that isn't what the comment asks <wink>

> Unfortunately for me, my ham:spam ratio is something like 99% 
> *spam*. This
> is because I run a highly filtered setup, with all my mailing 
> list traffic
> getting taken out of the mail stream before spambayes gets a look in.

I am approaching that.  My problem is that I delete items from my Inbox, but
never delete them from the Spam folder.  This is mainly for training
purposes, but I guess it could come in handy when I need to make money fast
<wink>.  However, the end result is that my spam:ham ratio is slowly
growing.  

Human perception gets in the way though.  It was not that many months ago
that I considered 20 spam a day bearable (and from what I understand, a .au
address means only 20 makes me lucky!).  Now I find that for any "unsure"
items that are found, I begin to wonder if SpamBayes is no longer doing its
job.  I believe the truth simpl is that SpamBayes has lowered my threshold
to the point where whenever I see *any* spam I recoil.

Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 2888 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes/attachments/20030219/6189e500/winmail.bin


More information about the Spambayes mailing list