[Spambayes] training WAS: aging information
Mark Hammond
mhammond at skippinet.com.au
Wed Feb 19 22:01:08 EST 2003
[Paul]
> From: D. R. Evans [mailto:N7DR at arrisi.com]
> > I saw a comment in the LJ article that one should train on roughly
> > equal numbers of spam and ham. Is this actually true? (This
> question of
> > course merely demonstrates that I'm too lazy to do the
> maths myself.)
>
> That's something I'd be interested in, too - particularly as
> the ham:spam ratio people get is utterly out of their control.
Yes, but the number we use to train on isn't.
> I'm also too lazy - or possibly incompetent - to do the maths,
I'm certainly the latter <wink>, but:
> but IIRC, there were some
> experiments done at one stage. A pointer to the relevant
> posts (or better
> still, a summary on the website) would be very useful.
AFAIK, this experiment is still ongoing. Particularly, the Outlook default
config file still has:
---
# This will probably go away if testing confirms it's a Good Thing.
experimental_ham_spam_imbalance_adjustment: True
---
I guess it can safely be stated that testing has not proved it a bad thing,
but that isn't what the comment asks <wink>
> Unfortunately for me, my ham:spam ratio is something like 99%
> *spam*. This
> is because I run a highly filtered setup, with all my mailing
> list traffic
> getting taken out of the mail stream before spambayes gets a look in.
I am approaching that. My problem is that I delete items from my Inbox, but
never delete them from the Spam folder. This is mainly for training
purposes, but I guess it could come in handy when I need to make money fast
<wink>. However, the end result is that my spam:ham ratio is slowly
growing.
Human perception gets in the way though. It was not that many months ago
that I considered 20 spam a day bearable (and from what I understand, a .au
address means only 20 makes me lucky!). Now I find that for any "unsure"
items that are found, I begin to wonder if SpamBayes is no longer doing its
job. I believe the truth simpl is that SpamBayes has lowered my threshold
to the point where whenever I see *any* spam I recoil.
Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 2888 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes/attachments/20030219/6189e500/winmail.bin
More information about the Spambayes
mailing list