[Spambayes] How low can you go?
gerrit at nl.linux.org
Sat Dec 13 15:23:06 EST 2003
Skip Montanaro wrote:
> Not long ago I dumped it all in favor of a more minimalist approach. In the
I think that's a good idea. Having trained some unsures either as ham or
as spam although they were very similar (that is, spambayes was actually
_right_ to classify it as unsure but I didn't like it), the unsure ratio
of my database got worse and I started having false negatives. So now, I
restarted my database as well. Drawback is that by current unsure ratio
is 1 ;)
> At the moment I have trained on 14 spams and 20 hams and am quite pleased
> with how its performing so far.
I remember the same from the past.
My father is using non-bayesian spamassasin, and it seems the
spamassasin manpage warns that without 'hundreds of messages' bayesian
spamfiltering is unusable. This is obviously incorrect for Spambayes.
Spambayes comes with no knowledge. Does it have a more intelligent
algorithm? Or is the warning in the spamassasin manpage incorrect?
> So, how small is yours? <wink>
Currently 0, with no unsures, but for 4 minutes and a 1 per 5 minute
frequence of fetchmail, it's no surprise <w>.
I have the dilemma that a lot of spam I receive has already been
'handles' by my ISP. It filters it for viruses and if it contains a
virus (2/3 of the spam I receive does), it replaces it with a message.
The message is equal each time, so, based on the wording in that
message, some words with are not spammy at all based on intuition are
being handled as spammy words. It doesn't work bad, however. A more
serieus problem is how to recognize fake bounces from real bounces...
(ah, 1 unsure now, but I'll wait until I have at least 3 hams and 3
spams before I start training)
225. If he perform a serious operation on an ass or ox, and kill it, he
shall pay the owner one-fourth of its value.
-- 1780 BC, Hammurabi, Code of Law
Asperger's Syndrome - a personal approach:
More information about the Spambayes