[Spambayes] spambayes fronting a mailing list?
tim.one at comcast.net
Thu Jan 16 11:35:39 EST 2003
[Barry A. Warsaw]
> My idea was to not train the list at all, before turning on
> spambayes. So the first batch of messages will all get held as
> unsure, and you'd use the admindb page to accept and reject messages.
> Accept messages would train as ham and rejected messages would get
> trained as spam.
Better to start by training on a few spam, and a few copies of the list
introduction msg (a decent intro msg necessarily contains many words and
lexicalisms characteristic of the list's topic).
If you have only ham in the database, the false negative rate will zoom
(every word in the database will be hammish).
If you have only spam in the database, the false positive rate will zoom
(every word in the database will be spammish).
> I wonder how long it'll take before spambayes gets pretty good at
> detecting what's appropriate and what's not for your list?
Depends more on list throughput than on time, i.e. it depends more on total
# of msgs trained on. By the time you've got 1 of each kind, it should do
better than chance. By the time you've got 20 of each kind, it should be a
major help. By the time you've got 500 of each, it should be excellent. By
the time you've got 15,000 of each, both error rates in c.l.py tests were
statistically indistinguishable from 0.
I keep hearing that spammers have gotten cleverer since then, but I haven't
seen evidence of it in my own email. The spam that sneaks through seems
much more likely to be due to spammer incompetence (like spam where they
forget to put *anything* in the msg body).
More information about the Spambayes