[Spambayes] spambayes fronting a mailing list?
Tim Stone - Four Stones Expressions
tim at fourstonesExpressions.com
Thu Jan 16 11:52:15 EST 2003
1/16/2003 10:35:39 AM, Tim Peters <tim.one at comcast.net> wrote:
>[Barry A. Warsaw]
>> My idea was to not train the list at all, before turning on
>> spambayes. So the first batch of messages will all get held as
>> unsure, and you'd use the admindb page to accept and reject messages.
>> Accept messages would train as ham and rejected messages would get
>> trained as spam.
I think I'm hearing something on this thread that doesn't make much sense to
me. If we always train as spam stuff that's been classified as spam, always
train as ham stuff that's been classified as ham, then we're kinda reinforcing
the obvious, and increasing the spaminess of words in that spam... isn't it
more realistic (and ultimately actually better) to train on a random sample
rather than always? - TimS
>Better to start by training on a few spam, and a few copies of the list
>introduction msg (a decent intro msg necessarily contains many words and
>lexicalisms characteristic of the list's topic).
>If you have only ham in the database, the false negative rate will zoom
>(every word in the database will be hammish).
>If you have only spam in the database, the false positive rate will zoom
>(every word in the database will be spammish).
>> I wonder how long it'll take before spambayes gets pretty good at
>> detecting what's appropriate and what's not for your list?
>Depends more on list throughput than on time, i.e. it depends more on total
># of msgs trained on. By the time you've got 1 of each kind, it should do
>better than chance. By the time you've got 20 of each kind, it should be a
>major help. By the time you've got 500 of each, it should be excellent. By
>the time you've got 15,000 of each, both error rates in c.l.py tests were
>statistically indistinguishable from 0.
>I keep hearing that spammers have gotten cleverer since then, but I haven't
>seen evidence of it in my own email. The spam that sneaks through seems
>much more likely to be due to spammer incompetence (like spam where they
>forget to put *anything* in the msg body).
>Spambayes mailing list
>Spambayes at python.org
c'est moi - TimS
More information about the Spambayes