[Spambayes] training problem?

Seth Goodman nobody at spamcop.net
Wed Dec 3 14:56:55 EST 2003


Nice Wiki work.  Another difference I see between your approach and my
previous one is that you trained on 30 days worth of spam.  I was afraid to
do that since I get 140 spam/day, so 30 days worth is 4,200 messages.  To
get that much ham, I would need to go back almost 6 months.  However, maybe
that long of a history for spam is what it takes to get good detection.  I'm
amazed at your low unsure rate.

When I originally trained on 650 spam and 650 ham, that amounted to about
five days of spam and 26 days of ham.  Now I'm wondering if the longer time
frame for spam is the key.  Does anyone have any thoughts on this?

One small note:  on the email list, you mentioned using thresholds of 80/20,
but on the Wiki you said 90/10.

Seth Goodman

  Humans:   off-list replies to sethg [at] GoodmanAssociates [dot] com

  Spambots: disregard the above

