[Spambayes] training problem?

Seth Goodman nobody at spamcop.net
Wed Dec 3 14:32:33 EST 2003


[Ryan Malayter]
> Have my "spam" folder, with 773 messages in it, all less than a month
> old. I then use Outlook to do a search of all my mail folders *except*
> my spam folder (this is easy in Outlook 2002 and up, because you can
> exclude individual folders from search), for all mail messages newer
> than a month old. I move the "cutoff date" on this search until the
> number of messages returned by the search is very close to the number in
> my spam folder.
>
> I then *copy* all the messages from this search into a temporary Outlook
> folder called "Ham for training". Then, I train on this folder and my
> spam folder, rebuilding the database from scratch. I set my thresholds
> to 20/80, and train appropriately on all spam or ham that falls in the
> middle spams.
>
> I'll add this to your Wiki...

Thanks for the feedback and contribution to the Wiki.  This is close to what
I did on my previous run, but the results were not so good.  That run was
similar to what you did except that I used the default thresholds of 90/15
and the initial training set size was around 600 each spam and ham.  Maybe
using the lower spam threshold of 80 and training all the unsures is the
important difference.  I'll try that with my next run.  OTOH, maybe my spam
stream is just nasty.

--
Seth Goodman

  Humans:   off-list replies to sethg [at] GoodmanAssociates [dot] com

  Spambots: disregard the above




More information about the Spambayes mailing list