[Spambayes] ageing out database entries

Seth Goodman sethg.delete at GoodmanAssociates.com
Fri Nov 14 19:28:34 EST 2003


Kenny,

Thanks for the corrections.  I obviously have a lot to learn about the
details of the various Bayesian classifiers.  Do you have any comments on
the stuff at the bottom of my previous post (reproduced below)?  This is
what motivated the discussion of K9 training in the first place and I would
value any insight or suggestions that anyone could offer.

> ---------------------------------
>
> Why am I asking these questions?
>
> [Tim Peters]
> >>> I suggest you wait until you have a real problem before trying to
> >>> solve it.
> >
> >What *bothers* you about SpamBayes?  What doesn't work right, or what was
> >too hard to figure out, or what's still too confusing?  What's missing?
>
> Well, what bothers me, so far, is that despite training on 620
> ham and 1403
> spam, SpamBayes still manages to miss (score as ham) 5-10 messages per day
> out of around 150 scored messages.  Most of these missed spams have an
> initial score very close to zero, so simply lowering the ham
> threshold would
> not fix it.  After training as spam, their spam score often increases
> respectably, but sometimes, the score stays below 5%.  This indicates that
> the same message would be missed next time, as well.  I don't
> know if I just
> need to get a bigger or more balanced training set, if there are
> some types
> of tokens (such as embedded URL's in HTML spam) that are not currently
> parsed or if this is just as good as it gets.  Anyway, that's what I would
> like to see improved and it is the motivation for the above discussion.
>
> Any thoughts from those who've been there already?

--
Seth Goodman

  Humans:   please remove ".delete" to reply

  Spambots: please disregard the above




More information about the Spambayes mailing list