[Spambayes] an alternative use of filters

Mark Hammond mhammond at skippinet.com.au
Thu Dec 19 22:41:58 EST 2002


[Tim]

> While experience varies across test sets and care in training, in my
> experience Unsures are, over time, about half spam and half
> ham.  A curious
> and semi-encouraging thing is that they're overwhelmingly
> msgs *I* can't
> judge at a glance either, and sometimes it's so hard to tell
> I just throw
> the msg away as unintelligble.

While we are dropping anecdotes, my experience is similar - except I find
that I have more false negatives than false positives in the unsure range
(very very few false-anythings outside our standard unsure range).  All
false positives are very spammy ham.  IIRC, this also reflects the common
test results.

I'm starting to get interested in the life-cycles of our corpora, as I am
starting to get "annoyed" at these false-anythings.  I believe simply that
my tolerance level is falling (the better we get at filtering spam, the more
offensive both uncaught spam and missed ham become).  However, it *is*
possible that as my ham:spam training ratios change, the effectiveness of
the filter also changes subtly.  As the Outlook system keeps all spam, and
as I naturally delete a bit of everything *except* this spam, my spam:ham
ratio slowly, but continually increases.

When I get a round tuit, I would like to take some of the existing standard
test code, and twist it into generating some sort of "expiry" based
statistics - not just expiring unused words, but possibly expiring entire
messages (and possibly never expiring "unsure" messages, etc).  I'm starting
to think this is the next natural progression of the Outlook client -
working out how things go once we will have forgotten we even installed the
filter, and we have 5 years of spam competing against 1/10th of the ham
should we need to retrain.

Excluding-the-stand-alone-DLL-version-of-the-filter-which-is-getting-oh-so-c
lose-ly,

Mark.




More information about the Spambayes mailing list