[Spambayes] Graph results

Fri Feb 28 21:15:52 EST 2003

Well, I tried to post to the list with the first set of graphs
I made, but that's waiting on moderator approval (likely because
of the several .png files attached).  The text portion of the
post follows.

- Alex

--------

This is the first set of really interesting graphs I've made
on the whole training regime thing.  These graphs show average
error rates (fp, fn, and unsure) over the prior 7 days of data
for any given point on the graph.  This is based on my actual
mail feed for the last 6 months, with nothing left out...
including our discussions of spam (with quoted examples) on
this list, which are a wonderful source of false positives.
I divided my mail into 24-hour groups for report buckets, and
I also divided all the mail into 5 sets, and did 5 runs of
each training regime each with 4 of the 5 sets.  These multiple
runs are all plotted on the graphs, which is why there's a
multiplicity of lines in each color.  (Doing multiple runs
like this points out which things are flukes, much like
a cross-validation, although the mechanics here are slightly
different.)

I have a few associated observations:

1. No matter how you train, spambayes gets very good very
   quickly... on the order of days to error rates < 5%.

2. Spambayes continues to improve for a couple months,
   but I'm starting to see an increase in errors after
   about 4-5 months.  I don't know why this is; it might
   be because spam is mutating, or it might be because
   my definition of spam has been mutating.

3. If you do perfect training as soon as messages arrive,
   you still get the occasional false positive and a fair
   amount of unsures.

4. Training immediately based on the classifier output and
   making corrections to perfect at the end of the day is
   only marginally worse than immediately perfect training.

5. Training only on fp, fn, and unsures doesn't change the
   fp much, but is significantly worse (double or triple,
   or 1 to 4%) on fn and unsures.

6. Training only on fp, fn, and unsures only trained on
   approximately 90 ham and 1300 spam (compared to 8200
   ham and 15300 spam for perfect training).

Doing these graphs was fun, in a nit-picky sort of way.  One
could spend weeks fiddling and coming up with more data to
make pretty pictures with.  I will probably spend some more
time building a few more training regimes (and posting this
on my website), but the moral of the story is pretty obvious:
spambayes is very good, and if you're willing to have slightly
higher error (and unsure) rates, then the amount of training
can be cut drastically.

Anyway, the next thing for me to really look at is the effect
of aging...

- Alex