On Sat, Mar 01, 2003 at 09:12:46AM -0800, T. Alexander Popiel wrote:
> Those who want to see my pretty graphs without waiting
> for the moderator approval of my .png-laden posting
> can go to http://www.wolfskeep.com/~popiel/spambayes/incremental
> to see all the pretty pictures (along with a bunch of the
> raw and semi-cooked data files).

Looks very nice indeed, and the results seem to be good (fn and fp ~10^-2).
For the other examples on your site, for which you use a parameter to
check its effect on the performance (e.g. the ham:spam ratio, of the
training set size), it would be nice to generate a ROC-curve:

In a ROC-curve (Receiver Operating Characteristic curve), you plot the
correct positive rate (y-axis) against the false positive rate (x-axis). The
points on the curve are given by using e.g. different spam:ham
ratio's. A ROC-curve doesn't necessarily provide more information, but
it is a rather standard way to present results in (more or less)
binary classification. The term ROC originates from RADAR detection
results, AFAIK.

A problem that needs to be addressed in making ROC-curves for
spambayes is how to handle unsures: disregarding them completely in
the ROC curve seems reasonable, but then one probably also needs a
correct.pos.rate vs. unsures rate curve.

