[spambayes-dev] Any updated information for the plans to
makebetter statistics?
Tony Meyer
tameyer at ihug.co.nz
Thu Oct 14 22:36:16 CEST 2004
> Any updated information for the plans to make better
> statistics (in terms of persistence, presentation, and
> what's provided) especially for the Outlook addin?
I'm planning on adding persistence (like sb_server/sb_imapfilter already
have) in the near future (it would certainly make it into 1.1a1); I've done
some of the coding already. Presentation is more tricky - I presume that
what would be nice are some graphs, but I don't have the skills to draw a
dynamic graph in an Outlook dialog (if anyone provides example code to do
this, then I'd happily tie in the SpamBayes stuff).
> Really, all I'm looking for is a persistent stat area
> (much like the current per session one) but includes
> classification accuracy percentage which is calculated
> from the total messages processed and classification
> errors. I'm presuming that the errors would be the
> messages that were manually classified as good or spam
> and/or the false positives and false negatives?
I'm not sure about adding an "accuracy percentage", since it's hard to
define (what do you do with unsures?), and so might mislead people, since
there presumably wouldn't be room to explain it in the dialog. You're
welcome to try and convince me otherwise, of course; it would certainly be
an easy addition.
However, once the stats are persistent, it would be simple for you to
calculate any such figure yourself, of course. Would that suffice?
> This way I can better test this wonderful tool and provide
> much more concrete test results when using the Outlook
> addin. In terms of the spam filtering accuracy when using
> different training methods and/or options for the SpamBayes
> engine.
If you want to do serious testing, then you'd be better off using the
testing scripts that are in the source archive (timcv.py and
incremental.py). These don't rely on you consistently doing the same thing,
and you can easily test multiple options/regimes over the same mail set.
There's a bit of a learning curve involved, but it's not too bad.
=Tony Meyer
---
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.
More information about the spambayes-dev
mailing list