[spambayes-dev] Enhanced Outlook statistics display

Kenny Pitt kennypitt at hotmail.com
Thu Dec 9 14:38:27 CET 2004


Tony Meyer wrote:
> I've wondered about putting up the spam "cost" as calculated by the
> various testtools scripts (by default 10*fp+fn+0.2*unsure).  It does
> give a single figure for accuracy that takes into consideration how
> bad fp's are and works with unsures - and you could use it with
> filters that don't have an unsure category.  The weights are
> adjustable via options, although few people would.

That's another possibility, although it would probably be more difficult to
compare against other spam filters (especially if anyone did adjust the
weights).  John's main point in his "batting average" article was that a
single accuracy score makes it difficult to see the difference between
filters that reduce false positives by letting though a lot of spam vs.
filters that kill almost all of the spam at the expense of increased false
positives.  By reporting the scores separately, the user can make the
tradeoff based on what is more important to them.

> I'm fine with the stats that we have now (what I would like, and
> might get to at some point, is to centralise the stats code somewhat
> so that we don't have to keep updating both the web interface code
> and Outlook separately).

That would be good, but difficult currently because they take entirely
different approaches.  The Oulook addin totals up the stats as it goes,
while sb_server recalculates them by iterating through the data in the
messageinfo database.  Maybe the changes you made to utilize the same
messageinfo database for Outlook will allow us to calculate the Outlook
stats the same way.  At the very least, though, we could probably create a
function that takes the raw counts as a parameter and returns the formatting
dictionary with the complete set of statistics.  I'll look into that as a
first step.

> What do you think about the stats that are requested in the tracker?

Are you refering to RFE #765924 regarding breaking down the stats by
hour/day/week, etc?  That seems like a lot of work for a questionable value,
especially since we would probably have to store a bit more data in
messageinfo to allow it.

> Another thought I had was that we could fit a "Reset Statistics"
> button on the Statistics panel (all it would have to do is delete the
> pickle and reset the session stats).  People might want to collect
> (eg) monthly stats, or stats after an initial training period, and
> that would make it easier for them.  I hate mucking about with the
> dialogs - you want to do this? ;)     

Should be easy enough, I'll take a look.  It would probably be nice to save
the date when the statistics were last reset, as well.  I haven't done much
with pickles.  Is that something that could be easily added to the stats
file?

-- 
Kenny Pitt



More information about the spambayes-dev mailing list