[Spambayes] cmp.py with mean and dev comparison
Sun, 22 Sep 2002 15:36:12 -0400
On 22 Sep 2002 at 15:19, Tim Peters wrote:
> [Brad Clements]
> > But then I'd have to change timcv.py
Because I wasn't paying attention when I wrote that ;-)
I've found the goodies and just waiting for my first test re-run to complete.
Meanwhile, I'm going to move Hist into it's own module and allow it to keep all the
numbers from add() and save itself in a pickle.
Then, I'm planning on working up a Pythoncard app that will plot a density (weighted
moving average) of the values, instead of a bucket'd histogram.
By loading two pickled Hists (spam/ham), I'll throw in a slider control or two to show
how many fp, fn's and "grey-n's" fall out based on spam-prob .. probably need another
constant if we want a middle ground, though overlaying the graphs and playing with a
slider might point out we don't need a middle ground.
Is extending Hist this way acceptable. Using it as a collection point should be illegal, but
it's a good spot to do it without having to change a lot of stuff in TestDriver.
> > But .. since I'll be spelunking the timcv.py code, I think I'll
> > also work on a "end-user emulation" module. Where I can simulate
> > receiving messages and training as I go.
> Such stuff probably belongs in new classes.
new module with a new class.
> > I suspect most users will be diligent about feeding spam into the
> > trainer, but will be lazy about feeding it ham.
> Me too. Gary fiddled the formulas in his approach to try to be robust in
> the face of this. No experiments have been run to test it, though, neither
> under Gary's nor Paul's schemes.
I intend to find out..
Brad Clements, email@example.com (315)268-1000
http://www.murkworks.com (315)268-9812 Fax