[Spambayes] cmp.py with mean and dev comparison

Brad Clements bkc@murkworks.com
Sun, 22 Sep 2002 15:36:12 -0400


On 22 Sep 2002 at 15:19, Tim Peters wrote:

> [Brad Clements]
> > But then I'd have to change timcv.py
> 
> Why?  

Because I wasn't paying attention when I wrote that ;-)

I've found the goodies and just waiting for my first test re-run to complete.

Meanwhile, I'm going to move Hist into it's own module and allow it to keep all the 
numbers from add() and save itself in a pickle.

Then, I'm planning on working up a Pythoncard app that will plot a density (weighted 
moving average) of the values, instead of a bucket'd histogram.

By loading two pickled Hists (spam/ham), I'll throw in a slider control or two to show 
how many fp, fn's and "grey-n's" fall out based on spam-prob .. probably need another 
constant if we want a middle ground, though overlaying the graphs and playing with a 
slider might point out we don't need a middle ground.

Is extending Hist this way acceptable. Using it as a collection point should be illegal, but 
it's a good spot to do it without having to change a lot of stuff in TestDriver.

> > But .. since I'll be spelunking the timcv.py code, I think I'll
> > also work on a "end-user emulation" module. Where I can simulate
> > receiving messages and training as I go.
> 
> Such stuff probably belongs in new classes.

new module with a new class.

> > I suspect most users will be diligent about feeding spam into the
> > trainer, but will be lazy about feeding it ham.
> 
> Me too.  Gary fiddled the formulas in his approach to try to be robust in
> the face of this.  No experiments have been run to test it, though, neither
> under Gary's nor Paul's schemes.

I intend to find out..



Brad Clements,                bkc@murkworks.com   (315)268-1000
http://www.murkworks.com                          (315)268-9812 Fax
AOL-IM: BKClements