[Spambayes] 2nd try at mean/dev cmp.py

Brad Clements bkc@murkworks.com
Sun, 22 Sep 2002 16:36:28 -0400


---------------------- multipart/mixed attachment
Produces this output:

ham mean                 ham sdev
30.30 29.46 (-2.77%)     7.75 7.32 (-5.55%)
30.04 29.31 (-2.43%)     7.66 7.24 (-5.48%)
30.77 29.90 (-2.83%)     7.60 7.11 (-6.45%)
30.71 29.83 (-2.87%)     7.49 7.13 (-4.81%)
30.26 29.47 (-2.61%)     7.72 7.23 (-6.35%)
30.47 29.62 (-2.79%)     7.44 7.10 (-4.57%)
30.16 29.36 (-2.65%)     7.49 7.10 (-5.21%)
30.07 29.23 (-2.79%)     7.43 6.91 (-7.00%)
29.88 29.13 (-2.51%)     7.70 7.25 (-5.84%)
30.09 29.32 (-2.56%)     7.49 6.97 (-6.94%)

ham mean and sdev for all runs
30.27 29.46 (-2.68%)     7.58 7.14 (-5.80%)

spam mean                spam sdev
79.05 78.63 (-0.53%)     7.90 7.79 (-1.39%)
78.99 78.67 (-0.41%)     8.44 8.25 (-2.25%)
79.15 78.85 (-0.38%)     8.31 8.15 (-1.93%)
79.31 79.00 (-0.39%)     8.00 7.81 (-2.38%)
79.07 78.79 (-0.35%)     8.02 7.77 (-3.12%)
79.36 79.05 (-0.39%)     7.65 7.37 (-3.66%)
78.73 78.52 (-0.27%)     7.87 7.60 (-3.43%)
78.99 78.71 (-0.35%)     8.24 7.99 (-3.03%)
79.33 79.02 (-0.39%)     8.28 8.05 (-2.78%)
78.79 78.46 (-0.42%)     8.47 8.37 (-1.18%)

spam mean and sdev for all runs
79.08 78.77 (-0.39%)     8.12 7.92 (-2.46%)

ham/spam mean difference: 48.81 49.31 +0.50

[bkc@marimba spambayes]$ cvs diff TestDriver.py
Index: TestDriver.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/TestDriver.py,v
retrieving revision 1.6
diff -r1.6 TestDriver.py
92c92
<     print "Ham distribution for", tag
---
>     print "-> <stat> Ham distribution for", tag,
96c96
<     print "Spam distribution for", tag
---
>     print "-> <stat> Spam distribution for", tag,

[bkc@marimba spambayes]$ cvs diff cmp.py
(attached)





Brad Clements,                bkc@murkworks.com   (315)268-1000
http://www.murkworks.com                          (315)268-9812 Fax
AOL-IM: BKClements



---------------------- multipart/mixed attachment
Index: cmp.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/cmp.py,v
retrieving revision 1.9
diff -r1.9 cmp.py
24c24,27
<     fps = []
---
>     fps = []
>     hamdev = []
>     spamdev = []
>     
28c31
<         if line.startswith('-> <stat> tested'):
---
>         if line.startswith('-> <stat> tested'):
29a33,49
>         if line.find('sample sdev') != -1:
>             vals = line.split(';')
>             mean = float(vals[1].split(' ')[-1])
>             sdev = float(vals[2].split(' ')[-1])
>             val = (mean,sdev)
>             typ = vals[0].split(' ')[2]
>             if line.find('for all runs') != -1:
>                 if typ == 'Ham':
>                     hamdevall = val
>                 else:
>                     spamdevall = val
>             elif line.find('all in this') != -1:
>                 if typ == 'Ham':
>                     hamdev.append(val)
>                 else:
>                     spamdev.append(val)
>             continue
33c53
<             break
---
>             break
47c67
<     return fps, fns, fptot, fntot, fpmean, fnmean
---
>     return fps, fns, fptot, fntot, fpmean, fnmean, hamdev, spamdev,hamdevall,spamdevall
51c71
<             t = "tied"
---
>             t = "tied          "
60c80,91
< 
---
> 
> def mtag(m1,m2):
>     mean1,dev1 = m1
>     mean2,dev2 = m2
>     mp = (mean2 - mean1) * 100.0 / mean1
>     dp = (dev2 - dev1) * 100.0 / dev1
> 
>     return "%2.2f %2.2f (%+2.2f%%)     %2.2f %2.2f (%+2.2f%%)" %  (
>             mean1,mean2,mp,
>             dev1,dev2,dp
>         )
>     
70a102,105
> 
> def dumpdev(meandev1,meandev2):
>     for m1,m2 in zip(meandev1,meandev2):
>         print mtag(m1, m2)
85,86c120,121
< fp1, fn1, fptot1, fntot1, fpmean1, fnmean1 = suck(file(f1n))
< fp2, fn2, fptot2, fntot2, fpmean2, fnmean2 = suck(file(f2n))
---
> fp1, fn1, fptot1, fntot1, fpmean1, fnmean1,hamdev1,spamdev1,hamdevall1,spamdevall1 = suck(file(f1n))
> fp2, fn2, fptot2, fntot2, fpmean2, fnmean2,hamdev2,spamdev2,hamdevall2,spamdevall2 = suck(file(f2n))
98a134,151
> 
> print
> print "ham mean                 ham sdev"
> dumpdev(hamdev1,hamdev2)
> print
> print "ham mean and sdev for all runs"
> dumpdev([hamdevall1],[hamdevall2])
> 
> print
> print "spam mean                spam sdev"
> dumpdev(spamdev1,spamdev2)
> print
> print "spam mean and sdev for all runs"
> dumpdev([spamdevall1],[spamdevall2])
> print
> diff1 = spamdevall1[0] - hamdevall1[0]
> diff2 = spamdevall2[0] - hamdevall2[0]
> print "ham/spam mean difference: %2.2f %2.2f %+2.2f" % (diff1,diff2,(diff2-diff1))

---------------------- multipart/mixed attachment--