[Spambayes] 2nd try at mean/dev cmp.py
Brad Clements
bkc@murkworks.com
Sun, 22 Sep 2002 16:36:28 -0400
---------------------- multipart/mixed attachment
Produces this output:
ham mean ham sdev
30.30 29.46 (-2.77%) 7.75 7.32 (-5.55%)
30.04 29.31 (-2.43%) 7.66 7.24 (-5.48%)
30.77 29.90 (-2.83%) 7.60 7.11 (-6.45%)
30.71 29.83 (-2.87%) 7.49 7.13 (-4.81%)
30.26 29.47 (-2.61%) 7.72 7.23 (-6.35%)
30.47 29.62 (-2.79%) 7.44 7.10 (-4.57%)
30.16 29.36 (-2.65%) 7.49 7.10 (-5.21%)
30.07 29.23 (-2.79%) 7.43 6.91 (-7.00%)
29.88 29.13 (-2.51%) 7.70 7.25 (-5.84%)
30.09 29.32 (-2.56%) 7.49 6.97 (-6.94%)
ham mean and sdev for all runs
30.27 29.46 (-2.68%) 7.58 7.14 (-5.80%)
spam mean spam sdev
79.05 78.63 (-0.53%) 7.90 7.79 (-1.39%)
78.99 78.67 (-0.41%) 8.44 8.25 (-2.25%)
79.15 78.85 (-0.38%) 8.31 8.15 (-1.93%)
79.31 79.00 (-0.39%) 8.00 7.81 (-2.38%)
79.07 78.79 (-0.35%) 8.02 7.77 (-3.12%)
79.36 79.05 (-0.39%) 7.65 7.37 (-3.66%)
78.73 78.52 (-0.27%) 7.87 7.60 (-3.43%)
78.99 78.71 (-0.35%) 8.24 7.99 (-3.03%)
79.33 79.02 (-0.39%) 8.28 8.05 (-2.78%)
78.79 78.46 (-0.42%) 8.47 8.37 (-1.18%)
spam mean and sdev for all runs
79.08 78.77 (-0.39%) 8.12 7.92 (-2.46%)
ham/spam mean difference: 48.81 49.31 +0.50
[bkc@marimba spambayes]$ cvs diff TestDriver.py
Index: TestDriver.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/TestDriver.py,v
retrieving revision 1.6
diff -r1.6 TestDriver.py
92c92
< print "Ham distribution for", tag
---
> print "-> <stat> Ham distribution for", tag,
96c96
< print "Spam distribution for", tag
---
> print "-> <stat> Spam distribution for", tag,
[bkc@marimba spambayes]$ cvs diff cmp.py
(attached)
Brad Clements, bkc@murkworks.com (315)268-1000
http://www.murkworks.com (315)268-9812 Fax
AOL-IM: BKClements
---------------------- multipart/mixed attachment
Index: cmp.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/cmp.py,v
retrieving revision 1.9
diff -r1.9 cmp.py
24c24,27
< fps = []
---
> fps = []
> hamdev = []
> spamdev = []
>
28c31
< if line.startswith('-> <stat> tested'):
---
> if line.startswith('-> <stat> tested'):
29a33,49
> if line.find('sample sdev') != -1:
> vals = line.split(';')
> mean = float(vals[1].split(' ')[-1])
> sdev = float(vals[2].split(' ')[-1])
> val = (mean,sdev)
> typ = vals[0].split(' ')[2]
> if line.find('for all runs') != -1:
> if typ == 'Ham':
> hamdevall = val
> else:
> spamdevall = val
> elif line.find('all in this') != -1:
> if typ == 'Ham':
> hamdev.append(val)
> else:
> spamdev.append(val)
> continue
33c53
< break
---
> break
47c67
< return fps, fns, fptot, fntot, fpmean, fnmean
---
> return fps, fns, fptot, fntot, fpmean, fnmean, hamdev, spamdev,hamdevall,spamdevall
51c71
< t = "tied"
---
> t = "tied "
60c80,91
<
---
>
> def mtag(m1,m2):
> mean1,dev1 = m1
> mean2,dev2 = m2
> mp = (mean2 - mean1) * 100.0 / mean1
> dp = (dev2 - dev1) * 100.0 / dev1
>
> return "%2.2f %2.2f (%+2.2f%%) %2.2f %2.2f (%+2.2f%%)" % (
> mean1,mean2,mp,
> dev1,dev2,dp
> )
>
70a102,105
>
> def dumpdev(meandev1,meandev2):
> for m1,m2 in zip(meandev1,meandev2):
> print mtag(m1, m2)
85,86c120,121
< fp1, fn1, fptot1, fntot1, fpmean1, fnmean1 = suck(file(f1n))
< fp2, fn2, fptot2, fntot2, fpmean2, fnmean2 = suck(file(f2n))
---
> fp1, fn1, fptot1, fntot1, fpmean1, fnmean1,hamdev1,spamdev1,hamdevall1,spamdevall1 = suck(file(f1n))
> fp2, fn2, fptot2, fntot2, fpmean2, fnmean2,hamdev2,spamdev2,hamdevall2,spamdevall2 = suck(file(f2n))
98a134,151
>
> print
> print "ham mean ham sdev"
> dumpdev(hamdev1,hamdev2)
> print
> print "ham mean and sdev for all runs"
> dumpdev([hamdevall1],[hamdevall2])
>
> print
> print "spam mean spam sdev"
> dumpdev(spamdev1,spamdev2)
> print
> print "spam mean and sdev for all runs"
> dumpdev([spamdevall1],[spamdevall2])
> print
> diff1 = spamdevall1[0] - hamdevall1[0]
> diff2 = spamdevall2[0] - hamdevall2[0]
> print "ham/spam mean difference: %2.2f %2.2f %+2.2f" % (diff1,diff2,(diff2-diff1))
---------------------- multipart/mixed attachment--