[Spambayes-checkins] spambayes rates.py,NONE,1.1 README.txt,1.1,1.2
Tim Peters
tim_one@users.sourceforge.net
Thu, 05 Sep 2002 16:34:43 -0700
Update of /cvsroot/spambayes/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv8648
Modified Files:
README.txt
Added Files:
rates.py
Log Message:
Checking in one of the helper scripts I use to analyze test output.
--- NEW FILE: rates.py ---
"""
rates.py basename
Assuming that file
basename + '.txt'
contains output from timtest.py, scans that file for summary statistics,
displays them to stdout, and also writes them to file
basename + 's.txt'
(where the 's' means 'summary'). This doesn't need a full output file, and
will display stuff for as far as the output file has gotten so far.
Two of these summary files can later be fed to cmp.py.
"""
import re
import sys
"""
Training on Data/Ham/Set1 & Data/Spam/Set1 ... 4000 hams & 2750 spams
testing against Data/Ham/Set2 & Data/Spam/Set2 ... 4000 hams & 2750 spams
false positive: 0.025
false negative: 1.34545454545
new false positives: ['Data/Ham/Set2/66645.txt']
"""
pat1 = re.compile(r'\s*Training on Data/').match
pat2 = re.compile(r'\s+false (positive|negative): (.*)').match
pat3 = re.compile(r"\s+new false (positives|negatives): \[(.+)\]").match
def doit(basename):
ifile = file(basename + '.txt')
oname = basename + 's.txt'
ofile = file(oname, 'w')
print basename, '->', oname
def dump(*stuff):
msg = ' '.join(map(str, stuff))
print msg
print >> ofile, msg
nfn = nfp = 0
ntrainedham = ntrainedspam = 0
for line in ifile:
"Training on Data/Ham/Set1 & Data/Spam/Set1 ... 4000 hams & 2750 spams"
m = pat1(line)
if m:
dump(line[:-1])
fields = line.split()
ntrainedham += int(fields[-5])
ntrainedspam += int(fields[-2])
continue
"false positive: 0.025"
"false negative: 1.34545454545"
m = pat2(line)
if m:
kind, guts = m.groups()
guts = float(guts)
if kind == 'positive':
lastval = guts
else:
dump(' %7.3f %7.3f' % (lastval, guts))
continue
"new false positives: ['Data/Ham/Set2/66645.txt']"
m = pat3(line)
if m: # note that it doesn't match at all if the list is "[]"
kind, guts = m.groups()
n = len(guts.split())
if kind == 'positives':
nfp += n
else:
nfn += n
dump('total false pos', nfp, nfp * 1e2 / ntrainedham)
dump('total false neg', nfn, nfn * 1e2 / ntrainedspam)
for name in sys.argv[1:]:
doit(name)
Index: README.txt
===================================================================
RCS file: /cvsroot/spambayes/spambayes/README.txt,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** README.txt 5 Sep 2002 20:17:31 -0000 1.1
--- README.txt 5 Sep 2002 23:34:41 -0000 1.2
***************
*** 38,41 ****
--- 38,48 ----
+ Test Utilities
+ ==============
+ rates.py
+ Scans the output (so far) from timtest.py, and captures summary
+ statistics.
+
+
Test Data Utilities
===================