[Spambayes-checkins] spambayes/contrib nway.py,NONE,1.1
Skip Montanaro
montanaro at users.sourceforge.net
Tue Aug 12 14:15:24 EDT 2003
Update of /cvsroot/spambayes/spambayes/contrib
In directory sc8-pr-cvs1:/tmp/cvs-serv11977
Added Files:
nway.py
Log Message:
simple n-way classifier
--- NEW FILE: nway.py ---
#!/usr/bin/env python
"""
Demonstration of n-way classification possibilities.
Usage: %(prog)s [ -h ] tag=db ...
-h - print this message and exit.
All args are of the form 'tag=db' where 'tag' is the tag to be given in the
X-Spambayes-Classification: header. A single message is read from stdin and
a modified message sent to stdout. The message is compared against each
database in turn. If its score exceeds the spam threshold when scored
against a particular database, an X-Spambayes-Classification header is added
and the modified message is written to stdout. If none of the comparisons
yields a definite classification, the message is written with an
'X-Spambayes-Classification: unsure' header.
Training is left up to the user. In general, you want to train so that a
message in a particular category will score as spam when checked against
that category's training database. For example, suppose you have the
following mbox formatted files: python, music, family, cars. If you wanted
to create a training database for each of them you could execute this
series of mboxtrain.py commands:
mboxtrain.py -d python.db -s python -g music -g family -g cars
mboxtrain.py -d music.db -g python -s music -g family -g cars
mboxtrain.py -d family.db -g python -g music -s family -g cars
mboxtrain.py -d cars.db -g python -g music -g family -s cars
You'd then compare messages using a %(prog)s command like this:
%(prog)s python=python.db music=music.db family=family.db cars=cars.db
"""
import getopt
import sys
import os
from spambayes import hammie, mboxutils, Options
prog = os.path.basename(sys.argv[0])
def help():
print >> sys.stderr, __doc__ % globals()
def main(args):
opts, args = getopt.getopt(args, "h")
for opt, arg in opts:
if opt == '-h':
help()
return 0
tagdb_list = []
msg = mboxutils.get_message(sys.stdin)
try:
del msg["X-Spambayes-Classification"]
except KeyError:
pass
for pair in args:
tag, db = pair.split('=', 1)
h = hammie.open(db, True, 'r')
score = h.score(msg)
if score >= Options.options.spam_cutoff:
msg["X-Spambayes-Classification"] = "%s; %.2f" % (tag, score)
break
else:
msg["X-Spambayes-Classification"] = "unsure"
sys.stdout.write(msg.as_string(unixfrom=(msg.get_unixfrom()
is not None)))
return 0
if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))
More information about the Spambayes-checkins
mailing list