[Spambayes-checkins] spambayes pop3proxy.py,1.14,1.15
Richie Hindle
richiehindle@users.sourceforge.net
Wed Nov 13 18:19:48 2002
Update of /cvsroot/spambayes/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv20474
Modified Files:
pop3proxy.py
Log Message:
o All command line switches and options now default to values from
bayescustomize.ini. Thanks to Francois Granger for the idea.
o Instead of there being two radio buttons (ham, spam) on the training
form, there are now two buttons: "Train as Ham" and "Train as Spam".
Thanks to Just van Rossum for the suggestion.
o "Classify message" form - paste or upload a message for classification.
Gives you the spam probability and the clues.
o It now gives a decent error if the POP3 server is unreachable.
o The "Bad file descriptor" / last-response-is-logged-three-times bug
is (hopefully) fixed.
o The bug whereby socket errors could cause the "Active POP3
conversations" count to go negative is fixed.
o After doing a word query, it now prepopulates the query field with
your word - handy if you mistyped it or you want to try a variant.
Index: pop3proxy.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/pop3proxy.py,v
retrieving revision 1.14
retrieving revision 1.15
diff -C2 -d -r1.14 -r1.15
*** pop3proxy.py 10 Nov 2002 19:59:22 -0000 1.14
--- pop3proxy.py 13 Nov 2002 18:19:45 -0000 1.15
***************
*** 7,11 ****
header. Usage:
! pop3proxy.py [options] <server> [<server port>]
<server> is the name of your real POP3 server
<port> is the port number of your real POP3 server, which
--- 7,11 ----
header. Usage:
! pop3proxy.py [options] [<server> [<server port>]]
<server> is the name of your real POP3 server
<port> is the port number of your real POP3 server, which
***************
*** 13,16 ****
--- 13,20 ----
options:
+ -z : Runs a self-test and exits.
+ -t : Runs a test POP3 server on port 8110 (for testing).
+ -h : Displays this help message.
+
-p FILE : use the named data file
-d : the file is a DBM file rather than a pickle
***************
*** 20,28 ****
-b : Launch a web browser showing the user interface.
! pop3proxy -t
! Runs a test POP3 server on port 8110; useful for testing.
!
! pop3proxy -h
! Displays this help message.
For safety, and to help debugging, the whole POP3 conversation is
--- 24,30 ----
-b : Launch a web browser showing the user interface.
! All command line arguments and switches take their default
! values from the [Hammie], [pop3proxy] and [html_ui] sections
! of bayescustomize.ini.
For safety, and to help debugging, the whole POP3 conversation is
***************
*** 48,72 ****
todo = """
! o (Re)training interface - one message per line, quick-rendering table.
! o Slightly-wordy index page; intro paragraph for each page.
o Once the training stuff is on a separate page, make the paste box
bigger.
- o "Links" section (on homepage?) to project homepage, mailing list,
- etc.
- o "Home" link (with helmet!) at the end of each page.
- o "Classify this" - just like Train.
- o "Send me an email every [...] to remind me to train on new
- messages."
- o "Send me a status email every [...] telling how many mails have been
- classified, etc."
o Deployment: Windows executable? atlaxwin and ctypes? Or just
webbrowser?
- o Possibly integrate Tim Stone's SMTP code - make it use async, make
- the training code update (rather than replace!) the database.
o Can it cleanly dynamically update its status display while having a
POP3 converation? Hammering reload sucks.
o Add a command to save the database without shutting down, and one to
reload the database.
! o Leave the word in the input field after a Word query.
"""
--- 50,103 ----
todo = """
!
! User interface improvements:
!
o Once the training stuff is on a separate page, make the paste box
bigger.
o Deployment: Windows executable? atlaxwin and ctypes? Or just
webbrowser?
o Can it cleanly dynamically update its status display while having a
POP3 converation? Hammering reload sucks.
o Add a command to save the database without shutting down, and one to
reload the database.
! o Save the Status (num classified, etc.) between sessions.
!
!
! New features:
!
! o (Re)training interface - one message per line, quick-rendering table.
! o "Send me an email every [...] to remind me to train on new
! messages."
! o "Send me a status email every [...] telling how many mails have been
! classified, etc."
! o Possibly integrate Tim Stone's SMTP code - make it use async, make
! the training code update (rather than replace!) the database.
! o Option to keep trained messages and view potential FPs and FNs to
! correct them.
! o Allow use of the UI without the POP3 proxy.
!
!
! Code quality:
!
! o Move the UI into its own module.
! o Eventually, pull the common HTTP code from pop3proxy.py and Entrian
! Debugger into a library.
!
!
! Info:
!
! o Slightly-wordy index page; intro paragraph for each page.
! o In both stats and training results, report nham and nspam - warn if
! they're very different (for some value of 'very').
! o "Links" section (on homepage?) to project homepage, mailing list,
! etc.
!
!
! Gimmicks:
!
! o Classify a web page given a URL.
! o Graphs. Of something. Who cares what?
! o Zoe...!
!
"""
***************
*** 147,151 ****
self.set_terminator('\r\n')
self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
! self.connect((serverName, serverPort))
def collect_incoming_data(self, data):
--- 178,188 ----
self.set_terminator('\r\n')
self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
! try:
! self.connect((serverName, serverPort))
! except socket.error, e:
! print >>sys.stderr, "Can't connect to %s:%d: %s" % \
! (serverName, serverPort, e)
! self.close()
! self.lineCallback('') # "The socket's been closed."
def collect_incoming_data(self, data):
***************
*** 199,203 ****
self.response = self.response + line
! # Is this line that terminates a set of headers?
self.seenAllHeaders = self.seenAllHeaders or line in ['\r\n', '\n']
--- 236,240 ----
self.response = self.response + line
! # Is this the line that terminates a set of headers?
self.seenAllHeaders = self.seenAllHeaders or line in ['\r\n', '\n']
***************
*** 237,241 ****
else:
# Assume that an unknown command will get a single-line
! # response. This should work for errors and for POP-AUTH.
return False
--- 274,281 ----
else:
# Assume that an unknown command will get a single-line
! # response. This should work for errors and for POP-AUTH,
! # and is harmless even for multiline responses - the first
! # line will be passed to onTransaction and ignored, then the
! # rest will be proxied straight through.
return False
***************
*** 246,257 ****
def found_terminator(self):
"""Asynchat override."""
! if self.request.strip().upper() == 'KILL':
! self.serverSocket.sendall('QUIT\r\n')
! self.send("+OK, dying.\r\n")
! self.serverSocket.shutdown(2)
! self.serverSocket.close()
self.shutdown(2)
self.close()
raise SystemExit
self.serverSocket.push(self.request + '\r\n')
--- 286,298 ----
def found_terminator(self):
"""Asynchat override."""
! verb = self.request.strip().upper()
! if verb == 'KILL':
self.shutdown(2)
self.close()
raise SystemExit
+ elif verb == 'CRASH':
+ # For testing
+ x = 0
+ y = 1/x
self.serverSocket.push(self.request + '\r\n')
***************
*** 271,276 ****
# Pass the request and the raw response to the subclass and
# send back the cooked response.
! cooked = self.onTransaction(self.command, self.args, self.response)
! self.push(cooked)
# If onServerLine() decided that the server has closed its
--- 312,318 ----
# Pass the request and the raw response to the subclass and
# send back the cooked response.
! if self.response:
! cooked = self.onTransaction(self.command, self.args, self.response)
! self.push(cooked)
# If onServerLine() decided that the server has closed its
***************
*** 334,337 ****
--- 376,380 ----
status.totalSessions += 1
status.activeSessions += 1
+ self.isClosed = False
def send(self, data):
***************
*** 339,343 ****
self.logFile.write(data)
self.logFile.flush()
! return POP3ProxyBase.send(self, data)
def recv(self, size):
--- 382,392 ----
self.logFile.write(data)
self.logFile.flush()
! try:
! return POP3ProxyBase.send(self, data)
! except socket.error:
! # The email client has closed the connection - 40tude Dialog
! # does this immediately after issuing a QUIT command,
! # without waiting for the response.
! self.close()
def recv(self, size):
***************
*** 349,354 ****
def close(self):
! status.activeSessions -= 1
! POP3ProxyBase.close(self)
def onTransaction(self, command, args, response):
--- 398,406 ----
def close(self):
! # This can be called multiple times by async.
! if not self.isClosed:
! self.isClosed = True
! status.activeSessions -= 1
! POP3ProxyBase.close(self)
def onTransaction(self, command, args, response):
***************
*** 442,448 ****
UserInterface objects to serve them."""
! def __init__(self, uiPort, bayes):
uiArgs = (bayes,)
! Listener.__init__(self, uiPort, UserInterface, uiArgs)
--- 494,500 ----
UserInterface objects to serve them."""
! def __init__(self, uiPort, bayes, socketMap=asyncore.socket_map):
uiArgs = (bayes,)
! Listener.__init__(self, uiPort, UserInterface, uiArgs, socketMap=socketMap)
***************
*** 479,485 ****
"""Serves the HTML user interface of the proxy."""
header = """<html><head><title>Spambayes proxy: %s</title>
<style>
! body { font: 90%% arial, swiss, helvetica }
table { font: 90%% arial, swiss, helvetica }
form { margin: 0 }
--- 531,544 ----
"""Serves the HTML user interface of the proxy."""
+ # A couple of notes about the HTML here:
+ # o I've tried to keep content and presentation separate using
+ # one main stylesheet - no <font> tags, and no inline stylesheets
+ # o Form fields must specify their name and value attributes like
+ # this: "... name='n' value='v' ..." even if there is no default
+ # value. This is so that setFieldValue can set the value.
+
header = """<html><head><title>Spambayes proxy: %s</title>
<style>
! body { font: 90%% arial, swiss, helvetica; margin: 0 }
table { font: 90%% arial, swiss, helvetica }
form { margin: 0 }
***************
*** 497,501 ****
</head>\n"""
! bodyStart = """<body style='margin: 0'>
<div class='banner'>
<img src='/helmet.gif' align='absmiddle'>
--- 556,560 ----
</head>\n"""
! bodyStart = """<body>
<div class='banner'>
<img src='/helmet.gif' align='absmiddle'>
***************
*** 504,514 ****
footer = """</div>
! <form action='/shutdown'>
<table width='100%%' cellspacing='0'>
! <tr><td class='banner'> Spambayes Proxy, %s.
<a href='http://www.spambayes.org/'>Spambayes.org</a></td>
<td align='right' class='banner'>
%s
! </td></tr></table></form>\n"""
shutdownDB = """<input type='submit' name='how' value='Shutdown'>"""
--- 563,575 ----
footer = """</div>
! <form action='/shutdown' method='POST'>
<table width='100%%' cellspacing='0'>
! <tr><td class='banner'> <a href='/'>Spambayes Proxy</a>,
! %s.
<a href='http://www.spambayes.org/'>Spambayes.org</a></td>
<td align='right' class='banner'>
%s
! </td></tr></table></form>
! </body></html>\n"""
shutdownDB = """<input type='submit' name='how' value='Shutdown'>"""
***************
*** 531,552 ****
wordQuery = """<form action='/wordquery'>
! <input name='word' type='text' size='30'>
<input type='submit' value='Tell me about this word'>
</form>"""
! train = """<form action='/upload' method='POST'
enctype='multipart/form-data'>
! Either upload a message file: <input type='file' name='file'><br>
! Or paste the whole message (incuding headers) here:<br>
! <textarea name='text' rows='3' cols='60'></textarea><br>
! Is this message
! <input type='radio' name='which' value='ham'>Ham</input> or
! <input type='radio'
! name='which' value='spam' checked>Spam</input>?<br>
! <input type='submit' value='Train on this message'>
! </form>"""
! def __init__(self, clientSocket, bayes):
! BrighterAsyncChat.__init__(self, clientSocket)
self.bayes = bayes
self.request = ''
--- 592,621 ----
wordQuery = """<form action='/wordquery'>
! <input name='word' value='' type='text' size='30'>
<input type='submit' value='Tell me about this word'>
</form>"""
! upload = """<form action='/%s' method='POST'
enctype='multipart/form-data'>
! Either upload a message file:
! <input type='file' name='file' value=''><br>
! Or paste the whole message (incuding headers) here:<br>
! <textarea name='text' rows='3' cols='60'></textarea><br>
! %s
! </form>"""
! uploadSumbit = """<input type='submit' name='which' value='%s'>"""
!
! train = upload % ('train',
! (uploadSumbit % "Train as Spam") + " " + \
! (uploadSumbit % "Train as Ham"))
!
! classify = upload % ('classify', uploadSumbit % "Classify")
!
! def __init__(self, clientSocket, bayes, socketMap=asyncore.socket_map):
! # Grumble: asynchat.__init__ doesn't take a 'map' argument,
! # hence the two-stage construction.
! BrighterAsyncChat.__init__(self)
! BrighterAsyncChat.set_socket(self, clientSocket, socketMap)
self.bayes = bayes
self.request = ''
***************
*** 654,662 ****
self.push(self.bodyStart % homeLink)
def onHome(self, params):
"""Serve up the homepage."""
body = (self.pageSection % ('Status', self.summary % status.__dict__)+
! self.pageSection % ('Word query', self.wordQuery)+
! self.pageSection % ('Train', self.train))
self.push(body)
--- 723,745 ----
self.push(self.bodyStart % homeLink)
+ def setFieldValue(self, form, name, value):
+ """Sets the default value of a field in a form. See the comment
+ at the top of this class for how to specify HTML that works with
+ this function. (This is exactly what Entrian PyMeld is for, but
+ that ships under the Sleepycat License.)"""
+ match = re.search(r"\s+name='%s'\s+value='([^']*)'" % name, form)
+ if match:
+ quotedValue = re.sub("'", "&#%d;" % ord("'"), value)
+ return form[:match.start(1)] + quotedValue + form[match.end(1):]
+ else:
+ print >>sys.stderr, "Warning: setFieldValue('%s') failed" % name
+ return form
+
def onHome(self, params):
"""Serve up the homepage."""
body = (self.pageSection % ('Status', self.summary % status.__dict__)+
! self.pageSection % ('Train', self.train)+
! self.pageSection % ('Classify a message', self.classify)+
! self.pageSection % ('Word query', self.wordQuery))
self.push(body)
***************
*** 676,684 ****
raise SystemExit
! def onUpload(self, params):
"""Train on an uploaded or pasted message."""
# Upload or paste? Spam or ham?
message = params.get('file') or params.get('text')
! isSpam = (params['which'] == 'spam')
# Append the message to a file, to make it easier to rebuild
--- 759,767 ----
raise SystemExit
! def onTrain(self, params):
"""Train on an uploaded or pasted message."""
# Upload or paste? Spam or ham?
message = params.get('file') or params.get('text')
! isSpam = (params['which'] == 'Train as Spam')
# Append the message to a file, to make it easier to rebuild
***************
*** 698,705 ****
# Train on the message.
! self.bayes.learn(tokenizer.tokenize(message), isSpam, True)
self.push("<p>OK. Return <a href='/'>Home</a> or train another:</p>")
self.push(self.pageSection % ('Train another', self.train))
def onWordquery(self, params):
word = params['word']
--- 781,803 ----
# Train on the message.
! tokens = tokenizer.tokenize(message)
! self.bayes.learn(tokens, isSpam, True)
self.push("<p>OK. Return <a href='/'>Home</a> or train another:</p>")
self.push(self.pageSection % ('Train another', self.train))
+ def onClassify(self, params):
+ """Classify an uploaded or pasted message."""
+ message = params.get('file') or params.get('text')
+ tokens = tokenizer.tokenize(message)
+ prob, clues = self.bayes.spamprob(tokens, evidence=True)
+ self.push("<p>Spam probability: <b>%.8f</b></p>" % prob)
+ self.push("<table class='sectiontable' cellspacing='0'>")
+ self.push("<tr><td class='sectionheading'>Clues:</td></tr>\n")
+ self.push("<tr><td class='sectionbody'><table>")
+ for w, p in clues:
+ self.push("<tr><td>%s</td><td>%.8f</td></tr>\n" % (w, p))
+ self.push("</table></td></tr></table>")
+ self.push("<p>Return <a href='/'>Home</a> or classify another:</p>")
+ self.push(self.pageSection % ('Classify another', self.classify))
def onWordquery(self, params):
word = params['word']
***************
*** 717,727 ****
Last used: <b>%(atime)s</b>.<br>""" % members
except KeyError:
! info = "'%s' does not appear in the database." % word
! body = (self.pageSection % ("Statistics for '%s'" % word, info) +
! self.pageSection % ('Word query', self.wordQuery))
self.push(body)
def main(serverName, serverPort, proxyPort,
uiPort, launchUI, pickleName, useDB):
--- 815,845 ----
Last used: <b>%(atime)s</b>.<br>""" % members
except KeyError:
! info = "%r does not appear in the database." % word
! query = self.setFieldValue(self.wordQuery, 'word', params['word'])
! body = (self.pageSection % ("Statistics for %r" % word, info) +
! self.pageSection % ('Word query', query))
self.push(body)
+ def initStatus():
+ status.proxyPort = options.pop3proxy_port
+ status.serverName = options.pop3proxy_server_name
+ status.serverPort = options.pop3proxy_server_port
+ status.pickleName = options.persistent_storage_file
+ status.useDB = options.persistent_use_database
+ status.uiPort = options.html_ui_port
+ status.launchUI = options.html_ui_launch_browser
+ status.gzipCache = options.pop3proxy_cache_use_gzip
+ status.cacheExpiryDays = options.pop3proxy_cache_expiry_days
+ status.runTestServer = False
+ status.totalSessions = 0
+ status.activeSessions = 0
+ status.numEmails = 0
+ status.numSpams = 0
+ status.numHams = 0
+ status.numUnsure = 0
+
+
def main(serverName, serverPort, proxyPort,
uiPort, launchUI, pickleName, useDB):
***************
*** 891,895 ****
def onUnknown(self, command, args):
"""Unknown POP3 command."""
! return "-ERR Unknown command: '%s'\r\n" % command
--- 1009,1013 ----
def onUnknown(self, command, args):
"""Unknown POP3 command."""
! return "-ERR Unknown command: %s\r\n" % repr(command)
***************
*** 901,904 ****
--- 1019,1023 ----
# asyncore environments.
import threading
+ initStatus()
testServerReady = threading.Event()
def runTestServer():
***************
*** 912,915 ****
--- 1031,1035 ----
# Name the database in case it ever gets auto-flushed to disk.
bayes = hammie.createbayes('_pop3proxy.db')
+ UserInterfaceListener(8881, bayes)
BayesProxyListener('localhost', 8110, 8111, bayes)
bayes.learn(tokenizer.tokenize(spam1), True)
***************
*** 944,952 ****
assert response.find(options.hammie_header_name) >= 0
# Kill the proxy and the test server.
proxy.sendall("kill\r\n")
! server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
! server.connect(('localhost', 8110))
! server.sendall("kill\r\n")
--- 1064,1085 ----
assert response.find(options.hammie_header_name) >= 0
+ # Smoke-test the HTML UI.
+ httpServer = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+ httpServer.connect(('localhost', 8881))
+ httpServer.sendall("get / HTTP/1.0\r\n\r\n")
+ response = ''
+ while 1:
+ packet = httpServer.recv(1000)
+ if not packet: break
+ response += packet
+ assert re.search(r"(?s)<html>.*Spambayes proxy.*</html>", response)
+
# Kill the proxy and the test server.
proxy.sendall("kill\r\n")
! proxy.recv(100)
! pop3Server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
! pop3Server.connect(('localhost', 8110))
! pop3Server.sendall("kill\r\n")
! pop3Server.recv(100)
***************
*** 958,979 ****
# Read the arguments.
try:
! opts, args = getopt.getopt(sys.argv[1:], 'htdbp:l:u:')
except getopt.error, msg:
print >>sys.stderr, str(msg) + '\n\n' + __doc__
sys.exit()
! status.pickleName = hammie.DEFAULTDB
! status.proxyPort = 110
! status.uiPort = 8880
! status.serverPort = 110
! status.useDB = False
! status.runTestServer = False
! status.launchUI = False
! status.totalSessions = 0
! status.activeSessions = 0
! status.numEmails = 0
! status.numSpams = 0
! status.numHams = 0
! status.numUnsure = 0
for opt, arg in opts:
if opt == '-h':
--- 1091,1101 ----
# Read the arguments.
try:
! opts, args = getopt.getopt(sys.argv[1:], 'htdbzp:l:u:')
except getopt.error, msg:
print >>sys.stderr, str(msg) + '\n\n' + __doc__
sys.exit()
! initStatus()
! runSelfTest = False
for opt, arg in opts:
if opt == '-h':
***************
*** 992,999 ****
elif opt == '-u':
status.uiPort = int(arg)
# Do whatever we've been asked to do...
! if not opts and not args:
! print "Running a self-test (use 'pop3proxy -h' for help)"
test()
print "Self-test passed." # ...else it would have asserted.
--- 1114,1123 ----
elif opt == '-u':
status.uiPort = int(arg)
+ elif opt == '-z':
+ runSelfTest = True
# Do whatever we've been asked to do...
! if runSelfTest:
! print "\nRunning self-test...\n"
test()
print "Self-test passed." # ...else it would have asserted.
***************
*** 1004,1014 ****
asyncore.loop()
! elif 1 <= len(args) <= 2:
! # Normal usage, with optional server port number.
! status.serverName = args[0]
! if len(args) == 2:
status.serverPort = int(args[1])
! main(status.serverName, status.serverPort, status.proxyPort,
! status.uiPort, status.launchUI, status.pickleName, status.useDB)
else:
--- 1128,1147 ----
asyncore.loop()
! elif 0 <= len(args) <= 2:
! # Normal usage, with optional server name and port number.
! if len(args) >= 1:
! status.serverName = args[0]
! if len(args) >= 2:
status.serverPort = int(args[1])
!
! if not status.serverName:
! print >>sys.stderr, \
! ("Error: You must give a POP3 server name, either in\n"
! "bayescustomize.ini as pop3proxy_server_name or on the\n"
! "command line. pop3server.py -h prints a usage message.")
! else:
! main(status.serverName, status.serverPort, status.proxyPort,
! status.uiPort, status.launchUI, status.pickleName,
! status.useDB)
else:
More information about the Spambayes-checkins
mailing list