[Spambayes-checkins] spambayes pop3proxy.py,1.14,1.15

Richie Hindle richiehindle@users.sourceforge.net
Wed Nov 13 18:19:48 2002


Update of /cvsroot/spambayes/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv20474

Modified Files:
	pop3proxy.py 
Log Message:
 o All command line switches and options now default to values from
   bayescustomize.ini.  Thanks to Francois Granger for the idea.
 o Instead of there being two radio buttons (ham, spam) on the training
   form, there are now two buttons: "Train as Ham" and "Train as Spam".
   Thanks to Just van Rossum for the suggestion.
 o "Classify message" form - paste or upload a message for classification.
   Gives you the spam probability and the clues.
 o It now gives a decent error if the POP3 server is unreachable.
 o The "Bad file descriptor" / last-response-is-logged-three-times bug
   is (hopefully) fixed.
 o The bug whereby socket errors could cause the "Active POP3
   conversations" count to go negative is fixed.
 o After doing a word query, it now prepopulates the query field with
   your word - handy if you mistyped it or you want to try a variant.


Index: pop3proxy.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/pop3proxy.py,v
retrieving revision 1.14
retrieving revision 1.15
diff -C2 -d -r1.14 -r1.15
*** pop3proxy.py	10 Nov 2002 19:59:22 -0000	1.14
--- pop3proxy.py	13 Nov 2002 18:19:45 -0000	1.15
***************
*** 7,11 ****
  header.  Usage:
  
!     pop3proxy.py [options] <server> [<server port>]
          <server> is the name of your real POP3 server
          <port>   is the port number of your real POP3 server, which
--- 7,11 ----
  header.  Usage:
  
!     pop3proxy.py [options] [<server> [<server port>]]
          <server> is the name of your real POP3 server
          <port>   is the port number of your real POP3 server, which
***************
*** 13,16 ****
--- 13,20 ----
  
          options:
+             -z      : Runs a self-test and exits.
+             -t      : Runs a test POP3 server on port 8110 (for testing).
+             -h      : Displays this help message.
+ 
              -p FILE : use the named data file
              -d      : the file is a DBM file rather than a pickle
***************
*** 20,28 ****
              -b      : Launch a web browser showing the user interface.
  
!     pop3proxy -t
!         Runs a test POP3 server on port 8110; useful for testing.
! 
!     pop3proxy -h
!         Displays this help message.
  
  For safety, and to help debugging, the whole POP3 conversation is
--- 24,30 ----
              -b      : Launch a web browser showing the user interface.
  
!         All command line arguments and switches take their default
!         values from the [Hammie], [pop3proxy] and [html_ui] sections
!         of bayescustomize.ini.
  
  For safety, and to help debugging, the whole POP3 conversation is
***************
*** 48,72 ****
  
  todo = """
!  o (Re)training interface - one message per line, quick-rendering table.
!  o Slightly-wordy index page; intro paragraph for each page.
   o Once the training stuff is on a separate page, make the paste box
     bigger.
-  o "Links" section (on homepage?) to project homepage, mailing list,
-    etc.
-  o "Home" link (with helmet!) at the end of each page.
-  o "Classify this" - just like Train.
-  o "Send me an email every [...] to remind me to train on new
-    messages."
-  o "Send me a status email every [...] telling how many mails have been
-    classified, etc."
   o Deployment: Windows executable?  atlaxwin and ctypes?  Or just
     webbrowser?
-  o Possibly integrate Tim Stone's SMTP code - make it use async, make
-    the training code update (rather than replace!) the database.
   o Can it cleanly dynamically update its status display while having a
     POP3 converation?  Hammering reload sucks.
   o Add a command to save the database without shutting down, and one to
     reload the database.
!  o Leave the word in the input field after a Word query.
  """
  
--- 50,103 ----
  
  todo = """
! 
! User interface improvements:
! 
   o Once the training stuff is on a separate page, make the paste box
     bigger.
   o Deployment: Windows executable?  atlaxwin and ctypes?  Or just
     webbrowser?
   o Can it cleanly dynamically update its status display while having a
     POP3 converation?  Hammering reload sucks.
   o Add a command to save the database without shutting down, and one to
     reload the database.
!  o Save the Status (num classified, etc.) between sessions.
! 
! 
! New features:
! 
!  o (Re)training interface - one message per line, quick-rendering table.
!  o "Send me an email every [...] to remind me to train on new
!    messages."
!  o "Send me a status email every [...] telling how many mails have been
!    classified, etc."
!  o Possibly integrate Tim Stone's SMTP code - make it use async, make
!    the training code update (rather than replace!) the database.
!  o Option to keep trained messages and view potential FPs and FNs to
!    correct them.
!  o Allow use of the UI without the POP3 proxy.
! 
! 
! Code quality:
! 
!  o Move the UI into its own module.
!  o Eventually, pull the common HTTP code from pop3proxy.py and Entrian
!    Debugger into a library.
! 
! 
! Info:
! 
!  o Slightly-wordy index page; intro paragraph for each page.
!  o In both stats and training results, report nham and nspam - warn if
!    they're very different (for some value of 'very').
!  o "Links" section (on homepage?) to project homepage, mailing list,
!    etc.
! 
! 
! Gimmicks:
! 
!  o Classify a web page given a URL.
!  o Graphs.  Of something.  Who cares what?
!  o Zoe...!
! 
  """
  
***************
*** 147,151 ****
          self.set_terminator('\r\n')
          self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
!         self.connect((serverName, serverPort))
  
      def collect_incoming_data(self, data):
--- 178,188 ----
          self.set_terminator('\r\n')
          self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
!         try:
!             self.connect((serverName, serverPort))
!         except socket.error, e:
!             print >>sys.stderr, "Can't connect to %s:%d: %s" % \
!                                 (serverName, serverPort, e)
!             self.close()
!             self.lineCallback('')   # "The socket's been closed."
  
      def collect_incoming_data(self, data):
***************
*** 199,203 ****
          self.response = self.response + line
  
!         # Is this line that terminates a set of headers?
          self.seenAllHeaders = self.seenAllHeaders or line in ['\r\n', '\n']
  
--- 236,240 ----
          self.response = self.response + line
  
!         # Is this the line that terminates a set of headers?
          self.seenAllHeaders = self.seenAllHeaders or line in ['\r\n', '\n']
  
***************
*** 237,241 ****
          else:
              # Assume that an unknown command will get a single-line
!             # response.  This should work for errors and for POP-AUTH.
              return False
  
--- 274,281 ----
          else:
              # Assume that an unknown command will get a single-line
!             # response.  This should work for errors and for POP-AUTH,
!             # and is harmless even for multiline responses - the first
!             # line will be passed to onTransaction and ignored, then the
!             # rest will be proxied straight through.
              return False
  
***************
*** 246,257 ****
      def found_terminator(self):
          """Asynchat override."""
!         if self.request.strip().upper() == 'KILL':
!             self.serverSocket.sendall('QUIT\r\n')
!             self.send("+OK, dying.\r\n")
!             self.serverSocket.shutdown(2)
!             self.serverSocket.close()
              self.shutdown(2)
              self.close()
              raise SystemExit
  
          self.serverSocket.push(self.request + '\r\n')
--- 286,298 ----
      def found_terminator(self):
          """Asynchat override."""
!         verb = self.request.strip().upper()
!         if verb == 'KILL':
              self.shutdown(2)
              self.close()
              raise SystemExit
+         elif verb == 'CRASH':
+             # For testing
+             x = 0
+             y = 1/x
  
          self.serverSocket.push(self.request + '\r\n')
***************
*** 271,276 ****
          # Pass the request and the raw response to the subclass and
          # send back the cooked response.
!         cooked = self.onTransaction(self.command, self.args, self.response)
!         self.push(cooked)
  
          # If onServerLine() decided that the server has closed its
--- 312,318 ----
          # Pass the request and the raw response to the subclass and
          # send back the cooked response.
!         if self.response:
!             cooked = self.onTransaction(self.command, self.args, self.response)
!             self.push(cooked)
  
          # If onServerLine() decided that the server has closed its
***************
*** 334,337 ****
--- 376,380 ----
          status.totalSessions += 1
          status.activeSessions += 1
+         self.isClosed = False
  
      def send(self, data):
***************
*** 339,343 ****
          self.logFile.write(data)
          self.logFile.flush()
!         return POP3ProxyBase.send(self, data)
  
      def recv(self, size):
--- 382,392 ----
          self.logFile.write(data)
          self.logFile.flush()
!         try:
!             return POP3ProxyBase.send(self, data)
!         except socket.error:
!             # The email client has closed the connection - 40tude Dialog
!             # does this immediately after issuing a QUIT command,
!             # without waiting for the response.
!             self.close()
  
      def recv(self, size):
***************
*** 349,354 ****
  
      def close(self):
!         status.activeSessions -= 1
!         POP3ProxyBase.close(self)
  
      def onTransaction(self, command, args, response):
--- 398,406 ----
  
      def close(self):
!         # This can be called multiple times by async.
!         if not self.isClosed:
!             self.isClosed = True
!             status.activeSessions -= 1
!             POP3ProxyBase.close(self)
  
      def onTransaction(self, command, args, response):
***************
*** 442,448 ****
      UserInterface objects to serve them."""
  
!     def __init__(self, uiPort, bayes):
          uiArgs = (bayes,)
!         Listener.__init__(self, uiPort, UserInterface, uiArgs)
  
  
--- 494,500 ----
      UserInterface objects to serve them."""
  
!     def __init__(self, uiPort, bayes, socketMap=asyncore.socket_map):
          uiArgs = (bayes,)
!         Listener.__init__(self, uiPort, UserInterface, uiArgs, socketMap=socketMap)
  
  
***************
*** 479,485 ****
      """Serves the HTML user interface of the proxy."""
  
      header = """<html><head><title>Spambayes proxy: %s</title>
               <style>
!              body { font: 90%% arial, swiss, helvetica }
               table { font: 90%% arial, swiss, helvetica }
               form { margin: 0 }
--- 531,544 ----
      """Serves the HTML user interface of the proxy."""
  
+     # A couple of notes about the HTML here:
+     #  o I've tried to keep content and presentation separate using
+     #    one main stylesheet - no <font> tags, and no inline stylesheets
+     #  o Form fields must specify their name and value attributes like
+     #    this: "... name='n' value='v' ..." even if there is no default
+     #    value.  This is so that setFieldValue can set the value.
+ 
      header = """<html><head><title>Spambayes proxy: %s</title>
               <style>
!              body { font: 90%% arial, swiss, helvetica; margin: 0 }
               table { font: 90%% arial, swiss, helvetica }
               form { margin: 0 }
***************
*** 497,501 ****
               </head>\n"""
  
!     bodyStart = """<body style='margin: 0'>
                  <div class='banner'>
                  <img src='/helmet.gif' align='absmiddle'>
--- 556,560 ----
               </head>\n"""
  
!     bodyStart = """<body>
                  <div class='banner'>
                  <img src='/helmet.gif' align='absmiddle'>
***************
*** 504,514 ****
  
      footer = """</div>
!              <form action='/shutdown'>
               <table width='100%%' cellspacing='0'>
!              <tr><td class='banner'>&nbsp;Spambayes Proxy, %s.
               <a href='http://www.spambayes.org/'>Spambayes.org</a></td>
               <td align='right' class='banner'>
               %s
!              </td></tr></table></form>\n"""
  
      shutdownDB = """<input type='submit' name='how' value='Shutdown'>"""
--- 563,575 ----
  
      footer = """</div>
!              <form action='/shutdown' method='POST'>
               <table width='100%%' cellspacing='0'>
!              <tr><td class='banner'>&nbsp;<a href='/'>Spambayes Proxy</a>,
!              %s.
               <a href='http://www.spambayes.org/'>Spambayes.org</a></td>
               <td align='right' class='banner'>
               %s
!              </td></tr></table></form>
!              </body></html>\n"""
  
      shutdownDB = """<input type='submit' name='how' value='Shutdown'>"""
***************
*** 531,552 ****
  
      wordQuery = """<form action='/wordquery'>
!                 <input name='word' type='text' size='30'>
                  <input type='submit' value='Tell me about this word'>
                  </form>"""
  
!     train = """<form action='/upload' method='POST'
                  enctype='multipart/form-data'>
!             Either upload a message file: <input type='file' name='file'><br>
!             Or paste the whole message (incuding headers) here:<br>
!             <textarea name='text' rows='3' cols='60'></textarea><br>
!             Is this message
!             <input type='radio' name='which' value='ham'>Ham</input> or
!             <input type='radio'
!                    name='which' value='spam' checked>Spam</input>?<br>
!             <input type='submit' value='Train on this message'>
!             </form>"""
  
!     def __init__(self, clientSocket, bayes):
!         BrighterAsyncChat.__init__(self, clientSocket)
          self.bayes = bayes
          self.request = ''
--- 592,621 ----
  
      wordQuery = """<form action='/wordquery'>
!                 <input name='word' value='' type='text' size='30'>
                  <input type='submit' value='Tell me about this word'>
                  </form>"""
  
!     upload = """<form action='/%s' method='POST'
                  enctype='multipart/form-data'>
!              Either upload a message file:
!              <input type='file' name='file' value=''><br>
!              Or paste the whole message (incuding headers) here:<br>
!              <textarea name='text' rows='3' cols='60'></textarea><br>
!              %s
!              </form>"""
  
!     uploadSumbit = """<input type='submit' name='which' value='%s'>"""
! 
!     train = upload % ('train',
!                       (uploadSumbit % "Train as Spam") + "&nbsp;" + \
!                       (uploadSumbit % "Train as Ham"))
! 
!     classify = upload % ('classify', uploadSumbit % "Classify")
! 
!     def __init__(self, clientSocket, bayes, socketMap=asyncore.socket_map):
!         # Grumble: asynchat.__init__ doesn't take a 'map' argument,
!         # hence the two-stage construction.
!         BrighterAsyncChat.__init__(self)
!         BrighterAsyncChat.set_socket(self, clientSocket, socketMap)
          self.bayes = bayes
          self.request = ''
***************
*** 654,662 ****
          self.push(self.bodyStart % homeLink)
  
      def onHome(self, params):
          """Serve up the homepage."""
          body = (self.pageSection % ('Status', self.summary % status.__dict__)+
!                 self.pageSection % ('Word query', self.wordQuery)+
!                 self.pageSection % ('Train', self.train))
          self.push(body)
  
--- 723,745 ----
          self.push(self.bodyStart % homeLink)
  
+     def setFieldValue(self, form, name, value):
+         """Sets the default value of a field in a form.  See the comment
+         at the top of this class for how to specify HTML that works with
+         this function.  (This is exactly what Entrian PyMeld is for, but
+         that ships under the Sleepycat License.)"""
+         match = re.search(r"\s+name='%s'\s+value='([^']*)'" % name, form)
+         if match:
+             quotedValue = re.sub("'", "&#%d;" % ord("'"), value)
+             return form[:match.start(1)] + quotedValue + form[match.end(1):]
+         else:
+             print >>sys.stderr, "Warning: setFieldValue('%s') failed" % name
+             return form
+ 
      def onHome(self, params):
          """Serve up the homepage."""
          body = (self.pageSection % ('Status', self.summary % status.__dict__)+
!                 self.pageSection % ('Train', self.train)+
!                 self.pageSection % ('Classify a message', self.classify)+
!                 self.pageSection % ('Word query', self.wordQuery))
          self.push(body)
  
***************
*** 676,684 ****
          raise SystemExit
  
!     def onUpload(self, params):
          """Train on an uploaded or pasted message."""
          # Upload or paste?  Spam or ham?
          message = params.get('file') or params.get('text')
!         isSpam = (params['which'] == 'spam')
  
          # Append the message to a file, to make it easier to rebuild
--- 759,767 ----
          raise SystemExit
  
!     def onTrain(self, params):
          """Train on an uploaded or pasted message."""
          # Upload or paste?  Spam or ham?
          message = params.get('file') or params.get('text')
!         isSpam = (params['which'] == 'Train as Spam')
  
          # Append the message to a file, to make it easier to rebuild
***************
*** 698,705 ****
  
          # Train on the message.
!         self.bayes.learn(tokenizer.tokenize(message), isSpam, True)
          self.push("<p>OK. Return <a href='/'>Home</a> or train another:</p>")
          self.push(self.pageSection % ('Train another', self.train))
  
      def onWordquery(self, params):
          word = params['word']
--- 781,803 ----
  
          # Train on the message.
!         tokens = tokenizer.tokenize(message)
!         self.bayes.learn(tokens, isSpam, True)
          self.push("<p>OK. Return <a href='/'>Home</a> or train another:</p>")
          self.push(self.pageSection % ('Train another', self.train))
  
+     def onClassify(self, params):
+         """Classify an uploaded or pasted message."""
+         message = params.get('file') or params.get('text')
+         tokens = tokenizer.tokenize(message)
+         prob, clues = self.bayes.spamprob(tokens, evidence=True)
+         self.push("<p>Spam probability: <b>%.8f</b></p>" % prob)
+         self.push("<table class='sectiontable' cellspacing='0'>")
+         self.push("<tr><td class='sectionheading'>Clues:</td></tr>\n")
+         self.push("<tr><td class='sectionbody'><table>")
+         for w, p in clues:
+             self.push("<tr><td>%s</td><td>%.8f</td></tr>\n" % (w, p))
+         self.push("</table></td></tr></table>")
+         self.push("<p>Return <a href='/'>Home</a> or classify another:</p>")
+         self.push(self.pageSection % ('Classify another', self.classify))
      def onWordquery(self, params):
          word = params['word']
***************
*** 717,727 ****
                     Last used: <b>%(atime)s</b>.<br>""" % members
          except KeyError:
!             info = "'%s' does not appear in the database." % word
  
!         body = (self.pageSection % ("Statistics for '%s'" % word, info) +
!                 self.pageSection % ('Word query', self.wordQuery))
          self.push(body)
  
  
  def main(serverName, serverPort, proxyPort,
           uiPort, launchUI, pickleName, useDB):
--- 815,845 ----
                     Last used: <b>%(atime)s</b>.<br>""" % members
          except KeyError:
!             info = "%r does not appear in the database." % word
  
!         query = self.setFieldValue(self.wordQuery, 'word', params['word'])
!         body = (self.pageSection % ("Statistics for %r" % word, info) +
!                 self.pageSection % ('Word query', query))
          self.push(body)
  
  
+ def initStatus():
+     status.proxyPort = options.pop3proxy_port
+     status.serverName = options.pop3proxy_server_name
+     status.serverPort = options.pop3proxy_server_port
+     status.pickleName = options.persistent_storage_file
+     status.useDB = options.persistent_use_database
+     status.uiPort = options.html_ui_port
+     status.launchUI = options.html_ui_launch_browser
+     status.gzipCache = options.pop3proxy_cache_use_gzip
+     status.cacheExpiryDays = options.pop3proxy_cache_expiry_days
+     status.runTestServer = False
+     status.totalSessions = 0
+     status.activeSessions = 0
+     status.numEmails = 0
+     status.numSpams = 0
+     status.numHams = 0
+     status.numUnsure = 0
+ 
+ 
  def main(serverName, serverPort, proxyPort,
           uiPort, launchUI, pickleName, useDB):
***************
*** 891,895 ****
      def onUnknown(self, command, args):
          """Unknown POP3 command."""
!         return "-ERR Unknown command: '%s'\r\n" % command
  
  
--- 1009,1013 ----
      def onUnknown(self, command, args):
          """Unknown POP3 command."""
!         return "-ERR Unknown command: %s\r\n" % repr(command)
  
  
***************
*** 901,904 ****
--- 1019,1023 ----
      # asyncore environments.
      import threading
+     initStatus()
      testServerReady = threading.Event()
      def runTestServer():
***************
*** 912,915 ****
--- 1031,1035 ----
          # Name the database in case it ever gets auto-flushed to disk.
          bayes = hammie.createbayes('_pop3proxy.db')
+         UserInterfaceListener(8881, bayes)
          BayesProxyListener('localhost', 8110, 8111, bayes)
          bayes.learn(tokenizer.tokenize(spam1), True)
***************
*** 944,952 ****
          assert response.find(options.hammie_header_name) >= 0
  
      # Kill the proxy and the test server.
      proxy.sendall("kill\r\n")
!     server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
!     server.connect(('localhost', 8110))
!     server.sendall("kill\r\n")
  
  
--- 1064,1085 ----
          assert response.find(options.hammie_header_name) >= 0
  
+     # Smoke-test the HTML UI.
+     httpServer = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+     httpServer.connect(('localhost', 8881))
+     httpServer.sendall("get / HTTP/1.0\r\n\r\n")
+     response = ''
+     while 1:
+         packet = httpServer.recv(1000)
+         if not packet: break
+         response += packet
+     assert re.search(r"(?s)<html>.*Spambayes proxy.*</html>", response)
+ 
      # Kill the proxy and the test server.
      proxy.sendall("kill\r\n")
!     proxy.recv(100)
!     pop3Server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
!     pop3Server.connect(('localhost', 8110))
!     pop3Server.sendall("kill\r\n")
!     pop3Server.recv(100)
  
  
***************
*** 958,979 ****
      # Read the arguments.
      try:
!         opts, args = getopt.getopt(sys.argv[1:], 'htdbp:l:u:')
      except getopt.error, msg:
          print >>sys.stderr, str(msg) + '\n\n' + __doc__
          sys.exit()
  
!     status.pickleName = hammie.DEFAULTDB
!     status.proxyPort = 110
!     status.uiPort = 8880
!     status.serverPort = 110
!     status.useDB = False
!     status.runTestServer = False
!     status.launchUI = False
!     status.totalSessions = 0
!     status.activeSessions = 0
!     status.numEmails = 0
!     status.numSpams = 0
!     status.numHams = 0
!     status.numUnsure = 0
      for opt, arg in opts:
          if opt == '-h':
--- 1091,1101 ----
      # Read the arguments.
      try:
!         opts, args = getopt.getopt(sys.argv[1:], 'htdbzp:l:u:')
      except getopt.error, msg:
          print >>sys.stderr, str(msg) + '\n\n' + __doc__
          sys.exit()
  
!     initStatus()
!     runSelfTest = False
      for opt, arg in opts:
          if opt == '-h':
***************
*** 992,999 ****
          elif opt == '-u':
              status.uiPort = int(arg)
  
      # Do whatever we've been asked to do...
!     if not opts and not args:
!         print "Running a self-test (use 'pop3proxy -h' for help)"
          test()
          print "Self-test passed."   # ...else it would have asserted.
--- 1114,1123 ----
          elif opt == '-u':
              status.uiPort = int(arg)
+         elif opt == '-z':
+             runSelfTest = True
  
      # Do whatever we've been asked to do...
!     if runSelfTest:
!         print "\nRunning self-test...\n"
          test()
          print "Self-test passed."   # ...else it would have asserted.
***************
*** 1004,1014 ****
          asyncore.loop()
  
!     elif 1 <= len(args) <= 2:
!         # Normal usage, with optional server port number.
!         status.serverName = args[0]
!         if len(args) == 2:
              status.serverPort = int(args[1])
!         main(status.serverName, status.serverPort, status.proxyPort,
!              status.uiPort, status.launchUI, status.pickleName, status.useDB)
  
      else:
--- 1128,1147 ----
          asyncore.loop()
  
!     elif 0 <= len(args) <= 2:
!         # Normal usage, with optional server name and port number.
!         if len(args) >= 1:
!             status.serverName = args[0]
!         if len(args) >= 2:
              status.serverPort = int(args[1])
! 
!         if not status.serverName:
!             print >>sys.stderr, \
!                   ("Error: You must give a POP3 server name, either in\n"
!                    "bayescustomize.ini as pop3proxy_server_name or on the\n"
!                    "command line.  pop3server.py -h prints a usage message.")
!         else:
!             main(status.serverName, status.serverPort, status.proxyPort,
!                  status.uiPort, status.launchUI, status.pickleName,
!                  status.useDB)
  
      else:





More information about the Spambayes-checkins mailing list