[Spambayes-checkins] spambayes/Outlook2000 tester.py,1.15,1.16

Mark Hammond mhammond at users.sourceforge.net
Sun Aug 31 23:36:01 EDT 2003


Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1:/tmp/cvs-serv6159

Modified Files:
	tester.py 
Log Message:
Print progress messages as we scan, and when sanity checking how many
message don't have HTML, only count multi-part ones.  My new code still
leaves me with 1000 out of 5000 with no html.


Index: tester.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/tester.py,v
retrieving revision 1.15
retrieving revision 1.16
diff -C2 -d -r1.15 -r1.16
*** tester.py	31 Aug 2003 05:38:52 -0000	1.15
--- tester.py	1 Sep 2003 05:35:58 -0000	1.16
***************
*** 14,17 ****
--- 14,19 ----
  from time import sleep
  import copy
+ import rfc822
+ import cStringIO
  
  HAM="ham"
***************
*** 423,426 ****
--- 425,429 ----
                      # reported.
                      num_looked += 1
+                     if num_looked % 500 == 0: print " (scanned", num_looked, "messages...)"
                      if not message.IsFilterCandidate() and \
                          message.msgclass.lower().startswith("ipm.note"):
***************
*** 434,438 ****
                      if not headers: num_without_headers += 1
                      if not body: num_without_body += 1
!                     if not html_body: num_without_html_body += 1
  
          print "Checked %d items, %d non-filterable items found" % (num_looked, num_found)
--- 437,445 ----
                      if not headers: num_without_headers += 1
                      if not body: num_without_body += 1
!                     # for HTML, we only check multi-part
!                     temp_obj = rfc822.Message(cStringIO.StringIO(headers+"\n\n"))
!                     content_type = temp_obj.get("content-type", '')
!                     if content_type.lower().startswith("multipart"):
!                         if not html_body: num_without_html_body += 1
  
          print "Checked %d items, %d non-filterable items found" % (num_looked, num_found)





More information about the Spambayes-checkins mailing list