[Spambayes-checkins] spambayes/Outlook2000 tester.py,1.15,1.16
Mark Hammond
mhammond at users.sourceforge.net
Sun Aug 31 23:36:01 EDT 2003
Update of /cvsroot/spambayes/spambayes/Outlook2000
In directory sc8-pr-cvs1:/tmp/cvs-serv6159
Modified Files:
tester.py
Log Message:
Print progress messages as we scan, and when sanity checking how many
message don't have HTML, only count multi-part ones. My new code still
leaves me with 1000 out of 5000 with no html.
Index: tester.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/tester.py,v
retrieving revision 1.15
retrieving revision 1.16
diff -C2 -d -r1.15 -r1.16
*** tester.py 31 Aug 2003 05:38:52 -0000 1.15
--- tester.py 1 Sep 2003 05:35:58 -0000 1.16
***************
*** 14,17 ****
--- 14,19 ----
from time import sleep
import copy
+ import rfc822
+ import cStringIO
HAM="ham"
***************
*** 423,426 ****
--- 425,429 ----
# reported.
num_looked += 1
+ if num_looked % 500 == 0: print " (scanned", num_looked, "messages...)"
if not message.IsFilterCandidate() and \
message.msgclass.lower().startswith("ipm.note"):
***************
*** 434,438 ****
if not headers: num_without_headers += 1
if not body: num_without_body += 1
! if not html_body: num_without_html_body += 1
print "Checked %d items, %d non-filterable items found" % (num_looked, num_found)
--- 437,445 ----
if not headers: num_without_headers += 1
if not body: num_without_body += 1
! # for HTML, we only check multi-part
! temp_obj = rfc822.Message(cStringIO.StringIO(headers+"\n\n"))
! content_type = temp_obj.get("content-type", '')
! if content_type.lower().startswith("multipart"):
! if not html_body: num_without_html_body += 1
print "Checked %d items, %d non-filterable items found" % (num_looked, num_found)
More information about the Spambayes-checkins
mailing list