[Spambayes] I thought bogus message structure problem was solved...

Skip Montanaro skip@pobox.com
Tue Nov 5 18:11:53 2002


---------------------- multipart/mixed attachment
I just saw a message with no hammie header.  Looking at my procmail log=
 file
I saw this traceback info:

    Tue Nov  5 11:49:47 2002
    Traceback (most recent call last):
      File "/Users/skip/local/bin/hammie.py", line 488, in ?
=09main()
      File "/Users/skip/local/bin/hammie.py", line 472, in main
=09filtered =3D h.filter(msg)
      File "/Users/skip/local/bin/hammie.py", line 269, in filter
=09msg =3D email.message_from_string(msg)
      File "/Users/skip/local/lib/python2.3/email/__init__.py", line 52=
, in message_from_string
=09return Parser(_class, strict=3Dstrict).parsestr(s)
      File "/Users/skip/local/lib/python2.3/email/Parser.py", line 75, =
in parsestr
=09return self.parse(StringIO(text), headersonly=3Dheadersonly)
      File "/Users/skip/local/lib/python2.3/email/Parser.py", line 64, =
in parse
=09self._parsebody(root, fp)
      File "/Users/skip/local/lib/python2.3/email/Parser.py", line 228,=
 in _parsebody
=09msgobj =3D self.parsestr(part)
      File "/Users/skip/local/lib/python2.3/email/Parser.py", line 75, =
in parsestr
=09return self.parse(StringIO(text), headersonly=3Dheadersonly)
      File "/Users/skip/local/lib/python2.3/email/Parser.py", line 62, =
in parse
=09self._parseheaders(root, fp)
      File "/Users/skip/local/lib/python2.3/email/Parser.py", line 128,=
 in _parseheaders
=09raise Errors.HeaderParseError(
    email.Errors.HeaderParseError: Not a header, not a continuation: ``=
It=92s Easier to Shop Online!''
    procmail: Program failure (1) of "/Users/skip/local/bin/hammie.py"
    procmail: Rescue of unfiltered data succeeded

The message structure is clearly bogus (attached for completeness).  I
thought someone had fixed this problem, but it appears it was only in o=
ther
contexts.  Looking around for ParseError I see that in a couple instanc=
es
MessageParseError (base for HeaderParseError) is trapped, as in this sn=
ippet
from mboxutils.py:

    def _factory(fp):
=09# Helper for getmbox
=09try:
=09    return email.message_from_file(fp)
=09except email.Errors.MessageParseError:
=09    return ''

However, it seems like we ought to be able to come up with a better fal=
lback
action than returning an empty string when classifying messages.  Is th=
ere a
way to simply treat the entire body as plain text even though the
Content-Type header says otherwise?

Skip


---------------------- multipart/mixed attachment
An embedded message was scrubbed...
From: OfficeManager <ink@gonetdeals.com>
Subject: F_R_E_E Shipping! Printer Ink Sale! Details Inside!
Date: Tue, 5 Nov 2002 12:44:16 -0500
Size: 225
Url: http://mail.python.org/pipermail/spambayes/attachments/20021105/a129a481/attachment.txt

---------------------- multipart/mixed attachment--



More information about the Spambayes mailing list