Third result ... RE: [Spambayes] First result from Gary
Thu, 19 Sep 2002 16:59:25 -0400
> I (or someone else -- please <wink>?) should probably change the
> tokenizer to back off to the raw message body when the email parser
> gives up.
> I'm still getting my email.Message legs. How does this look as a first
Thanks! Some comments:
> --- tokenizer.py 17 Sep 2002 17:57:39 -0000 1.23
> +++ tokenizer.py 19 Sep 2002 18:57:42 -0000
> @@ -3,6 +3,7 @@
> import email
> import re
> from sets import Set
> +from email.MIMEText import MIMEText
We imported email just a few lines above. MIMEText isn't going to be
referenced enough to justify giving it an abbreviated name.
> from Options import options
> @@ -839,18 +840,16 @@
> # Create an email Message object.
> - if hasattr(obj, "readline"):
> - return email.message_from_file(obj)
> - else:
> - return email.message_from_string(obj)
> + if hasattr(obj, "read"):
> + obj = obj.read()
> + return email.message_from_string(obj)
> except email.Errors.MessageParseError:
> - return None
> + return MIMEText(obj)
Barry suggested doing (and he wrote the email package, so this is a rare
case where we should listen to him <wink>):
msg = email.Message.Message()
instead. The difference is that MIMEText() makes up some headers out of
thin air (relative to the original malformed message), but a raw Message
> def tokenize(self, obj):
> msg = self.get_message(obj)
> if msg is None:
> yield 'control: MessageParseError'
> - # XXX Fall back to the raw body text?
There won't be a way for get_message to return None anymore, so also nuke
the code checking fot that. Replacing it with
assert msg is not None
would be OK, but we're hosed in any case then, and even without the assert
the code will raise a None-related exception soon anyway.
Piece o' cake, eh?
piece-o'-python-ly y'rs - tim