Third result ... RE: [Spambayes] First result from Gary Robinson'sideas

Tim Peters tim.one@comcast.net
Thu, 19 Sep 2002 16:59:25 -0400


[Tim]
> I (or someone else -- please <wink>?) should probably change the
> tokenizer to back off to the raw message body when the email parser
> gives up.

[Neale Pickett]
> I'm still getting my email.Message legs.  How does this look as a first
> cut?

Thanks!  Some comments:

> --- tokenizer.py        17 Sep 2002 17:57:39 -0000      1.23
> +++ tokenizer.py        19 Sep 2002 18:57:42 -0000
> @@ -3,6 +3,7 @@
>  import email
>  import re
>  from sets import Set
> +from email.MIMEText import MIMEText

We imported email just a few lines above.  MIMEText isn't going to be
referenced enough to justify giving it an abbreviated name.

>  from Options import options
>
> @@ -839,18 +840,16 @@
>          else:
>              # Create an email Message object.
>              try:
> -                if hasattr(obj, "readline"):
> -                    return email.message_from_file(obj)
> -                else:
> -                    return email.message_from_string(obj)
> +                if hasattr(obj, "read"):
> +                    obj = obj.read()
> +                return email.message_from_string(obj)
>              except email.Errors.MessageParseError:
> -                return None
> +                return MIMEText(obj)

Barry suggested doing (and he wrote the email package, so this is a rare
case where we should listen to him <wink>):

    msg = email.Message.Message()
    msg.set_payload(obj)
    return msg

instead.  The difference is that MIMEText() makes up some headers out of
thin air (relative to the original malformed message), but a raw Message
object doesn't.

>      def tokenize(self, obj):
>          msg = self.get_message(obj)
>          if msg is None:
>              yield 'control: MessageParseError'
> -            # XXX Fall back to the raw body text?
>              return

There won't be a way for get_message to return None anymore, so also nuke
the code checking fot that.  Replacing it with

    assert msg is not None

would be OK, but we're hosed in any case then, and even without the assert
the code will raise a None-related exception soon anyway.

Piece o' cake, eh?

piece-o'-python-ly y'rs  - tim