[Spambayes] Outlook plugin errors with Exchange

Tim Peters tim.one@comcast.net
Sat Nov 2 04:36:23 2002


[Piers Haken]

Thank you for the excellent report!

> ...
> I realize that this config may well be untested/unsupported, especially
> the fact that my inbox message store is on an Exchange server, but
> hopefully this info can be of some use to someone...

There's no intention *not* to support Exchange server, but I've never been
near one and I'm not sure anyone else here is near one either.  Someone with
access to that will have to deal with it.  You're elected.

> Also, this is my first time using python, so I'm sorry if I'm missing
> something really simple here.

No, you did a great job of faking it <wink>.

> ...
> 3) messages sent from one exchange account to another (ie, never going
> over SMTP) have no headers. This may be a problem since the parser can
> never infer the sender or any other metadata about the message. It might
> be useful to have a special tag that says that the message has no
> headers, since such email is very probably ham. Alternatively, some SMTP
> headers could be faked up from the various MAPI properties.

By default, the tokenizer code ignores most header fields.  It would be good
to simulate a few, especially Subject and From.  Sticking something like
NOHEADERS in the synthesized Subject header would suffice to teach the
classifier that NOHEADERS-in-a-Subject-header is a strong ham clue, and
there's really no need to get fancier than that.

> 4) for some reason, my outlook is prefixing the headers of SMTP mail
> with the string "Microsoft Mail Internet Headers Version 2.0\r\n", and
> this is causing every SMTP message to throw an exception during parsing
> (for example, when doing a 'show clues'):
...
>   File "C:\Python22\spam\spambayes\email\Parser.py", line 107, in
> _parseheaders
>     raise Errors.HeaderParseError(
> email.Errors.HeaderParseError: Not a header, not a continuation:
> ``Microsoft Mail Internet Headers Version 2.0''

That would be an error!  The format of header lines is specified by a public
standard, and as the error msg said, that specific line is neither a valid
header line nor a valid continuation of a preceding header line.

> This string is also shown in the 'options' dialog for the message (on
> both OutlookXP and Outlook2K) so I think it's something that exchange
> server adds to the message, ugh.

Sounds very likely; I haven't seen this.

> Here's a patch that fixes this for me and at least allows me to
> train on a full set of messages:>
> Index: email/Parser.py

The email pkg is a part of standard Python, and we (speaking as a Python
developer here) won't warp it to accept non-standard headers.  If it's
necessary to worm around this in the Outlook client, it should be easy to do
so by fiddling Outlook2000\msgstore.py's _GetMessageText().  For example,
this is untested but almost certainly close to working:

    if headers.startswith("Microsoft Mail"):
        headers = "X-MS-Mail-Gibberish: " + headers

It's enough just to check for the "Microsoft Mail" prefix, as the embedded
space alone makes it an invalid header line.  Stuffing a legitimate header
tag at the front should be enough to make the email pkg's parser happy
again.