[Spambayes] Re: Outlook plugin plus Exchange

Piers Haken piersh@friskit.com
Tue Nov 12 07:02:51 2002


> -----Original Message-----
> From: Tim Peters [mailto:tim.one@comcast.net] 
> Sent: Monday, November 11, 2002 10:25 PM
> To: Piers Haken
> Cc: David Leftley; spambayes@python.org
> Subject: RE: [Spambayes] Re: Outlook plugin plus Exchange
> 
> 
> [Piers Haken]
> > Yup, oulook displays it properly.
> 
> Meaning it shows you the HTML part, as rendered HTML, I bet.

Yup.

> > I have a feeling that it's oracle's mess,
> 
> Not from what you showed below.  It's not hard to find the 
> end of the headers!  The first blank line ends them.  That 
> Outlook is showing you stuff beyond that in its view of the 
> headers says it didn't suck out the headers properly to begin with.

I'm not sure that's the case. Outlook _always_ shows the MIME headers
below the SMTP headers in its 'internet headers' UI. For example, heres
the 'headers' from another message which does render correctly and that
spambayes does parse correctly:

<example>
Microsoft Mail Internet Headers Version 2.0
Received: from sccrmhc02.attbi.com ([204.127.202.62]) by
zeus.sfhq.friskit.com with Microsoft SMTPSVC(5.0.2195.5329);
	 Mon, 11 Nov 2002 11:22:27 -0800
Received: from Computer ([12.236.244.49]) by sccrmhc02.attbi.com
          (InterMail vM.4.01.03.27 201-229-121-127-20010626) with SMTP
          id <20021111191007.KEOD5251.sccrmhc02.attbi.com@Computer>;
          Mon, 11 Nov 2002 19:10:07 +0000
From: "Rebecca Whitworth" <lesanctuaire@earthlink.net>
To: "Piers Haken" <piersh@friskit.com>
Cc: "Traci and Stephen Green" <tracigreen50@yahoo.com>
Subject: the green's car
Date: Mon, 11 Nov 2002 11:15:54 -0800
Message-ID: <LMBBIHONPNPJLKCOKALNIEDIDHAA.lesanctuaire@earthlink.net>
MIME-Version: 1.0
Content-Type: multipart/related;
	boundary="----=_NextPart_000_002F_01C28973.B386B770"
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
Importance: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106
Return-Path: lesanctuaire@earthlink.net
X-OriginalArrivalTime: 11 Nov 2002 19:22:27.0328 (UTC)
FILETIME=[ABBFA800:01C289B7]

------=_NextPart_000_002F_01C28973.B386B770
Content-Type: multipart/alternative;
	boundary="----=_NextPart_001_0030_01C28973.B386B770"

------=_NextPart_001_0030_01C28973.B386B770
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 8bit

------=_NextPart_001_0030_01C28973.B386B770
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


------=_NextPart_001_0030_01C28973.B386B770--
------=_NextPart_000_002F_01C28973.B386B770
Content-Type: image/jpeg;
	name="image001.jpg"
Content-Transfer-Encoding: base64
Content-ID: <image001.jpg@01C28973.B30C7E60>


------=_NextPart_000_002F_01C28973.B386B770--
</example>

As you can see it's just showing everything but the contents of the MIME
parts. I don't think there's any suggestion that these are _just_ the
SMTP headers, but the outlook plugin is treating them as such. Maybe the
outlook plugin should trim the non-SMTP parts from these 'headers'
before passing them to the classifier??


> > but that outlook just ignores the invalid MIME-part headers
> 
> By this point Outlook isn't looking *at all* at the part 
> that's damaged (and probably by it).  It's just sucking out 
> the PR_BODY_HTML property from the msg and rendering it, and 
> the value of that property contains no MIME armor at all, 
> just HTML stuff.
> 
> > -- maybe spambayes can do the same.
> 
> I keep telling people never to call 
> email.message_from_string() directly, but they don't listen 
> <wink>.  The tokenizer's way of getting an email message from 
> a string would have at least recovered the message body in 
> this case, but would have lost the headers entirely (they're 
> crap -- what can you do?).
> 
> > The problem is multiplied by the fact that outlook includes 
> the MIME- 
> > part headers and boundaries with the regular headers,
> 
> The Outlook client actually deletes those from the headers, because:
> 
> > but separates the body parts and attachments. I don't think there's 
> > any way to get the original, unseparated message from the API.
> 
> That's right, there isn't.  Outlook's basic structure appears 
> to predate MIME catching on, and the MIME support very much 
> appears hacked in after it was too late for a change in 
> worldview.  It's a mess that way, if you want to (as we do) 
> get MIME back out.  The Outlook client right now "loses" all 
> attachments, and even loses the msg body if the msg has been 
> digitally signed (because it turns out Outlook does Yet 
> Another Entirely Different Thing for signed msgs, leaving the 
> two "normal" body properties empty and stuffing the body 
> *plus* the signature into Yet Another property).

Yeah, it's a mess, but I don't think that the classifier should assume
that the message has SMTP headers at all, since many other MTA's exist
(exchange, notes, etc...) Outlook wasn't designed with MIME in mind
since exchange doesn't use MIME.

> > The Outlook UI shows the headers as:
> 
> By this do you mean View -> Options -> Internet headers?

Yup.

<snip/>


More information about the Spambayes mailing list