[Spambayes] how spambayes handles image-only spams
wsy at merl.com
Sun Sep 7 21:12:55 EDT 2003
From: "Robert K. Coe" <bob at 1776.com>
I wonder if you may be overlooking something that could skew your
statistics. My experience has been that when I create an HTML
message, Outlook actually sends it as a multi-part MIME construct
incorporating both HTML and plain-text forms of the message. If the
recipient reads the message with an HTML-capable email reader,
he'll see the HTML form of the message; otherwise he'll see the
plain-text form. If you're collecting your statistics with a
plain-text mail reader, or if you're looking only at the plain-text
version in a multi-part message, you may be understating the actual
use of HTML in messages sent to you.
I read email with Emacs, and I get the _WHOLE_ text (headers and
everything, multipart mimes, all that) just as it was recieved on the
SMTP port. Additionally, I get a postprocessed section (not an
attachment) with all of the base64's expanded, <--interupptus-->
comments removed, etc.
That's what I feed back into the learning cycle; so I even get
things you probably don't get, like KOI-8 russian text and text
that had been so sliced up by spammus interruptus that you can't
Do you do any reassembly, or is there any chance that you are
not getting the ASCII text if there is any?
In fact, if someone knows how to get Outlook to stop sending a
plain-text version of HTML messages, I'd like to hear about it. Now
that almost everybody can read HTML messages, I think the
plain-text version is superfluous.
No, you have it the other way 'round.
The HTML version is superfluous, the plain text is all you need. :-)
More information about the Spambayes