[spambayes-dev] Proposed fourth X-Spambayes-Classification
tim.one at comcast.net
Sat May 31 00:06:49 EDT 2003
[T. Alexander Popiel]
> Personally, I'd like to see us have a simpler parser which just
> understood headers vs. body, and didn't try to decode the individual
> headers (for charset, or anything like that). Ideally, we'd give
> this simple parser the message (as a string) and a list of headers
> to remove from the message, and it would return a modified message
> (again as a string). We could use this simpler parser both for
> blowing away the MIME headers (as alluded to above for dealing with
> malformed messages) and for annotating the message with the
> classification results (blow away all the classification headers,
> then prepend the new ones (properly formatted) to the message).
> Of course, that would take about two hours of work, and I'm lucky
> to get two consecutive minutes right now...
I don't expect this would help. Decoding base64 and quoted-printable are
important, but base64 if and only if it's a text section. In order to
identify this stuff requires decoding the MIME structure too. Decoding
charsets probably isn't important for *me*, because virtually all my ham is
in 7-bit ASCII English, but for non-English users I can easily believe it's
vital. Etc -- the email package does a lot of stuff, and it's valuable.
As to fiddling damaged msgs to get them thru the parser, the next time just
try it. I've had easy success with this every time I've seen it pop up in
the Outlook client. Appending a newline is sometimes all it takes. In one
case, it required falling back to a different base64 decoder, because the
email pkg's decoder is too(!) forgiving.
The reason this crap keeps popping up has been covered before: we don't
have a chokepoint now for asking the email pkg to parse stuff, so
workarounds are spread around the codebase. Of course this won't get fixed
until someone who actually likes the email package makes time to make it fly
More information about the spambayes-dev