[spambayes-dev] Missed spam - Spam Clues: bechtel

Tim Peters tim.one at comcast.net
Sat Aug 2 00:56:41 EDT 2003

[Mark Hammond]
> OK, it gets weirder!!  If I save the copy I received via the
> SpamBayes list, I do indeed get the exact same results as you.
> However, re-investigating the original still shows the same clues I
> posted.
> ...
> The mysteries of Outlook get stranger and stranger.  I wonder if it
> was "invalid" HTML that someone on the way fixed?

Then I have to guess this:  the stream spambayes tokenizes is *our* attempt
to reconstruct the original msg from Outlook's funky many-headed message
store.  But the message attached to a "Spam Clues" email is *Outlook's*
attempt to reconstruct the original.  I bet the *original* original
<wink/sigh/snarl> had something funny going on in its structure, and that
Outlook repaired that when it reconstructed the

But what?  URL extraction occurs very early in the tokenizer, and in
particular long before we try to throw out HTML.  We'd have to look at the
actual bytestream spambayes synthesized (on your end) to figure this out.  A
curious possibility is that this spam was constructed to fool spambayes.

