[spambayes-dev] Tricky false positive: US states

Tim Peters tim.one at comcast.net
Fri Oct 3 22:37:34 EDT 2003

[T. Alexander Popiel]
> Heh.  Along similar lines, the thing that I'd find most useful at the
> moment is marking as spam any message that is multipart/alternative,
> and the 80% or more of the words from the plaintext portion do not
> appear in the HTML version.  (As a less draconian/better version of
> this, under the same circumstances, ignore the plaintext part entirely
> for both scoring and training.)

It would be quite spambayesian to ignore the plaintext portion entirely
regardless of how much overlap there is with the HTML version -- we try to
score what the end user sees, and unless the HTML or client is broken in
this case, they're only gonna see the HTML part.

OTOH, I got an identity-theft scam spam yesterday, pretending to be from
eBay, that *would* have scored as ham if it weren't for this bizarre piece
of text/plain:

     sure, we are the rusian scamers and this  for idiots only...
     if you read this ... sure you are not idiot... anyway sorry for

So while I'm in favor of the change "on principle", I'm wary enough to want
to test it first.

