[spambayes-dev] Clever avoidance technique

Sun Nov 30 17:09:43 EST 2003

[Greg Ward]
> Here's a nifty variation on the invisible-text-in-HTML tactic: make
> the invisible text vaguely relevant to the recipient of the spam.

Jeremy got exactly the same white-on-white text as you got below, about two
weeks ago, although the container spam had different content (albeit the
same thrust).  I think the CNRI connection is coincidence -- lots of spam
contains color-on-close-color decoy text, but you never notice it except on
those rare occasions it ends up being hammy to you.

> I just got one this morning that's immediately, obviously spam from
> these headers:

Why, are you married now, or you just don't get ham on Saturdays anymore
<wink>?

>   From: "Inconvenience O. Imprecision" <esteves at belice.com>
>   To: Gward <gward at python.net>
>   Subject: Gward, meet singles in your area
> U7n2QHvxKLmBOhTROl57D5Q7crCNQzbL
>   Date: Sat, 29 Nov 2003 16:42:22 -0500
>
> but if I look in the HTML body, I see this:
>
>   <p><font color=3d"#FFFFFF">The Defense Technical Information Center
>   (DTIC= =ae) is the central facility for the collection and
>   dissemination of scie= ntific and technical information for the
>   Department of Defense (DoD)=2e M= uch of this information is made
>   available by DTIC in the form of technica= l reports about
>   completed research, and research summaries of ongoing res= earch=2e
> u62Mb6TFJNptB0duTKrhqDiJDdBNRazm</font></p>
>
> which isn't terribly relevant to me...

Jeremy thought it was, as DTIC worked closely with CNRI, and even hosted a
symposium on CNRI's handle system (the topic of the next blurb below).

> but a little farther on (after the actual spam payload, encoded of
> course), we see this:
>
>   <p><font color=3d"#FFFFFF">The Handle System allows handles to be
>   both cr= eated and resolved in a distributed fashion (see the
>    diagram on this page= for an overview of the Handle System
>   architecture)=2e Both creation and = resolution can be accomplished
>   using dedicated clients, common clients su= ch as web browsers
>   using special extensions or plug-ins, or unextended cl= ients going
>   through various proxies=2e In all cases, communication with t= he
>   Handle System is carried out using the Handle System protocol which
>   ha= s a formal specification and some specific implementations, all
>   freely av= ailable from CNRI=2e The protocol has a corresponding
>   client library avai= lable in C and Java=2e The C client library
>   has been used by CNRI in the = creation of a handle-aware extension
>   to the Netscape and Microsoft web br= owsers=2e The Java client
>   library has been used to create an http-to-hand= [...]
>
> Interesting!  This would probably count as ham for any computer geek.
> However, the above blurb describes software produced by my former
> employer, and you can probably get to it with 3 or 4 clicks from my
> home page.

Ya, and Johnny Carrero's 1998 Folsom Fitness Extravaganza is only two clicks
from your home page, the McConnell Brain Imaging Centre only one.  If they
were targeting you specifically, they hit stuff relatively *hard* to find.

  And, knowing CNRI, the first blurb is probably vaguely
> related -- most of their money comes from the US
> military-industrial-entertainment complex, after all.
>
> This feels very much like it's targeted at Bayesian filters -- eg. I
> suspect SpamAssassin pre-2.6 would have had a better chance at calling
> this one spam than Spambayes (which scored it 0.198, just barely ham
> for my thresholds).

Jeremy and Guido both got spam a while back with a sure way to beat
SpamBayes:  the spam was added to replies to mailing list postings of
theirs, with their original subject lines and the quoted text of their
original messages.  That trick is all but guaranteed to find lots of tokens
hammy to you, and seems a lot cheaper & simpler than crawling over web pages
looking for "related interests".  But after a couple of those, we never saw
that trick again.  It's more expensive than spraying the same set of spam
content at every address you can find, and I expect the response rate from
targeting tech mailing-list posters was so low as to make it a net monetary
loss.

It would be nice to "do something" about the color-on-close-color trick, but
I don't yet see it *working* often enough to be worth the expense and
bother.