[Spambayes] Exceptionally well-done identity-theft spam

Tim Peters tim.one at comcast.net
Mon Dec 29 19:34:59 EST 2003


[Avi Jacobson]
> I wonder whether this is not the face of things to come --
> reliable-looking links to reliable-looking websites, where the HREF
> actually points elsewhere.

For identity-theft scam spam, almost certainly -- they have to trick you
into revealing personal info you wouldn't normally pass out.  But if what
you got after clicking on the link was, e.g., an offer to cut your mortgage
rate, or to enlarge part of your anatomy, I expect the response rate would
be too low to repay the costs.  After all, the initial sales msg flat-out
lied to you then, and the percentage of people eager to get fleeced a second
time has got to approach 0.

> Note in the source code that the incriminating part of the URL in
> the HREF (and in the browser window that opens) is coded in Hex
> values rather than characters.
>
> My guess is that if you dump enough of these messages into your Junk
> folder, Spambayes will be smart enough to identify this kind of URL
> as a high-probability token. Spambayes developers, am I right? Will
> too many % signs in a URL raise the spam probability?

Not now, unless you save away an enormous number of these things.  We break
URLs into pieces based on the official separator characters now, but that's
it.  The specific scam in question generated these distinct URL-related
tokens:

'proto:http'
'proto:https'
'url:'
'url:%31%36%32'
'url:%32%31%31'
'url:%36%33'
'url:%37%33%30%31'
'url:%39%33'
'url:%68%74%6d'
'url:%70%61%79%70%61%6c'
'url:_login-run'
'url:cgi-bin'
'url:cmd'
'url:com'
'url:com%65%6b%6a%68%61%73%6b%6a%71%70%77%6f%70%77%6f'
'url:config'
'url:dot_row'
'url:email_logo'
'url:gif'
'url:images'
'url:login'
'url:mail'
'url:paypal'
'url:pixel'
'url:webscr'
'url:www'
'url:yahoo'

It would *probably* work well to, in addition, generate

'url:%nn'

for each instance of a % escape.  That needs testing, though, as "pure wins"
are almost non-existent in this game, and for all I know some church in Iowa
generates HTML parish newletters in which all URLs are encoded just because
someone didn't understand an option in their HTML-generating software.




More information about the Spambayes mailing list