FW: [spambayes-dev] Results for DNS lookup in tokenizer
Phillip J. Eby
pje at telecommunity.com
Sun Apr 11 12:55:05 EDT 2004
At 08:25 AM 4/11/04 -0400, spambayes-dev-request at python.org wrote:
>I'll restate my question. What does Matt's proposal do that
>mine_received_headers doesn't do already?
It looks at URLs embedded in the message *body*. As a simple contrast, if
I link here to:
http://enlarge-my-spam.com?id=123456
That will produce a very *different* set of IP tokens than the Received:
headers of this message. And, if the same spam is sent from a thousand
compromised PC's, they will all still have the same URL IP cues, despite
lacking any Received: headers in common. Yes, they'll also have tokens
representing parts of the domain name, but spammers can cheaply change
their domain names to avoid being recognized.
Their website IP addresses are not only harder to change, but take
advantage of the fact that so-called "bulletproof hosting" providers are a
"bad neighborhood" for links. So, if you train on these tokens, then you
could potentially nail entirely unrelated spammers who simply host with the
same ISP.
Of course, the spammers' next move would likely be to use redirects from
non-"bulletproof" hosts, but everything we can do to make it more difficult
and more costly for them is a good thing.
More information about the spambayes-dev
mailing list