[Spambayes] Exceptionally well-done identity-theft spam
skip at pobox.com
Mon Dec 29 17:04:29 EST 2003
>> Yeah, this is a stinker. I get them all the time. Interestingly
>> enough, your message scored 0.69 for me. It probably would have
>> scored as spam except it came from you. ;-)
Tim> Have you trained on any real msgs from PayPal as ham?
Nope, but I've gotten enough of these that I've trained on several as spam.
PayPal would never send you such a "come login to my world" message, so even
though mail you get from them is a bit spammy in content, I suspect it has
enough clues to distinguish it from the actual scams. PayPal doesn't send
me much mail at all. I'm not sure I've trained on any mail from them in my
>> This suggests some more possible things to try:
>> * URLs which have usernames in them
>> * URLs which refer to non-standard ports
>> * URLs with IP addresses instead of hostnames (in addition to
>> specific hosts or networks)
>> I haven't looked to see if any of these are already recognized, but
>> all three techniques seem to be prevalent or required by such scams.
Tim> The pieces of the URL get broken out and tagged as such (with a
Tim> "url:" prefix), but there's no semantic analysis.
I suspect we might want to start doing a little. Oddball ports and
usernames in URLs are rare beasts. I'll bet most URLs containing them
(especially those embedded in unsolicited emails) would be strong spam
Tim> I don't think a statistical word analyzer (like ours) is going to
Tim> do much good against well-done identity-theft scam, and *some* of
Tim> those have been getting much better over the last year. This one
Tim> was also remarkable for its good spelling and grammar (still rare
Tim> in "the typical" scam of this sort).
I think the current state-of-the-art can be improved. I added a section to
the NEWTRICKS file.
More information about the Spambayes