[Spambayes] Latest spammer trick stymied

Tue Apr 1 10:24:36 EST 2003

[Tim S]
> That's right.  We really should try to solve this problem with
> tokenization.

I'm not sure how many tricks we can pull with tokenization - in the sample
mail, there simply aren't enough tokens in the message.  I see lots of
these, and another trick they use is to use simple mis-spellings of words
that would otherwise be clues - eg "fatherr and daugter".  Like Richard, I
assume these are designed to provide minimal clues.

The problem seems to simply be the "unsure" nature of these messages.  As
Richard says, a trivial URL message in ham will *generally* have enough good
clues to push it over the edge.  It sounds like we are asking for a tweaking
of the math and/or configuration options to push unsure messages towards
"spam" - ie, a "in the absence of any clues, assume spam" rather than the
current "assume unsure".

The only problem I see with this is that, by definition, unsure messages do
not have enough clues.  A distinction seems to be that in one case we have
lots of unsure clues, where in this case we have very few unsure clues.  I'm
not sure we want a token for the length of the message - the number of clues
is the issue.

[Alex]
> Spammers might be simple folk, but serious crackers (not the script
> kiddies) certainly are not.  If there comes to be a widely deployed

As Richard says, this may be a stretch.  Such a DOS attack would require
sending a crafted spam to each of these addresses known to run such a filter
(or a blind spam hoping to hit them).  This spam would cause a single hit on
the web server.  Re-sending the same spam would not re-fetch the URL, as now
we have spam clues, and can score the message without the URL fetch (this is
assuming we auto train after the first fetch).  We would obviously only
fetch html text from the server.

Could you not do the same thing today, by sending out a HTML email
referencing some images from the server you want to attack?  Given the
number of mail clients out there that will fetch these images (using their
mailers default settings), I would expect this to remain a far more
effective attack than the one you propose.

[Tim S again]
> EXCELLENT point, Alex.  Case closed.

I'm not sure who you are speaking for here <wink>.  But yeah, fetching the
URL does seem the wrong long-term approach.  I'm very impressed with the
creativity of the idea though - I see lots of these spams and did wonder WTF
we could do about it.

Mark.