[spambayes-dev] Another trick?
Tim Peters
tim.one at comcast.net
Mon Oct 27 20:23:28 EST 2003
[Skip]
> Got a message (attached including debug evidence) today which scored
> 0.15 for me. It consistend of a multipart MIME message (text/plain
> followed by text/html). The text/plain stuff was innocuous.
It also used the white-on-white trick.
> The text/html part had a small amount of text and a doubly-encoded URL.
> It started out like:
>
>
> http://uh%65rn%61ndez%38@buti
>
n%66a%63t.co%6D/%63/i%72%6F%6E.%68t%6Dl?
roundup=3D5-KgE
>
> After replacing the numeric entities
Note that spambayes already replaces numeric entities.
> I got:
>
>
>
http://uh%65rn%61ndez%38@butin%66a%63t.co%6D/%63/i%72%6F%6E.%68t%6Dl?roundup
=3D5-KgE
That's one of the reasons it scored low for you: "url:roundup" is a strong
ham token for you, thanks to Ka-Ping Yee <wink>.
> After replacing the HTML encoded characters, I was left with:
>
> http://uhernandez8@butinfact.com/c/iron.html?roundup=3D5-KgE
>
> As you might imagine, this did a fairly good job of obscuring clues
> in the URL. It might make sense to do a reasonable amount of
> decoding of HTML before splitting into tokens.
Hard to say -- this one might have gone better for you if spambayes *didn't*
decode numeric character entities. Then you wouldn't have seen
"url:roundup". The only stronger ham token for you was "scotland", which
was hiding in the white-on-white text. OTOH, the strongest spam clue for
you was
'url:co%6d': 0.95
which shows that your spambayes had already learned that trying to disguise
".com" this way is strongly spammy.
Mixed bag!
More information about the spambayes-dev
mailing list