[Spambayes] Cunning use of quoted-printable
Greg Ward
gward@python.net
Tue, 1 Oct 2002 11:41:24 -0400
On 01 October 2002, Richie Hindle said:
[... message with lots of quoted-printable in it ...]
> Looks like an attempt to fox system like spambayes. It doesn't make much
> difference, because the tokenizer decodes the quoted-printable, but it
> could trigger a clue token.
SpamAssassin has a test for this -- MIME_EXCESSIVE_QP:
rawbody MIME_EXCESSIVE_QP eval:check_for_mime_excessive_qp()
describe MIME_EXCESSIVE_QP Excessive quoted-printable encoding in body
score MIME_EXCESSIVE_QP 2.070
The implementation is pretty simple:
sub check_for_mime_excessive_qp {
my ($self) = @_;
# Note: We don't use rawbody because it removes MIME parts. Instead,
# we get the raw unfiltered body. We must not change any lines.
my $body = join('', @{$self->{msg}->get_body()});
my $length = length($body);
my $qp = $body =~ s/\=([0-9A-Fa-f]{2,2})/$1/g;
# this seems like a decent cutoff
return ($length != 0 && ($qp > ($length / 20)));
}
(Hey, now that Matt Sergeant is on the list, I can stop being the local
SpamAssassin expert! *phew*!)
I guess there are a couple of ways to translate this to a
stream-of-tokens approach:
* do a tokenizing pass over the raw message body, and spit out
a whole lot of "=20" tokens
* examine the raw body in a non-tokenizing way, and just emit
a "lots of quoted-printable" token
* ...?
Greg
--
Greg Ward <gward@python.net> http://www.gerg.ca/
Did YOU find a DIGITAL WATCH in YOUR box of VELVEETA?