[Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8

Guido van Rossum guido@python.org
Fri, 23 Aug 2002 09:28:52 -0400


> Well, consider.  If your friend were to send you base64 mail, it 
> probaby would *not* come from one of the spamhaus addresses in 
> bogofilter's wordlists.

Yeah, but not every spammer sends from a well-known spammer's address.

> The presence of base64 content is neutral.  That means that about the only
> way not decoding it could lead to a false positive is if the headers 
> contained spam-correlated tokens which decoding the body would have 
> countered with words having a higher non-spam loading.

Graham mentions the possibility that spammers can develop ways to make
their headers look neutral.  When I receive a base64-encoded HTML
message from Korea whose subject is "Hi", it could be from a Korean
Python hacker (there were 700 of those at a conference Christian
Tismer attended in Korea last year, so this is a realistic example),
or it could be Korean spam.  Decoding the base64 would make it
obvious.  The headers usually give some clues, but based on what makes
it through SpamAssassin (which we've been running for all python.org
mail since February or so), base64 encoding scores high on the list of
false negatives.

--Guido van Rossum (home page: http://www.python.org/~guido/)