[Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8

Eric S. Raymond esr@thyrsus.com
Thu, 22 Aug 2002 18:46:56 -0400


Guido van Rossum <guido@python.org>:
> > And not necessary.  Base64 spam invariably has telltales that Bayesian
> > amalysis will pick up in the headers and MIME cruft.  A rather large
> > percentage of it is either big5 or images.
> 
> I'd be curious to know if that will continue to be true in the future.
> At least one of my non-tech friends sends email that's exclusively
> HTML (even though the content is very lightly marked-up plain text),
> from a hotmail account.  Spam could easily have the same origin, but
> the HTML contents would be very different.

Well, consider.  If your friend were to send you base64 mail, it 
probaby would *not* come from one of the spamhaus addresses in 
bogofilter's wordlists.

The presence of base64 content is neutral.  That means that about the only
way not decoding it could lead to a false positive is if the headers 
contained spam-correlated tokens which decoding the body would have 
countered with words having a higher non-spam loading.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>