[Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8
Eric S. Raymond
esr@thyrsus.com
Thu, 22 Aug 2002 18:46:56 -0400
Guido van Rossum <guido@python.org>:
> > And not necessary. Base64 spam invariably has telltales that Bayesian
> > amalysis will pick up in the headers and MIME cruft. A rather large
> > percentage of it is either big5 or images.
>
> I'd be curious to know if that will continue to be true in the future.
> At least one of my non-tech friends sends email that's exclusively
> HTML (even though the content is very lightly marked-up plain text),
> from a hotmail account. Spam could easily have the same origin, but
> the HTML contents would be very different.
Well, consider. If your friend were to send you base64 mail, it
probaby would *not* come from one of the spamhaus addresses in
bogofilter's wordlists.
The presence of base64 content is neutral. That means that about the only
way not decoding it could lead to a false positive is if the headers
contained spam-correlated tokens which decoding the body would have
countered with words having a higher non-spam loading.
--
<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>