[Python-Dev] The first trustworthy <wink> GBayes results

Eric S. Raymond esr@thyrsus.com
Thu, 29 Aug 2002 13:13:07 -0400


Tim Peters <tim.one@comcast.net>:
> Spammers often generate random "word-like" gibberish at the ends of msgs,
> and "rd" is one of the random two-letter combos that appears in the spam
> corpus.  Perhaps it would be good to ignore "words" with fewer than W
> characters (to be determined by experiment).

Bogofilter throws out words of length one and two.

> I expect that including the headers would have given these much better
> chances of getting through, given Robin and Alex's posting histories.
> Still, the idea of counting words multiple times is open to question, and
> experiments both ways are in order.

And bogofilter includes the headers.  This is important, since
otherwise you don't rate things like spamhaus addresses and sender
names.
-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>