Tim Peters email@example.com:
Spammers often generate random "word-like" gibberish at the ends of msgs, and "rd" is one of the random two-letter combos that appears in the spam corpus. Perhaps it would be good to ignore "words" with fewer than W characters (to be determined by experiment).
Bogofilter throws out words of length one and two.
I expect that including the headers would have given these much better chances of getting through, given Robin and Alex's posting histories. Still, the idea of counting words multiple times is open to question, and experiments both ways are in order.
And bogofilter includes the headers. This is important, since otherwise you don't rate things like spamhaus addresses and sender names.