Kaitlin Duck Sherwood ducky at webfoot.com
Mon Feb 17 19:58:42 EST 2003

At 11:20 AM +0100 2/16/03, Rob Hooft wrote:
>  Just received this via a python.org mailinglist; spam is evolving 
> strongly to avoid automatic detection by bayesian techniques.

If the spammers ever get too clever for a purely word-based approach, 
then it would be easy to toss in the ratio of
	non-letter characters (perl /W) : letter characters (perl /w)
	characters inside HTML tags : characters outside HTML
	number of spaces : total length of message
as features.

I believe that those ratios will do a good job of spotting messages 
that have wildly different "eye space" and "ASCII space" 

