[Spambayes] Thank you

Anthony Baxter anthony at interlink.com.au
Thu Jul 31 15:45:21 EDT 2003


>>> "Richard Jowsey" wrote
> Interestingly, the *only* spam clue in my top-ten tokens was the word 
> "skeptical", which came in with a raw probability of 0.98 (everything 
> else was under 0.10). But that was just enough, apparently, to nudge 
> it over into the dark side.

It showed up as solidly 'ham' for me, '*H*': 0.99; '*S*': 0.03, 
but there were a bunch of solid spam clues:

	'income': 0.89; 'magazines,': 0.90; 'list-id:Discussion': 0.91;
	'now!': 0.92; 'you!': 0.92; 'subject:you': 0.94; 'income.': 0.94;
	'offers,': 0.95; '"remove"': 0.95; 'money!': 0.97

The problem was that mailman puts in a hell of a lot of headers and 
suchlike:
	'everywhere,': 0.04; 'url:mailman': 0.06;
	'errors-to:python.org': 0.06; 'list-archive:skip:m 10': 0.06;
	'list-help:python.org': 0.06; 'list-post:python.org': 0.06;
	'list-subscribe:python.org': 0.06;
	'list-subscribe:skip:m 10': 0.06;
	'list-unsubscribe:python.org': 0.06;
	'list-unsubscribe:skip:m 10': 0.06;
	'return-path:python.org': 0.06; 'sender:python.org': 0.06;
	'url:python': 0.07; 'list-subscribe:mailman': 0.08;
	'list-unsubscribe:mailman': 0.08; 'email addr:python.org': 0.08;
	'list-archive:pipermail': 0.08; 'url:listinfo': 0.08;
	'list-subscribe:http': 0.08; 'list-unsubscribe:http': 0.08;
	'broke': 0.08; 'header:Errors-To:1': 0.09; 'to:python.org': 0.09;
	'skip:_ 40': 0.09; 'spambayes': 0.09; 'subject:Spambayes': 0.09;
	'list-id:list': 0.09; 'subject:] ': 0.10; 'list-id:for': 0.10;
	'list-subscribe:mailto': 0.11; 'list-help:help': 0.11;
	'list-help:mailto': 0.11; 'list-help:subject': 0.11;
	'list-subscribe:subject': 0.11; 'list-subscribe:subscribe': 0.11;
	'list-unsubscribe:subject': 0.11; 'list-help:request': 0.11;
	'list-post:mailto': 0.11; 'list-subscribe:listinfo': 0.11;
	'list-subscribe:request': 0.11;
	'list-unsubscribe:listinfo': 0.11;
	'list-unsubscribe:request': 0.11; 'list-archive:http': 0.11;
	'that.': 0.12; 'list-unsubscribe:unsubscribe': 0.12;
	'list-unsubscribe:mailto': 0.13; 'looked': 0.13; 'tie': 0.14;

This suggests we could probably be smarter about parsing headers from 
mailman to reduce the number of highly correlated clues.




More information about the Spambayes mailing list