There's been a ton of press about applying Bayesian classifiers to spam detection lately, spurred on by Paul Graham's recent paper "A Plan for Spam"
Tim Peters has done an incredible amount of work on our Python implementation of this idea. Some of the reasons why I think Tim's work is so cool is that he's brought along his deep knowledge of speech recognition's related issues, and his obsessive devotion to reducing the amount of spam I ultimately have to delete <wink>.
In order to encourage more participation from the wider open source community, we've moved the code from a backwater of the Python cvs tree to its own project on SourceForge. The hope is that more people will be able to contribute to ideas, testing, and integration of the basic algorithms with other systems such as mail daemons, mailing list managers, and mail clients.
The project is called "spambayes" (for lack of creativity on our part :) and is hosted here:
If you're interested in becoming a developer on the project, let me know. Otherwise you can of course get anonymous checkouts of the code.
There are also two mailing lists related to the spambayes project. The first is a general discussion list:
and the other is a list for cvs checkin message notices:
Feel free to join those lists (and help be a guinea pig for Mailman 2.1 :).
PS to Python-devers: the code has been removed from nondist/sandbox/spambayes, so you won't be able to hack on it there. Also, please move discussion about this from email@example.com to firstname.lastname@example.org