I'm interested in contributing to GBayes ..
I'm thinking of trying word stemming and adding other types of token indicators. How can I contribute?
Btw, I have been saving up my spam for a year or so.. I have about 31,238 spam messages saved up now. These are categorized as spam based on my reading of the subject, or examining the body when in doubt. There are probably 10% dups in the corpus. Some of them have viruses, likely klez.
I'd like to replicate Tim's test rig so I can compare my results with existing ones. My spam isn't in mbox format, but I can convert it..
I'm particularly intersted in how to allow html only messages (reduce false positives). I'm getting a lot of personal mail in that format, unfortunately.
Brad Clements, firstname.lastname@example.org (315)268-1000 http://www.murkworks.com (315)268-9812 Fax AOL-IM: BKClements