[Python-Dev] Getting started with GBayes testing
Wed, 4 Sep 2002 17:48:07 -0700
I would like to be in on that project too please.
Seattle, WA USA
> -----Original Message-----
> From: email@example.com [mailto:firstname.lastname@example.org]On
> Behalf Of Guido van Rossum
> Sent: Wednesday, September 04, 2002 17:24
> To: email@example.com
> Cc: firstname.lastname@example.org
> Subject: Re: [Python-Dev] Getting started with GBayes testing
> > I'm interested in contributing to GBayes ..
> > I'm thinking of trying word stemming and adding other types of token
> > indicators. How can I contribute?
> Pretty soon, a SF propject will be created (Barry has already gotten
> the request in). We'll gladly add you to the list of developers.
> > Btw, I have been saving up my spam for a year or so.. I have about
> > 31,238 spam messages saved up now. These are categorized as spam
> > based on my reading of the subject, or examining the body when in
> > doubt. There are probably 10% dups in the corpus. Some of them have
> > viruses, likely klez.
> > I'd like to replicate Tim's test rig so I can compare my results
> > with existing ones. My spam isn't in mbox format, but I can convert
> > it..
> If you can't wait for the SF project, you can find all the code in the
> Python CVS tree:
> > I'm particularly intersted in how to allow html only messages
> > (reduce false positives). I'm getting a lot of personal mail in
> > that format, unfortunately.
> You train it with an equal number of spam and non-spam ("ham") that
> you received. Just make sure the ham training messages contain enough
> representatives of the html-only mail.
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> Python-Dev mailing list