[Python-Dev] Getting started with GBayes testing

David LeBlanc whisper@oz.net
Wed, 4 Sep 2002 17:48:07 -0700


I would like to be in on that project too please.

David LeBlanc
Seattle, WA USA 

> -----Original Message-----
> From: python-dev-admin@python.org [mailto:python-dev-admin@python.org]On
> Behalf Of Guido van Rossum
> Sent: Wednesday, September 04, 2002 17:24
> To: bkc@murkworks.com
> Cc: python-dev@python.org
> Subject: Re: [Python-Dev] Getting started with GBayes testing
> 
> 
> > I'm interested in contributing to GBayes ..
> > 
> > I'm thinking of trying word stemming and adding other types of token
> > indicators. How can I contribute?
> 
> Pretty soon, a SF propject will be created (Barry has already gotten
> the request in).  We'll gladly add you to the list of developers.
> 
> > Btw, I have been saving up my spam for a year or so.. I have about
> > 31,238 spam messages saved up now. These are categorized as spam
> > based on my reading of the subject, or examining the body when in
> > doubt. There are probably 10% dups in the corpus. Some of them have
> > viruses, likely klez.
> 
> Cool.
> 
> > I'd like to replicate Tim's test rig so I can compare my results
> > with existing ones. My spam isn't in mbox format, but I can convert
> > it..
> 
> If you can't wait for the SF project, you can find all the code in the
> Python CVS tree:
> 
>   
> http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/nondi
> st/sandbox/spambayes/
> 
> > I'm particularly intersted in how to allow html only messages
> > (reduce false positives).  I'm getting a lot of personal mail in
> > that format, unfortunately.
> 
> You train it with an equal number of spam and non-spam ("ham") that
> you received.  Just make sure the ham training messages contain enough
> representatives of the html-only mail.
> 
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev