[Spambayes] using SpamBayes for Wiki filtering

Tony Meyer tameyer at ihug.co.nz
Sun Nov 27 00:16:40 CET 2005


> I'm looking into spam prevention techniques for the Trac wiki/issue
> tracking system ( http://projects.edgewall.com/trac ) and was  
> wondering
> whether SpamBayes could be used as a library for this type of
> application, or if it was quite specific to email.

The tokenizer is designed to tokenize email, but you could certainly  
write your own tokenizer (or subclass the existing one) designed to  
tokenize wiki pages.  Once you've got tokens, you can use the  
existing classifier and storage classes (classifier.py and storage.py).

However, you might find that the email tokenizer does reasonably well  
on wiki pages; email and web text are not particularly different.  It  
would be worth trying that first.

BTW, you want to do something like :

 >>> from spambayes.tokenizer import tokenize
 >>> from spambayes.storage import ZODBClassifier
 >>> c = ZODBClassifier("/Users/tameyer/hammie.db")
 >>> c.spamprob(tokenize("query text"), True)
(0.5, [('*H*', 0.0), ('*S*', 0.0)])
 >>> c.learn(tokenize("spam text"), True)
 >>> c.learn(tokenize("ham text"), False)
 >>> c.store()
 >>> c.close()

> Are there any other projects using SpamBayes like this that I can  
> use as
> an example?

There's a plug-in for a web proxy to use SpamBayes for web filtering  
in the contrib/ directory.  If you google through the archives of  
this list (or maybe spambayes-dev?) there's an example of Skip using  
SpamBayes for music classification, IIRC.  I've used the classifier &  
storage for classification of lines of dialogue in a scripted  
performance.

=Tony.Meyer

-- 
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.




More information about the SpamBayes mailing list