[Spambayes] using SpamBayes for Wiki filtering
tameyer at ihug.co.nz
Sun Nov 27 00:16:40 CET 2005
> I'm looking into spam prevention techniques for the Trac wiki/issue
> tracking system ( http://projects.edgewall.com/trac ) and was
> whether SpamBayes could be used as a library for this type of
> application, or if it was quite specific to email.
The tokenizer is designed to tokenize email, but you could certainly
write your own tokenizer (or subclass the existing one) designed to
tokenize wiki pages. Once you've got tokens, you can use the
existing classifier and storage classes (classifier.py and storage.py).
However, you might find that the email tokenizer does reasonably well
on wiki pages; email and web text are not particularly different. It
would be worth trying that first.
BTW, you want to do something like :
>>> from spambayes.tokenizer import tokenize
>>> from spambayes.storage import ZODBClassifier
>>> c = ZODBClassifier("/Users/tameyer/hammie.db")
>>> c.spamprob(tokenize("query text"), True)
(0.5, [('*H*', 0.0), ('*S*', 0.0)])
>>> c.learn(tokenize("spam text"), True)
>>> c.learn(tokenize("ham text"), False)
> Are there any other projects using SpamBayes like this that I can
> use as
> an example?
There's a plug-in for a web proxy to use SpamBayes for web filtering
in the contrib/ directory. If you google through the archives of
this list (or maybe spambayes-dev?) there's an example of Skip using
SpamBayes for music classification, IIRC. I've used the classifier &
storage for classification of lines of dialogue in a scripted
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.
More information about the SpamBayes