Spambayes + HTTP proxy server
hamonlypaulpaterson at houston.rr.com
Sun Feb 2 07:24:38 CET 2003
jerf at compy.attbi.com wrote:
> On Sat, 01 Feb 2003 22:47:06 +0000, Paul Paterson wrote:
> >Does anyone have any experience in this area to say whether this
> >approach is workable?
> Perfectly workable, though it would probably require some tweaks to the
> tokenizer to work as well as possible.
> It would not take long to set up at least a prototype of this.
The prototype turned out to be shorter than my original post,
# mod_spambayesfilter.py - used by proxy3
from spambayes import tokenizer, classifier
BUFFER_LEN = 128
LOWER_BOUND = 0.5
tok = tokenizer.Tokenizer()
checker = classifier.Classifier()
def filter(self, s):
if checker.chi2_spamprob(t.tokenize(text)) > self.LOWER_BOUND:
return "Not authorized"
register_filter('*/*', 'text/html', SpamBayesFilter)
Am I right in thinking that the spambayes tokenizer will just revert to
splitting up words if it doesn't think it is looking at an email?
Perhaps this might be sufficient for webpage filtering since web pages
probably wont be using the same kinds of subtrefuge that spammers resort to.
More information about the Python-list