Spambayes + HTTP proxy server
Paul Paterson
hamonlypaulpaterson at houston.rr.com
Sun Feb 2 01:24:38 EST 2003
jerf at compy.attbi.com wrote:
> On Sat, 01 Feb 2003 22:47:06 +0000, Paul Paterson wrote:
>
> >Does anyone have any experience in this area to say whether this
> >approach is workable?
>
>
> Perfectly workable, though it would probably require some tweaks to the
> tokenizer to work as well as possible.
>
> It would not take long to set up at least a prototype of this.
>
The prototype turned out to be shorter than my original post,
#
# mod_spambayesfilter.py - used by proxy3
#
from spambayes import tokenizer, classifier
class SpamBayesFilter(BufferSomeFilter):
BUFFER_LEN = 128
LOWER_BOUND = 0.5
tok = tokenizer.Tokenizer()
checker = classifier.Classifier()
def filter(self, s):
if checker.chi2_spamprob(t.tokenize(text)) > self.LOWER_BOUND:
return "Not authorized"
else:
return s
register_filter('*/*', 'text/html', SpamBayesFilter)
Am I right in thinking that the spambayes tokenizer will just revert to
splitting up words if it doesn't think it is looking at an email?
Perhaps this might be sufficient for webpage filtering since web pages
probably wont be using the same kinds of subtrefuge that spammers resort to.
More information about the Python-list
mailing list