Spambayes + HTTP proxy server
Skip Montanaro
skip at pobox.com
Sat Feb 1 21:26:52 EST 2003
Paul> It occured to me that it would be possible to utilize this proxy
Paul> server idea with the Spambayes classifier to come up with an
Paul> all-Python web filter (suitable for use as a parental control or
Paul> company internet monitor)
Ignoring training, parameterization via the config file and the notion that
classifying web pages will be slightly different than classifying email,
maybe this will get you started:
from proxy3_filter import *
import proxy3_options
from spambayes import hammie, Options, mboxutils
DB = "/Users/skip/hammie.db"
HTML_ERROR = '''
<HTML>
<HEAD>
<TITLE>Forbidden</TITLE>
<STYLE TYPE="text/css"><!--
SPAN.a { font-family: georgia, times; font-size: 36pt; }
SPAN.b, P.retry { font-family: tahoma, verdana, arial; font-size:200%%; }
P.retry { text-align: left; }
--></STYLE>
</HEAD>
<BODY BGCOLOR=#800000 TEXT=white LINK=#ff8888 VLINK=#ff88aa>
<P><SPAN CLASS=b>Forbidden to connect: prob = %(prob)s</SPAN></P>
</BODY>
</HTML>
'''
class SpambayesFilter(BufferSomeFilter):
def __init__(self, *args):
BufferSomeFilter.__init__(self, *args)
self.hammie = hammie.open(DB, 1, 'r')
def filter(self, s):
prob, clues = self.hammie.score(s)
print "prob:", prob
if prob >= Options.options.spam_cutoff:
return HTML_ERROR % locals()
return s
from proxy3_util import *
register_filter('*', 'text/html', SpambayesFilter)
I called it mod_spambayes.py and ran the proxy as
python proxy3.py stdio spambayes
Whenever I tried to connect to a web server I got this traceback:
Traceback (most recent call last):
File "proxy3.py", line 540, in proxy
connections = connections + ready.process()
File "/Users/skip/src/proxy/proxy3_web.py", line 137, in process
stream = self.create_stream(length)
File "/Users/skip/src/proxy/proxy3_web.py", line 167, in create_stream
self.serverheaders)
File "/Users/skip/src/proxy/proxy3_filter.py", line 38, in get_filter
serverheaders=serverheaders)
TypeError: __init__() got an unexpected keyword argument 'clientheaders'
which didn't look related to this module, but apparently was, because when I
started the proxy as
python proxy3.py stdio
everything worked fine.
I've never used Patel's proxy, but it looks like it should be a relative
no-brainer to integrate with Spambayes. It's just that my brain is
apparently now disengaged, it being Saturday evening. I'll let someone else
fiddle with this.
Skip
More information about the Python-list
mailing list