Spambayes + HTTP proxy server

Skip Montanaro skip at pobox.com
Sat Feb 1 21:26:52 EST 2003


    Paul> It occured to me that it would be possible to utilize this proxy
    Paul> server idea with the Spambayes classifier to come up with an
    Paul> all-Python web filter (suitable for use as a parental control or
    Paul> company internet monitor)

Ignoring training, parameterization via the config file and the notion that
classifying web pages will be slightly different than classifying email,
maybe this will get you started:

    from proxy3_filter import *
    import proxy3_options

    from spambayes import hammie, Options, mboxutils

    DB = "/Users/skip/hammie.db"

    HTML_ERROR = '''
    <HTML>
      <HEAD>
        <TITLE>Forbidden</TITLE>
        <STYLE TYPE="text/css"><!--
          SPAN.a { font-family: georgia, times; font-size: 36pt; }
          SPAN.b, P.retry { font-family: tahoma, verdana, arial; font-size:200%%; }
          P.retry { text-align: left; }
        --></STYLE>
      </HEAD>
      <BODY BGCOLOR=#800000 TEXT=white LINK=#ff8888 VLINK=#ff88aa>
      <P><SPAN CLASS=b>Forbidden to connect: prob = %(prob)s</SPAN></P>
      </BODY>
    </HTML>
    '''

    class SpambayesFilter(BufferSomeFilter):
        def __init__(self, *args):
            BufferSomeFilter.__init__(self, *args)
            self.hammie = hammie.open(DB, 1, 'r')

        def filter(self, s):
            prob, clues = self.hammie.score(s)
            print "prob:", prob
            if prob >= Options.options.spam_cutoff:
                return HTML_ERROR % locals()
            return s

    from proxy3_util import *

    register_filter('*', 'text/html', SpambayesFilter)

I called it mod_spambayes.py and ran the proxy as

    python proxy3.py stdio spambayes

Whenever I tried to connect to a web server I got this traceback:

    Traceback (most recent call last):
      File "proxy3.py", line 540, in proxy
        connections = connections + ready.process()
      File "/Users/skip/src/proxy/proxy3_web.py", line 137, in process
        stream = self.create_stream(length)
      File "/Users/skip/src/proxy/proxy3_web.py", line 167, in create_stream
        self.serverheaders)
      File "/Users/skip/src/proxy/proxy3_filter.py", line 38, in get_filter
        serverheaders=serverheaders)
    TypeError: __init__() got an unexpected keyword argument 'clientheaders'

which didn't look related to this module, but apparently was, because when I
started the proxy as

    python proxy3.py stdio

everything worked fine.

I've never used Patel's proxy, but it looks like it should be a relative
no-brainer to integrate with Spambayes.  It's just that my brain is
apparently now disengaged, it being Saturday evening.  I'll let someone else
fiddle with this.

Skip





More information about the Python-list mailing list