Spambayes + HTTP proxy server

Skip Montanaro skip at
Sat Feb 1 21:26:52 EST 2003

    Paul> It occured to me that it would be possible to utilize this proxy
    Paul> server idea with the Spambayes classifier to come up with an
    Paul> all-Python web filter (suitable for use as a parental control or
    Paul> company internet monitor)

Ignoring training, parameterization via the config file and the notion that
classifying web pages will be slightly different than classifying email,
maybe this will get you started:

    from proxy3_filter import *
    import proxy3_options

    from spambayes import hammie, Options, mboxutils

    DB = "/Users/skip/hammie.db"

    HTML_ERROR = '''
        <STYLE TYPE="text/css"><!--
          SPAN.a { font-family: georgia, times; font-size: 36pt; }
          SPAN.b, P.retry { font-family: tahoma, verdana, arial; font-size:200%%; }
          P.retry { text-align: left; }
      <BODY BGCOLOR=#800000 TEXT=white LINK=#ff8888 VLINK=#ff88aa>
      <P><SPAN CLASS=b>Forbidden to connect: prob = %(prob)s</SPAN></P>

    class SpambayesFilter(BufferSomeFilter):
        def __init__(self, *args):
            BufferSomeFilter.__init__(self, *args)
            self.hammie =, 1, 'r')

        def filter(self, s):
            prob, clues = self.hammie.score(s)
            print "prob:", prob
            if prob >= Options.options.spam_cutoff:
                return HTML_ERROR % locals()
            return s

    from proxy3_util import *

    register_filter('*', 'text/html', SpambayesFilter)

I called it and ran the proxy as

    python stdio spambayes

Whenever I tried to connect to a web server I got this traceback:

    Traceback (most recent call last):
      File "", line 540, in proxy
        connections = connections + ready.process()
      File "/Users/skip/src/proxy/", line 137, in process
        stream = self.create_stream(length)
      File "/Users/skip/src/proxy/", line 167, in create_stream
      File "/Users/skip/src/proxy/", line 38, in get_filter
    TypeError: __init__() got an unexpected keyword argument 'clientheaders'

which didn't look related to this module, but apparently was, because when I
started the proxy as

    python stdio

everything worked fine.

I've never used Patel's proxy, but it looks like it should be a relative
no-brainer to integrate with Spambayes.  It's just that my brain is
apparently now disengaged, it being Saturday evening.  I'll let someone else
fiddle with this.


More information about the Python-list mailing list