Spambayes + HTTP proxy server

Skip Montanaro skip at
Sun Feb 2 19:27:05 CET 2003

Sorry for the too quick post.  In rearranging things I lost the spam return.
Just to be sure it was actually filtering something, I searched for "sex" at
Google.  It let that page in, allowed the safersex and SEX.ETC pages
through, but blocked HBO's Sex and the City and janesguide.  Note that this
is using my current hammmie.db file, which has only been trained on my ham
and spam email collections.  I don't expect it to necessarily do a very good
job with web pages given no training.


    import os

    from proxy3_filter import *
    import proxy3_options

    from spambayes import hammie, Options, mboxutils
    dbf = os.path.expanduser(Options.options.hammiefilter_persistent_storage_file)

    class SpambayesFilter(BufferAllFilter):
        hammie =, 1, 'r')

        def filter(self, s):
            if self.reply.split()[1] == '200':
                prob = self.hammie.score("%s\r\n%s" % (self.serverheaders, s))
                print "|  prob: %.5f" % prob
                if prob >= Options.options.spam_cutoff:
                    print self.serverheaders
                    print "text:", s[0:40], "...", s[-40:]
                    return "not authorized"
            return s

    from proxy3_util import *

    register_filter('*/*', 'text/html', SpambayesFilter)

