Whitelist/verification spam filters
-$P-W$- at verence.demon.co.uk
Thu Aug 29 21:11:44 CEST 2002
In article <mailman.1030555727.11821.python-list at python.org>,
David Mertz, Ph.D. <mertz at gnosis.cx> wrote:
>-$P-W$- at verence.demon.co.uk (Paul Wright) wrote:
>|Are you aware of the Distributed Checksum Clearinghouse (DCC)? That
>|seems to be a good way of dealing with spam, to my mind.
>I sent off a draft, but did not reference DCC. Perhaps I'll try to add
>that before publication. But I talked about Pyzor/Razor, and the
>general principle of distributed blacklists. Pyzor/Razor, btw. use a
>statistical fuzzy digest in cataloging messages. I guess an individual
>message is diagnosed probabilistically as matching any cataloged spam.
>I didn't look at the underlying algorithmic details, but I trust them
>here. I found zero false positives with Pyzor... but I got a very high
>rate of false negatives on my spam corpus.
The thing I like about the DCC as opposed to Pyzor/Razor is that it does
not rely on humans reporting spam. Since it stores hashes of all
non-local mail passing through a server, any hash with a sufficiently
high count is either a mailing list or spam (hence my comment about
needing to whitelist legitimate bulk email). I hear this works quite
well, although my own spam load is small enough that I haven't bothered
to set it up here.
Paul Wright | http://pobox.com/~pw201 |
More information about the Python-list