Whitelist/verification spam filters

David Mertz, Ph.D. mertz at gnosis.cx
Wed Aug 28 19:19:06 CEST 2002


-$P-W$- at verence.demon.co.uk (Paul Wright) wrote:
|Indeed. One other thing which I've not seen mentioned yet is what
|happens when two people using such systems email each other for the
|first time.

My understanding is that this is not generally a problem.  When you send
an outgoing message, the recipient is automatically whitelisted.  So
normally, a response at that point (even an automated challenge) is
passed through already.  It might be possible to spin a problem scenario
with mail forwarding, aliases, multiple addresses, etc., but I believe
the users who report that this issue is avoided (I believe there is also
an effort to special case messages that look like confirmation
challenges... although I wonder if spammers could sneak something
through that way).

|>I am writing an article comparing spam filtering techniques for IBM
|>developerWorks, as it happens.

|Are you aware of the Distributed Checksum Clearinghouse (DCC)? That
|seems to be a good way of dealing with spam, to my mind.

I sent off a draft, but did not reference DCC.  Perhaps I'll try to add
that before publication.  But I talked about Pyzor/Razor, and the
general principle of distributed blacklists.  Pyzor/Razor, btw. use a
statistical fuzzy digest in cataloging messages.  I guess an individual
message is diagnosed probabilistically as matching any cataloged spam.

I didn't look at the underlying algorithmic details, but I trust them
here.  I found zero false positives with Pyzor... but I got a very high
rate of false negatives on my spam corpus.

Yours, David...

--
---[ to our friends at TLAs (spread the word) ]--------------------------
Echelon North Korea Nazi cracking spy smuggle Columbia fissionable Stego
White Water strategic Clinton Delta Force militia TEMPEST Libya Mossad
---[ Postmodern Enterprises <mertz at gnosis.cx> ]--------------------------






More information about the Python-list mailing list