Whitelist/verification spam filters
marklists at mceahern.com
Tue Aug 27 21:00:10 CEST 2002
[David Mertz, Ph.D.]
> While the characterization as "evil" is just plain silly, I agree with
> some of the criticism of this style of spam filtering. Unlike McEahern,
> I have a quite large set of people who contact me. A lot of them are
> not "regular", but are still quite legitimate--certainly at least
> hundreds of such people in the last year, say (people write me about my
> articles and my software, but perhaps only a few times close together
> for a brief conversation).
I'm tempted to say, but then have a public and private email address--of
course, that defeats the purpose. But what exactly is that purpose? To
give my one true email address freely without concern for spam.
The appeal, to me, of the whitelist technique is that for the addresses in
my whitelist, there will be no false positives. Of course, that statement
relies on distinguishing between addresses and people. mom at yahoo.com <>
mom_forgot_her_password_and_setup_a_new_one at yahoo.com.
> A lot of my correspondents have flakey email systems, and might miss the
> confirmation requests. [...]
I appreciate the thoroughness of your list of ways someone would fail to
confirm. Is it not sufficient to you, the fact that these messages aren't
just discarded, but sit somewhere where you can review them and manually
whitelist them? Of course, if the addresses are transient, there's no point
in whitelisting them.
> I am quite certain that using a whitelist/verification
> system would wind up excluding a significant number of messages that I
> would otherwise wish to receive.
If the whitelist system (such as TMDA) allows you to view the messages that
have not been confirmed, then it's not excluding them, it's merely filtering
them. So, in that sense, it's better than nothing (unless you view the
confirmation request as a distinctly negative thing).
> I am writing an article comparing spam filtering techniques for IBM
> developerWorks, as it happens. I will discuss a number of distinct
> techniques, including the whitelist/verification approach. Part of my
> article is quantitative testing of false positive and false negative
> categorization of large corpora I developed (i.e. selected from my email
> archives). I don't really know any way to include the
> whitelist/verification approach in the quantitative data,
> unfortunately--it can't be used against my saved collections of
> messages, of course.
Well, I may not be understanding you here. But let's say you have a body of
messages: M. Take a subset of them up to a certain point in time-->M1.
Take all the from addresses in M1 and add them to the whitelist. Then, see
how many messages in M-M1 are from addresses not harvested from M1.
I look forward to your article.
More information about the Python-list