OT: spam filtering idea

Mon Jan 13 19:39:12 EST 2003

On Mon, 13 Jan 2003 15:42:35 -0500, Tim Peters wrote:
> He's probably right that the way to beat this generation of filters is to
> create spam statistically indistinguishable from ham.  The unknown not
> addressed there is that all forms of advertising are a percentage game, and
> current spam uses (e.g.) ALL CAPS and huge fonts and bright colors because
> those tricks increase response rate.  Spam so bland that it looks like it
> came from your grandmother may not draw a response rate large enough to
> repay the costs of spamming (which, while tiny on a per-msg basis, aren't
> zero).

I couldn't think of a reasonable way to predict the results of that,
because as I think I mentioned in another posting, there are two big
unknowns: The nature of the people responding to the spams (have you every
really thought about it? who the hell is keeping these things afloat? In
all seriousness, my current theory is that we're talking people of reduced
intelligence, but I don't *know*.), and how close the spam industry may be
to economic collapse, such that Bayes-type filters (which *are*
legitimately better then previous approaches) may be enough to tip them
over the edge. Without more data about those two things it's hard to
predict what will happen if spam tones down.

Somwhat back on the Python topic, once SpamBayes is done I intend to see
if I can implement what I talked about. It's just not worth picking up an
implementation in another language when it'd probably be a small handful
of hours' work in Python...