[Python-Dev] mail.python.org black listed ?!

Skip Montanaro skip@pobox.com (Skip Montanaro)
Mon, 23 Jul 2001 17:11:22 -0500

    BAW> - The filter marks the message with a % confidence of being spam
    BAW>   (e.g. X-Spam: 75%)

    BAW> - Each Mailman recipient could specify the threshhold above which
    BAW>   they do not want to receive the message (e.g. don't sent me
    BAW>   anything that's spam with a more than 70% confidence level).
    BAW>   This only works for regular delivery.

On thing to consider is that many mail filters probably only have crude
numeric comparison capability.  In procmail I have to filter using regular
expressions.  (Most of) my mail comes through pobox.com who modifies the
subject header to stuff like

    Subject: [spam score 9.00/10.0 -pobox] remove me

While I'm sure I could create a regular expression that would allow me to
classify pobox.com's spam score numerically (or call out to a Python script
to do it for me), I'm lazy enough that I simply lump everything that has a
pobox.com spam subject (I think 5.0/10.0 is their minimum criterion for
subject mangling) that I just toss everything with spam.*-pobox in the
Subject into the spam-hole.  I assume other mail software systems' filtering
capabilities are similarly limited.

I would therefore suggest that the X-Spam header be simply a three-digit
number in the range 000 to 100.  (No percent sign, always with any necessary
leading zeroes.)  It might even be better to create an X-Spam-Value header
in one-bit arithmetic, e.g. make a slightly smaller range (say 0 to 50) and
include a header like:

    X-Spam-Value: sssssssssssssssssssssssssssssssssss

to indicate a 70% likelihood (35 "s"s).  You could then match it with

    X-Spam-Value: s{25,50}

in procmail to spam-categorize anything with a probability of spamhood >=
50%.  You could include a readable X-Spam header like:

    X-Spam: rated 75% probability of being spam by "Spam Pie v. 0.1"