[Spambayes] Slice o' life
Wed Oct 16 05:33:01 2002
> This has been my first chance to play with mining the headers for real:
> mine_received_headers: True
> basic_header_tokenize: True
> use_chi_squared_combining: True
And now I note the first systematic weakness: I scored my own "spam"
folder, and discovered 5 spam with scores of 0.0. They all have one thing
in common: they're spam that SpamAssassin didn't catch, and came to me via
a python.org mailing list.
It turns out that python.org, Mailman, and SpamAssassin, put sooooooooo many
unique "Hey, I had my fingers this!" clues in the headers that virtually any
message coming thru python.org has a relatively huge collection of
killer-strong ham clues (just listing headers containing such clues):
Received: from mail.python.org (mail.python.org [188.8.131.52]) ...
Received: from localhost.localdomain ([127.0.0.1] helo=mail.python.org)
by mail.python.org with esmtp (Exim 4.05) ...
Received: from [184.108.40.206] (helo=wvwrbn) by mail.python.org ...
Subject: [Python-Help] Mp3sa hwnf
X-warning: 220.127.116.11 in blacklist at list.dsbl.org
X-Spam-Status: No, hits=3.8 required=5.0
X-Mailman-Version: 2.0.13 (101270)
List-Id: Expert volunteers answer Python-related questions
This was an HTML msg that appeared to be pushing a Turkish MP3 site. It's
not a dead-easy msg to score, but I also got a copy from another email
account, and it scored 0.64 there (instead of 0 via python.org). I guess I
go back to ignoring various header lines again ...