[Spambayes] Important: new feature!
h.beijeman at phys.uu.nl
Mon Jul 5 01:01:16 CEST 2004
I understand what you're saying, and I've re-trained spambayes on equal
numbers of HAM and SPAM, 300 each. Yet, it still has a tendency to
classify some HAM as spam. A significant percentage is like maybe 25% of
my regular announcement mail gets spamflagged...
Perhaps I have a better solution: I have a lot of mailtraffic from the
local university network and a lot of correspondence, announcements etc
arrive from addresses with the "phys.uu.nl" suffix. Surely it must be
possible to exclude messages whose returnaddress contain that string (or
other certified addresses) from further processing by spambayes?
Thank you very much for your time!!
From: Tony Meyer [mailto:tameyer at ihug.co.nz]
Sent: donderdag 1 juli 2004 1:30
To: 'beijeman'; spambayes at python.org
Subject: RE: [Spambayes] Important: new feature!
> I checked the "Show Spam Clues" on several of these
> cases and it appears that only some specific tokens
> are responsible. It would be very convenient to have
> direct access to the token-database and delete these
> troublesome cases.
This is not nearly as simple as it sounds:
1. There's no guarantee that this would work. Maybe eliminating those
tokens would simply make room for others that would have the same
2. Tokens don't stand alone - they arrive and leave in 'bags' (one
worth). If you later untrained on an email that contained one of these
tokens, things would break.
3. Ongoing training would mean that these tokens simply reappeared
anyway, so you'd have to be continually editing the database.
Can you perhaps identify the messages that these problematic tokens are
coming from? Maybe things would be better if you simply didn't train on
those messages? What is a "significant percentage", anyway?
Maybe you've mistrained a message at some point? That would also
the problem. You could try retraining from scratch.
> In addition, it would be nice to have some exclude list.
Please see FAQ 6.6:
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
way, you get everyone's help, and avoid a lack of replies when I'm busy.
More information about the Spambayes