KNOWN_USERS + SpamAssassin discarding lots of mail

I have in my mm_cfg.py :
KNOWN_SPAMMERS = [('X-Spam-Status', 'Yes')]
But for some reason this is discarding lots of mail. Is there some way I can turn up the debugging so I can see why these messages are being discarded? Where do these discarded messages go?
Thanks
Joel Heenan

Joel Heenan wrote:
The messages are being discarded because they have an
X-Spam-Status: Yes
(case insensitive) header. The Message-IDs are logged in the vette log with the message 'Message discarded, msgid: %s' where %s is replaced by the message-id or 'n/a' if there isn't one. There is no debugging knob other than modifying the code. The discarded messages evaporate without a trace.
If you really think this is resulting in messages being discarded that shouldn't be, I suggest you remove the KNOWN_SPAMMERS entry from mm_cfg.py and instead, put
^X-Spam-Status:\s*Yes
in header_filter_rules for some of the problem lists with action Hold. That way you'll get to see the messages.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 2/28/06, Mark Sapiro <msapiro@value.net> wrote:
I just got done configuring something very similar, but I was having problems. I was originally matching against '^X-Spam-Status: Yes', which never matched. Removing the '^' did the trick. I'm not sure why this is.
--
- Patrick Bogen

Patrick Bogen wrote:
I'll do some research if I get time today but I'm fairly sure something is borked with this spam filtering. Looking through the code I can see that if a header is not found its not supposed to return a spam match. The other thing I've noticed is that owner's don't seem to get spam scanned. Still, our behaviour was that
KNOWN_SPAMMERS = [('X-Spam-Status','Yes')]
caused all emails from non-owners to be silently discarded.
At this stage I'm following the advice by not using KNOWN_SPAMMERS and concentrating on per list measures. Thanks for your help.
Joel Heenan Sensory Networks Ph: +61 2 8322 2744 Fax: +61 2 9475 0316

Mark Saprio wrote:
The messages are being discarded because they have an
X-Spam-Status: Yes
Mark, this isn't strictly correct, I think. cre.search() is going to look for any place in the string where the regex matches, so they're *actually* being discarded because they have a header: X-Spam-Status: .*Yes.*
(Assuming the leading space is stripped when the header value is stored in a message- this seems like reasonable behaviour to me, but I'm not sure what the protocol says about spaces there.)
Now, here's the problem with this.
X-Spam-Status for a non-spam message may look like:
X-Spam-Status: No, score=-5.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 ...
(and keeps going for a while.)
As should be pretty obvious, 'Yes' case-insensitively is found in 'BAYES'. This won't occur with the header_filter_rules, because they match the header as a line, rather than treating the value separately. So, as an alternative, it should be possible to use a KNOWN_SPAMMER of ('X-Spam-Status', '^Yes').
On 3/1/06, Joel Heenan <joel.heenan@sensorynetworks.com> wrote:
- Patrick Bogen

Patrick Bogen wrote:
Or, you should use: ('x-spam-flag', 'yes'),
Cheers,
Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/

Patrick Bogen wrote:
You are correct. I was thinking of Spamassassin's X-Spam-Flag: header which is either "X-Spam-Flag: YES" or absent, and which is much safer to test than X-Spam-Status: for exactly these reasons.
SpamDetect, when checking KNOWN_SPAMMERS uses the get_all() message method to get the contents of all the headers of a type. In this case, a list of the contents of all the X-Spam-Status: headers in the message. Leading spaces between "X-Spam-Status:" and the first non-blank of the rest of the header are stripped, but trailing spaces if any are not.
Correct, but I'd still use X-Spam-Flag:.
I think that's correct and may well explain why they were all deleted.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Joel Heenan wrote:
The messages are being discarded because they have an
X-Spam-Status: Yes
(case insensitive) header. The Message-IDs are logged in the vette log with the message 'Message discarded, msgid: %s' where %s is replaced by the message-id or 'n/a' if there isn't one. There is no debugging knob other than modifying the code. The discarded messages evaporate without a trace.
If you really think this is resulting in messages being discarded that shouldn't be, I suggest you remove the KNOWN_SPAMMERS entry from mm_cfg.py and instead, put
^X-Spam-Status:\s*Yes
in header_filter_rules for some of the problem lists with action Hold. That way you'll get to see the messages.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 2/28/06, Mark Sapiro <msapiro@value.net> wrote:
I just got done configuring something very similar, but I was having problems. I was originally matching against '^X-Spam-Status: Yes', which never matched. Removing the '^' did the trick. I'm not sure why this is.
--
- Patrick Bogen

Patrick Bogen wrote:
I'll do some research if I get time today but I'm fairly sure something is borked with this spam filtering. Looking through the code I can see that if a header is not found its not supposed to return a spam match. The other thing I've noticed is that owner's don't seem to get spam scanned. Still, our behaviour was that
KNOWN_SPAMMERS = [('X-Spam-Status','Yes')]
caused all emails from non-owners to be silently discarded.
At this stage I'm following the advice by not using KNOWN_SPAMMERS and concentrating on per list measures. Thanks for your help.
Joel Heenan Sensory Networks Ph: +61 2 8322 2744 Fax: +61 2 9475 0316

Mark Saprio wrote:
The messages are being discarded because they have an
X-Spam-Status: Yes
Mark, this isn't strictly correct, I think. cre.search() is going to look for any place in the string where the regex matches, so they're *actually* being discarded because they have a header: X-Spam-Status: .*Yes.*
(Assuming the leading space is stripped when the header value is stored in a message- this seems like reasonable behaviour to me, but I'm not sure what the protocol says about spaces there.)
Now, here's the problem with this.
X-Spam-Status for a non-spam message may look like:
X-Spam-Status: No, score=-5.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 ...
(and keeps going for a while.)
As should be pretty obvious, 'Yes' case-insensitively is found in 'BAYES'. This won't occur with the header_filter_rules, because they match the header as a line, rather than treating the value separately. So, as an alternative, it should be possible to use a KNOWN_SPAMMER of ('X-Spam-Status', '^Yes').
On 3/1/06, Joel Heenan <joel.heenan@sensorynetworks.com> wrote:
- Patrick Bogen

Patrick Bogen wrote:
Or, you should use: ('x-spam-flag', 'yes'),
Cheers,
Tokio Kikuchi, tkikuchi@ is.kochi-u.ac.jp http://weather.is.kochi-u.ac.jp/

Patrick Bogen wrote:
You are correct. I was thinking of Spamassassin's X-Spam-Flag: header which is either "X-Spam-Flag: YES" or absent, and which is much safer to test than X-Spam-Status: for exactly these reasons.
SpamDetect, when checking KNOWN_SPAMMERS uses the get_all() message method to get the contents of all the headers of a type. In this case, a list of the contents of all the X-Spam-Status: headers in the message. Leading spaces between "X-Spam-Status:" and the first non-blank of the rest of the header are stripped, but trailing spaces if any are not.
Correct, but I'd still use X-Spam-Flag:.
I think that's correct and may well explain why they were all deleted.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (4)
-
Joel Heenan
-
Mark Sapiro
-
Patrick Bogen
-
Tokio Kikuchi