[spambayes-dev] Cool Outlook mystery
Tim Peters
tim.one at comcast.net
Thu Aug 7 20:20:44 EDT 2003
Our bug 782709 is pretty interesting! Tony just added a good clue to it.
I'll partly confirm it here, and add another bit of evidence.
After retraining and rescoring from scratch, there's a particular msg in my
Ham folder showing a spam score of 3% in my Spam column. "show spam clues"
rates it much higher:
Spam Score: 0.180576
word spamprob #ham #spam
'*H*' 0.722595 - -
'*S*' 0.083747 - -
Some of the token scores are amazing:
'to:no real name:2**0' 0.342745 7 7
'header:To:1' 0.398161 7 9
'to:2**0' 0.398161 7 9
'header:Date:1' 0.64742 1 4
'header:Message-Id:1' 0.764668 0 1
'subject:.' 0.764668 0 1
'subject: ' 0.846122 0 2
'header:From:1' 0.871695 1 16
Notice I said this was a ham message, and I trained on it as ham. Therefore
it shouldn't be possible that I see *any* token (let alone 3) in this
message with a ham-count of 0. I've certainly got, e.g., way more than
1+4=5 training messages with a Date header too, and way more than 16 with a
"To" header, etc.
In my professional opinion, something is royally hosed <wink>. My
observations so far match Tony's that it's confined to tokens in headers, so
it's probably not a database bug.
More information about the spambayes-dev
mailing list