[spambayes-bugs] [ spambayes-Bugs-782709 ] not match between actual score and what's shown in outlook

SourceForge.net noreply at sourceforge.net
Thu Aug 7 16:57:15 EDT 2003


Bugs item #782709, was opened at 2003-08-04 21:35
Message generated for change (Comment added) made by anadelonbrin
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=782709&group_id=61702

Category: Outlook
Group: None
Status: Open
Resolution: Invalid
Priority: 5
Submitted By: Fredrik Rodland (fmmr)
Assigned to: Mark Hammond (mhammond)
Summary: not match between actual score and what's shown in outlook

Initial Comment:
I noticed this for the first time today - running on the 
latest CVS.  Not sure how long this has been the case.

I've got folders wioth HAMs & SPAMs, and looked at the 
spam-score shown in outlooks Spam-field.  these does 
NOT match the actual score of a message - shown with 
the button 'shown spam clues for current message'.  this 
is the case also after scoring/filtering the messages.

filtering the messages (with 'score, but don't perform 
action') actually modifies the values of the Spam-field (I 
saw it during filtering), but these values are NOT equal 
to the actual values shown using the method above.

I use the spam-field as sort-criteria to track hams with 
high score and spams with low score, but the values are 
of no good anymore.

no traceback in the logs during filtering or viewing of 
manual spam-score.

----------------------------------------------------------------------

>Comment By: Tony Meyer (anadelonbrin)
Date: 2003-08-08 10:57

Message:
Logged In: YES 
user_id=552329

I have to go now, but I've narrowed it down a little.  When 
filtering, for some reason the email only has a few headers 
(even doing a Filter Now, not just when it arrives).  When 
showing clues, the email has them all.  (by email, I mean the 
result of msg.GetEmailPackageObject()).

For example (the first is from filtering, the second is the same 
message with show clues).  Bodies are identical and left out.

"""
>From nobody Fri Aug 08 10:51:09 2003
Return-Path: <mrs.victoria at breathe.com>
Delivered-To: ta-meyer at backend.pop.ihug.co.nz
Received: (qmail 11735 invoked from network); 26 Feb 2003 
02:54:26 -0000
Received: from grunt3.ihug.co.nz (203.109.254.43)
	by baldrick.ihug.co.nz with SMTP; 26 Feb
From: Mrs. Victoria Rinma.
"""

"""
>From nobody Fri Aug 08 10:53:24 2003
Return-Path: <mrs.victoria at breathe.com>
Delivered-To: ta-meyer at backend.pop.ihug.co.nz
Received: (qmail 11735 invoked from network); 26 Feb 2003 
02:54:26 -0000
Received: from grunt3.ihug.co.nz (203.109.254.43)
	by baldrick.ihug.co.nz with SMTP; 26 Feb 2003 
02:54:26 -0000
Received: from wibble.net [210.55.12.113] 
	by grunt3.ihug.co.nz with esmtp (Exim 3.35 #1 
(Debian))
	id 18nriD-0006El-00; Wed, 26 Feb 2003 15:54:26 
+1300
X-Wibble-Envelope-To: <tonym at madsods.gen.nz>
Received: from orson.icl.net (orson.vip.uk.com 
[194.176.218.10])
	by wibble.net (8.9.3/8.9.3/Debian 8.9.3-21) with 
ESMTP id PAA19831
	for <tonym at madsods.gen.nz>; Wed, 26 Feb 2003 
15:54:23 +1300
From: mrs.victoria at breathe.com
X-Authentication-Warning: wibble.net: Host orson.vip.uk.com 
[194.176.218.10]
	claimed to be orson.icl.net
Received: from localhost ([127.0.0.1] helo=orson.vip.uk.com)
	by orson.icl.net with esmtp (Exim 3.16 #1)
	id 18nrBv-0000O7-00; Wed, 26 Feb 2003 02:21:03 
+0000
Content-Disposition: inline
To: mrs.victoria at breathe.com
X-Originating-Ip: 216.139.169.9
MIME-Version: 1.0
Reply-To: mrs.victoria at caramail.com
Date: Wed, 26 Feb 2003 2:21:02 GMT
X-Mailer: EMUmail 4.5
Subject: ! message !
X-Webmail-User: 
mrs.victoria*breathe.com at pophost1.breathe.com
Message-Id: <E18nrBv-0000O7-00 at orson.icl.net>
X-MIME-Autoconverted: from 8bit to base64 by wibble.net id 
PAA19831
"""

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-08-08 10:10

Message:
Logged In: YES 
user_id=31435

I can confirm this:  I noticed today that the scores in my 
Spam columns looked wrong, and indeed the scores reported 
by "show spam clues" are correct, given the *H* and *S* 
scores reported there.  I did the full bit, asking to retrain the 
database from scratch, and to rescore all the messages 
trained on afterwards.  Don't know why yet, and (alas) no 
time to dig now.  Sometimes the spam score Outlook displays 
is larger than the correct value, sometimes smaller.

----------------------------------------------------------------------

Comment By: Fredrik Rodland (fmmr)
Date: 2003-08-07 21:04

Message:
Logged In: YES 
user_id=724871

I reverted to the latest cvs (updationg all files to HEAD), and 
installed python 2.3 (and win32 ext 1.5.7) and removed 
ptyhon 2.2 from my PC.

I then ran "filter now" on my HAM-folder.  after finishing, the 
message prompted is: "Found 1 spam, 11 unsure and 238 
good messages".  previosly (with the old version of 
spambayes, all mails were HAM in this folder).  The value of 
the spam-field in outlook actually changed, so something is 
definitly going on.

ok - looking at one of the messages:

I inserted the print-statement in the code as you instructed 
me to:

>>>>  Message  grattis med hus!  has score:  
0.92635444709 , perc:  92.635444709

however, looking at this messages with "show spam clues for 
current message" returns a Spam Score of 0.0139373 - i.e. 
1,4 %

the latter 1.4% is correct - i.e. the same as the old version of 
spambayes returned when filtering.

It seems to me as the filtering process is using another 
database than the "show spam clues".  
content of C:\Documents and 
Settings\Fredrik\Programdata\SpamBayes:
	default_bayes_customize.ini
	default_bayes_database.db
	default_message_database.db
	Microsoft Outlook Internet Settings.ini



SHOW SPAM CLUES FOR CURRENT MESSAGE
===================================

Spam Score: 0.0139373


word                                spamprob         #ham  #spam
'*H*'                               0.974866            -      -
'*S*'                               0.00274039          -      -
'to:addr:fredrik'                   0.0918367           2      0
'to:addr:rodland.no'                0.0990722          45      7
'from:addr:intecengineering.com'    0.155172            1      0
'from:addr:knut.dohlen'             0.155172            1      0
'from:name:knut dohlen'             0.155172            1      0
'message-id:@exchangedelft.intec-hou.com' 0.155172            
1      0
'subject:grattis'                   0.155172            1      0
'subject:hus'                       0.155172            1      0
'subject:med'                       0.155172            1      0
'subject:!'                         0.714406            8     30
'header:Received:1'                 0.739915            2      9

Message Stream:


Return-Path: <Knut.Dohlen at intecengineering.com>
Received: from exchangedelft.intec-hou.com 
([195.64.83.132])
	by ally.servicenett.no (8.12.8/8.12.8) with ESMTP id 
h4F9t44f024912
	for <fredrik at rodland.no>; Thu, 15 May 2003 
11:55:09 +0200
content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Subject: grattis med hus!
X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0
Date: Thu, 15 May 2003 11:55:00 +0200
Message-ID: 
<AE4AEBADBE3523498B45FB184E32921B101EDA at exchangedelf
t.intec-hou.com>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: grattis med hus!
Thread-Index: AcMayAzVrfaY8ftMT6+n6YK7bFSqgA==
From: "Knut Dohlen" <Knut.Dohlen at intecengineering.com>
To: <fredrik at rodland.no>
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by 
ally.servicenett.no id
	h4F9t44f024912
Status: 



Message Tokens:

26 unique tokens

'cc:none'
'content-type:text/plain'
'from:addr:intecengineering.com'
'from:addr:knut.dohlen'
'from:name:knut dohlen'
'header:Date:1'
'header:From:1'
'header:MIME-Version:1'
'header:Message-ID:1'
'header:Received:1'
'header:Return-Path:1'
'header:Subject:1'
'header:To:1'
'message-id:@exchangedelft.intec-hou.com'
'reply-to:none'
'sender:none'
'subject: '
'subject:!'
'subject:grattis'
'subject:hus'
'subject:med'
'to:2**0'
'to:addr:fredrik'
'to:addr:rodland.no'
'to:no real name:2**0'
'x-mailer:none'


RESULT OF dump_props.py FOR SAME MESSAGE:
=========================================
0x1035001e          : '<AE4AEBADBE3523498B45FB184E32921B1
01EDA at exchangedelft.intec-hou.com>'
0x1046001e          : 'Knut.Dohlen at intecengineering.com'
0x3fde0003          : 28591
0x80000003          : 2
0x8084001e          : 'FMR at RODLAND'
0x8085001e          : '00000008\x01frodland at aston.no'
PR_ACCESS           : 3
PR_ACCESS_LEVEL     : 1
PR_CLIENT_SUBMIT_TIME: <PyTime:15.05.2003 09:55:00>
PR_CONVERSATION_TOPIC_A: 'grattis med hus!'
PR_CREATION_TIME    : <PyTime:15.05.2003 10:01:04>
PR_DISPLAY_BCC_A    : ''
PR_DISPLAY_CC_A     : ''
PR_DISPLAY_TO_A     : 'fredrik at rodland.no'
PR_ENTRYID          : '\x00\x00\x00\x00\xee.-\x06\x1a!
\xebJ\x91\xb6X\x19>\xb1r\xac\xa4UU\x00'
PR_HASATTACH        : False
PR_IMPORTANCE       : 1
PR_LAST_MODIFICATION_TIME: <PyTime:07.08.2003 
08:46:25>
PR_MAPPING_SIGNATURE: '\xee.-\x06\x1a!\xebJ\x91
\xb6X\x19>\xb1r\xac'
PR_MESSAGE_ATTACHMENTS: 1
PR_MESSAGE_CLASS_A  : 'IPM.Note'
PR_MESSAGE_DELIVERY_TIME: <PyTime:15.05.2003 09:55:09>
PR_MESSAGE_FLAGS    : 1
PR_MESSAGE_RECIPIENTS: 1
PR_MESSAGE_SIZE     : 2405
PR_NORMALIZED_SUBJECT_A: 'grattis med hus!'
PR_OBJECT_TYPE      : 5
PR_PARENT_ENTRYID   : '\x00\x00\x00\x00\xee.-\x06\x1a!
\xebJ\x91\xb6X\x19>\xb1r\xac\x02\x8d\x00\x00'
PR_PRIORITY         : 0
PR_RCVD_REPRESENTING_ADDRTYPE_A: 'SMTP'
PR_RCVD_REPRESENTING_EMAIL_ADDRESS_A: 'frodland at aston
.no'
PR_RCVD_REPRESENTING_ENTRYID: '\x00\x00\x00\x00
\x81+\x1f\xa4\xbe\xa3\x10\x19\x9dn\x00\xdd\x01\x0fT\x02
\x00\x00\x00\x00Fredrik 
Rodland\x00SMTP\x00frodland at aston.no\x00'
PR_RCVD_REPRESENTING_NAME_A: 'Fredrik Rodland'
PR_RCVD_REPRESENTING_SEARCH_KEY: 'SMTP:FRODLAND at AS
TON.NO\x00'
PR_RECEIVED_BY_ADDRTYPE_A: 'SMTP'
PR_RECEIVED_BY_EMAIL_ADDRESS_A: 'frodland at aston.no'
PR_RECEIVED_BY_ENTRYID: '\x00\x00\x00\x00\x81+\x1f\xa4
\xbe\xa3\x10\x19\x9dn\x00\xdd\x01\x0fT\x02\x00\x00\x00
\x00Fredrik Rodland\x00SMTP\x00frodland at aston.no\x00'
PR_RECEIVED_BY_NAME_A: 'Fredrik Rodland'
PR_RECEIVED_BY_SEARCH_KEY: 'SMTP:FRODLAND at ASTON.NO
\x00'
PR_RECORD_KEY       : '\xa4UU\x00'
PR_SEARCH_KEY       : '@WT\x17\x13\xd1\xeeK\x85\x06
\x8e^\xa9\xe0\x16d'
PR_SENDER_ADDRTYPE_A: 'SMTP'
PR_SENDER_EMAIL_ADDRESS_A: 'Knut.Dohlen at intecengineerin
g.com'
PR_SENDER_ENTRYID   : '\x00\x00\x00\x00\x81+\x1f\xa4
\xbe\xa3\x10\x19\x9dn\x00\xdd\x01\x0fT\x02\x00\x00\x00
\x00Knut 
Dohlen\x00SMTP\x00Knut.Dohlen at intecengineering.com\x00'
PR_SENDER_NAME_A    : 'Knut Dohlen'
PR_SENDER_SEARCH_KEY: 'SMTP:KNUT.DOHLEN at INTECENGINE
ERING.COM\x00'
PR_SENSITIVITY      : 0
PR_SENT_REPRESENTING_ADDRTYPE_A: 'SMTP'
PR_SENT_REPRESENTING_EMAIL_ADDRESS_A: 'Knut.Dohlen at in
tecengineering.com'
PR_SENT_REPRESENTING_ENTRYID: '\x00\x00\x00\x00
\x81+\x1f\xa4\xbe\xa3\x10\x19\x9dn\x00\xdd\x01\x0fT\x02
\x00\x00\x00\x00Knut 
Dohlen\x00SMTP\x00Knut.Dohlen at intecengineering.com\x00'
PR_SENT_REPRESENTING_NAME_A: 'Knut Dohlen'
PR_SENT_REPRESENTING_SEARCH_KEY: 'SMTP:KNUT.DOHLEN@
INTECENGINEERING.COM\x00'
PR_STORE_ENTRYID    : '\x00\x00\x00\x008\xa1\xbb\x10\x05
\xe5\x10\x1a\xa1\xbb\x08\x00+*V\xc2\x00
\x00PSTPRX.DLL\x00\x00\x00\x00\x00\x00\x00\x00NITA\xf9
\xbf\xb8\x01\x00\xaa\x007\xd9n\x00\x00\x00C:\Documents 
and Settings\Fredrik\Lokale 
innstillinger\Programdata\Microsoft\Outlook\outlook.pst\x00
'
PR_STORE_RECORD_KEY : '\xee.-\x06\x1a!\xebJ\x91
\xb6X\x19>\xb1r\xac'
PR_STORE_SUPPORT_MASK: 79869
PR_SUBJECT_A        : 'grattis med hus!'
PR_SUBJECT_PREFIX_A : ''
PR_TNEF_CORRELATION_KEY: '\x00'
PR_TRANSPORT_MESSAGE_HEADERS_A: 'Return-Path: 
<Knut.Dohlen at intecengineering.com>\r\nReceived: from 
exchangedelft.intec-hou.com ([195.64.83.132])\r\n\tby 
ally.servicenett.no (8.12.8/8.12.8) with ESMTP id 
h4F9t44f024912\r\n\tfor <fredrik at rodland.no>; Thu, 15 May 
2003 11:55:09 +0200\r\ncontent-class: urn:content-
classes:message\r\nMIME-Version: 1.0\r\nContent-Type: 
text/plain;\r\n\tcharset="iso-8859-1"\r\nSubject: grattis med 
hus!\r\nX-MimeOLE: Produced By Microsoft Exchange 
V6.0.6249.0\r\nDate: Thu, 15 May 2003 11:55:00 +0200
\r\nMessage-ID: 
<AE4AEBADBE3523498B45FB184E32921B101EDA at exchangedelf
t.intec-hou.com>\r\nX-MS-Has-Attach: \r\nX-MS-TNEF-
Correlator: \r\nThread-Topic: grattis med hus!\r\nThread-
Index: AcMayAzVrfaY8ftMT6+n6YK7bFSqgA==\r\nFrom: "Knut 
Dohlen" <Knut.Dohlen at intecengineering.com>\r\nTo: 
<fredrik at rodland.no>\r\nContent-Transfer-Encoding: 
8bit\r\nX-MIME-Autoconverted: from quoted-printable to 8bit 
by ally.servicenett.no id h4F9t44f024912\r\nStatus:   \r\n\r\n'
Spam                : 0.92635444708974979
SpamBayesOriginalFolderID: '\x00\x00\x00\x00\xee.-\x06\x1a!
\xebJ\x91\xb6X\x19>\xb1r\xac\x82\x80\x00\x00'
SpamBayesOriginalFolderStoreID: '\x00\x00\x00\x008\xa1
\xbb\x10\x05\xe5\x10\x1a\xa1\xbb\x08\x00+*V\xc2\x00
\x00PSTPRX.DLL\x00\x00\x00\x00\x00\x00\x00\x00NITA\xf9
\xbf\xb8\x01\x00\xaa\x007\xd9n\x00\x00\x00C:\Documents 
and Settings\Fredrik\Lokale 
innstillinger\Programdata\Microsoft\Outlook\outlook.pst\x00
'
X-MIME-Autoconverted: 'from quoted-printable to 8bit by 
ally.servicenett.no id h4F9t44f024912'
X-MS-Has-Attach     : ''
X-MS-TNEF-Correlator: ''
X-MimeOLE           : 'Produced By Microsoft Exchange 
V6.0.6249.0'



----------------------------------------------------------------------

Comment By: Mark Hammond (mhammond)
Date: 2003-08-06 18:09

Message:
Logged In: YES 
user_id=14198

Seeing as you are running from source, can you add a couple
of print statements to help diagnose?

At the top of filter.py you will find:

def filter_message(msg, mgr, all_actions=True):
    config = mgr.config.filter
    prob = mgr.score(msg)

Add, say, 
    print "Message", msg.subject, "has score", prob

And see if that helps track it down.

Another possibility would be to run "sandbox/dumprops.py"
over one of the messages, and make sure we are actually
saving the value in the correct field (as, eg, the ini file
allows you to change the field name we use)

----------------------------------------------------------------------

Comment By: Fredrik Rodland (fmmr)
Date: 2003-08-05 18:58

Message:
Logged In: YES 
user_id=724871

Backing up my last comment, i just re-iunstalled a CVS-
version lying around (from about 030701) addin beta 1, 
version 0.3 (july 2003), which proves my point.  after a train 
(and selecting "score messages after training" the Spam-field 
in outlook is updated to the correct value.

----------------------------------------------------------------------

Comment By: Fredrik Rodland (fmmr)
Date: 2003-08-05 18:51

Message:
Logged In: YES 
user_id=724871

Tony's comment is just not right.  as I describe in my original 
post, the spam-field actually changes after a retrain now, but 
the value is just not the correct one.

this used to work just like I commented - after a retrain the 
spam-field shows the LAST (not first) score.

Reopened bug.


----------------------------------------------------------------------

Comment By: Tony Meyer (anadelonbrin)
Date: 2003-08-05 10:45

Message:
Logged In: YES 
user_id=552329

The value in Outlook's spam field is the score of the message 
at the time it was first processed.  The score in the "show 
spam clues" is what the message would score if it arrived at 
that point in time.  If you've done any training at all between 
the two events, the scores will be different.

Mark did consider including a "This message scored X when 
first processed.  The following information relates to scoring 
the message with current training" type message in the "show 
clues" info, but has been busy with other things (or maybe 
decided it wasn't worth it).  If this would be helpful, then feel 
free to open a feature request for it.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=782709&group_id=61702



More information about the Spambayes-bugs mailing list