[spambayes-dev] Dibbler.py error in training

sean darcy seandarcy at hotmail.com
Tue Apr 6 14:22:33 EDT 2004




----Original Message Follows----
From: "Kenny Pitt" <kennypitt at hotmail.com>
To: "'sean darcy'" 
<seandarcy at hotmail.com>,<skip at pobox.com>
CC: <spambayes-dev at python.org>
Subject: RE: [spambayes-dev] Dibbler.py error in training
Date: Tue, 6 Apr 2004 09:13:39 -0400
...................................
>Oops, looks like I misread the original error message.  The fix I put in
>is probably a useful safeguard, but not the one that was causing the
>problem.
>
>In looking more closely, though, something seems a little odd here.  The
>offending object that is coming back None appears to be the msg[header]
>reference.  If I'm not mistaken, that means that either the Subject: or
>To: header is missing entirely from the message, which is very unusual.

It's not that unusual for the Subject header to be missing. Looking over 
past emails, I've found some "ham" posts that had no subject. In any event, 
some of the posts to be trained do have  no Subject - all spam.

Here's an example from "tokens" on the untrained message page:

Tokens for: (none) (15)

Word 	Probability 	Times in ham 	Times in spam
content-type:text/plain 	0.288326 	1576 	556
from:addr:qziwpklwit 	- 	0 	0
from:addr:musician.org 	0.844828 	0 	1
from:no real name:2**0 	0.186886 	825 	165
to:none 	0.878691 	2 	14
cc:none 	0.351951 	979 	463
sender:none 	0.410456 	978 	593
reply-to:none 	0.271479 	746 	242
x-mailer:none 	0.417812 	832 	520
message-id:@mta13.srv.hcvlny.cv.net 	0.844828 	0 	1
header:Date:1 	0.500287 	1742 	1519
header:Received:3 	0.77726 	215 	654
header:Message-id:1 	0.907877 	144 	1238
header:From:1 	0.500718 	1739 	1519
header:Return-path:1 	0.940104 	95 	1302

Here's the mesage source:

Return-path: <qziwpklwit at musician.org>
Received: from mta13.srv.hcvlny.cv.net (mta13.srv.hcvlny.cv.net 
[167.206.5.82])
	by mstr9.srv.hcvlny.cv.net
	(iPlanet Messaging Server 5.2 HotFix 1.16 (built May 14 2003))
	with ESMTP id <0HVC00G0PB4QME at mstr9.srv.hcvlny.cv.net>; Mon,
	29 Mar 2004 08:36:26 -0500 (EST)
Received: from f94006.upc-f.chello.nl (f94006.upc-f.chello.nl [80.56.94.6])
	by mta13.srv.hcvlny.cv.net
	(iPlanet Messaging Server 5.2 HotFix 1.16 (built May 14 2003))
	with SMTP id <0HVC00ISEAU5TL at mta13.srv.hcvlny.cv.net>; Mon,
	29 Mar 2004 08:34:03 -0500 (EST)
Received: from 123.224.24.65 by 80.56.94.6 with qdtrhun [1
Date: Mon, 29 Mar 2004 08:34:03 -0500 (EST)
Date-warning: Date header was inserted by mta13.srv.hcvlny.cv.net
From: qziwpklwit at musician.org
Message-id: <0HVC00IM1B0CTL at mta13.srv.hcvlny.cv.net>
Content-transfer-encoding: 7BIT
X-Spambayes-Classification: unsure
X-Spambayes-Spam-Probability: 0.84
X-Spambayes-Level: ********
X-Spambayes-MailId: 1080858684-6

>Could you, by chance, attach a copy of the message that is causing the
>error?

The untrained message page has about 60 messages. How do I know which one is 
the problem?

>A copy of it should appear as a file in one of the cache
>directories below the directory containing your training database, or
>you could just view the message source from Review Messages and
>copy-and-paste it.

You've lost me. Here's my spambayes data directory:

ls
bayescustomize.ini      _pop3proxy.log            pop3proxy-spam-cache
bayescustomize.ini~     pop3proxy.log-1           pop3proxy-unknown-cache
bayescustomize.ini.bak  pop3proxy.log-evolution   spambayes.messageinfo.db
hammie.db               pop3proxy.log-evolution~  start.info
pop3proxy-ham-cache     pop3proxy.log-mozilla


When I grep for the odd "From" name I get nothing:
grep -R qziwpklwit  *

I'm looking for spam in all the wrong places.

>--
>Kenny Pitt


sean

_________________________________________________________________
Tax headache? MSN Money provides relief with tax tips, tools, IRS forms and 
more! http://moneycentral.msn.com/tax/workshop/welcome.asp




More information about the spambayes-dev mailing list