[spambayes-dev] Dibbler.py error in training
Kenny Pitt
kennypitt at hotmail.com
Tue Apr 6 15:55:46 EDT 2004
sean darcy wrote:
>> In looking more closely, though, something seems a little odd here.
>> The offending object that is coming back None appears to be the
>> msg[header] reference. If I'm not mistaken, that means that either
>> the Subject: or To: header is missing entirely from the message,
>> which is very unusual.
>
> It's not that unusual for the Subject header to be missing. Looking
> over past emails, I've found some "ham" posts that had no subject. In
> any event, some of the posts to be trained do have no Subject - all
> spam.
Well, it's certainly not unusual for the Subject: header to be empty but
I didn't realize that it was legal to leave out the header entirely.
Guess I'll have to go back and re-read the spec! <wink>
Anyway, I checked in a new fix (Corpus.py 1.19) to guard against missing
headers, so give that a try when it comes through and let us know the
results.
>> Could you, by chance, attach a copy of the message that is causing
>> the error?
>
> The untrained message page has about 60 messages. How do I know which
> one is the problem?
Click the "Defer" heading to make sure that is the default for all
messages, then select a classification for only one message at a time to
see which one dies. You can then go back to Review Messages and click
the subject of that message to display the message source.
>> A copy of it should appear as a file in one of the cache
>> directories below the directory containing your training database, or
>> you could just view the message source from Review Messages and
>> copy-and-paste it.
>
> You've lost me. Here's my spambayes data directory:
>
> ls
> bayescustomize.ini _pop3proxy.log pop3proxy-spam-cache
> bayescustomize.ini~ pop3proxy.log-1
> pop3proxy-unknown-cache
> bayescustomize.ini.bak pop3proxy.log-evolution
> spambayes.messageinfo.db
> hammie.db pop3proxy.log-evolution~ start.info
> pop3proxy-ham-cache pop3proxy.log-mozilla
The pop3proxy-unknown-cache subdirectory contains copies of e-mails that
haven't been trained yet, up to the expiration age which I believe
defaults to 7 days. No worries, though. The message source you
included in the message was what I was interested in.
--
Kenny Pitt
More information about the spambayes-dev
mailing list