[spambayes-dev] Re: Pickle vs DB inconsistencies

Greg Ward gward at python.net
Wed Jun 25 23:19:21 EDT 2003


On 25 June 2003, Meyer, Tony said:
>   * Unless dbexpimp.py is broken, the 10.pkl and 10.db you supplied were
> not identical.  There are two tokens that have different counts:
> 'header:MIME-Version:1' and header:Mime-Version:1 (2,5 vs 4,3 and 4,3 vs
> 2,3 respectively).  I'm not sure what this means!

Oops -- I must have screwed up.  I just retrained, and diff'ing the
dbExpImp output shows that they're the same.

>   * I couldn't use the same test messages as you because the filenames
> weren't valid for a win32 system and I couldn't unpack them.  I grabbed
> a random message of my own to use as a test and changed the simplescore
> script, adding a initial learn (since I can't unlearn one that's not in
> the db).

Arggh, bloody Maildir.  Almost but not *quite* the perfect mail folder
format.  Well, you seem to have figured it out anyways.

>   * Is the message that you give to simplescore one of the ones that was
> trained?  It should be, because you can't untrain a message that hasn't
> been trained (you might get negative counts).

Yes, I made that mistake a couple of times, and now I'm super-careful
that the message being unlearned is indeed in the training corpus.

        Greg
-- 
Greg Ward <gward at python.net>                         http://www.gerg.ca/
"Very funny, Scotty.  Now beam my *clothes* down."



More information about the spambayes-dev mailing list