[spambayes-dev] Re: Pickle vs DB inconsistencies
gward at python.net
Wed Jun 25 23:19:21 EDT 2003
On 25 June 2003, Meyer, Tony said:
> * Unless dbexpimp.py is broken, the 10.pkl and 10.db you supplied were
> not identical. There are two tokens that have different counts:
> 'header:MIME-Version:1' and header:Mime-Version:1 (2,5 vs 4,3 and 4,3 vs
> 2,3 respectively). I'm not sure what this means!
Oops -- I must have screwed up. I just retrained, and diff'ing the
dbExpImp output shows that they're the same.
> * I couldn't use the same test messages as you because the filenames
> weren't valid for a win32 system and I couldn't unpack them. I grabbed
> a random message of my own to use as a test and changed the simplescore
> script, adding a initial learn (since I can't unlearn one that's not in
> the db).
Arggh, bloody Maildir. Almost but not *quite* the perfect mail folder
format. Well, you seem to have figured it out anyways.
> * Is the message that you give to simplescore one of the ones that was
> trained? It should be, because you can't untrain a message that hasn't
> been trained (you might get negative counts).
Yes, I made that mistake a couple of times, and now I'm super-careful
that the message being unlearned is indeed in the training corpus.
Greg Ward <gward at python.net> http://www.gerg.ca/
"Very funny, Scotty. Now beam my *clothes* down."
More information about the spambayes-dev