[Spambayes] How are individual values stored in the database?
Skip Montanaro
skip at pobox.com
Wed Dec 4 04:49:10 2002
I thought values associated with keys in the DBDict thing were stored as
little pickles. Scanning the code in dbdict.py suggests that's the case,
but I'm unable to unserialize items using either cPickle or marshal:
Python 2.3a0 (#6, Nov 13 2002, 19:57:35)
[GCC 3.1 20020420 (prerelease)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dbhash
>>> db = dbhash.open("hammie.db")
>>> db["pfxlen:5"]
'W(GA\xce\xf6\xbf-\xc0$\x89K\x00K\x07K\x00G?\x9e\xed\x19\xc5\x95y\xfdtq\x01.'
>>> import cPickle as pickle
>>> pickle.loads(db["pfxlen:5"])
Traceback (most recent call last):
File "<stdin>", line 1, in ?
cPickle.UnpicklingError: invalid load key, 'W'.
>>> import marshal
>>> marshal.loads(db["pfxlen:5"])
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ValueError: bad marshal data
I used to be able to do this (I can still do it with the hammie.db file I
generated in mid-November). The file in question was created by hammie.py
invocations like
BAYESCUSTOMIZE=pfx.ini python ./hammie.py -g ham.mbox -p ./hammie.db -d
where pfx.ini has these lines:
[Tokenizer]
address_headers: to cc
summarize_prefixes: True
(I'm trying to evaluate a new tokenizer change and want to examine raw
counts for the generated tokens.)
I realize WordInfo objects aren't being pickled any longer, but I thought
tuples were. What have I missed?
Thx,
Skip
More information about the Spambayes
mailing list