[Spambayes] How are individual values stored in the database?

Skip Montanaro skip at pobox.com
Wed Dec 4 04:49:10 2002


I thought values associated with keys in the DBDict thing were stored as
little pickles.  Scanning the code in dbdict.py suggests that's the case,
but I'm unable to unserialize items using either cPickle or marshal:

    Python 2.3a0 (#6, Nov 13 2002, 19:57:35) 
    [GCC 3.1 20020420 (prerelease)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import dbhash
    >>> db = dbhash.open("hammie.db")
    >>> db["pfxlen:5"]
    'W(GA\xce\xf6\xbf-\xc0$\x89K\x00K\x07K\x00G?\x9e\xed\x19\xc5\x95y\xfdtq\x01.'
    >>> import cPickle as pickle
    >>> pickle.loads(db["pfxlen:5"])
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    cPickle.UnpicklingError: invalid load key, 'W'.
    >>> import marshal
    >>> marshal.loads(db["pfxlen:5"])
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    ValueError: bad marshal data

I used to be able to do this (I can still do it with the hammie.db file I
generated in mid-November).  The file in question was created by hammie.py
invocations like

    BAYESCUSTOMIZE=pfx.ini python ./hammie.py -g ham.mbox -p ./hammie.db -d

where pfx.ini has these lines:

    [Tokenizer]
    address_headers: to cc
    summarize_prefixes: True

(I'm trying to evaluate a new tokenizer change and want to examine raw
counts for the generated tokens.)

I realize WordInfo objects aren't being pickled any longer, but I thought
tuples were.  What have I missed?

Thx,

Skip



More information about the Spambayes mailing list