Size of dictionary
Pete Goodeve
pete at jwgibbs.cchem.Berkeley.EDU
Thu May 29 15:43:48 EDT 2003
I'm seeing some rather surprising (to me) behaviour when I save a
dictionary to a file with cPickle. When I remove a fraction of the
existing entries in the dictionary, the size of the saved file can
*increase* by something like 50%!
(To be specific, I have a Bayesian spam filter that maintains a table
of words and probabilities, but when the file starts to get too large
I want to reduce its size by dumping lower probability words. Having
it become *larger* doesn't help!)
In fact size seems to be surprisingly unrelated to content. I first
noticed it when I tried to trim 500 words out of 5500; the original
file was 80K -- it became 120K after 'trimming'! Working with a test
file, of 3963 bytes, I added one word ('snork' to be precise...) and
the result was *3910* bytes. Removed it again, and the file dropped
in size *again* to 3895...
I realize this is due to the hashing used, but it still seems a bit
odd. My question really is: is there any way to minimize the size
of the file (I assume it corresponds to the dictionary itself)?
Ta,
-- Pete --
More information about the Python-list
mailing list