performance of pickling & large lists :-(
Jeff Epler
jepler at unpythonic.net
Tue Aug 6 14:41:16 EDT 2002
Python's "pickle" will use approximately one dict entry per object, of
the form
{ id(obj) : obj }
this is used to make sure that a structure like 'y' in
y = [[]]*2
is pickled correctly, and that when reloaded, 'y[0] is y[1]'.
This accounts for most, if not all, of pickle's memory usage.
If you are serializing a data-structure which has a tree structure,
rather than a fully general graph structure, you could use something
like pickle but without the "memo" data member. Something like the
following might work:
class emptydict:
def __setitem__(self, k, v): pass
def __getitem__(self, k): raise KeyError, k
def has_key(self, k): return 0
def get(self, k, d=None): return d
def __len__(self): return 0
class AmnesiacPickler(pickle.Pickler):
def __init__(self, file, bin=0):
pickle.Pickler.__init__(self, file, bin)
self.memo = emptydict()
I tested it minimally (on python1.5), and it seemed to work. Items like
the following unpickled properly:
"hi there"
[1, 2, 3], [4, 5, 6]
and the following unpickled as something with the same value, but with
identity wrong:
[[]] * 2
This object 'r' caused fits:
r = []
r.append(r)
it died when it ran out of stack space, because the pickler didn't
recognize that it was recursively pickling the same thing.
Jeff
More information about the Python-list
mailing list