Re: [Python-Dev] Re: [Python-checkins] python/nondist/sandbox/datetime picklesize.py,NONE,1.1
Tim Peters <tim.one@comcast.net> writes:
[Michael Hudson]
Presumably there's a possibility of an optimization for pickling homogeneous (i.e. all the same type) lists (in pickle.py, not here).
Hard to say whether it would be worth it, though.
I don't really care about lists of date objects. The intent was just to see how much of the administrative pickle bloat could be saved by pickle's internal memo facility when multiple date objects appear in a structure for *whatever* reason (be it a list or tuple of dates, or a dict keyed by dates, or an object with multiple data-value attributes, or ...).
OK.
The administrative pickle bloat for a single date instance is severe:
pickling 2000-12-13 via Python -- pickle length 80 pickling 2000-12-13 via C -- pickle length 43
Yow.
The internal pickle memo saves a lot if more than one data instance appears, via "remembering" parts of the overhead scaffolding. But in the end a data object has a 4-byte state string, and there's still a lot more overhead than state stored in the pickles.
Seems so.
... Here's a fairly simple minded patch to the pickling side of pickle.py: it seems to save about 6 bytes per object in the good cases.
with: list of 100 dates via C -- 1236 bytes, 12.36 bytes/obj
without: list of 100 dates via C -- 1871 bytes, 18.71 bytes/obj
The "without" number is most curious. When I run the checked-in code, I get
list of 100 dates via C -- 1533 bytes, 15.33 bytes/obj
This was w/ CVS Python, Win2K and MSVC6. Ah! You must have fiddled the script to use pickle instead of cPickle.
Yep.
I get 1871 bytes then. I'm surprised they're so different.
I know cPickle does some tricks if refcounts are 1 that aren't duplicated in pickle.py.
I'm not going to pursue this further unless someone thinks it's a worthwhile move.
I don't personally have large pickles of homogeneous lists, so hard to say.
Me neither.
The C implementation of pickle would also need to be fiddled.
That was part of "pursuing this further" (as is, e.g., unpickling support).
Cutting the pickle overheads for instances of new-style classes would more clearly be worthwhile.
Hard to see how to do that. You could reduce the overhead by giving _datetime_unpickler a shorter name :-) In general, though, any improvements we can add here are likely to be blown away by ramming the pickle through zlib, so I'm not sure this is worth worrying about too much. Cheers, M. -- And then the character-only displays went away (leading to increasingly silly graphical effects and finally to ads on web pages). -- John W. Baxter, comp.lang.python
participants (1)
-
Michael Hudson