[Python-Dev] Re: [Python-checkins] python/nondist/sandbox/datetime picklesize.py,NONE,1.1

Michael Hudson mwh@python.net
04 Dec 2002 11:29:10 +0000


Tim Peters <tim.one@comcast.net> writes:

> [Michael Hudson]
> > Presumably there's a possibility of an optimization for pickling
> > homogeneous (i.e. all the same type) lists (in pickle.py, not here).
> >
> > Hard to say whether it would be worth it, though.
> 
> I don't really care about lists of date objects.  The intent was just to see
> how much of the administrative pickle bloat could be saved by pickle's
> internal memo facility when multiple date objects appear in a structure for
> *whatever* reason (be it a list or tuple of dates, or a dict keyed by dates,
> or an object with multiple data-value attributes, or ...).

OK.

> The administrative pickle bloat for a single date instance is severe:
> 
>     pickling 2000-12-13 via Python -- pickle length 80
>     pickling 2000-12-13 via C -- pickle length 43

Yow.

> The internal pickle memo saves a lot if more than one data instance appears,
> via "remembering" parts of the overhead scaffolding.  But in the end a data
> object has a 4-byte state string, and there's still a lot more overhead than
> state stored in the pickles.

Seems so.

> > ...
> > Here's a fairly simple minded patch to the pickling side of pickle.py:
> > it seems to save about 6 bytes per object in the good cases.
> >
> > with:
> > list of 100 dates via      C -- 1236 bytes, 12.36 bytes/obj
> >
> > without:
> > list of 100 dates via      C -- 1871 bytes, 18.71 bytes/obj
> 
> The "without" number is most curious.  When I run the checked-in code, I get
> 
>   list of 100 dates via      C -- 1533 bytes, 15.33 bytes/obj
> 
> This was w/ CVS Python, Win2K and MSVC6.  Ah!  You must have fiddled the
> script to use pickle instead of cPickle.

Yep.

> I get 1871 bytes then.  I'm surprised they're so different.

I know cPickle does some tricks if refcounts are 1 that aren't
duplicated in pickle.py.

> > I'm not going to pursue this further unless someone thinks it's a
> > worthwhile move.
> 
> I don't personally have large pickles of homogeneous lists, so hard to say.

Me neither.

> The C implementation of pickle would also need to be fiddled.

That was part of "pursuing this further" (as is, e.g., unpickling
support).

> Cutting the pickle overheads for instances of new-style classes
> would more clearly be worthwhile.

Hard to see how to do that.  You could reduce the overhead by giving
_datetime_unpickler a shorter name :-)

In general, though, any improvements we can add here are likely to be
blown away by ramming the pickle through zlib, so I'm not sure this is
worth worrying about too much.

Cheers,
M.

-- 
  And then the character-only displays went away (leading to
  increasingly silly graphical effects and finally to ads on
  web pages).                      -- John W. Baxter, comp.lang.python