[Python-Dev] Re: [Python-checkins] python/nondist/sandbox/datetime picklesize.py,NONE,1.1

Michael Hudson mwh@python.net
03 Dec 2002 18:27:59 +0000


Michael Hudson <mwh@python.net> writes:

> tim_one@users.sourceforge.net writes:
> 
> > New program just to display pickle sizes.  This makes clear that the
> > copy_reg based C implementation is much more space-efficient in the
> > end than the __getstate__/__setstate__ based Python implementation,
> > but that 4-byte date objects still suffer > 10 bytes of overhead each
> > no matter how many of them you pickle in one gulp.
> 
> Presumably there's a possibility of an optimization for pickling
> homogeneous (i.e. all the same type) lists (in pickle.py, not here).
> 
> Hard to say whether it would be worth it, though.

Here's a fairly simple minded patch to the pickling side of pickle.py:
it seems to save about 6 bytes per object in the good cases.

with:
list of 100 dates via      C -- 1236 bytes, 12.36 bytes/obj

without:
list of 100 dates via      C -- 1871 bytes, 18.71 bytes/obj

I'm not going to pursue this further unless someone thinks it's a
worthwhile move.

Cheers,
M.

Index: pickle.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/pickle.py,v
retrieving revision 1.72
diff -c -r1.72 pickle.py
*** pickle.py   13 Nov 2002 22:01:26 -0000      1.72
--- pickle.py   3 Dec 2002 18:24:37 -0000
***************
*** 109,114 ****
--- 109,115 ----
  INST            = 'i'
  LONG_BINGET     = 'j'
  LIST            = 'l'
+ HOM_LIST        = 'k'
  EMPTY_LIST      = ']'
  OBJ             = 'o'
  PUT             = 'p'
***************
*** 439,445 ****
--- 440,478 ----
      def save_empty_tuple(self, object):
          self.write(EMPTY_TUPLE)
  
+     def save_hom_list(self, object):
+         reduce = dispatch_table[type(object[0])]
+         
+         write = self.write
+         save  = self.save
+         memo  = self.memo
+ 
+         write(HOM_LIST)
+         
+         o = object[0]
+         
+         c, a = reduce(o)
+ 
+         l = [a]
+         
+         for o in object[1:]:
+             l.append(reduce(o)[1])
+ 
+         save(c)
+         save(l)
+ 
      def save_list(self, object):
+         t = {}
+ 
+         for o in object:
+             t[type(o)] = 1
+             if len(t) > 1:
+                 break
+         else:
+             if t and dispatch_table.has_key(t.iterkeys().next()):
+                 self.save_hom_list(object)
+                 return
+ 
          d = id(object)
  
          write = self.write

-- 
  Unfortunately, nigh the whole world is now duped into thinking that 
  silly fill-in forms on web pages is the way to do user interfaces.  
                                        -- Erik Naggum, comp.lang.lisp