[Python-3000] Simplifying pickle for Py3k

Alexandre Vassalotti alexandre at peadrop.com
Sat Oct 6 06:35:39 CEST 2007


On 10/5/07, Neil Schemenauer <nas at arctrix.com> wrote:
> On Thu, Oct 04, 2007 at 02:49:16AM -0400, Alexandre Vassalotti wrote:
> > Could you elaborate on what you are trying to do?
>
> I'm trying to efficiently pickle a 'unicode' subclass.  I'm
> disappointed that it's not possible to be as efficient as the
> built-in unicode class, even when using an extension code.

There is a few things you could do to produce smaller pickle streams.
If you are certain that the objects you will pickle are not
self-referential, then you can set Pickler.fast to True. This will
disable the "memorizer", which adds a 2-bytes overhead to each objects
pickled (depending on the input, this might or not shorten the
resulting stream). If this isn't enough, then you could subclass
Pickler and Unpickler and define a custom rule for your unicode
subclass.

An obvious optimization for pickle, in Py3k, would to add support for
short unicode string. Currently, there is a 4-bytes overhead per
string. Since Py3k is unicode throughout, this overhead can become
quite large.

> > Could point out specific examples of the "old code" that you are referring to?
>
> I don't have time right now to point at specific code.  How about
> the code that implements all the different versions of __reduce__
> and code for __getinitargs__, __getstate__, __setstate__?

At first glance, __reduce__ seems to be useful only for instances of
subclasses of built-in type. However, __getnewsargs__ could easily
replace it for that. So, removing __reduce__ (and __reduce_ex__) is
probably a good idea.

As far as I know, the current pickle module doesn't use
__getinitargs__ (this is one of the things the documentation is
totally wrong about).

As for __getstate__ and __setstate__, I think they are essential.
Without them, you couldn't pickle objects with __slots__ or save the
I/O state of certain objects.

It would certainly be possible to simplify a little the algorithm used
for pickling class instances. In "pseudo-code", it would look like
something along these lines:

    def save_obj(obj):
        # let obj be the instance of a user-defined class
        cls = obj.__class__
        if hasattr(obj, "__getnewargs__"):
            args = obj.__getnewargs__()
        else:
            args = ()
        if hasattr(obj, "__getstate__"):
            state = obj.__getstate__()
        else:
            state = obj.__dict__
        return (cls, args, state)

    def load_obj(cls, args, state):
        obj = cls.__new__(cls, *args)
        if hasattr(obj, "__getstate__"):
            try:
                obj.__setstate__(state)
            except AttributeError:
                raise UnpicklingError
        else:
            obj.__dict__.update(state)
        return obj

The main difference, between this and current method used to pickle
instances, is the use of __getnewargs__, instead of __reduce__.

-- Alexandre


More information about the Python-3000 mailing list