Re: [Python-ideas] new pickle semantics/API

Jan. 25, 2007

      "tomer filiba" <tomerfiliba@gmail.com> wrote:
...
On 1/25/07, Josiah Carlson <jcarlson@uci.edu> wrote:
...
Overall, I like the idea; I'm a big fan of simplifying object
persistence and/or serialization.  A part of me also likes how the
objects can choose to lie about their types.
But another part of me says; the basic objects that you specified
already have a format that is unambiguous, repr(obj).  They also are
able to be reconstructed from their component parts via eval(repr(obj)),
or even via the 'unrepr' function in the ConfigObj module.  It doesn't
handle circular referencse.
well, repr is fine for most simple things, but you don't use repr to
serialize objects, right? it's not powerful/introspective enough.
besides repr is meant to be readable, while __getstate__ can return
any object. imagine this:
I use repr to serialize objects all the time.  ConfigObj is great when I
want to handle python-based configuration information, and/or I don't
want to worry about the security implications of 'eval(arbitrary string)',
or 'import module'.

With a proper __repr__ method, I can even write towards your API:

class mylist(object):
    def __repr__(self):
        state = ...
        return 'mylist.__setstate__(%r)'%(state,)
...
class complex:
    def __repr__(self):
        return "(%f+%fj)" % (self.real, self.imag)
I would use 'return "(%r+%rj)"% (self.real, self.imag)', but it doesn't
much matter.
...
repr is made for humans of course, while serialization is
made for machines. they serves different purposes,
so they need different APIs.
I happen to disagree.  The only reason to use a different representation
or API is if there are size and/or performance benefits to offering a
machine readable vs. human readable format.

I'm know that there are real performance advantages to using (c)Pickle
over repr/unrepr, but I use it also so that I can change settings with
notepad (as has been necessary on occasion).
...
...
Even better, it has 3 native representations; repr(a).encode('zlib'),
repr(a), pprint.pprint(a); each offering a different amount of user
readability.  I digress.
you may have digressed, but that's a good point -- that's exactly
why i do NOT specify how objects are encoded as a stream of bytes.
all i'm after is the state of the object (which is expressed in terms of
other, more primitive objects).
Right, but as 'primative objects' go, you cant get significantly more
primitive than producing a string that can be naively understood by
someone familliar with Python *and* the built-in Python parser. 
Nevermind that it works *today* with all of the types you specified
earlier (with the exception of file objects - which you discover on
parsing/reproducing the object).
...
you can think of repr as a textual serializer to some extent, that
can use the proposed __getstate__ API. pprint is yet another
form of serializer.
Well, pprint is more or less a pretty repr.
...
...
I believe the biggest problem with the proposal, as specified, is that
changing the semantics of __getstate__ and __setstate__ is a bad idea.
Add a new pair of methods and ask the twisted people what they think.
My only criticism will then be the strawman repr/unrepr.
i'll try to come up with new names... but i don't have any ideas
at the moment.
Like Colin, I also like __rebuild__.

 - Josiah