[Python-ideas] new pickle semantics/API
Josiah Carlson
jcarlson at uci.edu
Fri Jan 26 00:29:02 CET 2007
"tomer filiba" <tomerfiliba at gmail.com> wrote:
> On 1/25/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> > Overall, I like the idea; I'm a big fan of simplifying object
> > persistence and/or serialization. A part of me also likes how the
> > objects can choose to lie about their types.
> >
> > But another part of me says; the basic objects that you specified
> > already have a format that is unambiguous, repr(obj). They also are
> > able to be reconstructed from their component parts via eval(repr(obj)),
> > or even via the 'unrepr' function in the ConfigObj module. It doesn't
> > handle circular referencse.
>
> well, repr is fine for most simple things, but you don't use repr to
> serialize objects, right? it's not powerful/introspective enough.
> besides repr is meant to be readable, while __getstate__ can return
> any object. imagine this:
I use repr to serialize objects all the time. ConfigObj is great when I
want to handle python-based configuration information, and/or I don't
want to worry about the security implications of 'eval(arbitrary string)',
or 'import module'.
With a proper __repr__ method, I can even write towards your API:
class mylist(object):
def __repr__(self):
state = ...
return 'mylist.__setstate__(%r)'%(state,)
> class complex:
> def __repr__(self):
> return "(%f+%fj)" % (self.real, self.imag)
I would use 'return "(%r+%rj)"% (self.real, self.imag)', but it doesn't
much matter.
> repr is made for humans of course, while serialization is
> made for machines. they serves different purposes,
> so they need different APIs.
I happen to disagree. The only reason to use a different representation
or API is if there are size and/or performance benefits to offering a
machine readable vs. human readable format.
I'm know that there are real performance advantages to using (c)Pickle
over repr/unrepr, but I use it also so that I can change settings with
notepad (as has been necessary on occasion).
> > Even better, it has 3 native representations; repr(a).encode('zlib'),
> > repr(a), pprint.pprint(a); each offering a different amount of user
> > readability. I digress.
>
> you may have digressed, but that's a good point -- that's exactly
> why i do NOT specify how objects are encoded as a stream of bytes.
>
> all i'm after is the state of the object (which is expressed in terms of
> other, more primitive objects).
Right, but as 'primative objects' go, you cant get significantly more
primitive than producing a string that can be naively understood by
someone familliar with Python *and* the built-in Python parser.
Nevermind that it works *today* with all of the types you specified
earlier (with the exception of file objects - which you discover on
parsing/reproducing the object).
> you can think of repr as a textual serializer to some extent, that
> can use the proposed __getstate__ API. pprint is yet another
> form of serializer.
Well, pprint is more or less a pretty repr.
> > I believe the biggest problem with the proposal, as specified, is that
> > changing the semantics of __getstate__ and __setstate__ is a bad idea.
> > Add a new pair of methods and ask the twisted people what they think.
> > My only criticism will then be the strawman repr/unrepr.
>
> i'll try to come up with new names... but i don't have any ideas
> at the moment.
Like Colin, I also like __rebuild__.
- Josiah
More information about the Python-ideas
mailing list