
On 1/26/07, Josiah Carlson <jcarlson@uci.edu> wrote:
*I* use pickle when I want speed. I use repr and unrepr (from configobj) when I want to be able to change things by hand. None of the objects I transfer between processes/disk/whatever are ever more than base Python types.
As for whatever *others* do, I don't know, I haven't done a survey. They may do as you say.
well, since this proposal originated from an RPC point-of-view, i, for once, also need to transfer full-blown user-defined objects and classes. for that, i need to be able to get the full state of the object, and the means to reconstruct it later. pickling the "primitives" was never a problem, because they have a well defined interface. although cyclic references call for something stronger than repr... and this is what i mean by "repr is for humans".
I never claimed it was a substitute, I stated quite clearly up front; "My only criticism will then be the strawman repr/unrepr." The alternate strawman is repr/eval, can be made to support basically everything except for self-referential objects. It still has that horrible security issue, but that also exists in pickle, and could still exist in the simplification/rebuilding rutines specified by Tomer.
well, pickle is unsafe for one main reason -- it's omnipotent. it performs arbitrary imports and object instantiation, which are equivalent to eval of arbitrary strings. BUT, it has nothing to do with the way it gets of sets the *state* of objects. to solve that, we can have a "capability-based pickle": dumping objects was never a security issue, it's the loading part that's dangerous. we can add a new function, loadsec(), that takes both the string to load and a set of classes it may use to __rebuilt__ the object. capabilities = {"list" : list, "str" : str, "os.stat_result" : os.stat_result} loadsec(data, capabilities) that way, you can control the objects that will be instantiated, which you trust, so no arbitrary code may be executed behind the scenes. for the "classic" unsafe load(), we can pass a magical dict-like thing that imports names via __getitem__ if i had a way to control what pickle.loads has access to, i wouldn't need to write my own serializer.. http://sebulbasvn.googlecode.com/svn/trunk/rpyc/core/marshal/brine.py
you may have digressed, but that's a good point -- that's exactly why i do NOT specify how objects are encoded as a stream of bytes.
all i'm after is the state of the object (which is expressed in terms of other, more primitive objects).
Right, but as 'primative objects' go, you cant get significantly more primitive than producing a string that can be naively understood by someone familliar with Python *and* the built-in Python parser.
but that's not the issue. when i send my objects across a socket back and forth, i want something that is fast, compact, and safe. i don't care for anyone sniffing my wire to "easily understand" what i'm sending... i mean, it's not like i encrypt the data, but readability doesn't count here. again, __simplify__ and __rebuild__ offer a mechanism for serializer-implementors, which may choose different encoding schemes for their internal purposes. this API isn't meant for the end user. i'll write a pre-pep to try to clarify it all in an orderly manner. -tomer