PEP for RFE 46738 (first draft)

Hello Chaps, The attached PEP (pep.txt) is for RFE 46738, which you can view here: http://sourceforge.net/tracker/index.php?func=detail&aid=467384&group_id=5470&atid=355470 It provides a safe, documented class for serialization of simple python types. A sample implementation is also attached (gherkin.py). Critcism and comments on the PEP and the implementation are appreciated. Simon Wittber.

Please don't invent new serialization formats. I think we have enough of those already. The RFE suggests that "the protocol is specified in the documentation, precisely enough to write interoperating implementations in other languages". If interoperability with other languages is really the issue, use an existing format like JSON. If you want an efficient binary format you can use a subset of the pickle protocol supporting only basic types. I tried this once. I ripped out all the fancy parts from pickle.py and left only binary pickling (protocol version 2) of basic types. It took less than hour and I was left with something only marginally more complex than your new proposed protocol. Oren

The RFE suggests that "the protocol is specified in the documentation, precisely enough to write interoperating implementations in other languages". If interoperability with other languages is really the issue, use an existing format like JSON.
JSON is slow (this is true of the python version, at least). Whether it is slow because of the implementation, or because of its textual nature, I do not know. The implementation I tested also failed to encode {1:2}. I am not sure if this is a problem with JSON or the implementation.
If you want an efficient binary format you can use a subset of the pickle protocol supporting only basic types. I tried this once. I ripped out all the fancy parts from pickle.py and left only binary pickling (protocol version 2) of basic types. It took less than hour and I was left with something only marginally more complex than your new proposed protocol.
I think you are missing the point. Is your pickle hack available for viewing? If it, or JSON is a better choice, then so be it. The point of the PEP is not the protocol, but the need for a documented, efficient, _safe_ serializion module in the standard library. Do you disagree? Simon Wittber.

Why this discussion of yet another serialization format? The wire-encoding for XML-RPC is quite stable, handles all the basic Python types proposed in the proto-PEP, and is highly interoperable. If performance is an issue, make sure you have a C-based accelerator module like sgmlop installed. If size is an issue, gzip it before sending it over the wire or to a file. Skip

Why this discussion of yet another serialization format?
Pickle is stated to be unsafe. Marshal is also stated to be unsafe. XML can be bloated, and XML+gzip is quite slow. Do size,speed, and security features have to mutually exclusive? No, that possibly is why people have had to invent their own formats. I can list four off the top of my head: bencode (bittorrent) jelly (twisted) banana (twisted) tofu (soya3d, looks like it is using twisted now... hmmm) XML is simply not suitable for database appplications, real time data capture and game/entertainment applications. I'm sure other people have noticed this... or am I alone on this issue? :-) Have a look at this contrived example: import time value = (("this is a record",1,2,1000001,"(08)123123123","some more text")*10000) import gherkin t = time.clock() s = gherkin.dumps(value) print 'Gherkin encode', time.clock() - t, 'seconds' t = time.clock() gherkin.loads(s) print 'Gherkin decode', time.clock() - t, 'seconds' import xmlrpclib t = time.clock() s = xmlrpclib.dumps(value) print 'XMLRPC encode', time.clock() - t, 'seconds' t = time.clock() xmlrpclib.loads(s) print 'XMLRPC decode', time.clock() - t, 'seconds' Which produces the output:
pythonw -u "bench.py" Gherkin encode 0.120689361357 seconds Gherkin decode 0.395871262968 seconds XMLRPC encode 0.528666352847 seconds XMLRPC decode 9.01307819849 seconds

Simon Wittber <simonwittber@gmail.com> wrote:
Why this discussion of yet another serialization format?
Pickle is stated to be unsafe. Marshal is also stated to be unsafe. XML can be bloated, and XML+gzip is quite slow.
Do size,speed, and security features have to mutually exclusive? No, that possibly is why people have had to invent their own formats. I can list four off the top of my head:
... Looks to me like the eval(repr(obj)) loop spanks XMLRPC. It likely also spanks Gherkin, but I'm not one to run untrusted code. Give it a shot on your own machine. As for parsing, repr() of standard Python objects is pretty easy to parse, and if you want something a bit easier to read, there's always pprint (though it may not be quite as fast). - Josiah Python 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
value = (("this is a record",1,2,1000001,"(08)123123123","some moretext")*10000) import time t = time.time();y = repr(value);time.time()-t 0.030999898910522461 t = time.time();z = eval(y);time.time()-t 0.26600003242492676 import xmlrpclib t = time.time();n = xmlrpclib.dumps(value);time.time()-t 0.42100000381469727 t = time.time();m = xmlrpclib.loads(n);time.time()-t 4.4529998302459717
participants (4)
-
Josiah Carlson
-
Oren Tirosh
-
Simon Wittber
-
Skip Montanaro