[Python-ideas] An idea for a new pickling tool
alexandre at peadrop.com
Wed Apr 22 23:14:58 CEST 2009
On Tue, Apr 21, 2009 at 6:02 PM, Raymond Hettinger <python at rcn.com> wrote:
> Python's pickles use a custom format that has evolved over time
> but they have five significant disadvantages:
> * it has lost its human readability and editability
This is not part of pickle design goals. Also, I don't think the
pickle protocol ever been a human-friendly format. Even if protocol 0
is ASCII-based, it doesn't mean one would like to edit it by hand.
> * is doesn't compress well
Do you have numbers to support this? The last time I tested
compression on pickle data, it worked fairly well. In fact, I get a
2.70 compression ratio for some pickles using gzip.
>From my experience with pickle, I doubt you can improve significantly
the size of pickled data, without using static schemata (like Google
Protocol Buffers and Thrift). The only inefficient thing in pickle, I
am aware of, is the handling of PUT and GET opcodes.
> * it isn't interoperable with other languages
> * it doesn't have the ability to enforce a schema
Again, these are not part of pickle's design goals.
> * it is a major security risk for untrusted inputs
There are way to fix this without replacing pickle. See the recipe in
> New idea
> Develop a solution using a mix of PyYAML, a python coded version of
> Kwalify, optional compression using bz2, gzip, or zlib, and pretty
> printing using pygments.
> YAML ( http://yaml.org/spec/1.2/ ) is a language independent standard
> for data serialization.
> PyYAML ( http://pyyaml.org/wiki/PyYAML ) is a full implementation of
> the YAML standard. It uses the YAML's application-specific tags and
> Python's own copy/reduce logic to provide the same power as pickle itself.
But how are you going to handle serialization of class instances in a
language independent manner?
More information about the Python-ideas