[Python-ideas] An idea for a new pickling tool

Alexandre Vassalotti alexandre at peadrop.com
Wed Apr 22 23:14:58 CEST 2009


On Tue, Apr 21, 2009 at 6:02 PM, Raymond Hettinger <python at rcn.com> wrote:
> Motivation
> ----------
>
> Python's pickles use a custom format that has evolved over time
> but they have five significant disadvantages:
>
>   * it has lost its human readability and editability
>

This is not part of pickle design goals. Also, I don't think the
pickle protocol ever been a human-friendly format. Even if protocol 0
is ASCII-based, it doesn't mean one would like to edit it by hand.

>   * is doesn't compress well

Do you have numbers to support this? The last time I tested
compression on pickle data, it worked fairly well. In fact, I get a
2.70 compression ratio for some pickles using gzip.

>From my experience with pickle, I doubt you can improve significantly
the size of pickled data, without using static schemata (like Google
Protocol Buffers and Thrift). The only inefficient thing in pickle, I
am aware of, is the handling of PUT and GET opcodes.

>   * it isn't interoperable with other languages
>   * it doesn't have the ability to enforce a schema

Again, these are not part of pickle's design goals.

>   * it is a major security risk for untrusted inputs
>

There are way to fix this without replacing pickle. See the recipe in
pickle documentation:

http://docs.python.org/3.0/library/pickle.html#restricting-globals

> New idea
> --------
>
> Develop a solution using a mix of PyYAML, a python coded version of
> Kwalify, optional compression using bz2, gzip, or zlib, and pretty
> printing using pygments.
>
> YAML ( http://yaml.org/spec/1.2/ ) is a language independent standard
> for data serialization.
>
> PyYAML ( http://pyyaml.org/wiki/PyYAML ) is a full implementation of
> the YAML standard.  It uses the YAML's application-specific tags and
> Python's own copy/reduce logic to provide the same power as pickle itself.
>

But how are you going to handle serialization of class instances in a
language independent manner?

Regards,
-- Alexandre



More information about the Python-ideas mailing list