Secure Pickle-like module

Jean-Paul Calderone exarkun at divmod.com
Thu May 25 17:02:49 EDT 2006


On 25 May 2006 13:22:21 -0700, jiba at tuxfamily.org wrote:
>Hi all,
>
>I'm currently working on a secure Pickle-like module, Cerealizer,
>http://home.gna.org/oomadness/en/cerealizer/index.html
>Cerealizer has a pickle-like interface (load, dump, __getstate__,
>__setstate__,...), however it requires to register the class you want
>to "cerealize", by calling cerealizer.register(YourClass).
>Cerealizer doesn't import other modules (contrary to pickle), and the
>only methods it may call are YourClass.__new__, YourClass.__getstate__
>and YourClass.__setstate__ (Cerealizer keeps it own reference to these
>three method, so as YourCall.__setstate__ = cracked_method is
>harmless).
>Thus, as long as __new__, __getstate__ and __setstate__ are not
>dangerous, Cerealizer should be secure.
>
>The performance are quite good and, with Psyco, it is about as fast as
>cPickle. However, Cerealizer is written in less than 300 lines of
>pure-Python code.
>
>I would appreciate any comments, especially if there are some security
>gurus here :-)

There are a couple factual inaccuracies on the site that I'd like to clear up first:

Trivial benchmarks put cerealizer and banana/jelly on the same level as far as performance goes:

$ python -m timeit -s 'from cereal import dumps; L = ["Hello", " ", ("w", "o", "r", "l", "d", ".")]' 'dumps(L)'
10000 loops, best of 3: 84.1 usec per loop

$ python -m timeit -s 'from twisted.spread import banana, jelly; dumps = lambda o: banana.encode(jelly.jelly(o)); L = ["Hello", " ", ("w", "o", "r", "l", "d", ".")]' 'dumps(L)'
10000 loops, best of 3: 89.7 usec per loop

This is with cBanana though, which has to be explicitly enabled and, of course, is written in C.  So Cerealizer looks like it has the potential to do pretty well, performance-wise.

Similar benchmarks show jelly/banana actually produces shorter encoded forms:

    >>> len(banana.encode(jelly.jelly(())))
    9
    >>> len(cereal.dumps(()))
    21
    >>> len(banana.encode(jelly.jelly(["Hello", " ", ("w", "o", "r", "l", "d", ".")])))
    45
    >>> len(cereal.dumps(["Hello", " ", ("w", "o", "r", "l", "d", ".")]))
    67

I think the mistake you may have made was thinking that repr(jelly()) is the final output form, when really it's banana.encode(jelly()).

You talked about _Tuple and _Dereference on the website as well.  These are internal implementation details.  They don't show up in the final decoded output at all:

    >>> from twisted.spread import jelly
    >>> output = jelly.unjelly(jelly.jelly([()] * 2))
    >>> output
    [(), ()]
    >>> output[0] is output[1]
    True
    >>> type(output[0]) is tuple
    True
    >>> 

jelly also supports extension types, by way of setUnjellyableForClass and similar functions.

As far as security goes, no obvious problems jump out at me, either
from the API for from skimming the code.  I think early-binding
__new__, __getstate__, and __setstate__ may be going further than
is necessary.  If someone can find code to set attributes on classes
in your process space, they can probably already do anything they
want to your program and don't need to exploit security problems in
your serializer.  On the other hand, early-binding may lead to
confusing bugs, albeit only in nasty cases where people are expecting
changes they make to class objects to have an effect on every part
of the system.

Jean-Paul



More information about the Python-list mailing list