[Python-ideas] Fwd: Re: Secure unpickle

Andrew Barnert abarnert at yahoo.com
Wed Jul 22 22:54:31 CEST 2015

On Jul 22, 2015, at 13:27, Ryan Gonzalez <rymg19 at gmail.com> wrote:
> A further idea: hashes.
> Each Pickle database (or whatever it's called) would contain a hash made up
> of:
> a) The types used to pickle the data.
> b) The hash of the data itself, prefixed with 2 bytes that have some sort
> of hard-to-get meaning (the length of the call stack?).
> c) The seconds since epoch, or another 64-bit value.

A type pickled and unpickled in a different interpreter instance isn't necessarily going to have the same hash value. And if you don't mean a Python hash, how do you hash an arbitrary class object? Or, if you mean just the name, how does that secure anything?

For that matter, it's often important for an updated version of the code to be able to load pickles created with yesterday's version. This is easy to do with the pickle protocol, but hashing would presumably break that (unless it didn't protect anything at all).

> The three values would likely be merged via bitwise or.

Why would you merge three hash values with bitwise or instead of one of the usual hash combining mechanisms? This just throws away most of your entropy.

> This has the advantage that there are three different elements making up
> the hash, some of which are harder to locate. Unless two of the values are
> known, the third can't be.
> The types would be extracted from the hash via some kind of magic,

That really _would_ be magic. The whole point of a hash is that it's one-way. If the hashed values can be recovered from it, it's not a hash.

Also, "harder to locate" is useless, unless you plan to continually update your code as attackers locate  the things you've hidden. (And, for something used in as many high-profile uses as Python's pickler, any security by obscurity would be attacked very frequently.)

> and then
> it would validate the data in the database based on the types, like Neil
> said.
> If someone wanted to change the types, they would need to regenerate the
> whole hash.

And... So what? Unless the checker has some secure way of knowing which timestamp, etc. to use in checking the hash, all you have to do is give it the timestamp, etc. that go along with your regenerated hash, and it will pass.

> Further security could be obtained by prefixing the first value
> with another special byte sequence that, although easier to find, would be
> used for validation purposes.
> Point 2's prefixing bytes and point 3's value would be especially trickier
> to find, since a few seconds may pass before the data is written to disk.
> It's still a bit insecure, but much better than the current situation. I
> think.

I think it's much worse than the current situation, because it adds illusory security while still being effectively just as crackable.

>> On Wed, Jul 22, 2015 at 3:03 AM, Neil Girdhar <mistersheik at gmail.com> wrote:
>> I've heard it said that pickle is a security hole, and so it's better to
>> write your own serialization routine.  That's unfortunate because pickle
>> has so many advantages such as automatically tying into copy/deepcopy.
>> Would it be possible to make unpickle secure, e.g., by having the caller
>> create a context in which all calls to unpickle are limited to unpickling a
>> specific set of types?  (When these types unpickle their sub-objects, they
>> could potentially limit the set of types further.)
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
> -- 
> Ryan
> [ERROR]: Your autotools build scripts are 200 lines longer than your
> program. Something’s wrong.
> http://kirbyfan64.github.io/
> Currently listening to: Death Egg Boss theme (Sonic Generations)
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

More information about the Python-ideas mailing list