[Python-ideas] Fwd: Re: Secure unpickle

Andrew Barnert abarnert at yahoo.com
Thu Jul 23 00:17:01 CEST 2015


> On Jul 22, 2015, at 13:58, Ryan Gonzalez <rymg19 at gmail.com> wrote:
> 
> Disclaimer: I know virtually *nothing* about cryptography, so this is probably worse than it seems.

It's always better to look for an existing cryptosystem than to try to invent a new one.

Briefly, I think what you're looking for here is a way to sign pickles and verify their signatures, which is a well-known problem.

If you have some secure way to store keys (e.g., the only code that ever touches the pickles runs on your backend servers), everything is easy; just use, say, OpenSSL to sign and verify your pickles (e.g., using a key on some non-public-accessible server). If you need public-accessible code to create and use pickles, there is no solution. (That's a slight oversimplification; a better way to put it is that if there's no existing cert-management and key-exchange system that can get the keys to your software securely, that probably means what you need is impossible.)

Tossing in a bunch of other stuff--a manifest listing the types, a timestamp, or other nonrandom salt--or tricks like obfuscating where the key is are ultimately irrelevant. If the signature is tamper-proof, adding more stuff to it doesn't make it any more so; if it's tamperable, adding more stuff doesn't make it less so. Of course you may want to add on extra features (e.g., timestamps can be useful for key revocation schemes to mitigate damage from a crack), or some of that information may be useful for its own sake (e.g., being able to extract the list of types without running the pickle could be very handy for debugging, logging, etc.), but it doesn't increase the security of the signature.

Anyway, I think what Neil is trying to solve is something different: assuming the data is insecure and there's no way to secure it, how do we write code that doesn't use it in an unsafe way?

They're really separate problems. I don't think Python should do anything to solve yours (anything Python could do, OpenSSL probably can already do for you, better); it might be useful for Python to solve his (although I think picking and stdlibifying or copying a good third-party solution may be a better idea than trying to design one).

>>> On July 22, 2015 3:54:31 PM CDT, Andrew Barnert <abarnert at yahoo.com> wrote:
>>> On Jul 22, 2015, at 13:27, Ryan Gonzalez <rymg19 at gmail.com> wrote:
>>> 
>>> A further idea: hashes.
>>> 
>>> Each Pickle database (or whatever it's called) would contain a hash
>> made up
>>> of:
>>> 
>>> a) The types used to pickle the data.
>>> b) The hash of the data itself, prefixed with 2 bytes that have some
>> sort
>>> of hard-to-get meaning (the length of the call stack?).
>>> c) The seconds since epoch, or another 64-bit value.
>> 
>> A type pickled and unpickled in a different interpreter instance isn't
>> necessarily going to have the same hash value. And if you don't mean a
>> Python hash, how do you hash an arbitrary class object? Or, if you mean
>> just the name, how does that secure anything?
>> 
>> For that matter, it's often important for an updated version of the
>> code to be able to load pickles created with yesterday's version. This
>> is easy to do with the pickle protocol, but hashing would presumably
>> break that (unless it didn't protect anything at all).
>> 
>>> The three values would likely be merged via bitwise or.
>> 
>> Why would you merge three hash values with bitwise or instead of one of
>> the usual hash combining mechanisms? This just throws away most of your
>> entropy.
> 
> Uhhhh...I have no clue. It just came off the top of my head.
> 
>> 
>>> This has the advantage that there are three different elements making
>> up
>>> the hash, some of which are harder to locate. Unless two of the
>> values are
>>> known, the third can't be.
>>> 
>>> The types would be extracted from the hash via some kind of magic,
>> 
>> That really _would_ be magic. The whole point of a hash is that it's
>> one-way. If the hashed values can be recovered from it, it's not a
>> hash.
> 
> Well, I again know nothing about cryptography, so I guess "key" is a better phrase. :O
> 
>> 
>> Also, "harder to locate" is useless, unless you plan to continually
>> update your code as attackers locate  the things you've hidden. (And,
>> for something used in as many high-profile uses as Python's pickler,
>> any security by obscurity would be attacked very frequently.)
>> 
>>> and then
>>> it would validate the data in the database based on the types, like
>> Neil
>>> said.
>>> 
>>> If someone wanted to change the types, they would need to regenerate
>> the
>>> whole hash.
>> 
>> And... So what? Unless the checker has some secure way of knowing which
>> timestamp, etc. to use in checking the hash, all you have to do is give
>> it the timestamp, etc. that go along with your regenerated hash, and it
>> will pass.
>> 
>>> Further security could be obtained by prefixing the first value
>>> with another special byte sequence that, although easier to find,
>> would be
>>> used for validation purposes.
>>> 
>>> Point 2's prefixing bytes and point 3's value would be especially
>> trickier
>>> to find, since a few seconds may pass before the data is written to
>> disk.
>>> 
>>> It's still a bit insecure, but much better than the current
>> situation. I
>>> think.
>> 
>> I think it's much worse than the current situation, because it adds
>> illusory security while still being effectively just as crackable.
>> 
>>> 
>>> 
>>>> On Wed, Jul 22, 2015 at 3:03 AM, Neil Girdhar
>> <mistersheik at gmail.com> wrote:
>>>> 
>>>> I've heard it said that pickle is a security hole, and so it's
>> better to
>>>> write your own serialization routine.  That's unfortunate because
>> pickle
>>>> has so many advantages such as automatically tying into
>> copy/deepcopy.
>>>> Would it be possible to make unpickle secure, e.g., by having the
>> caller
>>>> create a context in which all calls to unpickle are limited to
>> unpickling a
>>>> specific set of types?  (When these types unpickle their
>> sub-objects, they
>>>> could potentially limit the set of types further.)
>>>> 
>>>> _______________________________________________
>>>> Python-ideas mailing list
>>>> Python-ideas at python.org
>>>> https://mail.python.org/mailman/listinfo/python-ideas
>>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>> 
>>> 
>>> 
>>> -- 
>>> Ryan
>>> [ERROR]: Your autotools build scripts are 200 lines longer than your
>>> program. Something’s wrong.
>>> http://kirbyfan64.github.io/
>>> Currently listening to: Death Egg Boss theme (Sonic Generations)
>>> -- 
>>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org
>>> https://mail.python.org/mailman/listinfo/python-ideas
>>> Code of Conduct: http://python.org/psf/codeofconduct/
> 
> -- 
> Sent from my Android device with K-9 Mail. Please excuse my brevity.


More information about the Python-ideas mailing list