[Python-ideas] Fwd: Re: Secure unpickle

Neil Girdhar mistersheik at gmail.com
Thu Jul 23 02:27:27 CEST 2015


On Wed, Jul 22, 2015 at 6:17 PM, Andrew Barnert <abarnert at yahoo.com> wrote:

> > On Jul 22, 2015, at 13:58, Ryan Gonzalez <rymg19 at gmail.com> wrote:
> >
> > Disclaimer: I know virtually *nothing* about cryptography, so this is
> probably worse than it seems.
>
> It's always better to look for an existing cryptosystem than to try to
> invent a new one.
>
> Briefly, I think what you're looking for here is a way to sign pickles and
> verify their signatures, which is a well-known problem.
>
> If you have some secure way to store keys (e.g., the only code that ever
> touches the pickles runs on your backend servers), everything is easy; just
> use, say, OpenSSL to sign and verify your pickles (e.g., using a key on
> some non-public-accessible server). If you need public-accessible code to
> create and use pickles, there is no solution. (That's a slight
> oversimplification; a better way to put it is that if there's no existing
> cert-management and key-exchange system that can get the keys to your
> software securely, that probably means what you need is impossible.)
>
> Tossing in a bunch of other stuff--a manifest listing the types, a
> timestamp, or other nonrandom salt--or tricks like obfuscating where the
> key is are ultimately irrelevant. If the signature is tamper-proof, adding
> more stuff to it doesn't make it any more so; if it's tamperable, adding
> more stuff doesn't make it less so. Of course you may want to add on extra
> features (e.g., timestamps can be useful for key revocation schemes to
> mitigate damage from a crack), or some of that information may be useful
> for its own sake (e.g., being able to extract the list of types without
> running the pickle could be very handy for debugging, logging, etc.), but
> it doesn't increase the security of the signature.
>
> Anyway, I think what Neil is trying to solve is something different:
> assuming the data is insecure and there's no way to secure it, how do we
> write code that doesn't use it in an unsafe way?
>
> They're really separate problems. I don't think Python should do anything
> to solve yours (anything Python could do, OpenSSL probably can already do
> for you, better); it might be useful for Python to solve his (although I
> think picking and stdlibifying or copying a good third-party solution may
> be a better idea than trying to design one).
>

Thanks Andrew, totally agree with what you said.  For the record, I don't
know exactly what the problem is.  I just noticed on some projects people
talking about writing their own unpickling code because of insecurities in
pickle, and it made me think: "why should you have to?"  E.g.,

https://github.com/matplotlib/matplotlib/issues/3424
https://github.com/matplotlib/matplotlib/issues/4756

People explicitly say: "get the ability to dump/return our figures to
*any* serialization
format other than pickle"!
That is so unfortunate.  Pickle is such a good solution except for the
security.  Why can't we have security too?  It doesn't seem to me to be
right for a project like matplotlib to be writing their own serialization
library.  It would be awesome if Python had secure serialization built-in.

Best,

Neil



> >>> On July 22, 2015 3:54:31 PM CDT, Andrew Barnert <abarnert at yahoo.com>
> wrote:
> >>> On Jul 22, 2015, at 13:27, Ryan Gonzalez <rymg19 at gmail.com> wrote:
> >>>
> >>> A further idea: hashes.
> >>>
> >>> Each Pickle database (or whatever it's called) would contain a hash
> >> made up
> >>> of:
> >>>
> >>> a) The types used to pickle the data.
> >>> b) The hash of the data itself, prefixed with 2 bytes that have some
> >> sort
> >>> of hard-to-get meaning (the length of the call stack?).
> >>> c) The seconds since epoch, or another 64-bit value.
> >>
> >> A type pickled and unpickled in a different interpreter instance isn't
> >> necessarily going to have the same hash value. And if you don't mean a
> >> Python hash, how do you hash an arbitrary class object? Or, if you mean
> >> just the name, how does that secure anything?
> >>
> >> For that matter, it's often important for an updated version of the
> >> code to be able to load pickles created with yesterday's version. This
> >> is easy to do with the pickle protocol, but hashing would presumably
> >> break that (unless it didn't protect anything at all).
> >>
> >>> The three values would likely be merged via bitwise or.
> >>
> >> Why would you merge three hash values with bitwise or instead of one of
> >> the usual hash combining mechanisms? This just throws away most of your
> >> entropy.
> >
> > Uhhhh...I have no clue. It just came off the top of my head.
> >
> >>
> >>> This has the advantage that there are three different elements making
> >> up
> >>> the hash, some of which are harder to locate. Unless two of the
> >> values are
> >>> known, the third can't be.
> >>>
> >>> The types would be extracted from the hash via some kind of magic,
> >>
> >> That really _would_ be magic. The whole point of a hash is that it's
> >> one-way. If the hashed values can be recovered from it, it's not a
> >> hash.
> >
> > Well, I again know nothing about cryptography, so I guess "key" is a
> better phrase. :O
> >
> >>
> >> Also, "harder to locate" is useless, unless you plan to continually
> >> update your code as attackers locate  the things you've hidden. (And,
> >> for something used in as many high-profile uses as Python's pickler,
> >> any security by obscurity would be attacked very frequently.)
> >>
> >>> and then
> >>> it would validate the data in the database based on the types, like
> >> Neil
> >>> said.
> >>>
> >>> If someone wanted to change the types, they would need to regenerate
> >> the
> >>> whole hash.
> >>
> >> And... So what? Unless the checker has some secure way of knowing which
> >> timestamp, etc. to use in checking the hash, all you have to do is give
> >> it the timestamp, etc. that go along with your regenerated hash, and it
> >> will pass.
> >>
> >>> Further security could be obtained by prefixing the first value
> >>> with another special byte sequence that, although easier to find,
> >> would be
> >>> used for validation purposes.
> >>>
> >>> Point 2's prefixing bytes and point 3's value would be especially
> >> trickier
> >>> to find, since a few seconds may pass before the data is written to
> >> disk.
> >>>
> >>> It's still a bit insecure, but much better than the current
> >> situation. I
> >>> think.
> >>
> >> I think it's much worse than the current situation, because it adds
> >> illusory security while still being effectively just as crackable.
> >>
> >>>
> >>>
> >>>> On Wed, Jul 22, 2015 at 3:03 AM, Neil Girdhar
> >> <mistersheik at gmail.com> wrote:
> >>>>
> >>>> I've heard it said that pickle is a security hole, and so it's
> >> better to
> >>>> write your own serialization routine.  That's unfortunate because
> >> pickle
> >>>> has so many advantages such as automatically tying into
> >> copy/deepcopy.
> >>>> Would it be possible to make unpickle secure, e.g., by having the
> >> caller
> >>>> create a context in which all calls to unpickle are limited to
> >> unpickling a
> >>>> specific set of types?  (When these types unpickle their
> >> sub-objects, they
> >>>> could potentially limit the set of types further.)
> >>>>
> >>>> _______________________________________________
> >>>> Python-ideas mailing list
> >>>> Python-ideas at python.org
> >>>> https://mail.python.org/mailman/listinfo/python-ideas
> >>>> Code of Conduct: http://python.org/psf/codeofconduct/
> >>>
> >>>
> >>>
> >>> --
> >>> Ryan
> >>> [ERROR]: Your autotools build scripts are 200 lines longer than your
> >>> program. Something’s wrong.
> >>> http://kirbyfan64.github.io/
> >>> Currently listening to: Death Egg Boss theme (Sonic Generations)
> >>> --
> >>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
> >>> _______________________________________________
> >>> Python-ideas mailing list
> >>> Python-ideas at python.org
> >>> https://mail.python.org/mailman/listinfo/python-ideas
> >>> Code of Conduct: http://python.org/psf/codeofconduct/
> >
> > --
> > Sent from my Android device with K-9 Mail. Please excuse my brevity.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150722/ca206671/attachment-0001.html>


More information about the Python-ideas mailing list