A further idea: hashes. Each Pickle database (or whatever it's called) would contain a hash made up of: a) The types used to pickle the data. b) The hash of the data itself, prefixed with 2 bytes that have some sort of hard-to-get meaning (the length of the call stack?). c) The seconds since epoch, or another 64-bit value. The three values would likely be merged via bitwise or. This has the advantage that there are three different elements making up the hash, some of which are harder to locate. Unless two of the values are known, the third can't be. The types would be extracted from the hash via some kind of magic, and then it would validate the data in the database based on the types, like Neil said. If someone wanted to change the types, they would need to regenerate the whole hash. Further security could be obtained by prefixing the first value with another special byte sequence that, although easier to find, would be used for validation purposes. Point 2's prefixing bytes and point 3's value would be especially trickier to find, since a few seconds may pass before the data is written to disk. It's still a bit insecure, but much better than the current situation. I think. On Wed, Jul 22, 2015 at 3:03 AM, Neil Girdhar <mistersheik@gmail.com> wrote:
I've heard it said that pickle is a security hole, and so it's better to write your own serialization routine. That's unfortunate because pickle has so many advantages such as automatically tying into copy/deepcopy. Would it be possible to make unpickle secure, e.g., by having the caller create a context in which all calls to unpickle are limited to unpickling a specific set of types? (When these types unpickle their sub-objects, they could potentially limit the set of types further.)
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/ Currently listening to: Death Egg Boss theme (Sonic Generations) -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Jul 22, 2015, at 13:27, Ryan Gonzalez <rymg19@gmail.com> wrote:
A further idea: hashes.
Each Pickle database (or whatever it's called) would contain a hash made up of:
a) The types used to pickle the data. b) The hash of the data itself, prefixed with 2 bytes that have some sort of hard-to-get meaning (the length of the call stack?). c) The seconds since epoch, or another 64-bit value.
A type pickled and unpickled in a different interpreter instance isn't necessarily going to have the same hash value. And if you don't mean a Python hash, how do you hash an arbitrary class object? Or, if you mean just the name, how does that secure anything? For that matter, it's often important for an updated version of the code to be able to load pickles created with yesterday's version. This is easy to do with the pickle protocol, but hashing would presumably break that (unless it didn't protect anything at all).
The three values would likely be merged via bitwise or.
Why would you merge three hash values with bitwise or instead of one of the usual hash combining mechanisms? This just throws away most of your entropy.
This has the advantage that there are three different elements making up the hash, some of which are harder to locate. Unless two of the values are known, the third can't be.
The types would be extracted from the hash via some kind of magic,
That really _would_ be magic. The whole point of a hash is that it's one-way. If the hashed values can be recovered from it, it's not a hash. Also, "harder to locate" is useless, unless you plan to continually update your code as attackers locate the things you've hidden. (And, for something used in as many high-profile uses as Python's pickler, any security by obscurity would be attacked very frequently.)
and then it would validate the data in the database based on the types, like Neil said.
If someone wanted to change the types, they would need to regenerate the whole hash.
And... So what? Unless the checker has some secure way of knowing which timestamp, etc. to use in checking the hash, all you have to do is give it the timestamp, etc. that go along with your regenerated hash, and it will pass.
Further security could be obtained by prefixing the first value with another special byte sequence that, although easier to find, would be used for validation purposes.
Point 2's prefixing bytes and point 3's value would be especially trickier to find, since a few seconds may pass before the data is written to disk.
It's still a bit insecure, but much better than the current situation. I think.
I think it's much worse than the current situation, because it adds illusory security while still being effectively just as crackable.
On Wed, Jul 22, 2015 at 3:03 AM, Neil Girdhar <mistersheik@gmail.com> wrote:
I've heard it said that pickle is a security hole, and so it's better to write your own serialization routine. That's unfortunate because pickle has so many advantages such as automatically tying into copy/deepcopy. Would it be possible to make unpickle secure, e.g., by having the caller create a context in which all calls to unpickle are limited to unpickling a specific set of types? (When these types unpickle their sub-objects, they could potentially limit the set of types further.)
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/ Currently listening to: Death Egg Boss theme (Sonic Generations) -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Disclaimer: I know virtually *nothing* about cryptography, so this is probably worse than it seems. On July 22, 2015 3:54:31 PM CDT, Andrew Barnert <abarnert@yahoo.com> wrote:
On Jul 22, 2015, at 13:27, Ryan Gonzalez <rymg19@gmail.com> wrote:
A further idea: hashes.
Each Pickle database (or whatever it's called) would contain a hash
made up
of:
a) The types used to pickle the data. b) The hash of the data itself, prefixed with 2 bytes that have some sort of hard-to-get meaning (the length of the call stack?). c) The seconds since epoch, or another 64-bit value.
A type pickled and unpickled in a different interpreter instance isn't necessarily going to have the same hash value. And if you don't mean a Python hash, how do you hash an arbitrary class object? Or, if you mean just the name, how does that secure anything?
For that matter, it's often important for an updated version of the code to be able to load pickles created with yesterday's version. This is easy to do with the pickle protocol, but hashing would presumably break that (unless it didn't protect anything at all).
The three values would likely be merged via bitwise or.
Why would you merge three hash values with bitwise or instead of one of the usual hash combining mechanisms? This just throws away most of your entropy.
Uhhhh...I have no clue. It just came off the top of my head.
This has the advantage that there are three different elements making up the hash, some of which are harder to locate. Unless two of the values are known, the third can't be.
The types would be extracted from the hash via some kind of magic,
That really _would_ be magic. The whole point of a hash is that it's one-way. If the hashed values can be recovered from it, it's not a hash.
Well, I again know nothing about cryptography, so I guess "key" is a better phrase. :O
Also, "harder to locate" is useless, unless you plan to continually update your code as attackers locate the things you've hidden. (And, for something used in as many high-profile uses as Python's pickler, any security by obscurity would be attacked very frequently.)
and then it would validate the data in the database based on the types, like Neil said.
If someone wanted to change the types, they would need to regenerate the whole hash.
And... So what? Unless the checker has some secure way of knowing which timestamp, etc. to use in checking the hash, all you have to do is give it the timestamp, etc. that go along with your regenerated hash, and it will pass.
Further security could be obtained by prefixing the first value with another special byte sequence that, although easier to find, would be used for validation purposes.
Point 2's prefixing bytes and point 3's value would be especially trickier to find, since a few seconds may pass before the data is written to disk.
It's still a bit insecure, but much better than the current situation. I think.
I think it's much worse than the current situation, because it adds illusory security while still being effectively just as crackable.
On Wed, Jul 22, 2015 at 3:03 AM, Neil Girdhar
I've heard it said that pickle is a security hole, and so it's
better to
write your own serialization routine. That's unfortunate because
<mistersheik@gmail.com> wrote: pickle
has so many advantages such as automatically tying into copy/deepcopy. Would it be possible to make unpickle secure, e.g., by having the caller create a context in which all calls to unpickle are limited to unpickling a specific set of types? (When these types unpickle their sub-objects, they could potentially limit the set of types further.)
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/ Currently listening to: Death Egg Boss theme (Sonic Generations) -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Jul 22, 2015, at 13:58, Ryan Gonzalez <rymg19@gmail.com> wrote:
Disclaimer: I know virtually *nothing* about cryptography, so this is probably worse than it seems.
It's always better to look for an existing cryptosystem than to try to invent a new one. Briefly, I think what you're looking for here is a way to sign pickles and verify their signatures, which is a well-known problem. If you have some secure way to store keys (e.g., the only code that ever touches the pickles runs on your backend servers), everything is easy; just use, say, OpenSSL to sign and verify your pickles (e.g., using a key on some non-public-accessible server). If you need public-accessible code to create and use pickles, there is no solution. (That's a slight oversimplification; a better way to put it is that if there's no existing cert-management and key-exchange system that can get the keys to your software securely, that probably means what you need is impossible.) Tossing in a bunch of other stuff--a manifest listing the types, a timestamp, or other nonrandom salt--or tricks like obfuscating where the key is are ultimately irrelevant. If the signature is tamper-proof, adding more stuff to it doesn't make it any more so; if it's tamperable, adding more stuff doesn't make it less so. Of course you may want to add on extra features (e.g., timestamps can be useful for key revocation schemes to mitigate damage from a crack), or some of that information may be useful for its own sake (e.g., being able to extract the list of types without running the pickle could be very handy for debugging, logging, etc.), but it doesn't increase the security of the signature. Anyway, I think what Neil is trying to solve is something different: assuming the data is insecure and there's no way to secure it, how do we write code that doesn't use it in an unsafe way? They're really separate problems. I don't think Python should do anything to solve yours (anything Python could do, OpenSSL probably can already do for you, better); it might be useful for Python to solve his (although I think picking and stdlibifying or copying a good third-party solution may be a better idea than trying to design one).
On July 22, 2015 3:54:31 PM CDT, Andrew Barnert <abarnert@yahoo.com> wrote: On Jul 22, 2015, at 13:27, Ryan Gonzalez <rymg19@gmail.com> wrote:
A further idea: hashes.
Each Pickle database (or whatever it's called) would contain a hash made up of:
a) The types used to pickle the data. b) The hash of the data itself, prefixed with 2 bytes that have some sort of hard-to-get meaning (the length of the call stack?). c) The seconds since epoch, or another 64-bit value.
A type pickled and unpickled in a different interpreter instance isn't necessarily going to have the same hash value. And if you don't mean a Python hash, how do you hash an arbitrary class object? Or, if you mean just the name, how does that secure anything?
For that matter, it's often important for an updated version of the code to be able to load pickles created with yesterday's version. This is easy to do with the pickle protocol, but hashing would presumably break that (unless it didn't protect anything at all).
The three values would likely be merged via bitwise or.
Why would you merge three hash values with bitwise or instead of one of the usual hash combining mechanisms? This just throws away most of your entropy.
Uhhhh...I have no clue. It just came off the top of my head.
This has the advantage that there are three different elements making up the hash, some of which are harder to locate. Unless two of the values are known, the third can't be.
The types would be extracted from the hash via some kind of magic,
That really _would_ be magic. The whole point of a hash is that it's one-way. If the hashed values can be recovered from it, it's not a hash.
Well, I again know nothing about cryptography, so I guess "key" is a better phrase. :O
Also, "harder to locate" is useless, unless you plan to continually update your code as attackers locate the things you've hidden. (And, for something used in as many high-profile uses as Python's pickler, any security by obscurity would be attacked very frequently.)
and then it would validate the data in the database based on the types, like Neil said.
If someone wanted to change the types, they would need to regenerate the whole hash.
And... So what? Unless the checker has some secure way of knowing which timestamp, etc. to use in checking the hash, all you have to do is give it the timestamp, etc. that go along with your regenerated hash, and it will pass.
Further security could be obtained by prefixing the first value with another special byte sequence that, although easier to find, would be used for validation purposes.
Point 2's prefixing bytes and point 3's value would be especially trickier to find, since a few seconds may pass before the data is written to disk.
It's still a bit insecure, but much better than the current situation. I think.
I think it's much worse than the current situation, because it adds illusory security while still being effectively just as crackable.
On Wed, Jul 22, 2015 at 3:03 AM, Neil Girdhar
I've heard it said that pickle is a security hole, and so it's
better to
write your own serialization routine. That's unfortunate because
<mistersheik@gmail.com> wrote: pickle
has so many advantages such as automatically tying into copy/deepcopy. Would it be possible to make unpickle secure, e.g., by having the caller create a context in which all calls to unpickle are limited to unpickling a specific set of types? (When these types unpickle their sub-objects, they could potentially limit the set of types further.)
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/ Currently listening to: Death Egg Boss theme (Sonic Generations) -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Wed, Jul 22, 2015 at 6:17 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
On Jul 22, 2015, at 13:58, Ryan Gonzalez <rymg19@gmail.com> wrote:
Disclaimer: I know virtually *nothing* about cryptography, so this is probably worse than it seems.
It's always better to look for an existing cryptosystem than to try to invent a new one.
Briefly, I think what you're looking for here is a way to sign pickles and verify their signatures, which is a well-known problem.
If you have some secure way to store keys (e.g., the only code that ever touches the pickles runs on your backend servers), everything is easy; just use, say, OpenSSL to sign and verify your pickles (e.g., using a key on some non-public-accessible server). If you need public-accessible code to create and use pickles, there is no solution. (That's a slight oversimplification; a better way to put it is that if there's no existing cert-management and key-exchange system that can get the keys to your software securely, that probably means what you need is impossible.)
Tossing in a bunch of other stuff--a manifest listing the types, a timestamp, or other nonrandom salt--or tricks like obfuscating where the key is are ultimately irrelevant. If the signature is tamper-proof, adding more stuff to it doesn't make it any more so; if it's tamperable, adding more stuff doesn't make it less so. Of course you may want to add on extra features (e.g., timestamps can be useful for key revocation schemes to mitigate damage from a crack), or some of that information may be useful for its own sake (e.g., being able to extract the list of types without running the pickle could be very handy for debugging, logging, etc.), but it doesn't increase the security of the signature.
Anyway, I think what Neil is trying to solve is something different: assuming the data is insecure and there's no way to secure it, how do we write code that doesn't use it in an unsafe way?
They're really separate problems. I don't think Python should do anything to solve yours (anything Python could do, OpenSSL probably can already do for you, better); it might be useful for Python to solve his (although I think picking and stdlibifying or copying a good third-party solution may be a better idea than trying to design one).
Thanks Andrew, totally agree with what you said. For the record, I don't know exactly what the problem is. I just noticed on some projects people talking about writing their own unpickling code because of insecurities in pickle, and it made me think: "why should you have to?" E.g., https://github.com/matplotlib/matplotlib/issues/3424 https://github.com/matplotlib/matplotlib/issues/4756 People explicitly say: "get the ability to dump/return our figures to *any* serialization format other than pickle"! That is so unfortunate. Pickle is such a good solution except for the security. Why can't we have security too? It doesn't seem to me to be right for a project like matplotlib to be writing their own serialization library. It would be awesome if Python had secure serialization built-in. Best, Neil
On July 22, 2015 3:54:31 PM CDT, Andrew Barnert <abarnert@yahoo.com> wrote: On Jul 22, 2015, at 13:27, Ryan Gonzalez <rymg19@gmail.com> wrote:
A further idea: hashes.
Each Pickle database (or whatever it's called) would contain a hash made up of:
a) The types used to pickle the data. b) The hash of the data itself, prefixed with 2 bytes that have some sort of hard-to-get meaning (the length of the call stack?). c) The seconds since epoch, or another 64-bit value.
A type pickled and unpickled in a different interpreter instance isn't necessarily going to have the same hash value. And if you don't mean a Python hash, how do you hash an arbitrary class object? Or, if you mean just the name, how does that secure anything?
For that matter, it's often important for an updated version of the code to be able to load pickles created with yesterday's version. This is easy to do with the pickle protocol, but hashing would presumably break that (unless it didn't protect anything at all).
The three values would likely be merged via bitwise or.
Why would you merge three hash values with bitwise or instead of one of the usual hash combining mechanisms? This just throws away most of your entropy.
Uhhhh...I have no clue. It just came off the top of my head.
This has the advantage that there are three different elements making up the hash, some of which are harder to locate. Unless two of the values are known, the third can't be.
The types would be extracted from the hash via some kind of magic,
That really _would_ be magic. The whole point of a hash is that it's one-way. If the hashed values can be recovered from it, it's not a hash.
Well, I again know nothing about cryptography, so I guess "key" is a better phrase. :O
Also, "harder to locate" is useless, unless you plan to continually update your code as attackers locate the things you've hidden. (And, for something used in as many high-profile uses as Python's pickler, any security by obscurity would be attacked very frequently.)
and then it would validate the data in the database based on the types, like Neil said.
If someone wanted to change the types, they would need to regenerate the whole hash.
And... So what? Unless the checker has some secure way of knowing which timestamp, etc. to use in checking the hash, all you have to do is give it the timestamp, etc. that go along with your regenerated hash, and it will pass.
Further security could be obtained by prefixing the first value with another special byte sequence that, although easier to find, would be used for validation purposes.
Point 2's prefixing bytes and point 3's value would be especially trickier to find, since a few seconds may pass before the data is written to disk.
It's still a bit insecure, but much better than the current situation. I think.
I think it's much worse than the current situation, because it adds illusory security while still being effectively just as crackable.
On Wed, Jul 22, 2015 at 3:03 AM, Neil Girdhar
I've heard it said that pickle is a security hole, and so it's
better to
write your own serialization routine. That's unfortunate because
<mistersheik@gmail.com> wrote: pickle
has so many advantages such as automatically tying into copy/deepcopy. Would it be possible to make unpickle secure, e.g., by having the caller create a context in which all calls to unpickle are limited to unpickling a specific set of types? (When these types unpickle their sub-objects, they could potentially limit the set of types further.)
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/ Currently listening to: Death Egg Boss theme (Sonic Generations) -- Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Jul 22, 2015, at 17:27, Neil Girdhar <mistersheik@gmail.com> wrote:
Thanks Andrew, totally agree with what you said. For the record, I don't know exactly what the problem is. I just noticed on some projects people talking about writing their own unpickling code because of insecurities in pickle, and it made me think: "why should you have to?"
The problem is inherent to the design of pickle: it's a virtual machine that can make Python import arbitrary modules and call arbitrary globals (with arbitrary literals and/or already-constructed objects as arguments). You can't fix that without replacing the whole design. And that's what they're asking for in your second link: they want explicit imperative code in matplotlib, rather than the data, to drive the process. Also, the reason pickle is so convenient is that classes can opt in just by adding the right methods, but that's the same reason that not anticipating everything your code might do can mean an invisible security hole instead of a "can't pickle that type" error, so you can't fix that either without giving up that convenience. Of course the other problem is FUD. Despite the fact that there are plenty of use cases for which pickle is safe, there are people who would rather teach you that it's never ever safe than teach you how to recognize and understand potential problems. And there are people who believe pickle is slow and space-wasteful and can't handle large data, either because they read a blog post from 15 years ago, or because they're still using 2.7 and haven't read far enough down the docs page to see that they don't have to use format 0. And people who dogmatically insist that all serialization formats should be interchange formats (a pickle can only be unpickled by the exact same program, or a carefully-updated newer version of the same program) even when interchange isn't relevant. And so on. Changing pickle wouldn't get rid of the FUD unless you completely replaced it. So, it might be useful to build a little PyPI module that offered a pickle loader that didn't allow new modules to be imported and didn't allow any globals to be called except the ones specified in an explicit tuple specified in the constructor. But you still have to understand the issues to know when that will and won't solve your problems. And it still wouldn't satisfy the people posting in those bug reports.
* https://github.com/jsonpickle/jsonpickle (keep code and data separate) * https://pypi.python.org/pypi/dill (IPython) * https://github.com/zopefoundation/zodbpickle/issues/2 (cwe links) ... Alternatives to unserializing code: https://wrdrd.com/docs/consulting/knowledge-engineering#distributed-computin... #json-ld On Jul 22, 2015 8:48 PM, "Andrew Barnert via Python-ideas" < python-ideas@python.org> wrote:
On Jul 22, 2015, at 17:27, Neil Girdhar <mistersheik@gmail.com> wrote:
Thanks Andrew, totally agree with what you said. For the record, I don't know exactly what the problem is. I just noticed on some projects people talking about writing their own unpickling code because of insecurities in pickle, and it made me think: "why should you have to?"
The problem is inherent to the design of pickle: it's a virtual machine that can make Python import arbitrary modules and call arbitrary globals (with arbitrary literals and/or already-constructed objects as arguments). You can't fix that without replacing the whole design. And that's what they're asking for in your second link: they want explicit imperative code in matplotlib, rather than the data, to drive the process.
Also, the reason pickle is so convenient is that classes can opt in just by adding the right methods, but that's the same reason that not anticipating everything your code might do can mean an invisible security hole instead of a "can't pickle that type" error, so you can't fix that either without giving up that convenience.
Of course the other problem is FUD. Despite the fact that there are plenty of use cases for which pickle is safe, there are people who would rather teach you that it's never ever safe than teach you how to recognize and understand potential problems. And there are people who believe pickle is slow and space-wasteful and can't handle large data, either because they read a blog post from 15 years ago, or because they're still using 2.7 and haven't read far enough down the docs page to see that they don't have to use format 0. And people who dogmatically insist that all serialization formats should be interchange formats (a pickle can only be unpickled by the exact same program, or a carefully-updated newer version of the same program) even when interchange isn't relevant. And so on. Changing pickle wouldn't get rid of the FUD unless you completely replaced it.
So, it might be useful to build a little PyPI module that offered a pickle loader that didn't allow new modules to be imported and didn't allow any globals to be called except the ones specified in an explicit tuple specified in the constructor. But you still have to understand the issues to know when that will and won't solve your problems. And it still wouldn't satisfy the people posting in those bug reports.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Jul 22, 2015 at 5:27 PM, Neil Girdhar <mistersheik@gmail.com> wrote:
That is so unfortunate. Pickle is such a good solution except for the security. Why can't we have security too? It doesn't seem to me to be right for a project like matplotlib to be writing their own serialization library. It would be awesome if Python had secure serialization built-in.
The reason you can pickle/unpickle arbitrary Python objects is that the pickle format is basically a structured, optimized way of generating and then evaluating arbitrary Python code. Which is great because it's totally general -- that's why we love pickle, you can pickle anything -- but that exact feature is what makes it insecure. If you want to make something secure, that means making some explicit decisions about what kinds of things can be put into your data format and which cannot, and write some explicit code to handle each of these things instead of just handing the file format direct access to your interpreter. But by the time you've done that you've done the hard part of implementing a new format anyway... -n -- Nathaniel J. Smith -- http://vorpus.org
On Wed, Jul 22, 2015 at 9:46 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Wed, Jul 22, 2015 at 5:27 PM, Neil Girdhar <mistersheik@gmail.com> wrote:
That is so unfortunate. Pickle is such a good solution except for the security. Why can't we have security too? It doesn't seem to me to be right for a project like matplotlib to be writing their own serialization library. It would be awesome if Python had secure serialization
built-in.
The reason you can pickle/unpickle arbitrary Python objects is that the pickle format is basically a structured, optimized way of generating and then evaluating arbitrary Python code. Which is great because it's totally general -- that's why we love pickle, you can pickle anything -- but that exact feature is what makes it insecure. If you want to make something secure, that means making some explicit decisions about what kinds of things can be put into your data format and which cannot, and write some explicit code to handle each of these things instead of just handing the file format direct access to your interpreter. But by the time you've done that you've done the hard part of implementing a new format anyway...
Wouldn't it be easier to just tell unpickle which code it's allowed to run (by passing a list of modules and classes)? Then your serializer can be reused by deepcopy and other Python routines that might tie into "reduce"? I think that's easier than "implementing (yet another) a new format".
-n
-- Nathaniel J. Smith -- http://vorpus.org
On 07/23/2015 09:54 AM, Neil Girdhar wrote:
On Wed, Jul 22, 2015 at 9:46 PM, Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote:
On Wed, Jul 22, 2015 at 5:27 PM, Neil Girdhar <mistersheik@gmail.com <mailto:mistersheik@gmail.com>> wrote: > > That is so unfortunate. Pickle is such a good solution except for the > security. Why can't we have security too? It doesn't seem to me to be > right for a project like matplotlib to be writing their own serialization > library. It would be awesome if Python had secure serialization built-in.
The reason you can pickle/unpickle arbitrary Python objects is that the pickle format is basically a structured, optimized way of generating and then evaluating arbitrary Python code. Which is great because it's totally general -- that's why we love pickle, you can pickle anything -- but that exact feature is what makes it insecure. If you want to make something secure, that means making some explicit decisions about what kinds of things can be put into your data format and which cannot, and write some explicit code to handle each of these things instead of just handing the file format direct access to your interpreter. But by the time you've done that you've done the hard part of implementing a new format anyway...
Wouldn't it be easier to just tell unpickle which code it's allowed to run (by passing a list of modules and classes)?
unpickle can already do that, via Unpickler.find_class. There's an example in the docs. Eric.
Right, I forgot that that was mentioned in this thread. Then, I don't see the problem with unpickle. Is it still not secure enough for matplotlib e.g.? On Thu, Jul 23, 2015 at 10:26 AM, Eric V. Smith <eric@trueblade.com> wrote:
On 07/23/2015 09:54 AM, Neil Girdhar wrote:
On Wed, Jul 22, 2015 at 9:46 PM, Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote:
On Wed, Jul 22, 2015 at 5:27 PM, Neil Girdhar <mistersheik@gmail.com <mailto:mistersheik@gmail.com>> wrote: > > That is so unfortunate. Pickle is such a good solution except for
the
> security. Why can't we have security too? It doesn't seem to me
to be
> right for a project like matplotlib to be writing their own
serialization
> library. It would be awesome if Python had secure serialization
built-in.
The reason you can pickle/unpickle arbitrary Python objects is that the pickle format is basically a structured, optimized way of generating and then evaluating arbitrary Python code. Which is great because it's totally general -- that's why we love pickle, you can pickle anything -- but that exact feature is what makes it insecure. If you want to make something secure, that means making some explicit decisions about what kinds of things can be put into your data format and which cannot, and write some explicit code to handle each of
these
things instead of just handing the file format direct access to your interpreter. But by the time you've done that you've done the hard part of implementing a new format anyway...
Wouldn't it be easier to just tell unpickle which code it's allowed to run (by passing a list of modules and classes)?
unpickle can already do that, via Unpickler.find_class. There's an example in the docs.
Eric.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
--
--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/OhYb7RHNHyA/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
participants (6)
-
Andrew Barnert
-
Eric V. Smith
-
Nathaniel Smith
-
Neil Girdhar
-
Ryan Gonzalez
-
Wes Turner