On Wed, Jul 15, 2020 at 11:24:17AM +1000, Chris Angelico wrote:
It's correct far more often than you might think. There's a LOT of code out there where the Python source code has the exact same external access permissions as its config files - often because there's no access to either.
Um, yes? Safe use-cases is not the issue here. It's the unsafe use-cases that are important. Especially the use-cases that people may think are safe but actually aren't.
To stick to the seat belt analogy for a moment... we don't reject seat belts in cars because most of the time cars are safely parked in a garage. We add them for the times that cars are in motion at speed.
Improving the security of pickle shouldn't be done for the sake of cases where the security of pickle is irrelevent. It should be done for the sake of cases where it is necessary, especially for those cases where the developer thinks that security isn't necessary, but they are mistaken.
So if you're distributing your code, then maybe you don't use pickle.
Sure. What do I use to serialise my complex data structure? I guess I could write out the repr and then call eval on it, that should be fine... *wink*
Okay, let's say that somebody else did the work. Some awfully clever chappy found a way to add a magical "pickle.safeload()" function that did everything needed, safely. Would you oppose it?
(The old unsafe one would presumably have to remain for backwards compatibility, or for the cases which are inherently unsafe.)
I would ask them which laws of physics they violated, since pickle inherently has to be able to execute arbitrary code in order to be able to do everything it needs to.
I'm not a pickle expert, but I don't think that's quite right. pickle has to be able to execute arbitrary code in order to be able to de-serialise arbitrary pickles, but that doesn't mean it has to de-serialise arbitrary pickles if you aren't expecting arbitrary pickles.
Random beat it to me by suggesting a white-list, but I was thinking the same way. The pickle protocol has to be able to deal with arbitrary instances, but very few apps using pickle need to, or want to, accept arbitrary instances. If my app serialised Widgets and Gadgets, then it ought to be an error to attempt to deserialise anything else.
Then all I need do is ensure that the Widget and Gadget classes are secure, not the entire Python universe :-)
As I said, I'm not an expect, but five minutes reading this:
allows me to confidently pontificate on the subject *wink*
The depickling virtual machine (pickle machine or PM) is not Turing complete. It has no loops or conditionals. It's a dumb machine that takes a sequence of op-codes, executing them in order, and then halt.
The GLOBAL op-code (by default) will import any module, and use any function from that module. That's dangerous; an option to restrict what modules and functions can be called by the PM would go a long way to reducing the attack surface of pickle. (I think.)
Random's idea of white-listing seems like a promising approach to me. Even if it doesn't make pickle "safe" in some absolute sense, it will make it *less unsafe* and reduce the attack surface for people using pickle.
Security is always about tradeoffs, and we shouldn't let the idea of some unattainable perfectly secure pickle get in the way of improving the safety of pickle.
If someone claims they've created a way to allow untrusted users to insert code into your Python programs and have it execute, but they've made it safe, would you oppose its inclusion in the stdlib?
But that's not really what we're asking for. We're asking for a way to *avoid* executing arbitrary code, while still allowing *trusted* objects to be depickled.
You want "pickle but magically able to know what's safe and what's not"?
Of course not. But maybe I want to be able to tell pickle what I think is safe, and have everything else fail.