On Wed, Jul 15, 2020 at 9:37 PM Steven D'Aprano email@example.com wrote:
On Wed, Jul 15, 2020 at 11:24:17AM +1000, Chris Angelico wrote:
So if you're distributing your code, then maybe you don't use pickle.
Sure. What do I use to serialise my complex data structure? I guess I could write out the repr and then call eval on it, that should be fine... *wink*
Maybe don't HAVE an arbitrarily complex data structure for serialization. Maybe have a way to turn the in-memory representation into a much simpler structure, serialize that, and then load from your saved form.
It'll make your code a lot easier to reason about and refactor, since you're no longer intrinsically binding your code to your save format.
I'm not a pickle expert, but I don't think that's quite right. pickle has to be able to execute arbitrary code in order to be able to de-serialise arbitrary pickles, but that doesn't mean it has to de-serialise arbitrary pickles if you aren't expecting arbitrary pickles.
Random beat it to me by suggesting a white-list, but I was thinking the same way. The pickle protocol has to be able to deal with arbitrary instances, but very few apps using pickle need to, or want to, accept arbitrary instances. If my app serialised Widgets and Gadgets, then it ought to be an error to attempt to deserialise anything else.
Then all I need do is ensure that the Widget and Gadget classes are secure, not the entire Python universe :-)
If that's what you want, then have a way to serialize Widgets and Gadgets, and *not* a way to serialize arbitrary objects. That, to me, sounds more like "enhanced JSON" than "magically safe pickle".
Security is always about tradeoffs, and we shouldn't let the idea of some unattainable perfectly secure pickle get in the way of improving the safety of pickle.
Nor should we let the idea of a secure pickle get in the way of improving the functionality of safer options.
If someone claims they've created a way to allow untrusted users to insert code into your Python programs and have it execute, but they've made it safe, would you oppose its inclusion in the stdlib?
But that's not really what we're asking for. We're asking for a way to *avoid* executing arbitrary code, while still allowing *trusted* objects to be depickled.
Except that you are. It's equivalent to trying to create a safe version of eval() instead of building a simple arithmetic expression parser. You're starting from danger and trying to patch until it's safe, instead of starting from safety and adding functionality until it's usable.
Remember: If you have insufficient functionality, you'll know about it; if you are insufficiently secure, you won't know till it's too late.
You want "pickle but magically able to know what's safe and what's not"?
Of course not. But maybe I want to be able to tell pickle what I think is safe, and have everything else fail.
That's fair, but are you actually guaranteeing that it will never read arbitrary attributes from objects? Can pickle grab a module or function, pick up a dunder from it, and go to town? Are you able to give a total 100% guarantee that it cannot? If not, how do you know that it's safe?
Edwin has given further information on the inherent unsafe nature of pickle. It should be used for trusted pickles, NOT as a basis for some magical "safe" parser.