On Wed, Jul 15, 2020 at 09:55:03AM +1000, Chris Angelico wrote:
At that point, you are NOT running it with the "exact same access permissions", are you? :)
Indeed, and I did acknowledge that you were probably thinking about a different scenario. But I was challenging your assertion that anyone who can write a malicious pickle could just as easily inject malicious code into my source code. That's not always correct.
But a large amount of code is indeed run with the same access permissions as its temporary files (which may be incredibly restrictive or incredibly generous, either way).
Again, this is true. But we don't counter risks by pointing at the times that it's not a risk:
"Seat belts in cars? Ludicrous, most of the time the car is sitting still, not even moving, with nobody inside it! Why does it need seat belts?"
You are absolutely correct that most code (whether rightly or wrongly) doesn't consider, or maybe even doesn't *need* to consider, the security of pickle. If I personally write out a pickle, and then read it back in, what am I worried about? That I personally will inject malicious code into my own pickle, to grant myself access to my own computer? I don't think so.
But if I'm distributing my code to others, the responsible thing to do is to think of the potential security risks about using pickle in my app, or library. What if they use it in ways that I didn't foresee, ways which *ought to be* safe except for my choice to use pickle?
I'm not demanding that developers be omniscient, but I do think that they should not willfully ignore known security risks.
"All care, no responsibility" is only meaningful if we do actually take care.
They've probably been thinking about ways to exploit pickle for months. I've spent three minutes reading the docs. Who is likely to win?
This is why an *inherently safe* serialization format is a necessary thing. I don't want to spend even three minutes thinking about exploits, I just want to write the data out and read it back in, no issues, no worries, and not have to think about it.
And that's why we have JSON and various others,
How do I use JSON to serialise an arbitrary instance of some class?
Instances are just data. (Well, usually.) I should be able to serialise instances (well, most of them) and safely read them back again. Of course the gap between *should* and *can* is quite large, and Python really doesn't make it easy. I'm not saying this is an easy problem to solve.
which are not pickle and are not vulnerable the way that pickle is. I don't think we need a "safe pickle".
So they're vulverable in other ways? :-)
What we need is to not use pickle when it's not the right tool.
How do I know when it's not the right tool?
How do I know which other serialisation format is right?
What about those -- and they are a significant minority -- who are restricted to only what's in the stdlib?
I'm highly sympathetic to the requests for "JSON but able to encode more types", but not so sympathetic to "pickle but magically able to be safe".
Okay, let's say that somebody else did the work. Some awfully clever chappy found a way to add a magical "pickle.safeload()" function that did everything needed, safely. Would you oppose it?
(The old unsafe one would presumably have to remain for backwards compatibility, or for the cases which are inherently unsafe.)
If not, then it seems to me you don't really care about this issue and could sit out of it :-)
If you do *actively oppose* adding a safe version of pickle, perhaps you should explain why.