On Mon, Jul 13, 2020 at 09:56:45PM +1000, Chris Angelico wrote:
A pickle file (or equivalent blob in a database, or whatever) should be considered equally as trusted as your source code. If you're writing out a file that has the exact same access permissions as your own source code, and then reading it back, you shouldn't have to worry about pickle's safety any more than you worry about your code's safety
- anyone who could maliciously craft something for you to unpickle
could equally just edit the source code directly.
If I worry about the security of my source code, I can put a known good copy on read-only media, or lock it down with more restrictive permissions so that the user running the code cannot modify it. In either case, if my code needs to write data out and then later back in to a pickle file, it can't be written to the same location as my source code. (As it is read-only.)
So it isn't correct that a malicious user having the ability to craft a pickle file could just edit the source code. These are independent threats.
There is a scenario where what you say is correct: as the application developer, I create my data structures for my app and store them in pickles *at build time*, distributing the pickles as part of my app. In that case they can be read-only, and are effectively compiled source code. I guess you were thinking of a similar scenario?
But in the case of security, it really doesn't matter about the safe scenarios. It doesn't matter if there are a million safe use-cases for pickle ("what if I'm running on a single-user system with no internet, a malicious user can only hurt themselves...") if the user mistakes their actually unsafe scenario for a safe one.
And that's the risk: can I guarantee that there is no clever scheme by which an attacker can fool me into unpickling malicious code? I need to be smarter than the attacker, and more imaginative, and to have thought as long and hard about the problem as they have.
They've probably been thinking about ways to exploit pickle for months. I've spent three minutes reading the docs. Who is likely to win?
This is why an *inherently safe* serialization format is a necessary thing. I don't want to spend even three minutes thinking about exploits, I just want to write the data out and read it back in, no issues, no worries, and not have to think about it.
 Victims and authors of viruses and malware in the 1980s and 1990s may disagree.