
On Thu, Feb 21, 2013 at 3:01 AM, Devin Jeanpierre <jeanpierreda@gmail.com> wrote:
I've been noticing a lot of security-related issues being discussed in the Python world since the Ruby YAML problemcame out. Is it time to consider adding an alternative to pickle that is safe(r) by default?
Pickle is usable in situations few other things are, because it can handle cyclic references and virtually any python object. The only stdlib alternative I'm aware of is json, which can do neither of those things. (Or at least, not without significant extra serialization code.) I would imagine that any alternative supplied should be easy enough to use that pickle users would seriously consider switching, and include at least those features.
Pickle is unsafe if you give it untrusted input. It's safe if you pickle something yourself and then unpickle it. If the problem is that you want to pickle something and store it in some unsafe place (like a cookie or a db under user control) and then read it back in later and unpickle it, then you can mitigate the risk by using an HMAC or some other mechanism to prevent tampering and may want to consider encrypting it too. That said, there is one risk in pickling something yourself and unpickling it later that you need to watch out for. If your objects change, then unpickling might produce unexpected and even potentially unsafe results. You can mitigate this by adding object versions to your objects (as long as you don't forget to update that when the object changes). There's another problem - pickling is not guaranteed to work across Python versions. So you may find yourself having to read pickles that are no longer readable in a future python version. Not a problem for cookies, but a potential headache with long-lived pickles. All of this leads me to suggest using a better format for this problem. Json is a reasonable choice (I've used it myself) although I would still use an HMAC. If you encrypt it then that makes attacking the object that much harder. I'd advise against using your own format. I wrote a tutorial on hacking web sites called Gruyere <http://j.mp/gruyere-security>. I suggest reading the section on cookies http://j.mp/learn-state-manipulation (although to be honest, I recommend reading the whole thing :-) Aside from security, using a format like json encourages you to think about what belongs in the persisted object and what doesn't. Suppose your object includes a url. If you pickle it, you may end up persisting the parsed url with a dictionary of parameters and other unnecessary overhead. When you convert to json, you're going to just copy the url. On Thu, Feb 21, 2013 at 7:50 AM, Dustin J. Mitchell <dustin@v.igoro.us>wrote:
This conversation worries me. The security community has shown that safety isn't something you can add to a powerful tool. With great power comes great expressivity, and correspondingly more difficulty reasoning about it. Not to mention reasoning about yhe implementation. JSON is probably secure against code-execution exploits, but only probably.
When you put something in the stdlib and call it "safe", even with caveats, people will make even more brazen mistakes than with a documented-unsafe tool like pickle.
Yes indeed. --- Bruce Latest blog post: Alice's Puzzle Page http://www.vroospeak.com