On Sat, Feb 23, 2013 at 7:37 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Devin Jeanpierre writes: Nobody's saying we shouldn't bother with security. Any answer needs to be informed by the recognition that nothing we can design is proof against the Sufficiently Stupid/Lazy User, that's all I'm trying to say.
Sorry. Fair enough.
But security probably does have a cost in terms of inconvenience and restriction on capabilities. My question is "given that people can and will do stupid things with relatively safe libraries like json, what is the point of providing something intermediate between json and pickle?" In more detail, what features can we provide that don't involve the known risks of pickle that would be sufficiently attractive to users that they don't go to pickle anyway?
I believe that the features I'm suggesting meet that criterion (but see below for discussion of risk). Nothing will ever be sufficient to drive away all unwarranted use of pickle, but I feel like these two features are really big ones that would go a long way towards making the secure thing almost as easy in almost every circumstance. As long as I've ever personally wanted, although I can't speak for others.
You mention handling cycles, which adds minimal risk (unprepared code could infloop on the unpacked data, but that's not the serializer's fault), but also "new" types which isn't clear to me. If you mean new built-in types, can't the json module be extended? (That would apply to cycles as well, since we know it's possible it should be automatable.)
It can. This brings up an interesting point. YAML already extends JSON with cycle support (via aliases) and support for a notation for marking up nonstandard types (via tagging). For example: >>> yaml.load('&mydict {"a": !!python/tuple ["b", *mydict]}') {'a': ('b', {...})} PyYAML is useless security-wise, but if we're going to extend the json module, this would probably be the direction to go.
If you mean user-defined types, we're back where we started, with merely unpacking data running code whose provenance we don't know.
That actually isn't where we started. We started with a serialization format that includes such data as ""c__builtin__\neval\n(c__builtin__\nraw_input\n(S'py> '\ntRtR." (try running pickle.loads on that in Python 2). What I had in mind from the start was something where only whitelisted constructors are used to reconstitute python values from the serialized code. Then we're moved from trusting the input, to trusting the competence of authors of our objects in modules that we imported. In cerealizer there is a global registry of classes that profess to handle input securely. Obviously, they might be wrong, and maybe a user of a serialization library would want to provide a much smaller whitelist. Maybe even the bigger whitelist should be disabled by default, if we really want to be careful, and there should be a security warning in the docs if you try to use the global registry. So for example, there's the following things: # nominally safe; module authors only register if they believe # their deserialization code is safe even with untrusted input my_unserializer.loads("...", whitelist=my_unserializer.PSEUDOSAFE_GLOBAL_REGISTRY) # nominally safe; if not, then a security bug in python my_unserializer.loads("...", whitelist=set()) -- Devin