On Thu, Jul 16, 2020, at 02:36, Stephen J. Turnbull wrote:
I was asking for the current Unpickler class, which currently has a whitelist hook for loading globals,
Callables are globals in this sense.
not all callables are globals, as has been pointed out attributes of objects (methods) can also be callable.
this does require calling getattr which is itself global, but you can't exercise any fine-grained control over this without substituting your own getattr function, which creates problems if you are unpickling an object which contains a reference to the getattr function. Ultimately, *this* is the problem that made me realize that find_class isn't an adequate hook - that you cannot block or substitute a callable [whether that's a class, a global function, or a method] without also impacting the ability to unpickle objects that contain references to the same callable *as data*.
So overriding Unpickler.find_class will allow you to restrict to specified callables. It's not clear to me why you would want more fine-grained control: why not put the argument checking in the object constructor? In most cases that's probably where it's most useful, anyway, by DRY.
to be modified to also have a whitelist hook so that an application can provide a function that looks at a callable and its arguments that the pickle proposes to call, and can choose to either evaluate it, raise an error, or return a substitute value.
I would guess you can already have this by overriding Unpickler.load_reduce and patching Unpickler.dispatch[REDUCE] to the new load_reduce. Is there any other way for a pickle to specify the code to invoke on data supplied by that pickle?
I think I got all of them, but if you think there may be others feel free to be an extra pair of eyes. But these overrides are not available for the C version, and may well not be available on other python implementations.
The idea that the pickle format is "inherently unsafe" and cannot be made safe is magical thinking.
I think you're quite wrong. Pickle format itself is inherently unsafe because it allows a pickle to specify code to be executed on data that pickle specifies.
Which is why an unpickler that *does not directly execute the specified code* is necessary and sufficient to be safe.
Of course they already exist, and can be overriden. I guess what you're asking for is a promise that the interface won't change in future versions of pickle.py. That's the only difference between overriding Unpickler.find_class and overriding Unpickler.load_reduce (or any other method).
The other difference is that overriding find_class is supported by _pickle.c
[The dispatch mechanism is also unreasonably annoying to override, and having to override four separate methods, one of which is underscore-prefixed, to do the same thing, is not ideal]
On a mostly unrelated note I also have to admit I am baffled why the NEWOBJ opcodes are defined to call __new__ instead of __newobj__, when the latter is expected to exist and be a valid reduce function. Having a dunder name for a method that expects to be called from pickle seems like it would have been useful for security, since then you could add classes to a whitelist without allowing unrestricted calls to their constructor. Is this just a performance micro-optimization, or was there some other reason to prefer calling __new__ directly?