Mailman 3 Pickle security improvements - Python-ideas

newer
Official joint communication from...

Pickle security improvements

Random832

July 11, 2020

10:58 a.m.

The current practice, by overriding find_class, is limited to overriding what globals get loaded. This makes it impossible to distinguish globals that will be used as data from globals that will be called as constructors, along with similar concerns with object attributes [especially methods] obtained by loading builtins.getattr as global. I would suggest also exposing for overrides the points where a callable loaded from the pickle is called - on the pure-python _Unpickler these are _instantiate, load_newobj, load_newobj_ex, and load_reduce, though it might be worthwhile to make a single method that can be overridden and use it at the points where each of these call a loaded object.

Show replies by date

Wes Turner

July 2020

1:31 p.m.

Would this accomplish something like: pickle.load(safe=True) # or pickle.safe_loads() Is there already a way to load data and not code *with pickle*? https://docs.python.org/3/library/pickle.html On Sat, Jul 11, 2020, 11:01 AM Random832 <random832@fastmail.com> wrote:

...

The current practice, by overriding find_class, is limited to overriding what globals get loaded. This makes it impossible to distinguish globals that will be used as data from globals that will be called as constructors, along with similar concerns with object attributes [especially methods] obtained by loading builtins.getattr as global.

I would suggest also exposing for overrides the points where a callable loaded from the pickle is called - on the pure-python _Unpickler these are _instantiate, load_newobj, load_newobj_ex, and load_reduce, though it might be worthwhile to make a single method that can be overridden and use it at the points where each of these call a loaded object. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BB2TLA... Code of Conduct: http://python.org/psf/codeofconduct/

Christopher Barker

4:24 p.m.

On Sat, Jul 11, 2020 at 10:33 AM Wes Turner <wes.turner@gmail.com> wrote:

...

Is there already a way to load data and not code *with pickle*? https://docs.python.org/3/library/pickle.html

I'm not sure if this is what you mean, but there is: ast.literal_eval() which I *think* is safe. NOTE: I've wanted for ages to make a "PYSON" format / module for when JSON is not quite enough. e.g. distinction between lists and tuples, dict keys that aren't strings .... -CHB

...

On Sat, Jul 11, 2020, 11:01 AM Random832 <random832@fastmail.com> wrote:

...
The current practice, by overriding find_class, is limited to overriding what globals get loaded. This makes it impossible to distinguish globals that will be used as data from globals that will be called as constructors, along with similar concerns with object attributes [especially methods] obtained by loading builtins.getattr as global.

I would suggest also exposing for overrides the points where a callable loaded from the pickle is called - on the pure-python _Unpickler these are _instantiate, load_newobj, load_newobj_ex, and load_reduce, though it might be worthwhile to make a single method that can be overridden and use it at the points where each of these call a loaded object. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BB2TLA... Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/NRLT3I... Code of Conduct: http://python.org/psf/codeofconduct/

-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

David Mertz

4:43 p.m.

On Sat, Jul 11, 2020 at 4:24 PM Christopher Barker <pythonchb@gmail.com> wrote:

...

NOTE: I've wanted for ages to make a "PYSON" format / module for when JSON is not quite enough. e.g. distinction between lists and tuples, dict keys that aren't strings ....

https://github.com/jsonpickle/jsonpickle You're not the first one. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

Wes Turner

4:54 p.m.

AFAIU, jsonpickle (and fill, cloud pickle,) will still execute arbitray python (and ctypes) code. Isn't pickle faster than C JSON? Would it be feasible to just NOP callables when safe=True? Or would that be pointless? JSON5 is great but still doesn't handle e.g. complex fractions On Sat, Jul 11, 2020, 4:43 PM David Mertz <mertz@gnosis.cx> wrote:

...

On Sat, Jul 11, 2020 at 4:24 PM Christopher Barker <pythonchb@gmail.com> wrote:

...
NOTE: I've wanted for ages to make a "PYSON" format / module for when JSON is not quite enough. e.g. distinction between lists and tuples, dict keys that aren't strings ....

https://github.com/jsonpickle/jsonpickle

You're not the first one.

-- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

Greg Ewing

8:45 p.m.

On 12/07/20 8:54 am, Wes Turner wrote:

...

Would it be feasible to just NOP callables when safe=True?

This would break pickle, because calling constructors is the way many objects are unpickled. And it's not easy to tell which callables are safe to use as constructors and which aren't. -- Greg

Edwin Zimmerman

9:01 p.m.

As I see it, the unsafe callables (eval, exec, os.system, etc) are generally functions, and safe ones(int, list, dict) are generally classes, though there certainly would be exceptions. Would it be too great of a breaking change to block function callables by default? That might be an incremental step towards better security. --Edwin On 7/11/2020 8:45 PM, Greg Ewing wrote:

...

On 12/07/20 8:54 am, Wes Turner wrote:

...
Would it be feasible to just NOP callables when safe=True?

This would break pickle, because calling constructors is the way many objects are unpickled. And it's not easy to tell which callables are safe to use as constructors and which aren't.

Wes Turner

10:19 p.m.

If there were a configurable allow list of "safe" types, what in the stdlib would and wouldn't be on the list? On Sat, Jul 11, 2020, 9:16 PM Edwin Zimmerman <edwin@211mainstreet.net> wrote:

...

As I see it, the unsafe callables (eval, exec, os.system, etc) are generally functions, and safe ones(int, list, dict) are generally classes, though there certainly would be exceptions. Would it be too great of a breaking change to block function callables by default? That might be an incremental step towards better security.

--Edwin On 7/11/2020 8:45 PM, Greg Ewing wrote:

...
On 12/07/20 8:54 am, Wes Turner wrote:

...
Would it be feasible to just NOP callables when safe=True?

This would break pickle, because calling constructors is the way many objects are unpickled. And it's not easy to tell which callables are safe to use as constructors and which aren't.

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2QTTMN... Code of Conduct: http://python.org/psf/codeofconduct/

Bruce Leban

10:21 p.m.

The security problem arises from the fact that pickle will call arbitrary functions and that it will unpickle arbitrary classes, not just the ones that you might intend it to. It seems to me that the way to make pickle safe is to limit what it can call. Unpickle can take a list of classes and it will only unpickle objects in those classes plus the built-in types (list, tuple, etc.). I imagine that in most cases, when you are unpickling, you have some idea of what the thing is that you are unpickling. If an unlisted class or arbitrary function reference is found, it raises an UnpicklingError. There's even an example of this in the docs, but it's left to individual developers to copy the code from the documentation: https://docs.python.org/3.8/library/pickle.html. Why isn't this built in? This is still vulnerable to a class being implemented in a way that doesn't take into account how malicious unpickling might be used on it, and then someone unknowingly pickling it. We can go one step further by adding an __unpickle__ method that, if present, is the only method that is used to load a class. We would also want to add a __pickle__ method. --- Bruce

Greg Ewing

11:17 p.m.

On 12/07/20 1:01 pm, Edwin Zimmerman wrote:

...

As I see it, the unsafe callables (eval, exec, os.system, etc) are generally functions, and safe ones(int, list, dict) are generally classes, though there certainly would be exceptions.

Where security is concerned, "there certainly would be exceptions" are not words you want to hear. -- Greg

Edwin Zimmerman

6:53 a.m.

On 7/11/2020 11:17 PM, Greg Ewing wrote:

...

On 12/07/20 1:01 pm, Edwin Zimmerman wrote:

...
As I see it, the unsafe callables (eval, exec, os.system, etc) are generally functions, and safe ones(int, list, dict) are generally classes, though there certainly would be exceptions.

Where security is concerned, "there certainly would be exceptions" are not words you want to hear.

Agreed, that is why pickle should almost never be used. In the past, I have looked long and hard at using pickle in my own projects, but was always turned away because of its potential for security issues. I've thought for years that pickle is a major security foot gun, and I think that not allowing this by default:

...

...
...
pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.") would be a step in the right direction.

--Edwin

Chris Angelico

7:56 a.m.

On Mon, Jul 13, 2020 at 8:58 PM Edwin Zimmerman <edwin@211mainstreet.net> wrote:

...

On 7/11/2020 11:17 PM, Greg Ewing wrote:

On 12/07/20 1:01 pm, Edwin Zimmerman wrote:

As I see it, the unsafe callables (eval, exec, os.system, etc) are generally functions, and safe ones(int, list, dict) are generally classes, though there certainly would be exceptions.

Where security is concerned, "there certainly would be exceptions" are not words you want to hear.

Agreed, that is why pickle should almost never be used. In the past, I have looked long and hard at using pickle in my own projects, but was always turned away because of its potential for security issues. I've thought for years that pickle is a major security foot gun, and I think that not allowing this by default:

...
...
...
pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.") would be a step in the right direction.

Wes Turner

12:52 p.m.

Looks like pyro4 (python remote objects) has moved to the serpent library (as.literal_eval) [1]

...

defaults to a safe serializer (serpent https://pypi.python.org/pypi/serpent ) that supports many Python data types. supports different serializers (serpent, json, marshal, msgpack, pickle, cloudpickle, dill)

And pyro5 has removed support for unsafe serializers [5]:

...

no support for unsafe serializers AT ALL (pickle, dill, cloudpickle) - only safe serializers (serpent, marshal, json, msgpack)

for now, requires msgpack to be installed as well as serpent.

TBH, I'm not sure how much any of the serialization protocols have been fuzzed for safety purposes if at all. [4] https://github.com/irmen/Pyro4#feature-overview [5] https://pyro5.readthedocs.io/en/latest/intro.html#what-has-been-changed-sinc... (FWIW, PyArrow has very fast support for "Streaming, Serialization, and IPC" https://arrow.apache.org/docs/python/ipc.html and support for "Arbitrary Object Serialization" https://arrow.apache.org/docs/python/ipc.html#arbitrary-object-serialization w/ pyarrow.serialize() and pyarrow.deserialize()) On Mon, Jul 13, 2020, 7:57 AM Chris Angelico <rosuav@gmail.com> wrote:

...

On Mon, Jul 13, 2020 at 8:58 PM Edwin Zimmerman <edwin@211mainstreet.net> wrote:

...
On 7/11/2020 11:17 PM, Greg Ewing wrote:

On 12/07/20 1:01 pm, Edwin Zimmerman wrote:

As I see it, the unsafe callables (eval, exec, os.system, etc) are

generally functions, and safe ones(int, list, dict) are generally classes, though there certainly would be exceptions.

...
Where security is concerned, "there certainly would be exceptions" are not words you want to hear.

Agreed, that is why pickle should almost never be used. In the past, I

have looked long and hard at using pickle in my own projects, but was always turned away because of its potential for security issues. I've thought for years that pickle is a major security foot gun, and I think that not allowing this by default:

...
...
...
...
pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.") would be a step in the right direction.

A pickle file (or equivalent blob in a database, or whatever) should be considered equally as trusted as your source code. If you're writing out a file that has the exact same access permissions as your own source code, and then reading it back, you shouldn't have to worry about pickle's safety any more than you worry about your code's safety - anyone who could maliciously craft something for you to unpickle could equally just edit the source code directly.

ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/HRJVLR... Code of Conduct: http://python.org/psf/codeofconduct/

Christopher Barker

2:47 p.m.

I'm no security expert, but we've got a big pile of serialization code that is kind of like JSON-pickly, but it will only deserialize known objects. it's a bit of pain to declare what you want to work with, but it seems safer. I also have a newer system (built on top of dataclasses) that serializes to JSON, but it's "pure" JSON -- it does not store any info about object types or anything in the JSON. Rather, the deserializer knows what it expects, and can only unpack that. e.g., you set a class attribute to the type "List_of_Object_A" -- and then it will use Object_A's deserializer to try to unpack the JSON. It does mean you can't "just save" an arbitrary object, but you can use a decorator on a datacalsss to make any object savable. I did it this way because I want the JSON to be plain old JSON, but I also think it's substantially more secure. part of a non-released code base, but iof folks think it would be handy, I can pull it out into its own package. -CHB On Mon, Jul 13, 2020 at 9:54 AM Wes Turner <wes.turner@gmail.com> wrote:

...

Looks like pyro4 (python remote objects) has moved to the serpent library (as.literal_eval) [1]

...
defaults to a safe serializer (serpent https://pypi.python.org/pypi/serpent ) that supports many Python data types. supports different serializers (serpent, json, marshal, msgpack, pickle, cloudpickle, dill)

And pyro5 has removed support for unsafe serializers [5]:

...
no support for unsafe serializers AT ALL (pickle, dill, cloudpickle) - only safe serializers (serpent, marshal, json, msgpack)

for now, requires msgpack to be installed as well as serpent.

TBH, I'm not sure how much any of the serialization protocols have been fuzzed for safety purposes if at all.

[4] https://github.com/irmen/Pyro4#feature-overview

[5] https://pyro5.readthedocs.io/en/latest/intro.html#what-has-been-changed-sinc...

(FWIW, PyArrow has very fast support for "Streaming, Serialization, and IPC" https://arrow.apache.org/docs/python/ipc.html and support for "Arbitrary Object Serialization" https://arrow.apache.org/docs/python/ipc.html#arbitrary-object-serialization w/ pyarrow.serialize() and pyarrow.deserialize())

On Mon, Jul 13, 2020, 7:57 AM Chris Angelico <rosuav@gmail.com> wrote:

...
On Mon, Jul 13, 2020 at 8:58 PM Edwin Zimmerman <edwin@211mainstreet.net> wrote:

...
On 7/11/2020 11:17 PM, Greg Ewing wrote:

On 12/07/20 1:01 pm, Edwin Zimmerman wrote:

As I see it, the unsafe callables (eval, exec, os.system, etc) are

generally functions, and safe ones(int, list, dict) are generally classes, though there certainly would be exceptions.

...
Where security is concerned, "there certainly would be exceptions" are not words you want to hear.

Agreed, that is why pickle should almost never be used. In the past, I

have looked long and hard at using pickle in my own projects, but was always turned away because of its potential for security issues. I've thought for years that pickle is a major security foot gun, and I think that not allowing this by default:

...
...
...
...
pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.") would be a step in the right direction.

A pickle file (or equivalent blob in a database, or whatever) should be considered equally as trusted as your source code. If you're writing out a file that has the exact same access permissions as your own source code, and then reading it back, you shouldn't have to worry about pickle's safety any more than you worry about your code's safety - anyone who could maliciously craft something for you to unpickle could equally just edit the source code directly.

ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/HRJVLR... Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7CLM57... Code of Conduct: http://python.org/psf/codeofconduct/

-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Edwin Zimmerman

3:03 p.m.

I would have interest in it. --Edwin I'm no security expert, but we've got a big pile of serialization code that is kind of like JSON-pickly, but it will only deserialize known objects. it's a bit of pain to declare what you want to work with, but it seems safer. I also have a newer system (built on top of dataclasses) that serializes to JSON, but it's "pure" JSON -- it does not store any info about object types or anything in the JSON. Rather, the deserializer knows what it expects, and can only unpack that. e.g., you set a class attribute to the type "List_of_Object_A" -- and then it will use Object_A's deserializer to try to unpack the JSON. It does mean you can't "just save" an arbitrary object, but you can use a decorator on a datacalsss to make any object savable. I did it this way because I want the JSON to be plain old JSON, but I also think it's substantially more secure. part of a non-released code base, but iof folks think it would be handy, I can pull it out into its own package. -CHB On Mon, Jul 13, 2020 at 9:54 AM Wes Turner <wes.turner@gmail.com <mailto:wes.turner@gmail.com> > wrote: Looks like pyro4 (python remote objects) has moved to the serpent library (as.literal_eval) [1]

...

defaults to a safe serializer (serpent https://pypi.python.org/pypi/serpent ) that supports many Python data types.

...

supports different serializers (serpent, json, marshal, msgpack, pickle, cloudpickle, dill)

And pyro5 has removed support for unsafe serializers [5]:

...

no support for unsafe serializers AT ALL (pickle, dill, cloudpickle) - only safe serializers (serpent, marshal, json, msgpack)

...

for now, requires msgpack to be installed as well as serpent.

...

On 7/11/2020 11:17 PM, Greg Ewing wrote:

On 12/07/20 1:01 pm, Edwin Zimmerman wrote:

As I see it, the unsafe callables (eval, exec, os.system, etc) are generally functions, and safe ones(int, list, dict) are generally classes, though there certainly would be exceptions.

Where security is concerned, "there certainly would be exceptions" are not words you want to hear.

Agreed, that is why pickle should almost never be used. In the past, I have looked long and hard at using pickle in my own projects, but was always turned away because of its potential for security issues. I've thought for years that pickle is a major security foot gun, and I think that not allowing this by default:

...
...
...
pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.") would be a step in the right direction.

A pickle file (or equivalent blob in a database, or whatever) should be considered equally as trusted as your source code. If you're writing out a file that has the exact same access permissions as your own source code, and then reading it back, you shouldn't have to worry about pickle's safety any more than you worry about your code's safety - anyone who could maliciously craft something for you to unpickle could equally just edit the source code directly. ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org <mailto:python-ideas@python.org> To unsubscribe send an email to python-ideas-leave@python.org <mailto:python-ideas-leave@python.org> https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/HRJVLR... Code of Conduct: http://python.org/psf/codeofconduct/ _______________________________________________ Python-ideas mailing list -- python-ideas@python.org <mailto:python-ideas@python.org> To unsubscribe send an email to python-ideas-leave@python.org <mailto:python-ideas-leave@python.org> https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7CLM57... Code of Conduct: http://python.org/psf/codeofconduct/ -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Christopher Barker

1:14 a.m.

On Mon, Jul 13, 2020 at 12:03 PM Edwin Zimmerman <edwin@211mainstreet.net> wrote:

...

I would have interest in it.

OK -- I'll see what I can do about pulling it out and putting it on gitHub. Not sure I'll have the time to clean it up and make a nice package out of it, but maybe there's some ideas in there worth sharing. -CHB

...

--Edwin

I'm no security expert, but we've got a big pile of serialization code that is kind of like JSON-pickly, but it will only deserialize known objects. it's a bit of pain to declare what you want to work with, but it seems safer.

I also have a newer system (built on top of dataclasses) that serializes to JSON, but it's "pure" JSON -- it does not store any info about object types or anything in the JSON. Rather, the deserializer knows what it expects, and can only unpack that. e.g., you set a class attribute to the type "List_of_Object_A" -- and then it will use Object_A's deserializer to try to unpack the JSON.

It does mean you can't "just save" an arbitrary object, but you can use a decorator on a datacalsss to make any object savable.

I did it this way because I want the JSON to be plain old JSON, but I also think it's substantially more secure.

part of a non-released code base, but iof folks think it would be handy, I can pull it out into its own package.

-CHB

On Mon, Jul 13, 2020 at 9:54 AM Wes Turner <wes.turner@gmail.com> wrote:

Looks like pyro4 (python remote objects) has moved to the serpent library (as.literal_eval) [1]

...
defaults to a safe serializer (serpent https://pypi.python.org/pypi/serpent ) that supports many Python data types.

...
supports different serializers (serpent, json, marshal, msgpack, pickle, cloudpickle, dill)

And pyro5 has removed support for unsafe serializers [5]:

...
no support for unsafe serializers AT ALL (pickle, dill, cloudpickle) - only safe serializers (serpent, marshal, json, msgpack)

...
...
for now, requires msgpack to be installed as well as serpent.

TBH, I'm not sure how much any of the serialization protocols have been fuzzed for safety purposes if at all.

[4] https://github.com/irmen/Pyro4#feature-overview

[5] https://pyro5.readthedocs.io/en/latest/intro.html#what-has-been-changed-sinc...

(FWIW, PyArrow has very fast support for "Streaming, Serialization, and IPC"

https://arrow.apache.org/docs/python/ipc.html and support for "Arbitrary Object Serialization" https://arrow.apache.org/docs/python/ipc.html#arbitrary-object-serialization w/ pyarrow.serialize() and pyarrow.deserialize())

On Mon, Jul 13, 2020, 7:57 AM Chris Angelico <rosuav@gmail.com> wrote:

On Mon, Jul 13, 2020 at 8:58 PM Edwin Zimmerman <edwin@211mainstreet.net> wrote:

...
On 7/11/2020 11:17 PM, Greg Ewing wrote:

On 12/07/20 1:01 pm, Edwin Zimmerman wrote:

As I see it, the unsafe callables (eval, exec, os.system, etc) are

generally functions, and safe ones(int, list, dict) are generally classes, though there certainly would be exceptions.

...
Where security is concerned, "there certainly would be exceptions" are not words you want to hear.

Agreed, that is why pickle should almost never be used. In the past, I

have looked long and hard at using pickle in my own projects, but was always turned away because of its potential for security issues. I've thought for years that pickle is a major security foot gun, and I think that not allowing this by default:

...
...
...
...
pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.") would be a step in the right direction.

A pickle file (or equivalent blob in a database, or whatever) should be considered equally as trusted as your source code. If you're writing out a file that has the exact same access permissions as your own source code, and then reading it back, you shouldn't have to worry about pickle's safety any more than you worry about your code's safety - anyone who could maliciously craft something for you to unpickle could equally just edit the source code directly.

ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/HRJVLR... Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7CLM57... Code of Conduct: http://python.org/psf/codeofconduct/

--

Christopher Barker, PhD

Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

João Santos

9:35 a.m.

...

I would have interest in it.

--Edwin

I'm no security expert, but we've got a big pile of serialization code that is kind of like JSON-pickly, but it will only deserialize known objects. it's a bit of pain to declare what you want to work with, but it seems safer.

I also have a newer system (built on top of dataclasses) that serializes to JSON, but it's "pure" JSON -- it does not store any info about object types or anything in the JSON. Rather, the deserializer knows what it expects, and can only unpack that. e.g., you set a class attribute to the type "List_of_Object_A" -- and then it will use Object_A's deserializer to try to unpack the JSON.

It does mean you can't "just save" an arbitrary object, but you can use a decorator on a datacalsss to make any object savable.

I did it this way because I want the JSON to be plain old JSON, but I also think it's substantially more secure.

part of a non-released code base, but iof folks think it would be handy, I can pull it out into its own package.

-CHB

On Mon, Jul 13, 2020 at 9:54 AM Wes Turner <wes.turner@gmail.com <mailto:wes.turner@gmail.com> > wrote:

Looks like pyro4 (python remote objects) has moved to the serpent library (as.literal_eval) [1]

...
defaults to a safe serializer (serpent https://pypi.python.org/pypi/serpent ) that supports many Python data types.

supports different serializers (serpent, json, marshal, msgpack, pickle, cloudpickle, dill) And pyro5 has removed support for unsafe serializers [5]: no support for unsafe serializers AT ALL (pickle, dill, cloudpickle) - only safe serializers (serpent, marshal, json, msgpack)

for now, requires msgpack to be installed as well as serpent.

TBH, I'm not sure how much any of the serialization protocols have been fuzzed for safety purposes if at all.

[4] https://github.com/irmen/Pyro4#feature-overview

[5] https://pyro5.readthedocs.io/en/latest/intro.html#what-has-been-changed-sin ce-pyro4

(FWIW, PyArrow has very fast support for "Streaming, Serialization, and IPC"

https://arrow.apache.org/docs/python/ipc.html and support for "Arbitrary Object Serialization" https://arrow.apache.org/docs/python/ipc.html#arbitrary-object-serializatio n w/ pyarrow.serialize() and pyarrow.deserialize())

On Mon, Jul 13, 2020, 7:57 AM Chris Angelico <rosuav@gmail.com <mailto:rosuav@gmail.com> > wrote: On Mon, Jul 13, 2020 at 8:58 PM Edwin Zimmerman <edwin@211mainstreet.net <mailto:edwin@211mainstreet.net> > wrote:

...
On 7/11/2020 11:17 PM, Greg Ewing wrote:

On 12/07/20 1:01 pm, Edwin Zimmerman wrote:

As I see it, the unsafe callables (eval, exec, os.system, etc) are generally functions, and safe ones(int, list, dict) are generally classes, though there certainly would be exceptions.

Where security is concerned, "there certainly would be exceptions" are not words you want to hear.

Agreed, that is why pickle should almost never be used. In the past, I have looked long and hard at using pickle in my own projects, but was always turned away because of its potential for security issues. I've thought for years that pickle is a major security foot gun, and I think that not allowing

Pydantic (https://pydantic-docs.helpmanual.io/) can already do that. On Monday, 13 July 2020 21:03:14 CEST Edwin Zimmerman wrote: this by default:

...

...
...
...
...
pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")

would be a step in the right direction.

A pickle file (or equivalent blob in a database, or whatever) should be considered equally as trusted as your source code. If you're writing out a file that has the exact same access permissions as your own source code, and then reading it back, you shouldn't have to worry about pickle's safety any more than you worry about your code's safety - anyone who could maliciously craft something for you to unpickle could equally just edit the source code directly.

ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org <mailto:python-ideas@python.org> To unsubscribe send an email to python-ideas-leave@python.org <mailto:python-ideas-leave@python.org> https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/HRJVL RL4FDF4MQ6EPEHC36P6CHZSQBZ3/ Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org <mailto:python-ideas@python.org> To unsubscribe send an email to python-ideas-leave@python.org <mailto:python-ideas-leave@python.org> https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7CLM5 7BK3YRFOOZBSEIWV65EKPQLZXPJ/ Code of Conduct: http://python.org/psf/codeofconduct/

Steven D'Aprano

7:45 p.m.

On Mon, Jul 13, 2020 at 09:56:45PM +1000, Chris Angelico wrote:

...

A pickle file (or equivalent blob in a database, or whatever) should be considered equally as trusted as your source code. If you're writing out a file that has the exact same access permissions as your own source code, and then reading it back, you shouldn't have to worry about pickle's safety any more than you worry about your code's safety - anyone who could maliciously craft something for you to unpickle could equally just edit the source code directly.

If I worry about the security of my source code, I can put a known good copy on read-only media, or lock it down with more restrictive permissions so that the user running the code cannot modify it. In either case, if my code needs to write data out and then later back in to a pickle file, it can't be written to the same location as my source code. (As it is read-only.) So it isn't correct that a malicious user having the ability to craft a pickle file could just edit the source code. These are independent threats. There is a scenario where what you say is correct: as the application developer, I create my data structures for my app and store them in pickles *at build time*, distributing the pickles as part of my app. In that case they can be read-only, and are effectively compiled source code. I guess you were thinking of a similar scenario? But in the case of security, it really doesn't matter about the safe scenarios. It doesn't matter if there are a million safe use-cases for pickle ("what if I'm running on a single-user system with no internet, a malicious user can only hurt themselves..."[1]) if the user mistakes their actually unsafe scenario for a safe one. And that's the risk: can I guarantee that there is no clever scheme by which an attacker can fool me into unpickling malicious code? I need to be smarter than the attacker, and more imaginative, and to have thought as long and hard about the problem as they have. They've probably been thinking about ways to exploit pickle for months. I've spent three minutes reading the docs. Who is likely to win? This is why an *inherently safe* serialization format is a necessary thing. I don't want to spend even three minutes thinking about exploits, I just want to write the data out and read it back in, no issues, no worries, and not have to think about it. [1] Victims and authors of viruses and malware in the 1980s and 1990s may disagree. -- Steven

Chris Angelico

7:55 p.m.

On Wed, Jul 15, 2020 at 9:46 AM Steven D'Aprano <steve@pearwood.info> wrote:

...

On Mon, Jul 13, 2020 at 09:56:45PM +1000, Chris Angelico wrote:

...
A pickle file (or equivalent blob in a database, or whatever) should be considered equally as trusted as your source code. If you're writing out a file that has the exact same access permissions as your own source code, and then reading it back, you shouldn't have to worry about pickle's safety any more than you worry about your code's safety - anyone who could maliciously craft something for you to unpickle could equally just edit the source code directly.

If I worry about the security of my source code, I can put a known good copy on read-only media, or lock it down with more restrictive permissions so that the user running the code cannot modify it. In either case, if my code needs to write data out and then later back in to a pickle file, it can't be written to the same location as my source code. (As it is read-only.)

At that point, you are NOT running it with the "exact same access permissions", are you? :) But a large amount of code is indeed run with the same access permissions as its temporary files (which may be incredibly restrictive or incredibly generous, either way).

...

They've probably been thinking about ways to exploit pickle for months. I've spent three minutes reading the docs. Who is likely to win?

This is why an *inherently safe* serialization format is a necessary thing. I don't want to spend even three minutes thinking about exploits, I just want to write the data out and read it back in, no issues, no worries, and not have to think about it.

And that's why we have JSON and various others, which are not pickle and are not vulnerable the way that pickle is. I don't think we need a "safe pickle". What we need is to not use pickle when it's not the right tool. I'm highly sympathetic to the requests for "JSON but able to encode more types", but not so sympathetic to "pickle but magically able to be safe". ChrisA

Steven D'Aprano

8:58 p.m.

On Wed, Jul 15, 2020 at 09:55:03AM +1000, Chris Angelico wrote:

...

At that point, you are NOT running it with the "exact same access permissions", are you? :)

Indeed, and I did acknowledge that you were probably thinking about a different scenario. But I was challenging your assertion that anyone who can write a malicious pickle could just as easily inject malicious code into my source code. That's not always correct.

...

But a large amount of code is indeed run with the same access permissions as its temporary files (which may be incredibly restrictive or incredibly generous, either way).

Again, this is true. But we don't counter risks by pointing at the times that it's not a risk: "Seat belts in cars? Ludicrous, most of the time the car is sitting still, not even moving, with nobody inside it! Why does it need seat belts?" You are absolutely correct that most code (whether rightly or wrongly) doesn't consider, or maybe even doesn't *need* to consider, the security of pickle. If I personally write out a pickle, and then read it back in, what am I worried about? That I personally will inject malicious code into my own pickle, to grant myself access to my own computer? I don't think so. But if I'm distributing my code to others, the responsible thing to do is to think of the potential security risks about using pickle in my app, or library. What if they use it in ways that I didn't foresee, ways which *ought to be* safe except for my choice to use pickle? I'm not demanding that developers be omniscient, but I do think that they should not willfully ignore known security risks. "All care, no responsibility" is only meaningful if we do actually take care.

...

...
They've probably been thinking about ways to exploit pickle for months. I've spent three minutes reading the docs. Who is likely to win?

This is why an *inherently safe* serialization format is a necessary thing. I don't want to spend even three minutes thinking about exploits, I just want to write the data out and read it back in, no issues, no worries, and not have to think about it.

And that's why we have JSON and various others,

How do I use JSON to serialise an arbitrary instance of some class? Instances are just data. (Well, usually.) I should be able to serialise instances (well, most of them) and safely read them back again. Of course the gap between *should* and *can* is quite large, and Python really doesn't make it easy. I'm not saying this is an easy problem to solve.

...

which are not pickle and are not vulnerable the way that pickle is. I don't think we need a "safe pickle".

So they're vulverable in other ways? :-)

...

What we need is to not use pickle when it's not the right tool.

How do I know when it's not the right tool? How do I know which other serialisation format is right? What about those -- and they are a significant minority -- who are restricted to only what's in the stdlib?

...

I'm highly sympathetic to the requests for "JSON but able to encode more types", but not so sympathetic to "pickle but magically able to be safe".

Okay, let's say that somebody else did the work. Some awfully clever chappy found a way to add a magical "pickle.safeload()" function that did everything needed, safely. Would you oppose it? (The old unsafe one would presumably have to remain for backwards compatibility, or for the cases which are inherently unsafe.) If not, then it seems to me you don't really care about this issue and could sit out of it :-) If you do *actively oppose* adding a safe version of pickle, perhaps you should explain why. -- Steven

Chris Angelico

9:24 p.m.

On Wed, Jul 15, 2020 at 11:00 AM Steven D'Aprano <steve@pearwood.info> wrote:

...

On Wed, Jul 15, 2020 at 09:55:03AM +1000, Chris Angelico wrote:

...
At that point, you are NOT running it with the "exact same access permissions", are you? :)

Indeed, and I did acknowledge that you were probably thinking about a different scenario. But I was challenging your assertion that anyone who can write a malicious pickle could just as easily inject malicious code into my source code. That's not always correct.

It's correct far more often than you might think. There's a LOT of code out there where the Python source code has the exact same external access permissions as its config files - often because there's no access to either.

...

...
But a large amount of code is indeed run with the same access permissions as its temporary files (which may be incredibly restrictive or incredibly generous, either way).

Again, this is true. But we don't counter risks by pointing at the times that it's not a risk:

"Seat belts in cars? Ludicrous, most of the time the car is sitting still, not even moving, with nobody inside it! Why does it need seat belts?"

And if it's not moving, you don't have to wear them. I see this as a perfect parallel. When you are in a risky situation, you take care. When you have other reasons for not worrying about the risk (a seatbelt won't save you from meteor strike), you don't need to.

...

But if I'm distributing my code to others, the responsible thing to do is to think of the potential security risks about using pickle in my app, or library. What if they use it in ways that I didn't foresee, ways which *ought to be* safe except for my choice to use pickle?

I'm not demanding that developers be omniscient, but I do think that they should not willfully ignore known security risks.

"All care, no responsibility" is only meaningful if we do actually take care.

So if you're distributing your code, then maybe you don't use pickle.

...

...
And that's why we have JSON and various others,

How do I use JSON to serialise an arbitrary instance of some class?

Instances are just data. (Well, usually.) I should be able to serialise instances (well, most of them) and safely read them back again. Of course the gap between *should* and *can* is quite large, and Python really doesn't make it easy. I'm not saying this is an easy problem to solve.

One very VERY good option is to keep your code and data separate. In a lot of my projects, I do this very consciously and deliberately, ensuring that all my persistent data is JSON-safe. It makes things a lot easier to reason about when you don't have to concern yourself with refactoring breaking your saved data, which can certainly happen with pickle.

...

...
which are not pickle and are not vulnerable the way that pickle is. I don't think we need a "safe pickle".

So they're vulverable in other ways? :-)

Well, sure. Go ahead and point out JSON's vulnerabilities. :)

...

...
What we need is to not use pickle when it's not the right tool.

How do I know when it's not the right tool?

How do I know which other serialisation format is right?

What about those -- and they are a significant minority -- who are restricted to only what's in the stdlib?

Very good questions, and those are part of why we have multiple options. If you're restricted to the stdlib and distributing your code, I would generally recommend defaulting to JSON, because it's a well-known format that anyone can parse. If you need more functionality than JSON offers, but you're still restricted to the stdlib, you'll probably end up having to roll your own JSONEncoder subclass that handles what you need, or doing what I say above and keeping data separate from code. Neither is terribly difficult, but either way, you have to think slightly differently about what gets persisted. That's not a 100% ideal situation by any means, but it isn't as terrible as you might think.

...

...
I'm highly sympathetic to the requests for "JSON but able to encode more types", but not so sympathetic to "pickle but magically able to be safe".

Okay, let's say that somebody else did the work. Some awfully clever chappy found a way to add a magical "pickle.safeload()" function that did everything needed, safely. Would you oppose it?

(The old unsafe one would presumably have to remain for backwards compatibility, or for the cases which are inherently unsafe.)

I would ask them which laws of physics they violated, since pickle inherently has to be able to execute arbitrary code in order to be able to do everything it needs to. If someone claims they've created a way to allow untrusted users to insert code into your Python programs and have it execute, but they've made it safe, would you oppose its inclusion in the stdlib? How much security hardening would it take before you can confidently say that it really is safe?

...

If not, then it seems to me you don't really care about this issue and could sit out of it :-)

If you do *actively oppose* adding a safe version of pickle, perhaps you should explain why.

I actively oppose it because it isn't possible. Anything that is safe will not have all of pickle's functionality. A nerfed version of pickle that can only unpickle a tiny handful of core data types is no better than other options that already exist. The entire point of pickling arbitrary objects is that you can unpickle arbitrary objects. That's inherently unsafe if there is any possibility that the pickle file came from an untrusted user, and I do indeed oppose plans to try to make pickle what it isn't. You want "JSON but with a tagging system so it can unpickle dates and times"? No problem. You want "an encoder that can save int/str/list/dict and dataclasses"? There'd be no end of bikeshedding on how it handles mismatched classes, but that seems pretty doable. You want "pickle but magically able to know what's safe and what's not"? No. ChrisA

Random832

9:47 p.m.

On Tue, Jul 14, 2020, at 21:24, Chris Angelico wrote:

...

I actively oppose it because it isn't possible. Anything that is safe will not have all of pickle's functionality. A nerfed version of pickle that can only unpickle a tiny handful of core data types is no better than other options that already exist. The entire point of pickling arbitrary objects is that you can unpickle arbitrary objects.

I don't understand why no-one's engaging with what I actually suggested. I was not asking for a magically safe or arbitrarily restricted pickle function. I was asking for the current Unpickler class, which currently has a whitelist hook for loading globals, to be modified to also have a whitelist hook so that an application can provide a function that looks at a callable and its arguments that the pickle proposes to call, and can choose to either evaluate it, raise an error, or return a substitute value.

...

That's inherently unsafe if there is any possibility that the pickle file came from an untrusted user, and I do indeed oppose plans to try to make pickle what it isn't.

We already have one whitelist hook, why not another? The idea that the pickle format is "inherently unsafe" and cannot be made safe is magical thinking. What I am asking for is the ability for application code subclassing Unpickler to control how certain opcodes are evaluated by overriding methods... something that *already exists*, just not for the right ones needed to be adequately expressive. To the person who ran a fuzzer against it and found inputs that can cause segfaults or MemoryError, those are no more inherent to pickle than an equivalent bug in the JSON parser would be to JSON. They are bugs which can and should be fixed.

Steven D'Aprano

7:40 a.m.

On Tue, Jul 14, 2020 at 09:47:15PM -0400, Random832 wrote:

...

I was asking for the current Unpickler class, which currently has a whitelist hook for loading globals, to be modified to also have a whitelist hook so that an application can provide a function that looks at a callable and its arguments that the pickle proposes to call, and can choose to either evaluate it, raise an error, or return a substitute value.

Could you provide a proof of concept subclass? -- Steven

Random832

9:08 p.m.

On Wed, Jul 15, 2020, at 07:40, Steven D'Aprano wrote:

...

On Tue, Jul 14, 2020 at 09:47:15PM -0400, Random832 wrote:

...
I was asking for the current Unpickler class, which currently has a whitelist hook for loading globals, to be modified to also have a whitelist hook so that an application can provide a function that looks at a callable and its arguments that the pickle proposes to call, and can choose to either evaluate it, raise an error, or return a substitute value.

Could you provide a proof of concept subclass?

I was thinking of something like this... this is largely a trivial modification of the pure-python unpickler, but there's no methods that can be overridden for this effect in the C one. class MyUnpickler(pickle._Unpickler): # this method is intended to be overriden by subclasses def do_call(self, func, *a, **k): #print(f"blocked call {func}(*{a}, **{k})") #return None raise NotImplementedError("This unpickler can't handle this pickle") # these methods are defined the same as in _Unpickler except for the use of do_call def _instantiate(self, klass, args): if (args or not isinstance(klass, type) or hasattr(klass, "__getinitargs__")): try: value = do_call(klass, *args) except TypeError as err: raise TypeError("in constructor for %s: %s" % (klass.__name__, str(err)), sys.exc_info()[2]) else: value = do_call(klass.__new__, klass) self.append(value) def load_newobj(self): args = self.stack.pop() cls = self.stack.pop() obj = self.do_call(cls.__new__, cls, *args) self.append(obj) def load_newobj_ex(self): kwargs = self.stack.pop() args = self.stack.pop() cls = self.stack.pop() obj = self.do_call(cls.__new__, cls, *args, **kwargs) self.append(obj) def load_reduce(self): stack = self.stack args = stack.pop() func = stack[-1] stack[-1] = self.do_call(func, *args) dispatch = pickle._Unpickler.dispatch.copy() # load_inst and load_obj use _instantiate and don't need to be overridden directly dispatch[pickle.NEWOBJ[0]] = load_newobj dispatch[pickle.NEWOBJ_EX[0]] = load_newobj_ex dispatch[pickle.REDUCE[0]] = load_reduce def loads(s, /, *, fix_imports=True, encoding="ASCII", errors="strict", buffers=None, unpickler=pickle.Unpickler): if isinstance(s, str): raise TypeError("Can't load pickle from unicode string") file = io.BytesIO(s) return unpickler(file, fix_imports=fix_imports, buffers=buffers, encoding=encoding, errors=errors).load()

Edwin Zimmerman

7:54 a.m.

Random832 [mailto:random832@fastmail.com] wrote:

...

On Tue, Jul 14, 2020, at 21:24, Chris Angelico wrote:

...
I actively oppose it because it isn't possible. Anything that is safe will not have all of pickle's functionality. A nerfed version of pickle that can only unpickle a tiny handful of core data types is no better than other options that already exist. The entire point of pickling arbitrary objects is that you can unpickle arbitrary objects.

I don't understand why no-one's engaging with what I actually suggested. I was not asking for a magically safe or arbitrarily restricted pickle function.

I was asking for the current Unpickler class, which currently has a whitelist hook for loading globals, to be modified to also have a whitelist hook so that an application can provide a function that looks at a callable and its arguments that the pickle proposes to call, and can choose to either evaluate it, raise an error, or return a substitute value.

...
That's inherently unsafe if there is any possibility that the pickle file came from an untrusted user, and I do indeed oppose plans to try to make pickle what it isn't.

We already have one whitelist hook, why not another?

The idea that the pickle format is "inherently unsafe" and cannot be made safe is magical thinking.

The idea that the pickle module can be made "safe" is magical thinking. Pickle's attack surface is just too large and too powerful. As I said in a previous message, a stupid pickle fuzzer I wrote several years ago took about 60 seconds to start finding bugs (on an old slow-as-molasses single-core Intel Atom processor). A more intelligent fuzzer, on a much more powerful machine would probably do just as well today. It would help slightly to throw out the _pickle module and default to the pure Python version, but even then I wouldn't consider it anywhere close to secure. That said, I agree with the idea of giving users an easier way to control what pickle does. I think that any such modifications should continue to make clear that pickle has not magically become "safe". --Edwin

Random832

9:06 p.m.

On Wed, Jul 15, 2020, at 07:54, Edwin Zimmerman wrote:

...

The idea that the pickle module can be made "safe" is magical thinking. Pickle's attack surface is just too large and too powerful.

I don't think that makes something *inherently* unsafe, it just makes it difficult to make it safe. The problem I have is with the idea that it is *conceptually inevitable* [in the same way as, say, eval] for it to be unsafe, and therefore that it's not worth fixing bugs or adding whitelist features or doing anything other than saying "oh well it's their fault for using pickle" if/when an exploit is found. [that said, it might also be a worthwhile project to make an alternate "advanced de/serializer" that primarily works by creating empty objects [i.e. with object.__new__(cls)] and populating their slots/dictionaries by assignment rather than by executing any constructor code, though it would need special support for extension types with C structures]

...

As I said in a previous message, a stupid pickle fuzzer I wrote several years ago took about 60 seconds to start finding bugs (on an old slow-as-molasses single-core Intel Atom processor). A more intelligent fuzzer, on a much more powerful machine would probably do just as well today. It would help slightly to throw out the _pickle module and default to the pure Python version, but even then I wouldn't consider it anywhere close to secure.

That said, I agree with the idea of giving users an easier way to control what pickle does. I think that any such modifications should continue to make clear that pickle has not magically become "safe".

Stephen J. Turnbull

2:36 a.m.

Random832 writes:

...

I was asking for the current Unpickler class, which currently has a whitelist hook for loading globals,

Callables are globals in this sense. So overriding Unpickler.find_class will allow you to restrict to specified callables. It's not clear to me why you would want more fine-grained control: why not put the argument checking in the object constructor? In most cases that's probably where it's most useful, anyway, by DRY.

...

to be modified to also have a whitelist hook so that an application can provide a function that looks at a callable and its arguments that the pickle proposes to call, and can choose to either evaluate it, raise an error, or return a substitute value.

I would guess you can already have this by overriding Unpickler.load_reduce and patching Unpickler.dispatch[REDUCE[0]] to the new load_reduce. Is there any other way for a pickle to specify the code to invoke on data supplied by that pickle?

...

The idea that the pickle format is "inherently unsafe" and cannot be made safe is magical thinking.

I think you're quite wrong. Pickle format itself is inherently unsafe because it allows a pickle to specify code to be executed on data that pickle specifies. Of course that assumes that the problem code somehow got into a Python file on your PYTHON_PATH, but pickle format surely allows that in the absence of a specified threat model that rules it out. I'm sure it's true that some particular uses of pickle format can be safe. But you need to say what you need the code using pickle to do, and what threats you need to be safe against. Note that "pickle format is unsafe" is documented in many places, and the responsibility for security explicitly left up to the user. Not only that but an earlier attempt at "safe pickling" was removed in Python 2.3 (IIRC) since it didn't guarantee safety, and it wasn't considered worth the considerable effort to audit the pickle code for vulnerabilities.

...

What I am asking for is the ability for application code subclassing Unpickler to control how certain opcodes are evaluated by overriding methods... something that *already exists*, just not for the right ones needed to be adequately expressive.

Of course they already exist, and can be overriden. I guess what you're asking for is a promise that the interface won't change in future versions of pickle.py. That's the only difference between overriding Unpickler.find_class and overriding Unpickler.load_reduce (or any other method).

Random832

9:56 a.m.

On Thu, Jul 16, 2020, at 02:36, Stephen J. Turnbull wrote:

...

Random832 writes:

...
I was asking for the current Unpickler class, which currently has a whitelist hook for loading globals,

Callables are globals in this sense.

not all callables are globals, as has been pointed out attributes of objects (methods) can also be callable. this does require calling getattr which is itself global, but you can't exercise any fine-grained control over this without substituting your own getattr function, which creates problems if you are unpickling an object which contains a reference to the getattr function. Ultimately, *this* is the problem that made me realize that find_class isn't an adequate hook - that you cannot block or substitute a callable [whether that's a class, a global function, or a method] without also impacting the ability to unpickle objects that contain references to the same callable *as data*.

...

So overriding Unpickler.find_class will allow you to restrict to specified callables. It's not clear to me why you would want more fine-grained control: why not put the argument checking in the object constructor? In most cases that's probably where it's most useful, anyway, by DRY.

...
to be modified to also have a whitelist hook so that an application can provide a function that looks at a callable and its arguments that the pickle proposes to call, and can choose to either evaluate it, raise an error, or return a substitute value.

I would guess you can already have this by overriding Unpickler.load_reduce and patching Unpickler.dispatch[REDUCE[0]] to the new load_reduce. Is there any other way for a pickle to specify the code to invoke on data supplied by that pickle?

I think I got all of them, but if you think there may be others feel free to be an extra pair of eyes. But these overrides are not available for the C version, and may well not be available on other python implementations.

...

...
The idea that the pickle format is "inherently unsafe" and cannot be made safe is magical thinking.

I think you're quite wrong. Pickle format itself is inherently unsafe because it allows a pickle to specify code to be executed on data that pickle specifies.

Which is why an unpickler that *does not directly execute the specified code* is necessary and sufficient to be safe.

...

Of course they already exist, and can be overriden. I guess what you're asking for is a promise that the interface won't change in future versions of pickle.py. That's the only difference between overriding Unpickler.find_class and overriding Unpickler.load_reduce (or any other method).

The other difference is that overriding find_class is supported by _pickle.c [The dispatch mechanism is also unreasonably annoying to override, and having to override four separate methods, one of which is underscore-prefixed, to do the same thing, is not ideal] On a mostly unrelated note I also have to admit I am baffled why the NEWOBJ opcodes are defined to call __new__ instead of __newobj__, when the latter is expected to exist and be a valid reduce function. Having a dunder name for a method that expects to be called from pickle seems like it would have been useful for security, since then you could add classes to a whitelist without allowing unrestricted calls to their constructor. Is this just a performance micro-optimization, or was there some other reason to prefer calling __new__ directly?

Stephen J. Turnbull

12:54 p.m.

Random832 writes:

...

On Thu, Jul 16, 2020, at 02:36, Stephen J. Turnbull wrote:

...
Random832 writes:

...
I was asking for the current Unpickler class, which currently has a whitelist hook for loading globals,

Callables are globals in this sense.

not all callables are globals, as has been pointed out attributes of objects (methods) can also be callable.

this does require calling getattr which is itself global, but you can't exercise any fine-grained control

OK. Conceded. Since you have control over load_reduce (if you use the pure Python version), you can parse the stack and get the argument to getattr, but that's a hill I am quite unwilling to die on. That kind of messing around should indeed be enabled by the pickle API with well-defined semantics -- if the use cases justify it.

...

I think I got all of them, but if you think there may be others feel free to be an extra pair of eyes. But these overrides are not available for the C version,

That's going to be a sticking point, as many pickle use cases want to be as fast as possible. Additional overhead is likely to be unwelcome, although I guess the default would be minimal (I guess checking for the default of None and only calling if non-None would be fastest and do the job).

...

...
...
The idea that the pickle format is "inherently unsafe" and cannot be made safe is magical thinking.

I think you're quite wrong. Pickle format itself is inherently unsafe because it allows a pickle to specify code to be executed on data that pickle specifies.

Which is why an unpickler that *does not directly execute the specified code* is necessary and sufficient to be safe.

It certainly is *not* sufficient to be safe, if the threat model includes, say, a zero-day in the code being called, or an extended attack in which one allowed call sets the stage for another allowed call to blow up. And it may be sufficient to be useless, depending on the use case. You *are* going to die on *that* hill, you know.

...

On a mostly unrelated note I also have to admit I am baffled why the NEWOBJ opcodes are defined to call __new__ instead of __newobj__, when the latter is expected to exist and be a valid reduce function.

A lot of these decisions have implications for backward compatibility of pickles. If you add code to check versions and decide whether to call __new__ or __newobj__, that has performance and complexity implications that may have been judged not worth the marginal[1] improvement in security. Again, a request to change this seems likely to get pushback. In any case, to get fairly definitive answers to your questions, you should try writing Antoine and/or Tim off-list. Footnotes: [1] I do believe the use cases you have in mind are marginal and the kind of code you'll need to write to take advantage of the features vulnerability-prone. I don't matter, but I suspect the core devs will feel that way too.

Random832

11:25 p.m.

On Sat, Jul 18, 2020, at 12:54, Stephen J. Turnbull wrote:

...

...
I think I got all of them, but if you think there may be others feel free to be an extra pair of eyes. But these overrides are not available for the C version,

That's going to be a sticking point, as many pickle use cases want to be as fast as possible. Additional overhead is likely to be unwelcome, although I guess the default would be minimal (I guess checking for the default of None and only calling if non-None would be fastest and do the job).

The *default* would be to just pass the call through as-is, e.g. def do_call(self, f, *a, **k): return f(*a, **k); or whatever is the equivalent C - my proposal is all just about having an internally called method that *can* be overridden, not defining anything special with it by default. I guess part of where I'm not sure I'm on solid ground is... is the pure-python version guaranteed to always exist and always be available under the name _Unpickler, or is that an implementation detail? I've been assuming that there was no such guarantee and any change would have to be clearly defined and ultimately available in both versions.

...

It certainly is *not* sufficient to be safe, if the threat model includes, say, a zero-day in the code being called, or an extended attack in which one allowed call sets the stage for another allowed call to blow up. And it may be sufficient to be useless, depending on the use case. You *are* going to die on *that* hill, you know.

Well, sure, anything can have bugs. I meant it's sufficient for it not to have any special vulnerabilities vs anything else you might ever do with python. I'm basically just trying to push back against "this is, like eval, the keys to the kingdom and thus not worth hardening in any way at all".

...

...
On a mostly unrelated note I also have to admit I am baffled why the NEWOBJ opcodes are defined to call __new__ instead of __newobj__, when the latter is expected to exist and be a valid reduce function.

A lot of these decisions have implications for backward compatibility of pickles. If you add code to check versions and decide whether to call __new__ or __newobj__, that has performance and complexity implications that may have been judged not worth the marginal[1] improvement in security. Again, a request to change this seems likely to get pushback.

Sure - it'd have to be a new opcode at this point, and almost certainly isn't worth it... I just think the wrong decision was made in the first place, and we'd have more solid ground to design a version that doesn't require every application to provide its own specific filters if the decision had gone the other way. It doesn't matter at this point, I was just mentioning it as an aside.

Wes Turner

2:48 a.m.

Tragic! Pickle is relatively (?) fast and could be made more secure while making any performance regression due to additional security optional. Perhaps it is the objectives of pickle which are desirable: - serialize/deserialize arbitrary objects - binary representation Or perhaps the docs could include clarifications from this thread regarding the unsuitability of pickle for anything and the module should be underscored: _pickle On Sat, Jul 18, 2020, 11:28 PM Random832 <random832@fastmail.com> wrote:

...

On Sat, Jul 18, 2020, at 12:54, Stephen J. Turnbull wrote:

...
...
I think I got all of them, but if you think there may be others feel free to be an extra pair of eyes. But these overrides are not available for the C version,

That's going to be a sticking point, as many pickle use cases want to be as fast as possible. Additional overhead is likely to be unwelcome, although I guess the default would be minimal (I guess checking for the default of None and only calling if non-None would be fastest and do the job).

The *default* would be to just pass the call through as-is, e.g. def do_call(self, f, *a, **k): return f(*a, **k); or whatever is the equivalent C - my proposal is all just about having an internally called method that *can* be overridden, not defining anything special with it by default.

I guess part of where I'm not sure I'm on solid ground is... is the pure-python version guaranteed to always exist and always be available under the name _Unpickler, or is that an implementation detail? I've been assuming that there was no such guarantee and any change would have to be clearly defined and ultimately available in both versions.

...
It certainly is *not* sufficient to be safe, if the threat model includes, say, a zero-day in the code being called, or an extended attack in which one allowed call sets the stage for another allowed call to blow up. And it may be sufficient to be useless, depending on the use case. You *are* going to die on *that* hill, you know.

Well, sure, anything can have bugs. I meant it's sufficient for it not to have any special vulnerabilities vs anything else you might ever do with python. I'm basically just trying to push back against "this is, like eval, the keys to the kingdom and thus not worth hardening in any way at all".

...
...
On a mostly unrelated note I also have to admit I am baffled why the NEWOBJ opcodes are defined to call __new__ instead of __newobj__, when the latter is expected to exist and be a valid reduce function.

A lot of these decisions have implications for backward compatibility of pickles. If you add code to check versions and decide whether to call __new__ or __newobj__, that has performance and complexity implications that may have been judged not worth the marginal[1] improvement in security. Again, a request to change this seems likely to get pushback.

Sure - it'd have to be a new opcode at this point, and almost certainly isn't worth it... I just think the wrong decision was made in the first place, and we'd have more solid ground to design a version that doesn't require every application to provide its own specific filters if the decision had gone the other way. It doesn't matter at this point, I was just mentioning it as an aside. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/OHSE4C... Code of Conduct: http://python.org/psf/codeofconduct/

Stephen J. Turnbull

12:42 p.m.

Random832 writes:

...

I guess part of where I'm not sure I'm on solid ground is... is the pure-python version guaranteed to always exist and always be available under the name _Unpickler, or is that an implementation detail? I've been assuming that there was no such guarantee and any change would have to be clearly defined and ultimately available in both versions.

IIRC, it's not guaranteed, but it's generally considered best practice to have a pure Python version in the stdlib, even though that adds maintenance overhead. There are some things that can't be done in Python without C assistance, so it can't be a hard rule.

...

I'm basically just trying to push back against "this is, like eval, the keys to the kingdom and thus not worth hardening in any way at all".

And I'm basically trying to push back on the notion that this kind of discussion can be useful without talking about benefits (applications that can use the hardened pickle but not unhardened pickle) and threats (which define the "can use ... but not" phrase).

...

Sure - it'd have to be a new opcode at this point,

Why? The REDUCE opcode invokes load_reduce which ... oh heck, just post it: def load_reduce(self): stack = self.stack args = stack.pop() func = stack[-1] stack[-1] = func(*args) dispatch[REDUCE[0]] = load_reduce So, why not this? # in _pickle this variable needs to be exposed to Python call_restricting_callable_and_args = None def load_reduce(self): stack = self.stack args = stack.pop() func = stack[-1] if call_restricting_callable_and_args is None: stack[-1] = func(*args) else: stack[-1] = call_restricting_callable_and_args(func, args) dispatch[REDUCE[0]] = load_reduce which would allow raising an error, substituting a value, or calling func on args, as you suggested, but also allow substituting in args or even for func, and value substitution or editing after checking the value of func(*args). In C that would be as fast as you can get in the default case. Yes, there'd probably be pushback, even for that little overhead. But if you've got any good use cases, there'd be a shot, I think. The problem for me is I don't know of any use cases for any of that flexibility beyond whitelisting func, which we already have.

...

I just think the wrong decision was made in the first place,

Which "first place", the earlier pickle that had a restricted mode, or the "modern" pickle which based on that experience removed restricted mode? Steve

Random832

12:13 a.m.

On Sun, Jul 19, 2020, at 12:42, Stephen J. Turnbull wrote:

...

...
Sure - it'd have to be a new opcode at this point,

Why? The REDUCE opcode invokes load_reduce which ... oh heck, just post it:

...
I just think the wrong decision was made in the first place,

Which "first place", the earlier pickle that had a restricted mode, or the "modern" pickle which based on that experience removed restricted mode?

er, I think you've lost some context - these particular statements were regarding an aside about a hypothetical opcode that would be defined to call __newobj__, which could be defined other than by calling __new__. The function could be written to inspect its own arguments more carefully than a general constructor, and having that name could constitute advertisement by the class that the function is safe to be called by unpickle. It was the decision to call __new__ rather than __newobj__ in the NEWOBJ opcode that I was questioning.

Steven D'Aprano

7:35 a.m.

On Wed, Jul 15, 2020 at 11:24:17AM +1000, Chris Angelico wrote:

...

It's correct far more often than you might think. There's a LOT of code out there where the Python source code has the exact same external access permissions as its config files - often because there's no access to either.

Um, yes? Safe use-cases is not the issue here. It's the unsafe use-cases that are important. Especially the use-cases that people may think are safe but actually aren't. To stick to the seat belt analogy for a moment... we don't reject seat belts in cars because most of the time cars are safely parked in a garage. We add them for the times that cars are in motion at speed. Improving the security of pickle shouldn't be done for the sake of cases where the security of pickle is irrelevent. It should be done for the sake of cases where it is necessary, especially for those cases where the developer thinks that security isn't necessary, but they are mistaken.

...

So if you're distributing your code, then maybe you don't use pickle.

Sure. What do I use to serialise my complex data structure? I guess I could write out the repr and then call eval on it, that should be fine... *wink* [...]

...

...
Okay, let's say that somebody else did the work. Some awfully clever chappy found a way to add a magical "pickle.safeload()" function that did everything needed, safely. Would you oppose it?

(The old unsafe one would presumably have to remain for backwards compatibility, or for the cases which are inherently unsafe.)

I would ask them which laws of physics they violated, since pickle inherently has to be able to execute arbitrary code in order to be able to do everything it needs to.

I'm not a pickle expert, but I don't think that's quite right. pickle has to be able to execute arbitrary code in order to be able to de-serialise arbitrary pickles, but that doesn't mean it has to de-serialise arbitrary pickles if you aren't expecting arbitrary pickles. Random beat it to me by suggesting a white-list, but I was thinking the same way. The pickle protocol has to be able to deal with arbitrary instances, but very few apps using pickle need to, or want to, accept arbitrary instances. If my app serialised Widgets and Gadgets, then it ought to be an error to attempt to deserialise anything else. Then all I need do is ensure that the Widget and Gadget classes are secure, not the entire Python universe :-) As I said, I'm not an expect, but five minutes reading this: https://rushter.com/blog/pickle-serialization-internals/ allows me to confidently pontificate on the subject *wink* The depickling virtual machine (pickle machine or PM) is not Turing complete. It has no loops or conditionals. It's a dumb machine that takes a sequence of op-codes, executing them in order, and then halt. The GLOBAL op-code (by default) will import any module, and use any function from that module. That's dangerous; an option to restrict what modules and functions can be called by the PM would go a long way to reducing the attack surface of pickle. (I think.) Random's idea of white-listing seems like a promising approach to me. Even if it doesn't make pickle "safe" in some absolute sense, it will make it *less unsafe* and reduce the attack surface for people using pickle. Security is always about tradeoffs, and we shouldn't let the idea of some unattainable perfectly secure pickle get in the way of improving the safety of pickle.

...

If someone claims they've created a way to allow untrusted users to insert code into your Python programs and have it execute, but they've made it safe, would you oppose its inclusion in the stdlib?

But that's not really what we're asking for. We're asking for a way to *avoid* executing arbitrary code, while still allowing *trusted* objects to be depickled.

...

You want "pickle but magically able to know what's safe and what's not"?

Of course not. But maybe I want to be able to tell pickle what I think is safe, and have everything else fail. -- Steven

Chris Angelico

8:14 a.m.

On Wed, Jul 15, 2020 at 9:37 PM Steven D'Aprano <steve@pearwood.info> wrote:

...

On Wed, Jul 15, 2020 at 11:24:17AM +1000, Chris Angelico wrote:

...
So if you're distributing your code, then maybe you don't use pickle.

Sure. What do I use to serialise my complex data structure? I guess I could write out the repr and then call eval on it, that should be fine... *wink*

Maybe don't HAVE an arbitrarily complex data structure for serialization. Maybe have a way to turn the in-memory representation into a much simpler structure, serialize that, and then load from your saved form. It'll make your code a lot easier to reason about and refactor, since you're no longer intrinsically binding your code to your save format.

...

I'm not a pickle expert, but I don't think that's quite right. pickle has to be able to execute arbitrary code in order to be able to de-serialise arbitrary pickles, but that doesn't mean it has to de-serialise arbitrary pickles if you aren't expecting arbitrary pickles.

Random beat it to me by suggesting a white-list, but I was thinking the same way. The pickle protocol has to be able to deal with arbitrary instances, but very few apps using pickle need to, or want to, accept arbitrary instances. If my app serialised Widgets and Gadgets, then it ought to be an error to attempt to deserialise anything else.

Then all I need do is ensure that the Widget and Gadget classes are secure, not the entire Python universe :-)

If that's what you want, then have a way to serialize Widgets and Gadgets, and *not* a way to serialize arbitrary objects. That, to me, sounds more like "enhanced JSON" than "magically safe pickle".

...

Security is always about tradeoffs, and we shouldn't let the idea of some unattainable perfectly secure pickle get in the way of improving the safety of pickle.

Nor should we let the idea of a secure pickle get in the way of improving the functionality of safer options.

...

...
If someone claims they've created a way to allow untrusted users to insert code into your Python programs and have it execute, but they've made it safe, would you oppose its inclusion in the stdlib?

But that's not really what we're asking for. We're asking for a way to *avoid* executing arbitrary code, while still allowing *trusted* objects to be depickled.

Except that you are. It's equivalent to trying to create a safe version of eval() instead of building a simple arithmetic expression parser. You're starting from danger and trying to patch until it's safe, instead of starting from safety and adding functionality until it's usable. Remember: If you have insufficient functionality, you'll know about it; if you are insufficiently secure, you won't know till it's too late.

...

...
You want "pickle but magically able to know what's safe and what's not"?

Of course not. But maybe I want to be able to tell pickle what I think is safe, and have everything else fail.

That's fair, but are you actually guaranteeing that it will never read arbitrary attributes from objects? Can pickle grab a module or function, pick up a dunder from it, and go to town? Are you able to give a total 100% guarantee that it cannot? If not, how do you know that it's safe? Edwin has given further information on the inherent unsafe nature of pickle. It should be used for trusted pickles, NOT as a basis for some magical "safe" parser. ChrisA

Random832

9:13 p.m.

On Wed, Jul 15, 2020, at 08:14, Chris Angelico wrote:

...

That's fair, but are you actually guaranteeing that it will never read arbitrary attributes from objects?

First of all, reading an attribute of an object in a pickle requires the getattr function. Even currently, you can substitute your own function for getattr in find_class, and with my proposal you wouldn't have to because you could control attempts to evaluate even the real getattr function. Second of all, with no way to exfiltrate, why is reading arbitrary attributes from objects problematic?

Chris Angelico

9:16 p.m.

On Thu, Jul 16, 2020 at 11:13 AM Random832 <random832@fastmail.com> wrote:

...

On Wed, Jul 15, 2020, at 08:14, Chris Angelico wrote:

...
That's fair, but are you actually guaranteeing that it will never read arbitrary attributes from objects?

First of all, reading an attribute of an object in a pickle requires the getattr function. Even currently, you can substitute your own function for getattr in find_class, and with my proposal you wouldn't have to because you could control attempts to evaluate even the real getattr function.

Are you sure of that? I don't have any examples to hand, but are you able to pickle something identified as pkg.module.cls(x)?

...

Second of all, with no way to exfiltrate, why is reading arbitrary attributes from objects problematic?

Because the moment you can read arbitrary attributes from arbitrary objects, Python becomes impossible to sandbox. ChrisA

Random832

9:26 p.m.

On Wed, Jul 15, 2020, at 21:16, Chris Angelico wrote:

...

Are you sure of that? I don't have any examples to hand, but are you able to pickle something identified as pkg.module.cls(x)?

This produces find_class('pkg.module', 'cls'). Doing pkg.module.cls.method produces find_class('builtins', 'getattr')(find_class('pkg.module', 'cls'), 'method')

...

...
Second of all, with no way to exfiltrate, why is reading arbitrary attributes from objects problematic?

Because the moment you can read arbitrary attributes from arbitrary objects, Python becomes impossible to sandbox.

Not if you can't call them.

Stephen J. Turnbull

2:29 a.m.

Steven D'Aprano writes:

...

But if I'm distributing my code to others, the responsible thing to do is to think of the potential security risks about using pickle in my app, or library. What if they use it in ways that I didn't foresee, ways which *ought to be* safe except for my choice to use pickle?

If you have a choice, don't use pickle. If you don't have a choice, label your product "Uses pickle. If you care about security, find out what security issues that entails."

...

How do I use JSON to serialise an arbitrary instance of some class?

Ask Wes Turner about semantic JSON or whatever it is he frequently advocates for providing more type information to JSON codecs.

...

How do I know which other serialisation format is right?

That's your problem. We can advise you once you present your application and threat model(s). It does depend on both.

...

What about those -- and they are a significant minority -- who are restricted to only what's in the stdlib?

What about them?

...

Some awfully clever chappy found a way to add a magical "pickle.safeload()" function that did everything needed, safely. Would you oppose it?

I never oppose magic, for wizards are subtle and quick to anger. I have no desire to be turned into a newt. I just don't believe claims like "function that does everything needed, safely."

...

If you do *actively oppose* adding a safe version of pickle, perhaps you should explain why.

Define "safe": what is pickle allowed to do, what applications will be provided such "safety", and what threat model(s) is(are) being considered? Until somebody answers that, we all may as well sit this one out.

Wes Turner

11:28 a.m.

On Thu, Jul 16, 2020, 2:30 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

[...]

...
How do I use JSON to serialise an arbitrary instance of some class?

Ask Wes Turner about semantic JSON or whatever it is he frequently advocates for providing more type information to JSON codecs.

"Re: Improvement: __json__" (2020-04) https://mail.python.org/archives/list/python-ideas@python.org/thread/ISSUQVY... - https://mail.python.org/archives/list/python-ideas@python.org/message/D7WZX6...

...

...
How do I know which other serialisation format is right?

That's your problem. We can advise you once you present your application and threat model(s). It does depend on both.

If you're sharing data across process and/or system boundaries, pickle is the wrong format. (As evidenced by pyro4/5 completely removing pickle from their library for remote objects for security reasons).

...

Antoine Pitrou

6:15 a.m.

On Wed, 15 Jul 2020 09:45:06 +1000 Steven D'Aprano <steve@pearwood.info> wrote:

...

And that's the risk: can I guarantee that there is no clever scheme by which an attacker can fool me into unpickling malicious code? I need to be smarter than the attacker, and more imaginative, and to have thought as long and hard about the problem as they have.

A rather straightforward way to guarantee it would be to sign pickles cryptographically. Of course, the private signing key should not be compromised :-) Regards Antoine.

Edwin Zimmerman

5:02 p.m.

The bottom line is that pickle should never be used in a security sensitive context. Several years ago I spent about 5 minutes writing a custom pickle fuzzer. It ran for about 60 seconds before segfaulting. Fortunately, the last time I ran my fuzzer (about a year ago), all I could produce was a MemoryError traceback. Even with all the improvements pickle has seen, I think it would be unwise to imply (via pickle module names or flags) that pickle is "safe". --Edwin On 7/11/2020 1:31 PM, Wes Turner wrote:

...

Would this accomplish something like:

pickle.load(safe=True) # or pickle.safe_loads()

Is there already a way to load data and not code *with pickle*? https://docs.python.org/3/library/pickle.html

On Sat, Jul 11, 2020, 11:01 AM Random832 <random832@fastmail.com <mailto:random832@fastmail.com>> wrote:

The current practice, by overriding find_class, is limited to overriding what globals get loaded. This makes it impossible to distinguish globals that will be used as data from globals that will be called as constructors, along with similar concerns with object attributes [especially methods] obtained by loading builtins.getattr as global.

I would suggest also exposing for overrides the points where a callable loaded from the pickle is called - on the pure-python _Unpickler these are _instantiate, load_newobj, load_newobj_ex, and load_reduce, though it might be worthwhile to make a single method that can be overridden and use it at the points where each of these call a loaded object. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org <mailto:python-ideas@python.org> To unsubscribe send an email to python-ideas-leave@python.org <mailto:python-ideas-leave@python.org> https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BB2TLA... Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/NRLT3I... Code of Conduct: http://python.org/psf/codeofconduct/

Greg Ewing

8:15 p.m.

On 12/07/20 5:31 am, Wes Turner wrote:

...

Is there already a way to load data and not code *with pickle*?

As far as I know, pickle has never been able to load code objects. The security problems come from the fact that by default a pickle is able to *call* any module-level callable object that it has access to, with arbitrary data as arguments. Since this includes eval() and exec(), it can effectively run arbitrary code. The set of callables that can be considered "safe" depends on the application, so there can't really be a generic "safe" option. If that were possible, it would no doubt already exist and be the default. -- Greg

Random832

11:44 p.m.

On Sat, Jul 11, 2020, at 20:15, Greg Ewing wrote:

...

The set of callables that can be considered "safe" depends on the application, so there can't really be a generic "safe" option. If that were possible, it would no doubt already exist and be the default.

My main concern is wanting to make the, yes, application specific decision on whether calling a callable is safe *at call time* [and with access to the arguments e.g to determine if getattr is safe], rather than simply at the time a global is loaded.

1697

Age (days ago)

1706

Last active (days ago)

List overview

Download

43 comments

12 participants

participants (12)

Antoine Pitrou
Bruce Leban
Chris Angelico
Christopher Barker
David Mertz
Edwin Zimmerman
Greg Ewing
João Santos
Random832
Stephen J. Turnbull
Steven D'Aprano
Wes Turner

Pickle security improvements

tags

participants (12)