
So originally I presented this idea for serialize methods something like to_json() but the idea was rightly shot down where the language would have to support most formats and this could lead to confusion or complication. So this is a newer version of that idea. I think that object should provide an __serializable__ method which in-turn allows the user to define in it how the object is to be serialized, the default operation should be something along the lines of return self.__dict__ but that is just semantics. The idea with an object that offers a serializable method means that the object can be passed directly to any formater in python that offers a .dump method and the object is immediately formatted how the end user wants the data to be, without needing to write a middle layer formatter for this object. Is this another terrible idea from me? Or is there some ground in this?

On Fri, Mar 09, 2012 at 05:05:51PM +0000, Jakob Bowyer wrote:
Do you mean __getstate__? http://docs.python.org/library/pickle.html#the-pickle-protocol Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Pickle is insecure, unfortunately, so a generic module to serialize and unserialize Python objects (or data containers) securely, without the need of constructor, would be awesome. However, magical methods are evil. It will be hard to find the source of error if the logic in your magic level fail. -- anatoly t. On Fri, Mar 9, 2012 at 8:27 PM, Jakob Bowyer <jkbbwr@gmail.com> wrote:

Improve documentation to point users to JSON module? http://docs.python.org/library/json.html I didn't make any analysis if it is secure, but it seems a good starting point. The API seems a little hackish - perhaps there should be a recipe book. There is also http://home.gna.org/oomadness/en/cerealizer/index.html linked from comments on this pickle insecurity research that can be handy http://nadiana.com/python-pickle-insecure -- anatoly t. On Sat, Mar 10, 2012 at 1:40 AM, Guido van Rossum <guido@python.org> wrote:

On 2012-03-09, at 23:58 , Jim Rollenhagen wrote:
This idea was actually started because we were talking about how not all objects/types are JSON-serializable. The example at hand was the bytes type.
Technically, all object types are JSON-serializable since you can plug custom encoding schemes in. Of course, practically few types/libraries provide a `JSONEncoder` and an object hook so you'll have to build your own if you want to serialize and deserialize non-core types. On the other hand, since I'm not sure there's any community standard for the JSON serialization of e.g. a datetime, it's probably for the best that providing that is your job, because the library would very likely provide something you don't want or can't work with.

On Sat, 10 Mar 2012 01:36:53 +0300 anatoly techtonik <techtonik@gmail.com> wrote:
Pickle is insecure,
http://docs.python.org/dev/library/pickle.html#restricting-globals

On Fri, Mar 09, 2012 at 05:05:51PM +0000, Jakob Bowyer wrote:
I think that object should provide an __serializable__ method which in-turn allows the user to define in it how the object is to be serialized,
I don't think this is a sensible approach. In general, you don't serialise a single object, you serialise a bunch of them. Suppose you serialise three objects spam, ham, and eggs to a single file. Unfortunately, spam uses pickle, ham uses JSON, and eggs uses plist. How would you read the data back later? How do you know which de-serialiser you should use for each object? What happens if you end up with ambiguous content? You would need some sort of meta-serialiser, that not just recorded each serialised string, but also the format of that string. I don't think that it is helpful to ask objects to serialise themselves, giving them the choice of what serialisation scheme to use. While freedom of choice is good, it should be the *caller* who chooses the scheme, not the individual objects. So at the very least, for this idea to have legs, you would have to mandate a serialisation scheme which well-behaved objects ought to support. But Python already has that: pickle. If you want to mandate a second scheme, to overcome the known deficiencies of pickle, that's a separate issue.
Returning __dict__ can't work, because not all objects have a self.__dict__, and for those that do, it is not a string but a dict. [Aside: I don't understand why people say "that is just semantics" to dismiss something as trivial. Semantics is the *meaning* of something. It is the most fundamental, critical property of language. Without semantics, if I say "I got in the car and drove to work" you would not know if I actually got in the car and drove to work, or stayed home to watch television.] -- Steven

On 2012-03-10, at 00:22 , Steven D'Aprano wrote:
If we consider the example of __getstate__ or custom JSON encoders, the output is not a string but a serializable structure (usually some sort of tagged dictionary, for pickle the protocol itself does the tagging via the typename): the object tree is first converted to fully serializable structures, then serialized at once (using a single format) avoiding this issue. But of course there's then the trouble of what a "serializable structure" is for a given serialization format (JSON will accept arbitrary dicts, but I think Protocol Buffer only accepts predefined structures, and XML or S-Expressions will require some sort of schema to encode how they represent key:value maps for this document) meaning a single "serializable" protocol likely won't work at the end of the day, as it can have but one output which may or may not match the capabilities of the format the user wants to serialize from and to.
See above, other serializers could hook themselves onto __getstate__ (I originally thought this was Oleg's suggestion, I must have been mistaken since nobody else interpreted it that way) but it still ends up with the format's semantics not necessarily mapping 1:1 to Python semantics.

On Fri, Mar 09, 2012 at 05:05:51PM +0000, Jakob Bowyer wrote:
Do you mean __getstate__? http://docs.python.org/library/pickle.html#the-pickle-protocol Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Pickle is insecure, unfortunately, so a generic module to serialize and unserialize Python objects (or data containers) securely, without the need of constructor, would be awesome. However, magical methods are evil. It will be hard to find the source of error if the logic in your magic level fail. -- anatoly t. On Fri, Mar 9, 2012 at 8:27 PM, Jakob Bowyer <jkbbwr@gmail.com> wrote:

Improve documentation to point users to JSON module? http://docs.python.org/library/json.html I didn't make any analysis if it is secure, but it seems a good starting point. The API seems a little hackish - perhaps there should be a recipe book. There is also http://home.gna.org/oomadness/en/cerealizer/index.html linked from comments on this pickle insecurity research that can be handy http://nadiana.com/python-pickle-insecure -- anatoly t. On Sat, Mar 10, 2012 at 1:40 AM, Guido van Rossum <guido@python.org> wrote:

On 2012-03-09, at 23:58 , Jim Rollenhagen wrote:
This idea was actually started because we were talking about how not all objects/types are JSON-serializable. The example at hand was the bytes type.
Technically, all object types are JSON-serializable since you can plug custom encoding schemes in. Of course, practically few types/libraries provide a `JSONEncoder` and an object hook so you'll have to build your own if you want to serialize and deserialize non-core types. On the other hand, since I'm not sure there's any community standard for the JSON serialization of e.g. a datetime, it's probably for the best that providing that is your job, because the library would very likely provide something you don't want or can't work with.

On Sat, 10 Mar 2012 01:36:53 +0300 anatoly techtonik <techtonik@gmail.com> wrote:
Pickle is insecure,
http://docs.python.org/dev/library/pickle.html#restricting-globals

On Fri, Mar 09, 2012 at 05:05:51PM +0000, Jakob Bowyer wrote:
I think that object should provide an __serializable__ method which in-turn allows the user to define in it how the object is to be serialized,
I don't think this is a sensible approach. In general, you don't serialise a single object, you serialise a bunch of them. Suppose you serialise three objects spam, ham, and eggs to a single file. Unfortunately, spam uses pickle, ham uses JSON, and eggs uses plist. How would you read the data back later? How do you know which de-serialiser you should use for each object? What happens if you end up with ambiguous content? You would need some sort of meta-serialiser, that not just recorded each serialised string, but also the format of that string. I don't think that it is helpful to ask objects to serialise themselves, giving them the choice of what serialisation scheme to use. While freedom of choice is good, it should be the *caller* who chooses the scheme, not the individual objects. So at the very least, for this idea to have legs, you would have to mandate a serialisation scheme which well-behaved objects ought to support. But Python already has that: pickle. If you want to mandate a second scheme, to overcome the known deficiencies of pickle, that's a separate issue.
Returning __dict__ can't work, because not all objects have a self.__dict__, and for those that do, it is not a string but a dict. [Aside: I don't understand why people say "that is just semantics" to dismiss something as trivial. Semantics is the *meaning* of something. It is the most fundamental, critical property of language. Without semantics, if I say "I got in the car and drove to work" you would not know if I actually got in the car and drove to work, or stayed home to watch television.] -- Steven

On 2012-03-10, at 00:22 , Steven D'Aprano wrote:
If we consider the example of __getstate__ or custom JSON encoders, the output is not a string but a serializable structure (usually some sort of tagged dictionary, for pickle the protocol itself does the tagging via the typename): the object tree is first converted to fully serializable structures, then serialized at once (using a single format) avoiding this issue. But of course there's then the trouble of what a "serializable structure" is for a given serialization format (JSON will accept arbitrary dicts, but I think Protocol Buffer only accepts predefined structures, and XML or S-Expressions will require some sort of schema to encode how they represent key:value maps for this document) meaning a single "serializable" protocol likely won't work at the end of the day, as it can have but one output which may or may not match the capabilities of the format the user wants to serialize from and to.
See above, other serializers could hook themselves onto __getstate__ (I originally thought this was Oleg's suggestion, I must have been mistaken since nobody else interpreted it that way) but it still ends up with the format's semantics not necessarily mapping 1:1 to Python semantics.
participants (8)
-
anatoly techtonik
-
Antoine Pitrou
-
Guido van Rossum
-
Jakob Bowyer
-
Jim Rollenhagen
-
Masklinn
-
Oleg Broytman
-
Steven D'Aprano