[Python-ideas] Serializable method

Masklinn masklinn at masklinn.net
Sat Mar 10 12:59:35 CET 2012


On 2012-03-10, at 00:22 , Steven D'Aprano wrote:

> On Fri, Mar 09, 2012 at 05:05:51PM +0000, Jakob Bowyer wrote:
> 
>> I think that object should provide an __serializable__ method which in-turn
>> allows the user to define in it how the object is to be serialized, 
> 
> I don't think this is a sensible approach. In general, you don't 
> serialise a single object, you serialise a bunch of them. Suppose you 
> serialise three objects spam, ham, and eggs to a single file. 
> Unfortunately, spam uses pickle, ham uses JSON, and eggs uses plist. How 
> would you read the data back later? How do you know which de-serialiser 
> you should use for each object? What happens if you end up with 
> ambiguous content?

If we consider the example of __getstate__ or custom JSON encoders, the
output is not a string but a serializable structure (usually some sort
of tagged dictionary, for pickle the protocol itself does the tagging via
the typename): the object tree is first converted to fully serializable
structures, then serialized at once (using a single format) avoiding this
issue.

But of course there's then the trouble of what a "serializable structure"
is for a given serialization format (JSON will accept arbitrary dicts,
but I think Protocol Buffer only accepts predefined structures, and XML or
S-Expressions will require some sort of schema to encode how they represent
key:value maps for this document) meaning a single "serializable" protocol
likely won't work at the end of the day, as it can have but one output
which may or may not match the capabilities of the format the user wants
to serialize from and to.

> I don't think that it is helpful to ask objects to serialise themselves, 
> giving them the choice of what serialisation scheme to use. While 
> freedom of choice is good, it should be the *caller* who chooses the 
> scheme, not the individual objects.

See above, other serializers could hook themselves onto __getstate__
(I originally thought this was Oleg's suggestion, I must have been
mistaken since nobody else interpreted it that way) but it still ends
up with the format's semantics not necessarily mapping 1:1 to Python
semantics.




More information about the Python-ideas mailing list