Hi all, This is a proposal to enable uuid objects serialization / deserialization by the json module out of the box. UUID objects cast to string:
example = uuid.uuid4() str(example) 'b8bcbfaa-d54f-4f33-9d7e-c91e38bb1b63'
The can be casted from string:
example == uuid.UUID(str(example)) True
But are not serializable out of the box:
json.dumps(example) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.8/json/__init__.py", line 231, in dumps return _default_encoder.encode(obj) File "/usr/lib/python3.8/json/encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "/usr/lib/python3.8/json/encoder.py", line 257, in iterencode return _iterencode(o, 0) File "/usr/lib/python3.8/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.__class__.__name__} ' TypeError: Object of type UUID is not JSON serializable
Wouldn't it be pythonically possible to make this work out of the box, without going through the string typecasting ? If that discussion goes well perhaps we can also talk about datetimes ... I know there's nothing about datetime formats in the json specification, that users are free to choose, but could we choose a standard format by default that would just work for people who don't care about the format they want. Thank you in advance for your replies Have a great day -- ∞
Hi, The issue with custom types serialized/unserialized with JSON is that they don't exist in JSON format, so you need to find a way to represent it without breaking other types (like a prefix in a string or a specific JSON object with fields that identify its type). But they still don't exist in JSON and you usually want to parse real JSON, not over-interpreting objects that are contained in the document, so it's not a good idea to add custom types by default. The solution is to use custom encoders/decoders, and it's possible with Python json module (https://docs.python.org/3/library/json.html#encoders-and-decoders). So you just use it when you need custom types, and otherwise you don't have to deal with them. Le mer. 10 juin 2020 à 14:19, J. Pic <jpic@yourlabs.org> a écrit :
Hi all,
This is a proposal to enable uuid objects serialization / deserialization by the json module out of the box.
UUID objects cast to string:
example = uuid.uuid4() str(example) 'b8bcbfaa-d54f-4f33-9d7e-c91e38bb1b63'
The can be casted from string:
example == uuid.UUID(str(example)) True
But are not serializable out of the box:
json.dumps(example) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.8/json/__init__.py", line 231, in dumps return _default_encoder.encode(obj) File "/usr/lib/python3.8/json/encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "/usr/lib/python3.8/json/encoder.py", line 257, in iterencode return _iterencode(o, 0) File "/usr/lib/python3.8/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.__class__.__name__} ' TypeError: Object of type UUID is not JSON serializable
Wouldn't it be pythonically possible to make this work out of the box, without going through the string typecasting ?
If that discussion goes well perhaps we can also talk about datetimes ... I know there's nothing about datetime formats in the json specification, that users are free to choose, but could we choose a standard format by default that would just work for people who don't care about the format they want.
Thank you in advance for your replies
Have a great day
-- ∞ _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/W23G6C... Code of Conduct: http://python.org/psf/codeofconduct/
-- Antoine Rozo
I don't know how or even whether this would be possible ootb. On one side, I do really like the idea; on the other hand, it kind of seems like going down a bumpy road (as you said, datetimes? filepaths?urls?). And the de-serialization would not be easy. What if we added a function call for the serialization, something like json.dumps(object, method=str) (don't know if that is an option at the moment) On Wed, Jun 10, 2020 at 9:19 AM J. Pic <jpic@yourlabs.org> wrote:
Hi all,
This is a proposal to enable uuid objects serialization / deserialization by the json module out of the box.
UUID objects cast to string:
example = uuid.uuid4() str(example) 'b8bcbfaa-d54f-4f33-9d7e-c91e38bb1b63'
The can be casted from string:
example == uuid.UUID(str(example)) True
But are not serializable out of the box:
json.dumps(example) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.8/json/__init__.py", line 231, in dumps return _default_encoder.encode(obj) File "/usr/lib/python3.8/json/encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "/usr/lib/python3.8/json/encoder.py", line 257, in iterencode return _iterencode(o, 0) File "/usr/lib/python3.8/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.__class__.__name__} ' TypeError: Object of type UUID is not JSON serializable
Wouldn't it be pythonically possible to make this work out of the box, without going through the string typecasting ?
If that discussion goes well perhaps we can also talk about datetimes ... I know there's nothing about datetime formats in the json specification, that users are free to choose, but could we choose a standard format by default that would just work for people who don't care about the format they want.
Thank you in advance for your replies
Have a great day
-- ∞ _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/W23G6C... Code of Conduct: http://python.org/psf/codeofconduct/
I understand, do you think the python standard library should provide a JSONEncoder and JSONDecoder that supports python standard library objects ? It would be optional to use, but if you use it then any object from the python standard library will just work.
On Wed, Jun 10, 2020 at 3:42 PM J. Pic <jpic@yourlabs.org> wrote:
I understand, do you think the python standard library should provide a JSONEncoder and JSONDecoder that supports python standard library objects ?
It would be optional to use, but if you use it then any object from the python standard library will just work.
I think a JSONEncoder for standard types would be nice to have. A JSONDecoder is much more complicated. What would it do? Convert every string that looks like an ISO datetime into a Python datetime object? Or convert objects like `{"type": "datetime", "value": "2020-01-01T12:34:56"}`?
On 6/10/2020 9:48 AM, Alex Hall wrote:
On Wed, Jun 10, 2020 at 3:42 PM J. Pic <jpic@yourlabs.org <mailto:jpic@yourlabs.org>> wrote:
I understand, do you think the python standard library should provide a JSONEncoder and JSONDecoder that supports python standard library objects ?
It would be optional to use, but if you use it then any object from the python standard library will just work.
I think a JSONEncoder for standard types would be nice to have.
A JSONDecoder is much more complicated. What would it do? Convert every string that looks like an ISO datetime into a Python datetime object? Or convert objects like `{"type": "datetime", "value": "2020-01-01T12:34:56"}`?
The general problem here is that if you're encoding things for a non-python program to use, you can't do something non-standard like the above. And if you're looking for something that only other python programs can use, you're better off using pickle or some other non-JSON format. That's why these ideas never result in any changes to the stdlib json library. Eric
Good point, but then I'm not sure the decoder could be used for untrusted json anymore. Another solution would be to generate a schema in a separate variable, which would represent the JSON structure with Python types. Also, there's still simple regexp pattern matching that could also be good enough.
On Wed, Jun 10, 2020 at 4:09 PM J. Pic <jpic@yourlabs.org> wrote:
Good point, but then I'm not sure the decoder could be used for untrusted json anymore.
Another solution would be to generate a schema in a separate variable, which would represent the JSON structure with Python types.
For that there are many existing libraries. You might like one of my recent projects: https://github.com/alexmojaki/datafunctions or: https://pydantic-docs.helpmanual.io/ https://github.com/lidatong/dataclasses-json <https://github.com/lidatong/dataclasses-json#Overriding> https://github.com/ltworf/typedload/
On Wednesday, June 10, 2020, at 08:48 -0500, Alex Hall wrote:
On Wed, Jun 10, 2020 at 3:42 PM J. Pic <jpic@yourlabs.org> wrote:
I understand, do you think the python standard library should provide a JSONEncoder and JSONDecoder that supports python standard library objects ?
It would be optional to use, but if you use it then any object from the python standard library will just work.
I think a JSONEncoder for standard types would be nice to have.
Perhaps.
A JSONDecoder is much more complicated. What would it do? Convert every string that looks like an ISO datetime into a Python datetime object? Or convert objects like `{"type": "datetime", "value": "2020-01-01T12:34:56"}`?
IMO, it's worse than that. If you control both the producers and the consumers, and they're both written in Python, then you may as well use pickle and base64 (and an HMAC!) to convert your python data to an opaque ASCII string and just transmit that string. Why bother with JSON and all of its verbosity and restrictions in the first place? If interoperability is a concern, then how much does this sort of thing complicate your JSON and all of the other producers/consumers? Will their applications, standard libraries, and best practices "just work"?
On Thu, Jun 11, 2020 at 12:45 AM Dan Sommers <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
If you control both the producers and the consumers, and they're both written in Python, then you may as well use pickle and base64 (and an HMAC!) to convert your python data to an opaque ASCII string and just transmit that string. Why bother with JSON and all of its verbosity and restrictions in the first place?
If interoperability is a concern, then how much does this sort of thing complicate your JSON and all of the other producers/consumers? Will their applications, standard libraries, and best practices "just work"?
What if it's to be produced and consumed by your app (so, no interoperability), but you want it to be human-readable and human-editable? JSON is pretty good for that. ChrisA
On 6/10/2020 11:00 AM, Chris Angelico wrote:
On Thu, Jun 11, 2020 at 12:45 AM Dan Sommers <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
If you control both the producers and the consumers, and they're both written in Python, then you may as well use pickle and base64 (and an HMAC!) to convert your python data to an opaque ASCII string and just transmit that string. Why bother with JSON and all of its verbosity and restrictions in the first place?
If interoperability is a concern, then how much does this sort of thing complicate your JSON and all of the other producers/consumers? Will their applications, standard libraries, and best practices "just work"? What if it's to be produced and consumed by your app (so, no interoperability), but you want it to be human-readable and human-editable? JSON is pretty good for that.
True, but I don't think the stdlib needs to cater to that requirement when there are hooks to write your own customizations. Eric
On Thu, Jun 11, 2020 at 1:35 AM Eric V. Smith <eric@trueblade.com> wrote:
On 6/10/2020 11:00 AM, Chris Angelico wrote:
On Thu, Jun 11, 2020 at 12:45 AM Dan Sommers <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
If you control both the producers and the consumers, and they're both written in Python, then you may as well use pickle and base64 (and an HMAC!) to convert your python data to an opaque ASCII string and just transmit that string. Why bother with JSON and all of its verbosity and restrictions in the first place?
If interoperability is a concern, then how much does this sort of thing complicate your JSON and all of the other producers/consumers? Will their applications, standard libraries, and best practices "just work"? What if it's to be produced and consumed by your app (so, no interoperability), but you want it to be human-readable and human-editable? JSON is pretty good for that.
True, but I don't think the stdlib needs to cater to that requirement when there are hooks to write your own customizations.
I agree in general, but it might be worth having a few recipes in the docs or something. Make it clear that the Python json module *can* encode these kinds of things, but it's up to you as the app designer to decide how (among a number of equally viable options) you want to represent them. ChrisA
On Wednesday, June 10, 2020, at 10:37 -0500, Chris Angelico wrote:
On Thu, Jun 11, 2020 at 1:35 AM Eric V. Smith <eric@trueblade.com> wrote:
On 6/10/2020 11:00 AM, Chris Angelico wrote:
On Thu, Jun 11, 2020 at 12:45 AM Dan Sommers <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
If you control both the producers and the consumers, and they're both written in Python, then you may as well use pickle and base64 (and an HMAC!) to convert your python data to an opaque ASCII string and just transmit that string. Why bother with JSON and all of its verbosity and restrictions in the first place?
If interoperability is a concern, then how much does this sort of thing complicate your JSON and all of the other producers/consumers? Will their applications, standard libraries, and best practices "just work"? What if it's to be produced and consumed by your app (so, no interoperability), but you want it to be human-readable and human-editable? JSON is pretty good for that.
Readable, yes. Mostly. Editable? YMMV. Both reading and writing definitely get worse, however, as soon as you have to add your own type annotations and sub-structures. Then again, I used to read/write/edit assembly language programs happily with ed, so all of this is fairly subjective.
True, but I don't think the stdlib needs to cater to that requirement when there are hooks to write your own customizations.
I agree in general, but it might be worth having a few recipes in the docs or something. Make it clear that the Python json module *can* encode these kinds of things, but it's up to you as the app designer to decide how (among a number of equally viable options) you want to represent them.
Hooks and non-trivial examples in the documentation are both excellent.
I published a lib on PyPi that does that, which pushed to write a complete readme, that I will reproduce here if anybody is interested in more discussion about this, along with my conclusions: Overall, it seems like the cost of maintenance is going to be insignificant. While the value is reduced with objects that can't be patched during runtime such as datetime (TypeError: can't set attributes of built-in/extension type 'datetime.datetime') that you'd need to import from that library instead of from datetime, it still brings value in making serialization and deserialization into the versatile and popular JSON format easier and more reusable than with the current object_hook, by leveraging typical object oriented programing in a very boring way that makes it easy for anyone to grasp. The README looks like: Instead of: from json import loads, dumps from uuid import UUID, uuid4 obj = uuid4() encoded = dumps(str(obj)) decoded = UUID(loads(encoded)) assert obj == decoded We can do: from jsonlight import loads, dumps from uuid import UUID, uuid4 obj = uuid4() encoded = dumps(obj) decoded = loads(UUID, encoded) assert obj == decoded This is because jsonlight patches uuid.UUID class to add the following methods: - ``__jsondump__``: return a representation of self with JSON data types - ``__jsonload__``: instantiate an object based on the result from __jsondump__ You can see that the main difference with ``json.loads`` is that ``jsonlight.loads`` requires a type as the first argument. This is because ``jsonlight.loads`` will first call ``json.loads`` to convert the string into a Python object with basic JSON types, and then pass that to the type's ``__jsonload__`` function. Other types can't be monkey patched, so you have to import them from jsonlight instead, which is the sad case of datetime: from jsonlight import loads, dumps, datetime obj = datetime.now() assert obj == loads(datetime, dumps(obj)) You may also define ``__jsondump__`` and ``__jsonload__`` methods on your own classes, example: from jsonlight import load class YourClass: def __init__(self, uuid=None): self.uuid = uuid or uuid4() def __jsondump__(self): return dict(uuid=self.uuid) @classmethod def __jsonload__(cls, data): return cls(load(UUID, data['uuid']) # This also works, but would not illustrate how to support recursion # return cls(UUID(data['uuid'])) As you can see: - you don't have to worry about calling ``__jsondump__`` on return values of your own ``__jsondump__`` because ``jsonlight.dumps`` will do that recursively, - you have full control on deserialization just like with ``__setstate__``, but if you call jsonlight.load in there yourself then you don't have to duplicate deserialization logic or bother calling ``__jsonload__`` on nested objects yourself, Monkey-patched stdlib objects are: - UUID - Path Feel free to add more. Stdlib objects that couldn't be monkey patched, and that you have to import from jsonlight instead are: - datetime
I don't think the stdlib needs to cater to that requirement when there are hooks to write your own customizations.
If the stdlib offers such hooks, as well as objects that don't serialize by default, why not ship a usable hook that would serialize anything from the stdlib by default ? It really seems like it "almost got there". Perhaps the stdlib JSONEncoder could check for a new __json__ method on every object it serializes. Similar to __getstate__, __json__ should however return data that only contains json-compatible types. Then we could go on and add it for stdlib objects such as uuid and datetime, and have a rudimentary but failsafe json dumpsing function that works with any python object from the stdlib, as well as your own objects where you add a __json__ magic method.
IMO, it's worse than that.
Agreed that JSON deserialization is a problem I would rather not even try to solve actually. Choosing what type to deserialize with seems like a problem that doesn't have an elegant solution: - even with a schema: what if an attribute has different types within the same list, then the schema will not work or have to be complex - storing the types into the encoded output like Pickle, but that changes the schema and might also be subject to the same security warnings that Pickle has So, custom-typed deserialized doesn't look like something that could get in the stdlib. That said, rudimentary and failsafe JSON serialization seems reachable and still useful.
I don't think the stdlib needs to cater to that requirement when there are hooks to write your own customizations.
If the stdlib offers such hooks, as well as objects that don't serialize by default, why not ship a usable hook that would serialize anything from the stdlib by default ? It really seems like it "almost got there". My reason: because it's yet something else to maintain in the standard
On 6/10/2020 11:59 AM, J. Pic wrote: library, and doesn't add enough value to justify its existence and ongoing maintenance cost.
Perhaps the stdlib JSONEncoder could check for a new __json__ method on every object it serializes. Similar to __getstate__, __json__ should however return data that only contains json-compatible types. Then we could go on and add it for stdlib objects such as uuid and datetime, and have a rudimentary but failsafe json dumpsing function that works with any python object from the stdlib, as well as your own objects where you add a __json__ magic method.
There are many, many design decisions that would need to be made. Off the top of my head: what about recursive data structures? And basically every other decision ever made by pickle over the years. My suggestion would be to write a package to do this yourself, then upload it to PyPI. I think functools.singledispatch would be a good building block.
IMO, it's worse than that.
Agreed that JSON deserialization is a problem I would rather not even try to solve actually. Choosing what type to deserialize with seems like a problem that doesn't have an elegant solution:
- even with a schema: what if an attribute has different types within the same list, then the schema will not work or have to be complex - storing the types into the encoded output like Pickle, but that changes the schema and might also be subject to the same security warnings that Pickle has
So, custom-typed deserialized doesn't look like something that could get in the stdlib.
That said, rudimentary and failsafe JSON serialization seems reachable and still useful.
I'd be even more opposed to a a "serialize but not deserialize" version going into the standard lib. Eric
Or, there might be a way to get the best of both worlds. Consider this silly example: encoded = yourobject.__jsondump__() # this should work yourobject == YourClass.__jsonload__(encoded) Basically very similar to __getstate__ and __setstate__ with pickle, with the following limitations: - these functions are limited to json types, which are rudimentary on one hand but on the other it really seems like their current standard is here to stay - dict, list, set and other types that may contain different types would not be able to do any type conversion, encoding list(UUID()) does not make it possible to get the UUID type back in the decoded list object, unless you implement your list subclass for example then it's up to you as a user But with this limitations come the following advantages: - you still have full control of deserialization logic in your own classes, - it's easier than implementing an object_hook function because you don't have to code a type detection logic: the class __jsonload__ should know what type to apply on what attribute, - serialization becomes easy to safen, good for users who don't care / don't need anything particularly sophisticated - for which the current hooks work, - json is a popular format, PostgreSQL's JSON field is not the only really enjoyable thing we can do with json, Here's a simple example to illustrate: from uuid import UUID, uuid4 class YourClass: def __init__(self, uuid=None): self.uuid = uuid or uuid4() def __jsondump__(self): return dict(uuid=str(self.uuid)) # if uuid has __jsondump__, then this would be possible too: # return dict(uuid=self.uuid.__jsondump__()) # # if the encoder would call __jsondump__ automatically, then this would # be enough: # return dict(uuid=self.uuid) @classmethod def __jsonload__(cls, data): return cls(UUID(data['uuid'])) def __eq__(self, other): if isinstance(other, type(self)): return other.uuid == self.uuid return super().__eq__(other) obj = YourClass() encoded = obj.__jsondump__() decoded = YourClass.__jsonload__(encoded) assert obj == decoded I'll try that in PyPi then (just have to monkey patch the stdlib objects), if I find material to prove that it's worth the maintenance cost then I'll come back with more. Thank you very much for sharing some of your insight Have a great day <3
Since there's no standard for this in json, deciding to serialize to str is quite arbitrary. I personally serialize UUIDs to binary data as it takes less space. Some serialize it to hexadecimal. Tbh, it sounds like a good idea but can have it's pitfalls. On Wed, Jun 10, 2020, 4:43 PM J. Pic <jpic@yourlabs.org> wrote:
I understand, do you think the python standard library should provide a JSONEncoder and JSONDecoder that supports python standard library objects ?
It would be optional to use, but if you use it then any object from the python standard library will just work. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/C3IW6R... Code of Conduct: http://python.org/psf/codeofconduct/
participants (8)
-
Alex Hall
-
Antoine Rozo
-
Bar Harel
-
Chris Angelico
-
Dan Sommers
-
Eric V. Smith
-
J. Pic
-
Pablo Alcain