
I was playing with the codecs module and realized that there's untapped potential to use them for de/serialization. It's easy enough to register a codec for some case, but very few objects (I think only strings and bytes and their stream cousins) have native encode/decode methods. It seems to me that obj.encode("json") and str.decode("json"), for example, would be a powerful feature, if it were tied into the native codecs registry, enabling users to simplify a lot of serialization code and implement or tie-in any codec that makes sense. Right now, if I want to json.dumps a MappingProxyType, I believe I have to pass a custom JSONEncoder to json.dumps explicitly every time I call it. But I think I should be able to register one, and then just call thing.encode('json'). I could call codecs.encode(thing, 'json'), but I think maybe I shouldn't have to import codecs or json into my modules to do this. What do you think? In case anyone is interested, here's a simple registration of json as a codec that works today: import codecs, json def encode(obj): try: size = len(obj) except TypeError: size = 1 return json.dumps(obj), size def decode(obj): return json.loads(obj), len(obj) codec_info = codecs.CodecInfo( name='json', encode=encode, decode=decode ) codecs.register({'json': codec_info}.get) print(codecs.encode({'a':1}, 'json')) # etc

Michael A. Smith writes:
It seems to me that obj.encode("json") and str.decode("json"), for example, would be a powerful feature,
This idea comes up a lot in various forms. The most popular lately is an optional __json__ dunder, which really would avoid the complication of working with custom JSONEncoders. That hasn't got a lot of takeup, though. Perhaps we could broaden the appeal by generalizing it to obj.__serialize__(protocol='json'), but that looks like overengineering to me. I think tying it to the codecs registry is probably going to get a lot of pushback, at least when you get to the point where you're discussing with the senior core devs. In Python terms like "codec" and methods like .encode and .decode are very deliberately tied to character encodings. In Python 2 there were "transcodings" like gzip, rarely used, discarded in Python 3, and not missed.
Right now, if I want to json.dumps a MappingProxyType, I believe I have to pass a custom JSONEncoder to json.dumps explicitly every time I call it.
That's exactly the kind of thing we have 'def' for though.
But I think I should be able to register one, and then just call thing.encode('json').
Think your MappingProxyType example through. It's going to be a lot more complicated than "register and just call", I think.
I could call codecs.encode(thing, 'json'), but I think maybe I shouldn't have to import codecs or json into my modules to do this.
That will never fly, I think. Text encoding is privileged in the open builtin and on str and bytes because every single Python program must do it (the source is *always* bytes and *always* has to be decoded to text), and because *only* text and bytes need .encode and .decode respectively.

On Thu, Jul 16, 2020, at 02:13, Stephen J. Turnbull wrote:
Michael A. Smith writes:
It seems to me that obj.encode("json") and str.decode("json"), for example, would be a powerful feature,
This idea comes up a lot in various forms. The most popular lately is an optional __json__ dunder, which really would avoid the complication of working with custom JSONEncoders. That hasn't got a lot of takeup, though. Perhaps we could broaden the appeal by generalizing it to obj.__serialize__(protocol='json'), but that looks like overengineering to me.
This kind of thing [especially having objects directly call their referenced objects' serialize methods rather than calling back to some other method to serialize them] seems like it would limit the ability of the protocol to handle with serialization formats that do anything to handle recursive or shared references. Particularly if we're serious about Pickle not being a viable foundation for secure deserialization, I think a new serialization protocol needs to be flexible enough to handle these cases.
participants (3)
-
Michael A. Smith
-
Random832
-
Stephen J. Turnbull