[issue19837] Wire protocol encoding for the JSON module

Nick Coghlan report at bugs.python.org
Sat Nov 30 03:30:45 CET 2013


New submission from Nick Coghlan:

In the Python 3 transition, we had to make a choice regarding whether we treated the JSON module as a text transform (with load[s] reading Unicode code points and dump[s] producing them), or as a text encoding (with load[s] reading binary sequences and dump[s] producing them).

To minimise the changes to the module API, the decision was made to treat it as a text transform, with the text encoding handled externally.

This API design decision doesn't appear to have worked out that well in the web development context, since JSON is typically encountered as a UTF-8 encoded wire protocol, not as already decoded text.

It also makes the module inconsistent with most of the other modules that offer "dumps" APIs, as those *are* specifically about wire protocols (Python 3.4):

>>> import json, marshal, pickle, plistlib, xmlrpc.client
>>> json.dumps('hello')
'"hello"'
>>> marshal.dumps('hello')
b'\xda\x05hello'
>>> pickle.dumps('hello')
b'\x80\x03X\x05\x00\x00\x00helloq\x00.'
>>> plistlib.dumps('hello')
b'<?xml version="1.0" encoding="UTF-8"?>\n<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">\n<plist version="1.0">\n<string>hello</string>\n</plist>\n'

The only module with a dumps function that (like the json module) returns a string, is the XML-RPC client module:

>>> xmlrpc.client.dumps(('hello',))
'<params>\n<param>\n<value><string>hello</string></value>\n</param>\n</params>\n'

And that's nonsensical, since that XML-RPC API *accepts an encoding argument*, which it now silently ignores:

>>> xmlrpc.client.dumps(('hello',), encoding='utf-8')
'<params>\n<param>\n<value><string>hello</string></value>\n</param>\n</params>\n'
>>> xmlrpc.client.dumps(('hello',), encoding='utf-16')
'<params>\n<param>\n<value><string>hello</string></value>\n</param>\n</params>\n'

I now believe that an "encoding" parameter should have been added to the json.dump API in the Py3k transition (defaulting to UTF-8), allowing all of the dump/load APIs in the standard library to be consistently about converting to and from a binary wire protocol.

Unfortunately, I don't have a solution to offer at this point (since backwards compatibility concerns rule out the simple solution of just changing the return type). I just wanted to get it on record as a problem (and internal inconsistency within the standard library for dump/load protocols) with the current API.

----------
components: Library (Lib)
messages: 204764
nosy: chrism, ncoghlan
priority: normal
severity: normal
status: open
title: Wire protocol encoding for the JSON module
versions: Python 3.5

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19837>
_______________________________________


More information about the Python-bugs-list mailing list