[Python-Dev] Dropping bytes "support" in json

Stephen J. Turnbull stephen at xemacs.org
Fri Apr 10 20:13:35 CEST 2009

"Martin v. Löwis" writes:

 > > (3) The default transfer encoding syntax is UTF-8.
 > Notice that the RFC is partially irrelevant. It only applies
 > to the application/json mime type, and JSON is used in various
 > other protocols, using various other encodings.

Sure.  That's their problem.  In Python, Unicode is the native
encoding, and we have codecs to deal with the outside world, no?  That
happens to match very well not only with RFC 4627, but the sidebar on
json.org that defines JSON.

 > > I think it's a bad idea for any of the core JSON API to accept or
 > > produce bytes in any language that provides a Unicode string type.
 > So how do you integrate the encoding detection that the RFC suggests
 > to be done?

I suggest you don't.  That's mission creep.  Think about writing tests
for it, and remember that out in the wild those "various other
encodings" almost certainly include Shift JIS, Big5, and KOI8-R.  Both
those considerations point to "er, let's delegate detection and
en/decoding to the nice folks who maintain the codec suite."  Where
it's embedded in some other protocol which specifies a TES, the TES
can be implemented there, too.

As I wrote earlier, I don't see anything wrong with providing a
wrapper module that deals with some default/common/easy cases.  But
I'd stick it in the contrib directory.

More information about the Python-Dev mailing list